Summary and Critique by Claude BinghamA Decision Tree Method for Building Energy Demand Modeling
Energy consumption has been identified as a major factor in long-term building impact. Additionally, newer buildings have consumed more and more energy over time. To that end, the researchers in this project wanted to construct an accurate predictive model that would be able to estimate future energy use of buildings.
They chose decision tree methodology to create the predictive models. Regression methods were noted to be too complex for users with limited mathematical training; the researchers saw neural networks as 'black boxes,' unreplicable for some of the same reasons of regression. Building simulations cannot accurately predict building occupant behavior patterns and therefore can only estimate what a building's energy consumption could be in a statically situational environment. Decision trees, however, is relatively simple, can manipulate numerical and categorical data, and does not require much computation.
Decision trees use a flowchart-like structure to show hierarchy, status, and category of data. In this study, for example, a decision tree depicted the outside temperature, if a room was occupied, and whether the air conditioning was on because of those previous two factors. Based on the number of recorded occurrences of each possible variable state, energy use can be approximated for an individual room.
To verify the actual ability of such a model to create reliably accurate predictions, the research team used the C4.5 decision tree algorithm with open-source WEKA data-mining software. This pairing was chosen for their flexibility and ability to apply multiple types of data. The constructed model is then tested against predicted values. In this research study, the model was constructed to include six categorical variables and four numerical variables based on data collected from 80 buildings in six districts in Japan. The resulting value was set to be either 'HIGH' or 'LOW' energy use intensity.
The test model was able to correctly predict 92% of expected cases. The researchers noted that the confidence interval was 80%, too low to be consistently reliable and the model was miss-attributing variables at times. This was likely due to the size of the data set and limited variable hierarchy (also tied to data set size and variety).
This research benefited greatly from examining reasons for and against using various methodologies for predictive studies. The experiment was well-explained and well executed, with one exception. The sample size for the test data was too small. While the results were both reasonably accurate, and passably reliable, it shows decision trees are not downward scalable for smaller data samples. This methodology appears to work well with large data sets, but not smaller ones.