A Short-Term Forecast Approach of Public Buildings’ Power Demands upon Multi-source Data

Due to the significant increase of the global electricity demand and the rising number of urban population, the electric consumption in a city has attracted more attentions. Given the fact that public buildings occupy a large proportion of the electric consumption, the accurate prediction of electric consumptions for them is crucial to the rational electricity allocation and supply. This paper studies the possibility of utilizing urban multi-source data such as POI, pedestrian volume etc. to predict buildings’ electric consumptions. Among the multiple datasets, the key influencing factors are extracted to forecast the buildings’ electric power demands by the given probabilistic graphical algorithm named EMG. Our methodology is applied to display the relationships between the factors and forecast the daily electric power demands of nine public buildings including hotels, shopping malls, and office buildings in city of Hangzhou, China over the period of a month. The computational experiments are conducted and the result favors our approach.


Introduction
With the population growth and economic development, the global electric consumption is increasing yearly. In the past decades, the proportion of the world's urban population has been rising and the per capita electric consumption has gradually increased. The electric power demand in a city is undergoing drastic changing. According to the latest statistics released by the World Bank in 2016 [1], the global per capita electric consumption rose to 3104.7 kWh (kilowatt hour) in 2013 up from 2027.4 kWh in 2006 and the percentage of the world's urban population increased by 3.9% from 2006 to 2014. As the result, the urban electric consumption will rise sharply worldwide. Meanwhile, the urban public buildings have always been one of major electric consumption groups around the world [2,3]. The topic of building energy demand-side is the focus of researches today because an accurate prediction of demand is very important for every country to work out the reasonable plan of energy production and reduce carbon emissions [4,5,6]. It also plays a vital role in rationally allocating a city's electricity, avoiding peak-hour power shortage, saving public funds, reducing economic risks, and reducing environmental pollution caused by excessive electricity generation [7,8].
In the past researchers paid close attentions mainly to the influences of four factors including meteorological factors, building attributes, time series, and occupancy rates. Among them meteorological data, building attribute data and time series data can be more easily obtained, but the occupancy is difficult to obtain accurately. There are two possible ways to obtain building occupancy data including manual counting and automatic sensing. The former method is not only time-consuming but also cannot be conducted in real time. Thus, the occupancy data obtained is usually a macroscopic statistical data by such as the annual average occupancy. The automatic sensing way is to install a sensing device and a data collecting system in the building records real-time building occupancy. But this method requires expensive hardware and software investment, which makes it difficult to be applied extensively. It inspires us to invent a way to capture the occupancy upon other data. From the practice, we know the occupancy of a building is influenced by regional functions and regional vitality [9,10,11]. Therefore, we try to acquire the occupancy rate of a building through the urban multi-source data such as POI and pedestrian volume data in the surrounding area of the building to support the forecast of building electric consumption. Furthermore, few scholars have studied the relationships between factors such as weather and time series and their impacts on the occupancy of a building. We use the probability graph model to represent the relationships between various influence factors and electric consumptions and an approximate inference algorithm to predict public buildings' electric consumptions. This paper has the following contributions： (1) It studies the relationships between urban multi-source data with electric consumptions to predict building electric consumptions.
(2) It applies the probability graph model to study and express the relationships between various factors, and an approximate inference algorithm known as EMG is proposed to predict building electric consumptions.
(3) We evaluate the proposed approaches using the real data from nine public buildings in city of Hangzhou, China. In our computational experiments, the MAPE is used for the quality criterion. A comparative analysis is performed by using the regression analysis. The result indicates that our approach has a better accuracy. This paper is organized as follows. Related work is presented in section 2. Section 3 gives the methodology including grey correlation analysis and probability graph algorithm. Section 4 is the case study and results. Several tests and statistical analysis are provided in section 5. The paper is concluded with some remarks.

Related Work
In the past researchers have been continuously exploring four approaches to forecast buildings' energy consumptions including meteorological, architectural attributes, occupancy, and time series predictions. Some studies explore the impact of meteorological factors on building energy consumption. Zheng et al. study the effects of hour of day and outside air temperature on hot water energy consumption by a data driven method [12]. Yang et al. investigate some variables such as time of day and outdoor air dry-bulb temperature, and apply artificial neural network to make a short-term electric consumption prediction for commercial buildings [13]. Nelson et al. conduct the research on the influence of meteorological factor on residential buildings' energy and use a quadratic regression analysis approach to predict the demand of buildings' energy [14]. Ambera et al. investigate the influence of five important factors (temperature, solar radiation, relative humidity, wind speed, and weekday index) on administration buildings' energy. They use a multiple regression model and a genetic programming model to forecast daily electricity consumption [15]. James et al. look into the impact of climate change on peak and annual building energy consumption [16].
Many scholars pay closer attention to the effect of architectural attributes on the electric consumption. Lu use a physical-statistical approach which includes physical model and the statistical time series model to predict the energy consumption of buildings. The physical model simulates the basic energy consumption of different buildings and the statistical time series model reflects the heterogeneity of various buildings [17]. Akin makes a short-term prediction of electric demand through the detailed data and information of the house [18]. Cara et al. develop the auto-regressive models with building specific inputs for forecasting power demands [19]. Kristopher et al. carry out a study that utilizes statistical learning methods to predict the future monthly energy consumptions for single-family detached homes using building attributes and monthly climate data [20].
Researchers also try to utilize occupancy and time series factors to predict the building energy consumptions. Ferlito et al use the building properties, occupancy rate and weather to predict buildings' energy consumption by Artificial Neural Network method [21]. Sandels et al. explore the influence of weather, occupancy, and temporal factors on electricity consumptions of a Swedish office building [22]. Kim et al. study the influence of building occupancy and construction area allocation on building electric consumption, and uses the linear equation method to predict the electric consumption of buildings [23]. López-Rodrí guez et al. conclude that building electricity demand is highly correlated with occupancy time in buildings, and build an occupancy statistical model for creating active occupancy with the aim to predict electricity consumptions [24]. Kavousian studies the structural and behavioral determinants of residential electricity consumption. This study shows that electric consumption is not significantly related to income level, home ownership, or building age [25].

Grey correlation analysis
The grey relational analysis has been widely studied and applied since its birth [26] [27]. It determines the degree of association upon the similarity of the curves represented by the two series. The grey correlation degree is used to represent the degree of association.
The grey correlation degree is computed as follows: Step 1: Set a data column (reference column) of the historical electric consumptions as shown in (1). Let m be the number of records for one of the underlying six buildings' electric consumptions during 92 days. (1) Step 2: The reference column together with the comparing columns form a matrix A. We apply a normalization process to all data in the matrix for the analysis accuracy. The normalized data matrix B is shown (2). The n is number of factors. The first column is normalized reference column and others are normalized comparing columns. The normalization process utilizes the Initiative Value method (3) Step 3: We compute one by one the absolute difference between the elements in normalized comparing and reference ones. Correlation coefficients between normalized comparing and normalized reference columns are calculated. In formula (4), ρ is the distinguishing coefficient that takes values in the range (0, 1). The smaller the ρ value is, the greater the difference between correlation coefficients is. It may be adjusted based on the practical needs of the system.
Step 4: Using the outcome obtained in step 3, the mean value of correlation coefficients for each comparing column can be calculated upon (5) respectively. The purpose of this is to acquire the correlation between every pair of comparing column and reference column and yield the grey correlation degree. The larger the value, the greater the influence of the comparing column.

Probability graph model
There are three steps to construct a probabilistic graphical model: structure learning, parameter learning, and inference. •

Structure learning
The obtained data may be incomplete, so we use the SEM (Structural Expectation-Maximization) algorithm [28] to learn structure. It is adopted in the structure learning based on an incomplete data set. Fig. 1 shows the Pseudo code of the algorithm.  Parameter learning and inference In the Probabilistic Graphical model, EM (Expectation-Maximization) algorithm is an approximate learning and inference algorithm [29], which can resolve the incomplete graph problem. The Gibbs algorithm [30] is data sampling algorithm upon the Monte Carlo method to reduce the amount of data and speed up the calculation. Here we propose a hybrid algorithm, EMG algorithm, which utilizes the advantages of both algorithms.
The pseudo code of the EMG algorithm is shown in Fig 2. The algorithm includes three parts, the first is to generate a sample data set for each entity variable by the Gibbs algorithm. The distribution of the sample data set is similar to the real data set. Second, it obtains the current parameters of each entity from the sample dataset, and uses the current parameters and graph structure to compute the expected value of each entity variable. Finally, it recalculates the parameters of each entity by the expected value of it. It iterates until the estimated parameter reaches the local optimum or reaches the specified number of iterations. • The sample dataset The approximate inference algorithm based on Gibbs sampling is one of the simplest and the most popular methods of data sampling. It uses each node as a variable to conduct a random sampling, and then assigns an initial value to each variable to get an initial state. It computes each node's conditional probability to achieve a new value and state based on the Markov Cover. The above steps repeat until the number of samples reaches a given threshold and the sample data set is obtained. • Parameter learning The weights of each entity in the graph are obtained by the parameter learning of the EMG algorithm. Its parameter learning is similar to the EM parameter learning algorithm.
E-Step. It uses the graph structure and the current parameters to calculate the expected value of missing variables. M-Step. In the M-step by scanning the inferred results from the E-step, the algorithm recalculates the new maximum parameter distribution and replaces the old parameters with new ones. It repeats until the parameters converge, and we have learned the unknown parameters.
• Inference In the E-step (Line 12-16) of EMG algorithm, we call exact inference method, i.e., use the simple Bayesian rule, to compute the values of the hidden entity nodes, for each instance of the observed data. This is actually an inference process.

Case Study and Result
This paper uses the nine public buildings in Hangzhou, China for the case study. Among them there are shopping malls, hotels and office buildings to illustrate the (predicting) methodology. The paper explores the correlations between buildings' electric consumptions and influencing factors including: architectural property, weather, air quality, population and POI data. All the different sorts of data obtained will be further processed.

Data preparation
• The architectural property data In spite of different functions and structures, the buildings possess some common attributes or properties. The property data collected includes building age, number of stories (including ground and underground), and total area (m 2 ) as well as window/wall ratio. • Historical Electric Consumption Data The historical data of daily electric consumption is acquired for these public buildings from January 1, 2015 to January 31 2016, The daily electric consumptions spanned from 0:00 a.m. to 23:59:59 p.m. The electricity unit is KWH (kilowatt hour). In order to verify the prediction, this paper divides the data set into two subsets: data for May, June and July used as the training sets while data for August used as the testing one to validate the model.

•
Weather and Air Quality Data The weather data collected contains the daily average temperature and humidity from January 1, 2015 to November 31, in City of Hangzhou, China. The temperature unit is degree centigrade, and the humidity unit is percentage.
• POI Data POI data contains a number of specific functional facilities such as restaurants, bus stops, etc. The dataset has the name, address, coordinate and other attributes of the functional properties. In our paper six functional facilities (POI) within 200 meters around concerned buildings are included. The six types of POI are office buildings, shopping malls, restaurants, hotels, metro stations and bus stations. The number of bus stations is calculated according to distinguish bus routes. For example, bus line 12 and bus line 39 stop at the same station A then the number of stops at A will be counted as two.
• Pedestrian Volume Data We collect pedestrian volume data within 50 meters around the building from January 1, 2015 to November 31, 2015.
Due to the widespread use of mobile phones, the number of mobile phone users can accurately reflect the changes in pedestrian volume.
• Occupancy Data We collect the average statistical data for each month in 2015. Occupancy data is the ratio of average number of people in a building to the total building accommodation capacity • Time series Data We divide the year into four quarters and use the vector to represent it. For example, the first quarter can be expressed as 1,0,0,0.

Scatter diagram
The following scatting diagrams disclose that the correlation of the occupancy between with pedestrian volume and number of POI. Fig. 3 is the scatter diagram of the three hotel buildings. Fig. 4 is the scatter diagram of the three office buildings. Fig. 5 is the scatter diagram of the three shopping buildings. The occupancy rate of various public buildings is highly correlated with number of POI and pedestrian volume. Therefore, number of POI and occupancy rate are the impact factors of building electric consumption.

Remove noisy factors
Although the relationships between factors and electric consumptions can be represented by scatterplots, to accurately analyze the correlations between factors and public buildings' electric consumptions we use the gray relational analysis method introduced in Section 3.1 to remove the noise factors.
As shown in table 1, we extracted potential fifteen factors from the prepared multisource data in section 4.1. The Grey Correlation analysis is used to determine whether all fifteen factors (X1,…,X15) listed in table 1 have significant impacts on the underlying public buildings' electric consumptions. The grey theory [33] has the advantage of using less data while producing higher accuracy. It has been widely studied and applied since its birth.
The grey correlation degrees between potential 15 factors and buildings' electric consumptions are shown in table 2. According to the grey theory, the correlation degree above 0.5 (the threshold value) will be treated as key influence factors. As shown in table 2, it is interesting to see the correlations of influence factors vary significantly with building types. For instance, Pedestrian volume has the greatest impact on the electricity consumption of a shopping building, while its impact on an office building's one is minimal. The more pedestrians in a shopping building the more consumptions in that building. Nevertheless, during the normal working hours the number of staff members in an office building is relatively stable.

Prediction
Section 4.2 explores the key factors that influence public buildings' electric consumptions. The prediction of electric consumption is realized via probabilistic graphical model for the sake of sorting out the interrelations between the factors and their influences on the buildings' electric consumption. According to the modeling method given in Section 3.2, we construct the corresponding probability graph models for various public buildings. Fig. 6 shows the probabilistic graphical model for three kinds of public buildings.  Based on the obtained probabilistic graph models, the parameter learning, and inference algorithms described in Section 3.2, we are able to predict the electricity consumption of nine public buildings for a given month. Our results are compared and analyzed based on the outcomes yielded by another classic forecasting algorithm: multivariable linear regression model. The prediction results are depicted in Fig. 7 Fig. 7, the consumption patterns predicted by our approach look similar to the actual electric consumptions. It is hard to tell the best forecasting model by visualization. In order to quantify the qualities further, we apply MAPE (Mean Absolute Percentage Error) to evaluate the algorithm.

Error Analysis and Discussion
The mean absolute percentage error (MAPE) indicates the prediction accuracy of a forecasting method. Generally, a lower MAPE interprets better prediction accuracy. Table 3 summarizes the results yielded by our approach and other benchmarked approaches. The average prediction error of our approach is 6.98％. The predictions' errors of other four methods are 7.86%, 7.69%, 8.29% and 14.99%, respectively. However, the error of five methods is within the recommended ASHRAE limits-30% for predictions [31]. TC method: total consumption forecast using the proposed ANN and only using total consumption data. EUs method: total consumption forecast using the proposed method and obtaining the prediction as the aggregation of the different EUs [32]. The urban multi-source based methods proposed in this paper produces the predictions with better accuracy while the datasets needed for the model are relatively easy to access. It is expected that in real applications, our approach would be easier to be deployed.
In addition, our approach extracts crucial factors from total fifteen potential ones and uses grey correlation analysis to conduct predictions. This indicates that reducing unimportant factors mitigates some noisy influence due to loosely-related factors and produces a better prediction accuracy. Using different critical influence factors, we have constructed the special probability graph models for various public buildings to help improving the prediction accuracy. Our approach uses the probabilistic graphical algorithm (EMG) is the combination of the Expectation-Maximization and Gibbs methods. On the average, the predictions of our algorithm produce less error other methods that demonstrates our algorithm's capability of dealing with different types of public buildings.

Conclusion
In this paper, we investigate the influences of various factors on public buildings' electric consumptions. The critical influencing factors ranging from the architectural properties to some spatiotemporal attributes such as POI, pedestrian volume, etc. are collected and studied. This research reveals the profound influence of spatiotemporal data on electric consumptions from a new perspective. Furthermore, integrating various influencing factors in our approach is unique and more efficient comparing to other methods. However, there are some issues to be addressed in the future research. For example, the dataset is still not big enough in terms of time span due to data acquisition restrictions and costs. The number of investigated buildings is relatively small. We will explore more how different forecasting algorithms fare as more data (longer time span, more public buildings, etc.) being collected and provide better insights in selecting suitable prediction methods under different circumstances for urban electric power demands.