A New Model for Building Energy Modeling and Management Using Predictive Analytics: Partitioned Hierarchical Multitask Regression (PHMR)

of


Introduction
In an era marked by escalating energy demands, the imperative to enhance building energy efficiency has become increasingly critical.The Department of Energy's 2023 report underscores this urgency, highlighting that buildings account for a substantial portion of total energy use in the United States [1].This situation presents not just a challenge in resource management but also raises significant environmental and economic concerns.The integration of wireless sensors and IoT technologies has paved new avenues for understanding and controlling building energy usage.When combined with predictive analytics, these technologies show considerable promise in forecasting energy consumption and enabling the fine-tuning of building parameters to minimize waste.
Despite these technological advancements, the development of effective predictive models for building energy consumption is fraught with challenges, primarily due to the dynamic and multifaceted nature of energy usage.While recent studies, such as those by Himeur et al. [2,3], Deng et al. [4], Elnour et al. [5] , and Han et al. [6], have illuminated the potential of AI and big data in revolutionizing building energy management, they also reveal significant gaps in current predictive modeling approaches.These studies highlight the need for models that can effectively navigate the complex interplay between various indoor and outdoor variables affecting energy consumption.
In response to this identified research gap, our study introduces the Partitioned Hierarchical Multitask Regression (PHMR) model.The PHMR model is designed to address the complexities of building energy consumption by integrating recursive partitioning regression (RPR) with multitask learning (hierML).This approach not only enhances the accuracy of energy consumption predictions but also facilitates the nuanced control and adaptation of building parameters.Our model represents a significant advancement from existing RPR models Chan et al. [7], Chaudhuri et al. [8], Landwehr et al. [9], Loh [10], Loh [11], Zeileis et al. [12], Rusch and Zeileis [13], filling a critical gap in the literature and offering a more sophisticated tool for energy management.
The key contributions of this study are manifold.We present the model formulation of PHMR, which bridges recursive partitioning on outdoor variables with hierarchical multitask learning to enhance prediction and control of building energy consumption.The transformation of the hierML algorithm into a convex optimization problem is detailed, ensuring optimal and computationally efficient solutions.Furthermore, the practical application of PHMR in managing a modular house's HVAC system in Spain demonstrates the model's utility in reducing unnecessary energy consumption, aligning with the sustainable goals outlined in recent research by Borràs et al. [14] and Mehdizadeh Khorrami et al. [15].This paper is organized to detail the PHMR model formulation, the algorithm used for model estimation, its comparison against other methods through simulation studies, and its practical application in building energy prediction and management.

Model Formulation for PHMR
The PHMR model consists of two components: (1) a treegrowing process that utilizes the outdoor variables for recursive partitioning and (2) hierML performed at each step of the tree-growing process.In this section (Section 2), we will discuss the hierML model at a fixed step (s-th step) of the tree-growing process.The other steps use hierML in a similar manner.In the following section (Section 3), we will present the algorithm that is used to estimate the model parameters for hierML, which is integrated with recursive partitioning to grow a tree.
The s-th step of the tree-growing process yields a tree T s , consisting of internal and leaf nodes.All nodes of T s , denoted as V s , include the leaf nodes V l,s , which do not have any children in the tree.At each internal node of V, a Z variable is used to partition the samples into left and right branches.The leaf nodes correspond to different subdivisions of the space defined by the outdoor variables, and for each leaf node, there exists a regression model linking the response with indoor variables: . Figure 1 provides an example of T s with leaf nodes V s = 3, 5, 6, 7 .
In previous recursive partitioning regression (RPR) models, including our SPR [16,17], each leaf node's regression model was fitted independently.However, in our proposed hierML model, we incorporate the hierarchical multilevel similarity structure of the leaf nodes to fit the models jointly.According to the principle of recursive partitioning, models at lower levels of leaf nodes should be more similar to each other.For instance, in Figure 1, the model at leaf node 6 should be more similar to the model at 7 than to that at 5 and that at 3 (and in fact, less similar to 3 than to 5).This is because the recursive partitioning process performs easier splits at earlier stages, resulting in more dissimilar branches and easier splits.As the split becomes later, it becomes harder because the resulting branches become more and more similar.
To jointly estimate regression models for the leaf nodes while incorporating the hierarchical multilevel similarity structure, a penalized formulation is proposed as follows: The first part of equation ( 1) is the least-square error loss.For simplicity, the subscript "s" has been removed from all the notations in this section since we are focusing on a single step in the tree-growing process.The second term in equation ( 1) is the hierML penalty, and there are some additional notations that need to be defined before explaining the penalty clearly.For each node in the tree, denoted as v ∈ V, G v represents the set of leaf nodes that grow from node v. Taking T s in Figure 1 as an example, there are seven nodes in the tree.The G v for each node is G 1 = 3, 5, 6, 7 , G 2 = 5, 6, 7 , G 3 = 3 , G 4 = 6, 7 , G 5 = 5 , G 6 = 6 , and G 7 = 7 .Let β j G v contain the set of regression coefficients corresponding to the j-th predictor (i.e., j-th indoor variable) in G v .For example,

and
To jointly estimate the leaf node-wise regression models while incorporating the hierarchical multilevel similarity structure, we propose a penalized formulation inspired by multitask learning using L 1 /L 2 regularization Obozinski et al. [18].Specifically, the set of leaf nodes growing from a given node v is denoted as G v .By considering the G v as distinct tasks, we apply a weighting strategy to L 2 -norm on where w v is a weight to be discussed later.They then put an L 1 -norm outside the weighted L 2 -norm, i.e., ∑ v w v β j G v 2 , to enable the selection of regression coefficients contained in each β j G v as a group.Additionally, they put another L 1 -norm further outside to enable the selection of regression coefficients corresponding to the j-th predictor as a group, i.e., This completes the explanation of the hierML penalty, which 2 Indoor Air is the second term in the formulation.λ in the formulation is a tuning parameter that balances the least-square loss and the proposed penalty [17].
In addition, we will explain how to select the weight w v for each node v.When the tree splits into a left and a right branch at each internal node, the regression models at the two branches should exhibit some similarities because they share the same internal node.However, the models should not be identical, or else the internal node would not have been split.For instance, at the lowest internal node, v = 4 in Figure 1, the tree splits into nodes 6 and 7 as the two branches.The regression coefficients of the j-th predictor in the models of nodes 6 and 7, β j 6 and β j 7 , should be comparable but not precisely identical. where 2 encourages the two coefficients to be selected jointly to account for their similarity, while β j 6 and β j 7 encourage selection separately to account for their difference.g 4 and s 4 are the corresponding weights.Using the definition of β j G v , (2) can be written as Furthermore, we can move up to the next internal node, i.e., v = 2.At node 2, the tree splits into a subtree rooted from node 4 as the left branch and node 5 as the right branch.Following a similar idea to (3), we can write down the penalization on regression coefficients that simultaneously account for the similarity and difference between the two branches, i.e., In a similar way, the penalization associated with the internal node v = 1 is as follows: To generalize the above scheme, we can write the definition of W j v as with g v + s v = 1 for identifiability consideration.Using this definition, we can write the hierML penalty in (1) as By some algebraic operations, it can be demonstrated that w v is associated with g v and s v in the following manner, where v root is the tree's root node: This completes the discussion on designing the weight w v for the proposed hierML penalty.For better illustration, take T s in Figure 1 as an example.The right-hand side of ( 7) can be shown to be: Comparing the above equation with the left-hand side of (7), we can get 3 Indoor Air w 7 = s 1 s 2 s 4 , w 5 = s 1 s 2 , and w 3 = s 1 .It is easy to verify that these weights comply with the formula in (8).
It is important to note that a coefficient can be penalized in multiple groups using the proposed hierML penalty.For example, β j 6 is penalized in four groups according to (9): once by itself as β j 6 and three other times in groups G 1 , G 2 , and G 4 , respectively.These groups have a nested structure, where 6 is a subset of G 4 , G 4 is a subset of G 2 , and G 2 is a subset of G 1 .This is a general property of hierML, where a regression coefficient of each leaf node is penalized in multiple nested groups with each group corresponding to an ancestor of the leaf node.However, the weighting scheme proposed ensures that the weights corresponding to the multiple nested groups for each leaf node sum up to one, which balances the penalization of regression coefficients in all leaf nodes [17].This property is presented mathematically in Proposition 1.
Proposition 1.For any leaf node v l , let Path v l be a set of nodes including all the ancestors of v l and itself.For each v ∈ Path v l , let w v be the weight associated with node v in the hierML penalty in (1).Then, Detailed proofs are available in a separate supplementary document (Supplementary Material, Proposition 1 (available here)) for clarity.
In summary, our proposed hierML model is defined by the optimization in (1) with the weight w v given in (8), which depends on the choice of g v and s v .To determine s v , we suggest the following approach: recall that s v represents the idea that the regression coefficients at each branch of node v should be different, while g v assumes they should be similar.Therefore, a larger value of s v implies that we want the coefficients to be estimated more independently of each other.The proposed approach for choosing the weight s v is to make it proportional to the distance between node v and the bottom level of the tree.This is based on the idea that the farther away a node is from the bottom of the tree, or the closer it is to the root node, the more different the regression coefficients at each branch of v should be.This is because recursive partitioning generally produces easier splits at earlier stages of the partitioning process, resulting in more dissimilar branches.As the split occurs at later stages, the resulting branches become more similar, making the split more difficult.By making s v proportional to the distance between node v and the bottom level of the tree, the proposed approach takes into account this principle of recursive partitioning.
Once the s v for all the nodes are specified, w v can be obtained using (8).Next, we use the example in Figure 2 to explain how the weight for node 6, w 6 , is obtained.Because node 6 is a leaf node, we need to specify s u for all u ∈ Ancestors 6 which include nodes 1, 2, and 4. Node 1 is three levels up from the bottom level, so s 1 = 3.Likewise, we can get s 2 = 2 and s 4 = 1.Normalize these weights to warrant the equality in (10) to hold, we can get s 1 = 1, s 2 = 0 67, and s 4 = 0 33.Then, w 6 = 0 22 using (8).The weights corresponding to other nodes can be obtained in a similar way.Eventually, we can get the hierML penalty term in (1) written as

Algorithm for Model Estimation of PHMR
To solve the optimization problem in (1), our first step is to convert it into an alternative convex optimization following a similar idea proposed by Bach [19] for solving group lasso, i.e., The optimization in ( 12) is convex but with a nonsmooth penalty, which is difficult to solve directly.We relax the penalty term as follows: where Proposition 2. The equality of (13) holds, i.e., The findings of this study align with the theory presented in Supplementary Proposition 1 (Supplementary Material, Proposition 1).Detailed proofs are available in a separate supplementary document (Supplementary Material, Proposition 2) for clarity [17].
Using the results in Proposition 2, we can write the optimization in (12) into an equivalent format with a smooth penalty term, i.e.,

4
Indoor Air 14) can be solved using an iterative algorithm that alternates between solving B and D. Given D, B can be solved analytically, i.e., Given B, d j,v can be obtained using the result in Proposition 2. The tuning parameter λ is selected based on a line search from 0 to Λ, where 0 is the smallest possible value and Λ is the largest value that no further improvement can be made by increasing this value.The optimal λ is chosen by minimizing the mean squared prediction error (MSPE) on the validation set or via a cross-validation scheme.We summarize the above estimation procedure for the hierML model in Algorithm 1.Note that Algorithm 1 provides the model estimation method for hierML at each step of the recursive partitioning process for growing the tree.Next, we present the steps involved in the entire process, which compose the algorithm for constructing the PHMR model.Input to the algorithm includes a training set and a validation set on the indoor variables X, outdoor variable Z, and response variable Y.At each step, the algorithm needs to select an outdoor variable, Z j , to split the samples belonging to a leaf node v s into a left and a right branch (a.k.a. the left child and right child node), Z j ≤ z j and Z j > z j , respectively.z j is the splitting point.To choose the optimal outdoor variable and the associated splitting point, our algorithm goes through each outdoor variable included in the dataset and each candidate splitting point and chooses the ones to be such that the empirical risk reduction evaluated on the validation set is the largest.The empirical risk reduction is computed as follows: where β v s tr contains regression coefficients for leaf node v s estimated using the training set, which has been obtained in the previous step.β Z j ≤z j tr and β Z j >z j tr contain coefficients of the regression models at two child nodes of v s , respectively.The child nodes are obtained by using Z j to split v s according to the splitting point z j .β Z j ≤z j tr and β Z j >z j tr are estimated by the hierML model or two lasso models for computational ease.A commonly used empirical risk function Rval • is the sum of squared prediction errors over all the samples in the validation set.Suppose that the outdoor variable Z j * and associated 5 Indoor Air splitting point z j * are found to be leading to the highest empirical risk reduction, then our algorithm will split the leaf node using Z j * , z j * .This creates two new leaf nodes corresponding to Z j * ≤ z j * and Z j * > z j * , respectively.Algorithm 1 will be used to refit the hierML model for all the leaf nodes.This completes the partitioning at one leaf node.The partitioning will be recursively performed on each newly generated leaf node until no reduction in the empirical risk function can be found and the algorithm stops [17].We summarize the above-described recursive partitioning tree-growing process in Algorithm 2.

Simulation Studies
The efficacy of the Partitioned Hierarchical Multitask Regression (PHMR) model was rigorously evaluated through a series of simulation studies designed to reflect various complexities encountered in real-world data.This section details the data generation process, compares PHMR against existing methods, and provides an in-depth analysis of the results.
4.1.Data Generation.Data for the simulation studies were generated to mimic a real-world scenario with one indoor variable (I), five outdoor variables (Z 1 to Z 5 ), and a response variable (Y).Z 1 and Z 2 were the true partitioning variables, dividing the outdoor variable space into distinct subdivisions, while Z 3 to Z 5 were included as noise.The 75 input variables followed a multivariate normal distribution with a correlation structure in Σ 75×75 set to σ ij = 0 5 i−j , i, j = 1, ⋯, 75.This setup aimed to test the robustness and adaptability of PHMR in a controlled yet complex environment.PHMR's performance was compared against Single Partition Regression (SPR), model-based recursive partitioning (MOB), and Generalized, Unbiased, Interaction Detection and Estimation (GUIDE) [16,17].The selection of these methods provided a diverse range of comparison points, from traditional regression approaches to more modern par-titioning techniques.The tuning parameters for all models were optimized based on the mean squared prediction error (MSPE) on a validation set, ensuring a fair and consistent evaluation framework.

Result Analysis and Model
Performance.The mean squared prediction error (MSPE) of each method on the test set is reported in Table 1.
The results show that PHMR outperforms the other methods significantly, particularly in smaller training sizes.This superior predictive accuracy and efficiency are attributed to its hierarchical structure, facilitating information sharing across nodes and enhancing prediction capability.
The recovery rate of the true tree structure, detailed in Table 2 and visualized in Figure 2(a), further demonstrates PHMR's proficiency in revealing and representing complex data relationships.This was attributed to its hierarchical structure, which allows for information sharing across different nodes and enhances the overall prediction capability.The recovery rate of the true tree structure further demonstrated PHMR's ability to uncover and represent complex data relationships.A nuanced examination of the MSPE and Pearson correlation within each leaf node, as shown in Table 3, illustrates the model's adaptability across varying data segments.Nodes closer to the root exhibited better prediction performances, with smaller MSPEs and larger Pearson correlations, than those nearer the bottom of the tree, like nodes 8 and 9.However, even in these challenging nodes, PHMR achieved high Pearson correlations between the true and predicted responses on test data, underscoring its robustness.
In 100 independent simulations, we assessed the reconstruction frequency of the actual tree structure (Figure 2(a)) by SPR and PHMR, with outcomes and MSPE of fullyreconstructed runs detailed in Table 2. PHMR notably outperformed SPR in replicating the ground-truth structure, primarily because SPR often prematurely terminates node splitting.Further analysis of PHMR's performance across outdoor variable subdivisions (nodes 2, 4, 6, 8, and 9) showed that Input: A set of L leaf nodes, v l 's, and their associated data (training and validation) of indoor variables X v l and response variable y v l .A set of weights w v for each node v of the tree Initialize: Initial value of D as D 0 and start from iteration t = 1.

Iterate for all the values of λ:
for λ ∈ 0, Λ do while t < t max and Algorithm 1: Algorithm for hierML model estimation.6 Indoor Air nodes nearer the root had better predictions with lower MSPE and higher Pearson correlation than those at the bottom, as documented in Table 3.Despite smaller sample sizes in lower nodes, PHMR achieved high Pearson correlations, indicating its robust predictive capability across varied data segments.

Implications.
The simulation studies underscore the robustness and practical applicability of PHMR in predicting complex phenomena.Its ability to accurately capture and represent underlying data structures makes it a valuable tool for various domains.Moreover, the insights gained from these studies provide a solid foundation for further research, suggesting avenues such as extending the hierML model to accommodate nonlinear relationships and developing more efficient algorithms for tree growth in the PHMR approach.These simulation studies provide a comprehensive understanding of PHMR's capabilities and advantages.The model's superior performance, coupled with its methodological sophistication, positions it as a promising approach for tackling intricate predictive tasks in real-world scenarios.

Application of PHMR in Predictive HVAC Management
5.1.Dataset Description and Experimental Settings.This study utilized a dataset from a modular house in Madrid, Spain, collected every 15 minutes over 42 days, resulting in 4,137 samples to predict indoor temperature.The dataset was collected from the SML system, a prototype dwelling equipped with cutting-edge energy-saving features (Bache and Lichman [20] and Zamora-Martinez et al. [21]).Figure 3 4.The data were split into a training set (50%), a validation set (25%), and a test set (25%).PHMR was applied to this dataset, creating a significant tree structure (depicted in Figure 4) that highlighted the importance of variables such as Sun_light, Sun_irradiance, and Outdoor_temp in predicting indoor temperature variations [17].This section elaborates on the dataset's specifics, the variable importance as discovered by PHMR, and the model's implementation details.

Model Performance and Key
Findings.The model's performance was evaluated against Single Partition Regression (SPR), with PHMR demonstrating a 37% improvement in prediction accuracy, evidenced by a lower mean squared prediction error (MSPE) of 4.57 compared to SPR's 7.27.This performance is depicted in a scatter plot (Figure 5) illustrating the relationship between predicted and actual temperatures.Key findings from this application include the crucial role of sunlight-related variables in temperature prediction and the model's ability to fit different outdoor temperature bins, enabling the HVAC system to adjust its activation/deactivation strategy efficiently.This subsection details the performance metrics used, the comparative analysis with other models, and the critical insights gained from applying PHMR to the real-world dataset.

Practical Implications.
The application of PHMR in predictive HVAC management has underscored its substantial   8 Indoor Air potential for enhancing building energy efficiency.By providing accurate indoor temperature forecasts, PHMR allows HVAC systems to operate more strategically, significantly reducing energy consumption while maintaining comfort levels.This achievement not only highlights the practical utility of the model but also its contribution to advancing sustainable energy practices in building management.The robustness and adaptability of PHMR, as demonstrated by its superior performance over conventional models like SPR, suggest its wide applicability and potential to revolutionize various predictive tasks within the energy management domain.
In light of the discussions on the PHMR model's capability, it is pertinent to highlight its adaptability and precision in processing diverse datasets.The model's design allows it to seamlessly accommodate a broad spectrum of variables, including those indicative of occupancy levels, such as CO 2 concentrations.While the current case study's findings showed CO 2 levels in dining and living areas as not significantly influencing the model's predictions relative to other variables, this should not be construed as diminishing the potential relevance of CO 2 in different contexts or datasets.The PHMR model's robust architecture is well-equipped to handle controlled and uncontrolled variables, demonstrating its applicability across a wide range of building types and environmental conditions.This ensures the model's utility in capturing the nuanced dynamics of indoor environments, reaffirming its broad applicability and value in predictive HVAC management and beyond.
As we look to the future, this study opens new avenues for research and development.Enhancing PHMR to accommodate nonlinear relationships could lead to even more precise predictions in complex scenarios, and optimizing the algorithms for tree growth and model selection could improve its computational efficiency and scalability.Furthermore, integrating PHMR with other building management systems could provide a comprehensive approach to intelligent building operations.Collaborative efforts with

9
Indoor Air industry practitioners will be crucial in validating the model's effectiveness and exploring its full potential.Ultimately, this research not only validates PHMR's effectiveness in a real-world scenario but also sets the stage for its broader application in creating energy-efficient and intelligent building systems.

Conclusion
This study introduced the PHMR model, a significant advancement in predictive modeling, tailored to optimize building energy management.PHMR's innovative integration of recursive partitioning for outdoor variables with a hierarchical machine learning (hierML) model for indoor variables at each partitioning node has demonstrated its strength.Notably, the model's ability to incorporate multilevel hierarchical similarity structures into the joint model fitting process has led to improved prediction accuracy and robust performance, outperforming traditional methods in both simulated and real-world datasets.The successful application of PHMR in accurately predicting building energy usage underscores its potential as a powerful tool in the field, opening new avenues for intelligent, data-driven decision-making in building management systems.
6.1.Limitations and Future Directions.Despite its promising capabilities, PHMR comes with limitations that future research should aim to address.The current model relies on linear relationships within the hierarchical structure, potentially limiting its ability to capture more intricate, nonlinear interactions prevalent in complex data.Additionally, the scalability and computational efficiency, particularly concerning larger datasets and real-time applications, require enhancement to make PHMR more accessible and practical for broader usage.Future developments might include extending the hierML model to encompass nonlinear relationships, thereby broadening the model's applicability and accuracy.Enhancements in tree growth algorithms and model selection processes are also critical to improving PHMR's computational efficiency and scalability.These improvements are not just enhancements; they are necessary steps to evolve PHMR into a more comprehensive tool for various predictive modeling applications.

Broader Applications beyond Building Energy Prediction.
Beyond building energy management, the PHMR model holds the potential for impactful applications across diverse industries.In healthcare, leveraging PHMR could lead to breakthroughs in predictive diagnostics and personalized treatment plans, providing a more nuanced understanding of patient data.The financial sector could benefit from the model's predictive accuracy in areas like risk assessment and market trend analysis, making more informed and strategic decisions.Manufacturing and supply chain operations can harness PHMR to predict maintenance needs and optimize production processes, thereby enhancing efficiency and reducing operational costs.Environmental sciences could utilize the model for more accurate climate modeling and effective pollution control strategies, contributing to conservation and sustainability efforts.These various applications not only demonstrate PHMR's versatility but also emphasize its potential to drive significant advancements in numerous fields, making it a valuable tool for researchers and practitioners alike.

Nomenclature
Acronyms PHMR: Partitioned Hierarchical Multitask Regression-a predictive modeling approach used for building energy prediction HVAC: Heating, Ventilation, and Air Conditioning-a technology of indoor environmental comfort hierML: Hierarchical machine learning-a machine learning approach that considers hierarchical structures within data RPR: Recursive partition regression-a regression method that involves partitioning data recursively SPR: Single Partition Regression-a regression method involving a single partition of data MOB: Model-based recursive partitioning-a statistical method for recursive partitioning using modelbased criteria GUIDE: Generalized, Unbiased, Interaction Detection and Estimation-a method for detecting and estimating interactions in nonlinear models MSPE: Mean squared prediction error-a measure of prediction accuracy in statistical models.

I:
Indoor variable-represents an indoor measurement or condition in the study.Z 1 to Z 5 : Outdoor variables-variables representing outdoor conditions, where Z 1 and Z 2 are specifically the true partitioning variables Y: Response variable-typically represents building energy consumption or indoor temperature in the study β v l : Regression coefficients-coefficients for the leaf node v l in the regression model ε v l : Error term-represents the error term for the leaf node v l in the model λ: Regularization parameter-used in the PHMR model to control the complexity of the model σ ij : Correlation matrix elements-elements of the correlation matrix for the input variables.

Mathematical Notations
Σ 75×75 : Correlation matrix-a 75 × 75 matrix representing the correlations between the 75 input variables B: Coefficient vector-a vector of regression coefficients across all leaf nodes in the PHMR model G v : Set of leaf nodes-represents the set of leaf nodes that grow from node v in the tree structure of the model.

Figure 1 :
Figure1: To simplify the presentation, an instance of the tree growth process and its corresponding symbols are displayed, showcasing only two consecutive steps.

Figure 2 :
Figure 2: (a) True tree structure partitioned by Z 1 and Z 2 .(b) Pattern of regression coefficients within each leaf node of the tree.

Figure 3 :
Figure 3: SML system, a prototype house with advanced energy-saving technologies and actuator maps [21].

Figure 5 :
Figure 5: Predicted vs. true response variable (indoor temperature) on the test set by PHMR with Pearson's correlation = 0 8.
Proposition 2 end while Calculate MSPE val λ on validation set using B λ end for Output B λ with the smallest MSPE val λ Output: Regression coefficients' estimation for each leaf node B

Table 1 :
A training set D tr and a validation set D val on the indoor variable X, outdoor variable Z and response variable Y.Initialize:Fit a lasso model at the current node v s using D tr to obtain the current empirical risk Rval βAssume that there are Q outdoor variables, for j = 1 to Q do Split the data of node v s into left and right child node, Z j ≤ z j and Z j > z j , respectively; Fit two lasso models in both child nodes and obtain the coefficients of the regression and associated splitting pointzj * leading to the largest empirical risk reduction Δ Rj * val ; Split the data of node v s into two new leaf nodes, Z j * ≤ z j * and Z j * > z j * , respectively; Apply Algorithm 1 on all leaf nodes to obtain hierML estimation; Re-calculate empirical risk for each leaf node on D val by using new estimates β v l from hierML; Select the leaf node with largest Rval β v l .end if Output: A set of leaf nodes and fitted regression models in each leaf node.Algorithm 2: Algorithm for constructing the PHMR model.MSPE on test data for four methods under two different training sample sizes.
provides a comprehensive depiction of the sensors and actuators employed in the study.This dataset included six indoor variables like CO 2 and relative humidity, and six outdoor variables Input: *

Table 2 :
Recovery rate of the true tree structure and MSPE of fully recovered run sizes.

Table 3 :
MSPE and Pearson's correlation on testing data for each leaf node (mean (std) over 100 simulation runs) size.

Table 4 :
Abbreviation and physical meanings of indoor, outdoor, and response variables in the application case study.