API Docs

This page documents the modules used to create MVBEP. The current state of the documentation includes only MVBEP`. The remaining modules (i.e. initializer, Transformer, Developer, Interpreter, and Writer) are not yet documented.

mvbep.mvbep module

Measurement and Verification Building Energy Prediction (MVBEP) is a class that encompasses different modules for reading and validating input data to transforming such data and using them to develop regression models for savings estimations in the post-retrofit period.

The class is fitted by using fit_training() which takes in the required input data. Followingly, an initialization summary is produced to check the data sufficiency requirements or the need for any actions to fix the input data. If the data met the requirements to build a model, the function develop_mvbep() is used to transform the data, train, and evaluate regression models. generate_development_summary() function can be used to see the summary of the development process. Finally, savings are estimated passing using post-retrofit data to predict_energy_consumption() function. The current state of the documentation covers only MVBEP class. Future additions to the project includes writing the documentation for the remaining modules (i.e. initializer, Transformer, Developer, Interpreter, and Writer).

Please check the provided Notebooks for the package demonstration.

class mvbep.mvbep.MVBEP(mvbep_state_path: Optional[str] = None)[source]

Bases: object

MVBEP class to perform all steps of building an energy consumption baseline.

The class incorporates the 4 required modules for building a baseline starting from initialization to savings quantification.

Parameters: mvbep_state_path (str, default 'None') – The file path for a saved MVBEP state in case the baseline creation process stopped before the final step and saved by save_mvbep_state() .

Example

In case a object of MVBEP was saved by using save_mvbep_state(), it can be loaded like

>>> mvbep_boulder = MVBEP(mvbep_state_path = 'mvbep_states/office-boulder_mvbep-state')

In case there was no object saved before, an instance of MVBEP is created by

>>> mvbep_boulder = MVBEP()

fit_training(data: DataFrame, frequency: str, country_code: Optional[str] = None, occupancy_schedule: Optional[dict] = None, mismatch_date_threshold: float = 0.3, total_missing=None, max_consec_missing=None, n_days=360)[source]

Fits a MVBEP object with raw data.

This is the first method in developing an energy consumption baseline. The method takes required historical data to prepare them for next processes.

Parameters

data (pd.DataFrame) –
A dataframe that includes the required data which includes at least
- Timestamps in 15-min or hourly intervals
- Energy consumption
- Outdoor dry-bulb temperature
frequency (str, {'15-min', 'hourly'}) – The timestamps intervals frequency.
country_code (str, default to 'None') – A two-letter str indicating the country code in which the building resides. The supported codes are listed in holiday package documentation
occupancy_schedule (dict, default to 'None') – A dict indicating the general occupancy density in the building. [Check the parameter structure ](??)
mismatch_date_threshold (float, default to 0.3) – Sets the threshold for values in timestamp column that cannot be converted from str to datetime object.
total_missing (int, default to 'None') – Sets a threshold for the total number of a feature’s missing observations to meet data sufficiency requirements. The default value is set based on frequency.
max_consec_missing (int, default to 'None') – Sets a threshold for consecutive missing observations in a single feature before the feature is dropped. The default value is set based on frequency.
n_days (int, default to 365) – Sets a threshold for the least number of days in data .

Example

Example of a building located in Boulder, CO, USA with hourly timestamps. The instance of MVBEP was created with a nmae of mvbep_boulder .

>>> mvbep_boulder.fit_training(
...     data = df_boulder_office,
...     frequency = 'hourly',
...     country_code = 'US'
... )

generate_initialization_summary(file_name: Optional[str] = None)[source]

Generates summary of the initialization performed after fit_training() .

The initialization summary is generated as an HTML file with highlights of the initialization process including plots, descriptive data, and data sufficiency result.

Parameters: file_name (str, default to 'None') – Sets the name of the HTML initialization summary. In case no name was provided, the resulting name will be initiation_time + init_sum_ .

Example

Writing the initialization summary of mvbep_boulder after running fit_training() .

>>> mvbep_boulder.generate_initialization_summary(file_name = 'mvbep_summaries/office-boulder_init-summary')

develop_mvbep(modeling_methods: Optional[dict] = None, test_size: float = 0.2, hyperparameter_tuning: bool = False, ranking_method: str = 'min_cvrmse')[source]

Transforms the cleaned data and develops regression models.

Takes the cleaned data after fit_training() and iterates over the possible transformations while using each transformation to generate regression models using the chosen modeling approaches in modeling_methods . With each transformation, outputs such as evaluation metrics and models are saved in the MVBEP object’s state (i.e. attribute mvbep_state ).

Parameters

modeling_methods (dict, default to 'None') –

The chosen modeling approaches to develop the baseline. In case None was passed, the argument is passed by:

>>> default_modeling_methods = {
...     'LR' : True, # TOWT (If the frequency is hourly otherwise it is WLS)
...     'RF' : True, # Random Regression Forest
...     'XGB': True, # Extreme Gradient Boosting
...     'SVR': True, # Support Vector Regressor
...     'SLP': True, # Feed Forward Neural Network
...     'KNN': True  # K-Nearest Neighbor
... }

test_size (float, default to 0.2) – Sets the testing set size out of the input data.
hyperparameter_tuning (bool, defalut to False) –
- If True : the hyperparameter tuning process is performed for any model with hyperparameters to be tuned.
- If False : No hyperparameter tuning process is performed (except for KNN).
ranking_method (str, {'min_cvrmse', 'min_nmbe'}, default to 'min_cvrmse') –
Sets the ranking method to choose the best model based on the testing set evaluation.
- If min_cvrmse : The best model is selected based on Coefficient of Variation of Root Mean Squared Error (CV(RMSE))
- If ‘min_nmbe’: The best model is selected based on Normalized Mean Bias Error (NMBE).

Example

Developing mvbep_boulder after running fit_training() .

>>> mvbep_boulder.develop_mvbep()

generate_development_summary(file_name: Optional[str] = None)[source]

Generates development summary after using develop_mvbep().

Outputs an HTML file that summarizes the development process after running develop_mvbep().

Parameters: file_name (str, default to 'None') – Sets the name of the HTML development summary. In case no name was provided, the resulting name will be initiation_time + dev_sum_.

Example

Writing the initialization summary of mvbep_boulder after running develop_mvbep().

>>> mvbep_boulder.generate_development_summary(file_name = 'mvbep_summaries/office-boulder_dev-summary')

save_mvbep_state(file_name: Optional[str] = None)[source]

Saves the current progress of the MVBEP object by storing mvbep_state.

Parameters: file_name (str, default to 'None') – Sets the name of the Joblib state file. In case no name was provided, the resulting name will be initiation_time + mvbep_state.

Example

Saving the state of either an initiated MVBEP by fit_training() or a developed one by develop_mvbep().

>>> mvbep.save_state('mvbep_states/office-boulder_mvbep-state')

predict_energy_consumption(data: DataFrame, generate_summary: bool = False, file_name: Optional[str] = None, mismatch_date_threshold=0.3, total_missing=None, max_consec_missing=None)[source]

Generates savings quantification summary after using develop_mvbep().

Outputs an HTML file that summarizes the quantification process after running develop_mvbep(). The quantification process requires post-retrofit data that matches the same frequency and features of the data used in initialization when running fit_training(). Features that was dropped in the initialization process are not required in the post-retrofit data. To see which features passed the initialization process, check the output of generate_initialization_summary().

Parameters

data (pd.DataFrame) – The post-retrofit data.
generate_summary (bool, default to False) –
Either generates a summary in an HTML file or return a list of baseline energy consumption. In case the passed data does not meet the requirements, an initialization summary is generated regardless of the passed argument in generate_summary.
- If True: A quantification summary is provided. The function does not return any object.
- If False: A list of baseline energy consumption for the provided post-retrofit period is returend.
file_name (str, default to 'None') – Sets the name of the HTML quantification summary. In case no name was provided, the resulting name will be initiation_time + quant_sum_.
mismatch_date_threshold (float, default to 0.3) – Sets the threshold for values in timestamp column that cannot be converted from str to pd.datetime object.
total_missing (int, default to 'None') – Sets a threshold for the total number of a feature’s missing observations to meet data sufficiency requirements. The value is set based on frequency.
max_consec_missing (int, default to 'None') – Sets a threshold for consecutive missing observations in a single feature before the feature is dropped. The value is set based on frequency.

Example

Writing the quantification summary of mvbep_boulder.

>>> mvbep_boulder.predict_energy_consumption(data = df_boulder_post_retrofit,
...                                         generate_summary = True,
...                                         file_name='mvbep_summaries/office-boulder_dev-summary')