In this project we train a predictive model on Supervisory Control and Data Acquisition (SCADA) data captured from a physical wind turbine. SCADA systems are used for controlling, monitoring, and analyzing industrial devices and processes. The SCADA concept was developed to be a universal means of remote-access to a variety of local control modules, which could be from different manufacturers and allowing access through standard automation protocols.
Here we demonstrate how we can train a machine learning model using a freely available SCADA dataset, which comes from Kaggle
The samples in this dataset are distributed as a .CSV file with the following attributes:
- Date/Time --- timestamp of the observation (10 minutes intervals)
- LV ActivePower (kW) --- The amount of power generated by the turbine at that timestamp (in kWh)
- Wind Speed (m/s) --- The wind speed as measured at the hub height of the turbine
- Theoretical_Power_Curve (KWh) --- The theoretical power values that the turbine generates with that wind speed as provided by the turbine manufacturer
- Wind Direction (degrees) --- The wind direction at the hub height of the turbine (the turbine turns in this direction automaticaly)
This project contains the following assets
WindTurbineScada.ipynb
--- a notebok demonstrating data ingestion, exploratory data analysis, model building and evaluationtrain.py
--- a model training script, which can be run as a Domino job to retrain the model (i.e. if new data is available)score.py
--- a scoring function, which can be deployed as a Domino Model APImodel.bin
--- a pickled version of a pre-trainedExtraTreesRegressor
modeldata/T1.csv
--- the original dataset
This project can be run with the default 5.6 Domino Standard Environment Py3.9 R4.2.3 Compute Environment.
No additional customisation is needed and all required Python packages are provided in the accompanying requirements.txt
file.