Comparative Study of Statistical, Ensemble, and Deep Learning Models for Wind Power Forecasting Using Multivariate Time Series Data
This project presents a comparative analysis of five machine learning models for wind power prediction using multivariate time series data. It evaluates LSTM, Random Forest, SVR, SARIMA, and XGBoost on real-world turbine data using comprehensive preprocessing, feature engineering, and model tuning.
Accurate wind power forecasting is critical for integrating renewable energy into modern grids. This project benchmarks traditional and modern ML approaches under the same pipeline to identify the most efficient and reliable method.
- LSTM: Captures temporal dependencies in meteorological sequences.
- Random Forest (RF): Robust ensemble method; handles feature interactions well.
- Support Vector Regression (SVR): Effective with non-linear regression under high-dimensional settings.
- SARIMA: Captures seasonality and trend; interpretable statistical model.
- XGBoost: Gradient boosting model known for high accuracy and scalability.
- Source: Kaggle – Wind Turbine Power Forecasting
- Duration: Jan 2018 – Mar 2020 (10-min intervals)
- Attributes: DateTime, Wind Speed, Direction, Temperature, Pressure, Humidity, Active Power, etc.
- Size: 118,225 rows × 22 attributes
- Handled missing and erroneous power values (e.g., negative power set to 0)
- Normalization and outlier filtering
- Derived lagged features to improve temporal modeling
- Train/Val/Test split: 70% / 20% / 10%
- Models evaluated on last 15 days of the dataset
| Model | R² Score | MAE | RMSE | MAPE |
|---|---|---|---|---|
| LSTM | Highest | Low | Low | Low |
| RF | High | Low | Low | Low |
| XGBoost | High | Low | Low | Low |
| SVR | Moderate | Moderate | Moderate | Moderate |
| SARIMA | Low | High | High | High |
Winner: LSTM slightly outperforms others in capturing long-term dependencies, followed by XGBoost and Random Forest.
├── data/
│ └── Turbine_Data.csv
├── Wind_Power_Prediction.ipynb
├── README.md
└── LICENSE
- LSTM excels with temporal patterns and meteorological inputs.
- XGBoost and RF handle non-linearities and anomalies well.
- SARIMA, while interpretable, struggles with complex modern datasets.
- Feature engineering and preprocessing are crucial for model performance.
Fig. 1: Predicted vs Actual Wind Power
This project is licensed under the MIT License. See the LICENSE file for more details.
