24Fa-Microclimate-Geo-LSTM-Kriging
Urban Weather Generator Deep Learning: LSTM-Kriging Model
Description
This project aims to understand and reproduce data published by the National University of Singapore's (NUS) approach for microclimate prediction, the Geo-LSTM-Kriging Model. This model meshes three key layers to provide accurate microclimate predictions using local weatherstation data. The model takes together LULC data, historical data, and spatial distance information to learn from previous data and apply such learnings to present data. This combination is a novel combination of the strengths of LSTM's time series predictions and Kriging's spatial data dependencies.
This team spent work cleaning up the open-source code from NUS to understand the model's inputs better. Having improved legibility, the adapatability of the model was clear to the team and allowed them to integrate campus-specific data. The team's work this semester was focused to replicate the NUS's model for Georgia Tech campus. The team used real weather data collected on GT campus coupled with Tech's Tree Viewer App to improve the accuracy of results for the campus.
Requirements
- Python
- Pandas
- Numpy
- OSMNX
- Scikit-Learn
- Pytorch
- Pykrige
In addition this program requires that you provide a dataframe for the features you want to measure, as well as gridpoint data for the Kriging Regression. We’ve provided test dataframes that you can find within the “weather_station_5min” and “weather_station_60min” folders.
Repository Structure
This repo contains the aspects needed to replicate the Geo-Kriging LSTM Model. Inside the “python” folder, you will find the necessary notebook files to execute it. The main ones to focus on are “Organized_Model_Eval.ipynb” and “osmnx 2.ipynb”. The latter is used to gather the necessary gridpoint data for the kriging aspect and the former is utilized for execution of the models onto the data.
Installation
Each of the required libraries can be easily installed with pip. For OSMNX, a separate process is required. You can find it at this link. Once installed, run the “osmnx 2.ipynb” with the OX kernel.
Methodology
For the methodology, the team used three wokflows: pre-processing, machine learning (ML) training, & prediction and plotting. Pre-processing involves cleaning up the data and making it usable for the ML training workflow. For the spatial method that the team incorporated into the model, the OpenStreetMap Python library osmnx, and using the library, a map of Georgia Tech's campus was used as a basis. Overlayed on top of the map is system of equally spaced out points forming a grid. For each of the equidistant points, 12 distance vectors were obtained. Theses distances are from each point on the grid to the nearest centers of various variables; these various variables are buildings, libraries, parks, parking, footways, grass, fitness centers, woods, wetlands, trees on campus, and trees in Atlanta off Tech's campus. The tree data was not easily accessible through osmnx, so the team utilized Georgia Tech's Tree Viewer App for the tree location data and distances. In the Geo-LSTM-Kriging model that our team followed, Kriging is a statistical method for spacial interpolation, predicting unknown values at locations that don’t have measured values by using spatial correlations. These spatial correlations, the 12 distance vectors for each x,y point, are based off of relationships between variables. Dr. Brian Stone's weather data obtained from the several weather stations he set up a year ago on Tech's campus was used as an input for pre-processing as well, and it mainly provided information on temperatures, dew point, and relative humidity. This is the data that our group tried to find out and this data will be used as a means for comparison and training the machine learning model. The weather data from Dr. Stone's weather stations were quite messy and required cleaning. The coordinates for each of the weather stations were obtained and separated into folders. Within each of the weather station files, the averages of the data columns, namely the temperatures, dew point, and relative humidities columns, were obtained. This normalization of data to a value of zero to one ensures that during training, there will not be any errors that come up when predicting new values. Each of the normalized values were then appended into a dictionary for machine learning training. Originally, our team tried to train the weather station files in one single large data set, however it was too much for a computer to handle, so splitting it up to dictionaries ensure that it was able to run very smoothly. The outputs of the pre-processing workflow are the CSV file for the grid distances and a CSV for the normalized weather station values.
Using PyTorch, the team set the main target for the prediction as the temperature and main features as relative humidity and dew point. A sequence for the LSTM was created and set as default the 10 time steps. The team did not change the parameter values too much, as the team followed the documentation's recommendations. The hidden size was changed to about 8 layers and 200 epochs, which are how many iterations one wants the program to run through the training and ensures that the program is decently trained and not too overfitted. After training each of the weather station files independently, they are all appended into one single data frame to obtain the average prediction results for each of the stations.
The distance vectors obtained from each of the approximately 1300 different grid points mentioned earlier were used for the actual Geo-Kriging part of the project. The model is based off the random forest regression because random forest typically works well with a lot of different types of parameters, especially if the data turns out to be non linear.
The team finally fit the model with these parameters to obtain a temperature graph displaying the heat distribution of Georgia Tech's campus.
Future Work
The team would like to run further validation studies on the model to ensure its functionality in different climates and with consistent data. As the NUS ran their model using a month's data, given steady annual temperature data, and our weather station data is inconsistent, the model should be continute to be validated before considered for further application.
To enhance the robustness and applicability of the LSTM-Kriging Model for urban weather generation, the following future developments are proposed: 1. Accuracy Validation
A critical next step is to rigorously test the accuracy of the model under various scenarios. Comparative analysis with existing models and observations will help assess the reliability and identify potential areas for refinement, including:
• Assessing the model's predictions across various time scales, such as hourly, daily, and seasonal trends, to identify performance consistency over short- and long-term periods.
• Performing a detailed breakdown of prediction errors to uncover biases or patterns associated with specific weather variables (temperature, humidity, wind speed) and urban contexts.
- Handling Missing Data
Addressing missing data in weather and spatial datasets remains a key challenge. Future work will focus on integrating advanced imputation techniques or machine learning methods to fill data gaps effectively without compromising the model's performance, including:
• Using diffusion models to reconstruct missing data patterns.
- Generalizability Testing
The model will be tested on additional university campuses with contexts similar to Georgia Tech, particularly those featuring comparable weather station setups and spatial characteristics. This will help evaluate the model's adaptability to different urban microclimates and provide insights for broader applications.
Reference
Jintong Han, Adrian Chong, Joie Lim, Savitha Ramasamy, Nyuk Hien Wong, Filip Biljecki, (2024). Microclimate spatio-temporal prediction using deep learning and land use data. Building and Environment. https://doi.org/10.1016/j.buildenv.2024.111358
Presentation
Team
Name | Seniority | Major | Department | GitHub Handle |
---|---|---|---|---|
Sofia Mujica | Junior | Mechanical Engineering | ME | sofia-mujica |
Ze Yu Jiang | Junior | Computer Science | COC | zeyujiang8800 |
Krish Gupta | Sophomore | Civil Engineerin | CEE | krishgupta-CE |
Thanasarn Changnawa | PhD | Architecture | ARCH | Thanasarn-Changnawa |