Skip to content

Microclimate Weather Prediction with Deep Learning

Georgia Tech VIP-SMUR Project | Fall 2025
High-resolution microclimate weather prediction combining FusionLSTM deep learning architecture, Regression Kriging, and spatial analysis

Python 3.8+ PyTorch License

Overview

This project combines multiple advanced techniques to predict microclimate weather patterns at Georgia Tech campus with high spatial and temporal resolution. The pipeline integrates FusionLSTM (a TFT-inspired architecture featuring multi-scale LSTM branches with attention) for time series prediction with Regression Kriging to generate campus-wide maps. The model predicts Temperature and Relative Humidity at 100,283 grid points using multi-variate time series and geospatial features.

The model successfully captures extreme weather conditions across Georgia Tech campus. Below are comparison maps showing model predictions vs. actual station observations for four representative scenarios:

Scenario Conditions Visualization
πŸ”₯ Hottest 37.30Β°C avg, 34.56% RH
2016-06-25 16:10
Hottest
❄️ Coldest 8.45Β°C avg, 89.61% RH
2017-05-06 06:30
Coldest
🏜️ Driest 18.13°C avg, 17.95% RH
2015-04-04 18:30
Driest
πŸ’§ Most Humid 16.73Β°C avg, 99.50% RH
2015-04-18 04:40
Most Humid

Each comparison map shows side-by-side temperature (left) and relative humidity (right) predictions with station observations overlaid as square markers.

Why FusionLSTM?

We adapted core TFT components (Gated Residual Networks, Variable Selection, Multi-Head Attention) with three key modifications optimized for microclimate forecasting:

  1. Single-step point prediction - Better Kriging integration, avoids compounding uncertainty
  2. Parallel multi-scale LSTM branches - Explicitly captures both rapid fluctuations and diurnal cycles
  3. Physics-aware features only - No station IDs, enabling generalization to unseen locations

See docs/PROJECT_PLAN.md for detailed design rationale.

Key Features

  • FusionLSTM Model - Multi-scale LSTM with attention for single-step iterative forecasting (+10 minutes ahead)
  • 16-Station Warm-Season Training - April–September data with station holdout validation
  • Physics-Aware Features - Solar angles and time encodings (no station IDs for generalization)
  • High-Resolution Spatial Mapping - 100,283 grid points across campus (~3.5 kmΒ²)
  • Regression Kriging - PyKrige spatial interpolation with 16-station stability
  • Baseline Comparisons - Persistence, Linear Regression, Vanilla LSTM
  • Publication-Ready Visualizations - 300 DPI high-resolution maps
  • HPC Integration - SLURM scripts for Phoenix cluster (GPU/CPU)

Quick Start

Local Machine

Step 1: Installation

Option A: Using UV (Recommended)

# Clone the repository (requires GitHub SSH key for private repo)
git clone git@github.com:VIP-SMUR/25Fa-Microclimate-ML.git
cd 25Fa-Microclimate-ML

# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

# Create environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

Option B: Using pip

# Clone the repository (requires GitHub SSH key for private repo)
git clone git@github.com:VIP-SMUR/25Fa-Microclimate-ML.git
cd 25Fa-Microclimate-ML

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 2: Download Required Data

A. Weather Data (Required for Training)

Download the 16-station weather data with solar angles:

Download from Hugging Face (~112 MB)

Place in: 2_Data/gatech_16_stations_10_min_long_format_sun_angles.csv

B. Pre-trained Model (Required for Inference)

Download the trained model weights from Hugging Face:

Download Model (TFT_weather_model.pth) (1.4 MB)

Or visit: https://huggingface.co/yupengtang/TFT_weather_model

Place in: 3_Training_Model/weights/ or 3_Training_Model/weights_4heads/

Step 3: Run Inference

cd 4_Inference_and_Visualization

# Run inference
python _1_fusionlstm_inference_kriging.py --target_time "2017-06-25 14:00:00"
# β†’ Generates: outputs/FusionLSTM_kriging_predictions_20170625_140000.csv

# Create visualization maps
python _2_mapping_visualization.py
# β†’ Generates: figures/temperature_map_*.png, humidity_map_*.png, comparison_map_*.png

# Analyze LULC feature importance (optional)
python _3_lulc_analysis.py
# β†’ Generates: figures/FusionLSTM_kriging_predictions_*_lulc_importance.png

# Generate extreme scenario maps (automatic scenario detection)
python _4_generate_representative_scenarios.py --device cuda --dpi 300
# β†’ Generates: figures/HOTTEST_*.png, COLDEST_*.png, DRIEST_*.png, MOST_HUMID_*.png

New to HPC? See PACE_PHOENIX_TUTORIAL.md for complete beginner's guide including VPN setup and SSH connection.

Note: This is a private repository. Configure GitHub SSH key first if you haven't already.

# Connect to Phoenix
ssh <your_gt_username>@login-phoenix.pace.gatech.edu

# Navigate to your home directory (recommended for code)
cd $HOME
# Alternative: Use scratch for large temporary files
# cd /storage/scratch1/<your_gt_username>

# Clone repository (requires GitHub access)
git clone git@github.com:VIP-SMUR/25Fa-Microclimate-ML.git
cd 25Fa-Microclimate-ML

# ===== TRAINING =====
cd 3_Training_Model
sbatch phoenix_gpu_train.sbatch  # GPU: A100, 4 attention heads, ~2-4 hours
# OR
sbatch phoenix_cpu_train.sbatch  # CPU: 8 cores, 2 attention heads, ~24-48 hours

# ===== INFERENCE & VISUALIZATION =====
cd ../4_Inference_and_Visualization

# Step 1: Inference (~2 min 10 sec)
sbatch _1_inference.sbatch
# β†’ outputs/FusionLSTM_kriging_predictions_*.csv (31 MB)
# β†’ outputs/FusionLSTM_kriging_predictions_*_models.pkl (169 KB)

# Step 2: Visualization (~1 min 31 sec)
sbatch _2_visualization.sbatch
# β†’ figures/temperature_map_*.png (1.7 MB)
# β†’ figures/humidity_map_*.png (1.6 MB)
# β†’ figures/comparison_map_*.png (2.1 MB)

# Step 3: LULC Analysis (~34 sec, optional)
sbatch _3_lulc.sbatch
# β†’ figures/FusionLSTM_kriging_predictions_*_lulc_importance.png (237 KB)

# Step 4: Generate representative scenario maps (~3 min, recommended for publication)
sbatch _4_representative_scenarios.sbatch
# β†’ figures/HOTTEST_*.png, COLDEST_*.png, DRIEST_*.png, MOST_HUMID_*.png

# Monitor jobs
squeue -u $USER
tail -f logs/inference_*.out

Complete Workflow

cd 4_Inference_and_Visualization

# 1. Run inference with trained model
python _1_fusionlstm_inference_kriging.py --target_time "2017-06-25 14:00:00"
# Output: outputs/FusionLSTM_kriging_predictions_20170625_140000.csv (100,283 points)

# 2. Create high-resolution visualization maps
python _2_mapping_visualization.py --dpi 300
# Output: 
#   - figures/temperature_map_20170625_140000.png
#   - figures/humidity_map_20170625_140000.png
#   - figures/comparison_map_20170625_140000.png

# 3. Analyze spatial feature importance (optional)
python _3_lulc_analysis.py
# Output: figures/FusionLSTM_kriging_predictions_20170625_140000_lulc_importance.png

# 4. Generate extreme scenario maps (automatic scenario detection)
python _4_generate_representative_scenarios.py --device cuda --dpi 300
# Output: figures/HOTTEST_*.png, COLDEST_*.png, DRIEST_*.png, MOST_HUMID_*.png

Option B: Train Your Own Model

# Step 1: Train the model
cd 3_Training_Model

# GPU training (recommended)
python _5_main.py \
    --data_path ../2_Data/gatech_16_stations_10_min_long_format_sun_angles.csv \
    --device cuda \
    --attention_heads 4 \
    --save_dir ./weights_4heads

# OR CPU training
python _5_main.py \
    --data_path ../2_Data/gatech_16_stations_10_min_long_format_sun_angles.csv \
    --device cpu \
    --attention_heads 2 \
    --save_dir ./weights

# Step 2: Run inference with your model
cd ../4_Inference_and_Visualization
python _1_fusionlstm_inference_kriging.py \
    --model_path ../3_Training_Model/weights_4heads/TFT_weather_model.pth \
    --target_time "2017-06-25 14:00:00"

# Step 3: Create visualizations
python _2_mapping_visualization.py

Project Structure

25Fa-Microclimate-ML/
β”‚
β”œβ”€β”€ 1_Urban_Features/          # Geospatial feature extraction
β”‚   β”œβ”€β”€ grid_generator.py          # Generate prediction grid
β”‚   β”œβ”€β”€ get_osm_utils.py            # OpenStreetMap data processing
β”‚   └── ...
β”‚
β”œβ”€β”€ 2_Data/                    # Weather data and features
β”‚   β”œβ”€β”€ gatech_16_stations_10_min_long_format_sun_angles.csv  # Main dataset (~947K rows, 16 stations)
β”‚   β”œβ”€β”€ gatech_gridpoint.csv                                  # Grid data (100,283 points)
β”‚   └── ...
β”‚
β”œβ”€β”€ 3_Training_Model/          # Model training pipeline
β”‚   β”œβ”€β”€ _1_model_architecture.py    # Single-step TFT implementation
β”‚   β”œβ”€β”€ _2_data_preprocessing.py    # Data cleaning & sequences
β”‚   β”œβ”€β”€ _3_model_training.py        # Training pipeline
β”‚   β”œβ”€β”€ _4_model_evaluation.py      # Metrics & visualization
β”‚   β”œβ”€β”€ _5_main.py                  # Main training script (CLI)
β”‚   β”œβ”€β”€ _6_example_usage.py         # Usage examples
β”‚   β”œβ”€β”€ phoenix_gpu_train.sbatch    # SLURM: GPU training (A100)
β”‚   β”œβ”€β”€ phoenix_cpu_train.sbatch    # SLURM: CPU training
β”‚   β”œβ”€β”€ weights/                    # CPU trained models
β”‚   β”œβ”€β”€ weights_4heads/             # GPU trained models
β”‚   β”œβ”€β”€ training_plots/             # Loss curves, LR schedule
β”‚   β”œβ”€β”€ evaluation_plots/           # Predictions, residuals
β”‚   β”œβ”€β”€ logs/                       # SLURM job logs
β”‚   └── README.md                   # Training documentation
β”‚
β”œβ”€β”€ 4_Inference_and_Visualization/  # Inference & visualization
β”‚   β”œβ”€β”€ _1_fusionlstm_inference_kriging.py       # FusionLSTM inference + Kriging
β”‚   β”œβ”€β”€ _2_mapping_visualization.py            # Create spatial maps
β”‚   β”œβ”€β”€ _3_lulc_analysis.py                    # LULC feature importance
β”‚   β”œβ”€β”€ _4_generate_representative_scenarios.py # Auto-generate extreme scenarios
β”‚   β”œβ”€β”€ _1_inference.sbatch                    # SLURM: Inference (A100 GPU)
β”‚   β”œβ”€β”€ _2_visualization.sbatch                # SLURM: Visualization (CPU)
β”‚   β”œβ”€β”€ _3_lulc.sbatch                         # SLURM: LULC analysis (CPU)
β”‚   β”œβ”€β”€ _4_representative_scenarios.sbatch     # SLURM: Generate all scenarios (GPU)
β”‚   β”œβ”€β”€ outputs/                               # Prediction CSV + PKL files
β”‚   β”œβ”€β”€ figures/                               # Generated maps (PNG)
β”‚   β”œβ”€β”€ logs/                                  # SLURM job logs
β”‚   └── README.md                              # Inference documentation
β”‚
β”œβ”€β”€ 5_Optional_Frontend/       # Interactive dashboard (optional)
β”‚   β”œβ”€β”€ streamlit_dashboard.py      # Main dashboard app
β”‚   └── README.md                   # Dashboard documentation
β”‚
β”œβ”€β”€ requirements.txt           # Python dependencies
└── README.md                  # This file

Model Architecture

FusionLSTM: Multi-Scale LSTM with Attention

Core Components: - Variable Selection Networks β†’ Adaptive feature importance learning - Multi-scale LSTM β†’ Parallel short-term (1 layer) + long-term (2 layers) branches - Multi-head Attention β†’ Temporal dependency modeling (2 heads, optimized for efficiency) - Gated Residual Networks β†’ Complex feature interactions with stable gradients - Position Encoding β†’ Learnable temporal embeddings

Input Features (12 dimensions): 1. Weather Observations (2): Temperature (Β°C), Relative Humidity (%) 2. Solar Geometry (4): Altitude/azimuth angles (sin/cos encoding) - Physics-aware features 3. Temporal Features (6): - Hour of day (sin/cos, 24-hour cycle) - Day of year (sin/cos, 365-day cycle) - Minute (sin/cos, 10-minute resolution)

Output Targets (2 dimensions): - Temperature (Β°C) - Relative Humidity (%)

Sequence Configuration: - Historical window: 144 timesteps (24 hours @ 10-minute resolution) - Prediction: Next timestep (+10 minutes) - Training: Warm season only (April–September) for focused heat stress modeling

Spatial Interpolation: Regression Kriging with 16 Stations

Method: Regression Kriging (PyKrige) - Station Coverage: 16 stations provide robust spatial coverage and stable variogram estimation - Variogram: Spherical model with empirical fitting (sill, range, nugget) - Spatial Covariates: Elevation, building density/height, distances to features, shadow ratios - Stability: 16 stations eliminate the need for IDW fallback (previously required with only 3 stations)

Process: 1. FusionLSTM generates predictions at all 16 weather station locations 2. Regression Kriging interpolates to 100,283 grid points using spatial covariates 3. Campus boundary overlay from OpenStreetMap (via OSMnx) 4. Adaptive map boundaries fitted to campus extent

Dataset & Coverage

Weather Stations: 16 stations - Time Range: 2015-2019 (warm season: April–September) - Resolution: 10-minute intervals - Total Records: ~947,000 warm-season observations - File Size: ~112 MB - Variables: Temperature, Relative Humidity, Dew Point, Solar Angles - Download: Hugging Face Dataset (see 2_Data/README.md)

Spatial Grid: 100,283 points - Area: Georgia Tech campus (~3.5 kmΒ²) - Average Spacing: ~5.9 meters - Spatial Features: 24 features including distances, elevation, building metrics, shadow ratios

Station Holdout Protocol: Station 10635579 (61,556 records) held out for testing; remaining 15 stations used for training/validation with 90/10 time-based split

Performance

Model Performance (FusionLSTM Architecture)

Validation Model (15 stations training + 1 held-out for testing):

Variable RMSE MAE MAPE RΒ² Max Error
Temperature 0.43Β°C 0.29Β°C 1.15% 0.9928 6.98Β°C
Relative Humidity 1.28% 0.90% 1.51% 0.9952 22.23%

Generalization Analysis (Test/Train Ratio): - Temperature: 1.057 (excellent - minimal overfitting) - Relative Humidity: 1.074 (excellent - minimal overfitting) - Average: 1.065 (healthy generalization)

Training Summary: - Best Validation Loss: 0.000161 - Total Parameters: 337,742 - Training Time: ~23 minutes (A100 GPU) - Status: Excellent fit with healthy generalization

Feature Importance Analysis

Top Contributing Features (by RMSE impact when removed):

Rank Feature RMSE Change Category
1 RH (Relative Humidity) +2789.47% Weather Observations
2 Temp (Temperature) +371.97% Weather Observations
3 hour_sin +6.98% Temporal (Diurnal Cycle)
4 altitude_sin +6.46% Solar Geometry
5 hour_cos +2.44% Temporal (Diurnal Cycle)
6 azimuth_sin +2.02% Solar Geometry

Key Insights: - Weather observations dominate predictions (expected for autoregressive model) - Solar angles contribute significantly (+2.38% avg) - validates physics-aware design - Temporal features (hour encoding) capture diurnal patterns (+4.71% combined) - Day-of-year encoding captures seasonal variations effectively

Computational Requirements

Training (Phoenix cluster, A100 GPU): - Validation Model (15 stations): ~23 minutes, 32.7 GB RAM - Deployment Model (16 stations): ~26 minutes, 34.0 GB RAM - CPU alternative (8 cores): ~24-48 hours, 64 GB RAM

Inference (single timestamp, 100,283 points): - GPU: ~2 min 10 sec (measured on A100) - CPU: ~10-15 minutes (estimated)

Visualization (100k+ points, DPI 300): - ~1 min 31 sec for 3 maps (measured)

LULC Analysis (permutation importance): - ~34 seconds (measured)

Extreme Scenarios (all 4 scenarios): - ~3 min 4 sec total (measured on A100)

Usage Examples

Example 1: Complete Pipeline (Local)

cd 4_Inference_and_Visualization

# Run inference
python _1_fusionlstm_inference_kriging.py --target_time "2017-06-25 14:00:00" --device cuda

# Create maps with custom settings
python _2_mapping_visualization.py \
    --dpi 300 \
    --marker-size 8 \
    --marker-alpha 0.7

# Analyze LULC feature importance
python _3_lulc_analysis.py

Example 2: HPC Workflow (Phoenix)

# Navigate to project directory
cd $HOME/25Fa-Microclimate-ML/4_Inference_and_Visualization

# Submit all jobs sequentially
sbatch _1_inference.sbatch
# Wait ~2 min 10 sec for inference to complete

sbatch _2_visualization.sbatch  # ~1 min 31 sec
sbatch _3_lulc.sbatch           # ~34 sec

# Optional: Generate extreme scenarios
sbatch _4_representative_scenarios.sbatch  # ~3 min 4 sec

# Check status
squeue -u $USER

# View results
ls -lh outputs/
ls -lh figures/

Example 3: Custom Training

cd 3_Training_Model

python _5_main.py \
    --data_path ../2_Data/gatech_16_stations_10_min_long_format_sun_angles.csv \
    --sequence_length 144 \
    --epochs 200 \
    --patience 20 \
    --hidden_dim 128 \
    --num_layers 2 \
    --dropout 0.3 \
    --attention_heads 4 \
    --weight_decay 1e-3 \
    --learning_rate 0.001 \
    --device cuda \
    --save_dir ./weights_4heads

Example 4: Batch Processing Multiple Times

cd 4_Inference_and_Visualization

# Loop through multiple timestamps
for time in "2017-06-25 14:00:00" "2016-06-25 16:10:00" "2015-04-04 18:30:00"; do
    python _1_fusionlstm_inference_kriging.py --target_time "$time" --device cuda
    python _2_mapping_visualization.py
done

Output Files

Inference Outputs

Directory: 4_Inference_and_Visualization/outputs/

File Size Description
FusionLSTM_kriging_predictions_TIMESTAMP.csv ~31 MB 100,283 points Γ— 26 columns
FusionLSTM_kriging_predictions_TIMESTAMP_models.pkl 169 KB Regression Kriging models (Temperature + RH)

CSV Columns: - Coordinates: latitude, longitude, point_id - Spatial features: 9 LULC features - Predictions: KrigingPrediction_Tem, KrigingPrediction_RH

Visualization Outputs

Directory: 4_Inference_and_Visualization/figures/

File Size Resolution Description
temperature_map_TIMESTAMP.png ~1.7 MB 4800Γ—3600 px Temperature spatial map
humidity_map_TIMESTAMP.png ~1.4 MB 4800Γ—3600 px Humidity spatial map
comparison_map_TIMESTAMP.png ~2.1 MB 7200Γ—3000 px Side-by-side comparison
*_lulc_importance.png 237 KB 4800Γ—1800 px LULC feature importance analysis
HOTTEST_*.png, COLDEST_*.png, etc. ~1-2 MB each 4800Γ—3600 px Extreme scenario maps

Training Outputs

Directory: 3_Training_Model/

  • weights/TFT_weather_model.pth - CPU trained model
  • weights_4heads/TFT_weather_model.pth - GPU trained model
  • training_plots/ - Loss curves, LR schedule
  • evaluation_plots/ - Predictions, residuals, scatter plots
  • logs/ - SLURM job outputs

Dependencies

Core Requirements

# Deep Learning
torch>=2.3.1
numpy>=1.26.4,<2.0.0
pandas==2.2.2
scikit-learn==1.7.2

# Spatial Analysis
pykrige==1.7.0
contextily==1.6.2

# Visualization
matplotlib==3.10.3
seaborn==0.13.2

# Utilities
tqdm==4.67.1

Installation Options

Using UV (Recommended):

uv venv && source .venv/bin/activate
uv pip install -r requirements.txt

Using pip:

pip install -r requirements.txt

Why UV? UV provides 10-100x faster package installation compared to pip, with deterministic dependency resolution for reproducible environments. All SLURM scripts use UV by default for efficient HPC resource utilization.

Package Management with UV

This project uses UV for fast and reliable Python package management.

Installation

One-time setup:

# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

# Make permanent (add to ~/.bashrc)
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc

Usage on HPC (Phoenix Cluster)

All SLURM scripts automatically use UV for dependency installation:

#!/bin/bash
#SBATCH --account=gts-pkastner3

module load anaconda3
export PATH="$HOME/.local/bin:$PATH"

# UV automatically used (10-100x faster than pip)
uv pip install -r requirements.txt --system

python your_script.py

Performance: UV reduces package installation time from ~45 seconds (pip) to ~2 seconds (cached), saving over 1 hour for 100 job submissions.

Local Development

# Create virtual environment
uv venv
source .venv/bin/activate

# Install dependencies
uv pip install -r requirements.txt

UV is a drop-in replacement for pip with identical command syntax and significantly improved performance.

Documentation

Detailed guides for each module:

  • 3_Training_Model/README.md - Complete training guide
  • Single-step architecture details
  • SLURM job configuration
  • Hyperparameter tuning
  • Performance optimization

  • 4_Inference_and_Visualization/README.md - Inference & visualization

  • Inference workflow
  • Kriging vs IDW fallback
  • Visualization settings
  • LULC analysis

  • PACE_PHOENIX_TUTORIAL.md - Complete beginner's guide

  • Step-by-step SSH connection
  • GlobalProtect VPN setup
  • File upload/download
  • Complete workflow examples

GitHub SSH Key Setup (For Private Repo)

This repository is private. Set up SSH key for access:

# On Phoenix (or local machine)
# 1. Generate SSH key
ssh-keygen -t ed25519 -C "your_email@gatech.edu"
# Press Enter for default location, optionally set passphrase

# 2. Display public key
cat ~/.ssh/id_ed25519.pub

# 3. Copy the output (starts with ssh-ed25519)
# 4. Add to GitHub: Settings β†’ SSH and GPG keys β†’ New SSH key
# 5. Paste key and save

# Test connection
ssh -T git@github.com
# Should see: "Hi USERNAME! You've successfully authenticated"

Model Validation & Generalization

Overfitting Analysis

The model demonstrates excellent generalization with minimal overfitting:

Test/Train RMSE Ratios: - Temperature: 1.057 (5.7% increase) - Relative Humidity: 1.074 (7.4% increase) - Average: 1.065 βœ… (healthy - indicates robust generalization)

Interpretation: - Ratio < 1.10: Excellent generalization - Ratio 1.10-1.20: Good generalization - Ratio > 1.20: Potential overfitting

Why This Works: - Strong regularization (dropout=0.3, weight_decay=1e-3) - Early stopping (patience=20) prevents overtraining - Station holdout validation ensures spatial generalization - 24-hour sequence length with proper regularization

Training Stability

Loss Analysis: - Best Validation Loss: 0.000161 - Final Train Loss: 0.000835 - Train/Val Ratio: 5.17 (expected for scaled targets)

Convergence: - Smooth loss curves indicate stable training - Early stopping triggered appropriately - No gradient explosion or vanishing

Tips for Best Results

  1. GPU Acceleration: Use --device cuda for 10x faster training/inference (~23 min vs ~24-48 hours)
  2. Data Quality: Ensure clean, temporally ordered data with proper datetime format
  3. Target Time Selection: Use timestamps within warm season 2015-2019 (e.g., 2017-06-25 14:00:00 for summer afternoon)
  4. Visualization Quality:
  5. DPI 300 for standard quality
  6. DPI 600 for high-quality posters
  7. Adjust marker size/alpha for data density
  8. LULC Analysis:
  9. Works with 16-station Regression Kriging
  10. Random Forest permutation importance shows which spatial features matter most
  11. Visualizes feature importance across temperature and humidity predictions
  12. Model Performance:
  13. Temperature predictions: Sub-0.5Β°C RMSE, >99% RΒ²
  14. Humidity predictions: ~1.3% RMSE, >99% RΒ²
  15. Excellent generalization (test/train ratio ~1.065)
  16. HPC Usage:
  17. Use GPU partition for training (100x faster: 23 min vs 24+ hours)
  18. Use CPU partition for visualization
  19. Monitor job status with squeue -u $USER
  20. Set UV_CACHE_DIR to scratch to avoid disk quota issues

Known Limitations

Data Constraints

  • Temporal Coverage: Warm season only (April–September, 2015-2019)
  • Model optimized for heat stress conditions
  • Not validated for cold season (October–March)

  • Spatial Extent: Georgia Tech campus only (~3.5 kmΒ²)

  • Generalization to other locations requires retraining or transfer learning

Technical Constraints

  • Grid Density: 100,283 points β†’ moderate memory requirements (~2-4 GB)
  • Computational: GPU recommended for training (CPU training 10-20x slower)
  • Real-time: No live data integration yet (historical data only)

Troubleshooting

Common Issues

Q: No prediction files found
A: Run inference first: python _1_fusionlstm_inference_kriging.py or sbatch _1_inference.sbatch

Q: Model file not found
A: Train the model first (sbatch phoenix_gpu_train.sbatch) or download pre-trained weights from Hugging Face

Q: Weather data file not found
A: Download the 16-station dataset from Hugging Face and place as 2_Data/gatech_16_stations_10_min_long_format_sun_angles.csv (see 2_Data/README.md)

Q: CUDA out of memory
A: Use --device cpu or reduce batch size (default: 1024)

Q: ModuleNotFoundError
A: Install dependencies: uv pip install -r requirements.txt or pip install -r requirements.txt

Q: Visualization too slow
A: Reduce DPI (--dpi 150) or use GPU for inference

Q: Training job pending too long
A: Check queue status with squeue -p gpu-a100. A100 partition may have high demand; consider using A40 partition or CPU training

Getting Help

Research Context

This project is part of the Georgia Tech VIP-SMUR (Vertically Integrated Projects - Sustainable Microclimate Urban Research) initiative, focusing on understanding and predicting microclimate variations in urban environments for improved sustainability and resilience.

Use Cases

  • Urban Planning: Heat island effect analysis and mitigation
  • Climate Research: Fine-grained microclimate pattern studies
  • Smart Campus: Weather-aware systems and applications
  • Education: Teaching tool for ML and spatial weather modeling

Architecture Inspiration

The single-step iterative prediction approach demonstrates superior long-term forecasting performance compared to multi-step architectures by maintaining consistent input-output distribution and avoiding temporal distribution gaps.

Future Work

  • Increase weather station coverage (>3 stations) for robust Kriging βœ… Completed: 16 stations
  • Extend temporal coverage to full year (currently warm season 2015-2019)
  • Add more spatial features (vegetation density, sky view factor)
  • Real-time prediction with live weather data integration
  • Multi-location model generalization (transfer learning to other campuses)
  • Integration with IoT sensor networks for real-time validation
  • Ensemble predictions with uncertainty quantification
  • Extend to full-year prediction (cold season: October–March)

Citation

If you use this project in your research, please cite:

@software{gatech_microclimate_2025,
  title = {FusionLSTM: Multi-Scale Fusion LSTM with Attention for Urban Microclimate Prediction},
  author = {Georgia Tech VIP-SMUR Team},
  year = {2025},
  url = {https://github.com/VIP-SMUR/25Fa-Microclimate-ML},
  note = {TFT-inspired architecture with parallel LSTM branches for warm-season microclimate forecasting}
}

License

This project is part of Georgia Tech research. See individual files for specific licensing.

Contributing

This is an active research project. For contributions or questions, please:

  1. Check existing documentation in module READMEs
  2. Review open GitHub Issues
  3. Follow existing code style and structure
  4. Document new features thoroughly

Acknowledgments

  • Georgia Tech VIP Program - Project support and infrastructure
  • PACE Phoenix Cluster - Computational resources
  • PyTorch Team - Deep learning framework
  • OpenStreetMap Contributors - Geospatial data

Project Status: Active Development | Last Updated: November 20, 2024

Pipeline Version: 2.0 (Single-step architecture, SLURM integration, structured outputs)

For detailed usage instructions, see module-specific READMEs: - Training Guide - Inference & Visualization Guide - Phoenix HPC Tutorial

  • Microclimate-ML (Geo-LSTM-Kriging)

Presentation

Team

Name Seniority Major School # Semesters GitHub Handle
Thanasarn Changnawa PhD Architecture ARCH 3 Thanasarn-Changnawa
Han-Syun Shih Masters Architecture (HBP) ARCH 3 Benjaminhansyun
Edzel Sutanto Sophomore Industrial Engineering ISYE 1 Edzelandika
Roshan Cerejo Junior Industrial Engineering ISYE 1 rcerejo
Vikram Renganathan Junior Computer Science SCS 1 viren108
Yupeng Tang Masters Computer Science SCS 2 yupengtang
Ze Yu Jiang Junior Computer Science SCS 4 zeyujiang8800