FAMILY_MULTI_SPECTRAL_INDICES - Experiment Log

Experimental Design

Index Calculation Methodology

Data Pipeline: 1. Load Sentinel-2 zarr store for target time periods 2. Apply BRP parcel masks for consumption potato fields 3. Calculate six vegetation indices using real band values 4. Aggregate to weekly temporal resolution 5. Extract statistical features per index

Index Formulas Using Real Bands:

# NDVI - Normalized Difference Vegetation Index
ndvi = (B08 - B04) / (B08 + B04 + 1e-8)

# EVI - Enhanced Vegetation Index  
evi = 2.5 * (B08 - B04) / (B08 + 6*B04 - 7.5*B02 + 1)

# NDWI - Normalized Difference Water Index
ndwi = (B08 - B11) / (B08 + B11 + 1e-8)

# SAVI - Soil Adjusted Vegetation Index
savi = 1.5 * (B08 - B04) / (B08 + B04 + 0.5)

# GNDVI - Green Normalized Difference Vegetation Index
gndvi = (B08 - B03) / (B08 + B03 + 1e-8)

# MSI - Moisture Stress Index
msi = B11 / (B08 + 1e-8)

Anomaly Detection Strategy

Focus on Extreme Weather Years: - 2018: Spring/summer drought (April-August) - 2022: Record heat and drought - Detection method: Z-score > 2.0 from 5-year rolling mean

Anomaly Features:

# Calculate rolling statistics
rolling_mean = index.rolling(window=260, min_periods=52).mean()
rolling_std = index.rolling(window=260, min_periods=52).std()

# Z-score anomalies
z_score = (index - rolling_mean) / (rolling_std + 1e-8)
anomaly_flag = abs(z_score) > 2.0

# Cumulative stress
cumulative_stress = z_score.clip(lower=0).cumsum()

Feature Engineering Approach

Temporal Features per Index: 1. Current values: Raw index values at t 2. Moving averages: 2, 4, 8, 12 week windows 3. Growth rates: Week-over-week change 4. Volatility: Rolling standard deviation 5. Peak timing: Week of maximum value 6. Stress duration: Consecutive weeks below threshold

Index Interaction Features:

# Vegetation-moisture interaction
veg_moisture = ndvi * ndwi

# Stress indicator combining moisture and vegetation
stress_index = msi * (1 - ndvi)

# Enhanced vegetation accounting for moisture
evi_adjusted = evi * (1 + ndwi)

# Ratio features
ndvi_ndwi_ratio = ndvi / (ndwi + 0.1)
evi_savi_ratio = evi / (savi + 0.1)

Model Ensemble Strategy

Variant A - Core Ensemble: - Random Forest with 200 trees - Features: NDVI, NDWI, EVI (3 indices × 5 features = 15 total) - No hyperparameter tuning (baseline performance)

Variant B - Full Suite: - Gradient Boosting with interaction terms - Features: All 6 indices + ratios + interactions (~50 features) - Hyperparameter optimization via RandomizedSearchCV

Variant C - Anomaly Focus: - LightGBM with custom objective for extreme events - Features: Z-scores, cumulative stress, anomaly flags - Weighted training emphasizing 2018/2022 observations

Validation Methodology

Walk-Forward Cross-Validation:

# Proper time series split
train_end = len(data) - 156  # Reserve 3 years for test
for week in range(104, train_end, 4):  # Weekly steps
    train = data[:week]
    val = data[week:week+12]  # 12-week forecast

    # Fit model on train
    # Predict on validation
    # Store results

Baseline Comparison: All four mandatory baselines from experiments/_shared/baselines.py: 1. Persistent (current = future) 2. Seasonal naive (52-week lag) 3. AR(2) autoregressive 4. Naive (last value)

Expected Results

Hypothesis Support Criteria: - Variant A: >5% improvement (moderate support) - Variant B: >8% improvement (strong support) - Variant C: >10% improvement on extreme years (strong support)

Feature Importance Expected: 1. NDWI during drought periods (highest) 2. MSI cumulative stress 3. NDVI growth rate changes 4. EVI peak timing 5. Index interaction terms

Implementation Notes

Data Access Pattern

import xarray as xr
from src.sources.brp.brp_api.brp import BRPApi
from src.sources.boerderij_nl.boerderij_nl_api import BoerderijApi

# Load satellite data (REAL zarr store)
sentinel = xr.open_zarr("lake_31UFU_medium_polder.zarr")

# Get parcel masks (REAL boundaries)
brp = BRPApi()
mask_2018 = brp.get_consumption_potato_mask(
    bbox=(5.5, 52.5, 5.6, 52.6),
    date_range=(date(2018,1,1), date(2018,12,31)),
    grid_shape=(len(sentinel.y), len(sentinel.x)),
    grid_coords=(sentinel.x.values, sentinel.y.values),
    target_crs='EPSG:32631'
)

# Get price data (REAL market prices)
prices = BoerderijApi().get_data(
    product_id="NL.157.2086",
    date_range=("2015-01-01", "2024-12-31"),
    granularity="weekly"
)

Computational Optimizations

Zarr Chunking: Optimize for temporal access patterns
Dask Integration: Parallel index calculation
Caching: Store computed indices to disk
Batch Processing: Process multiple parcels simultaneously

Quality Controls

Cloud Filtering: Use SCL band, require >80% clear pixels
Outlier Detection: Remove index values >3 std from mean
Gap Filling: Linear interpolation for max 2-week gaps
Validation Split: Ensure no data leakage in time series

Results Tracking

Variant A Results

Date: [PENDING]
MAE: [PENDING]
Improvement: [PENDING]
Verdict: [PENDING]

Variant B Results

Date: [PENDING]
MAE: [PENDING]
Improvement: [PENDING]
Verdict: [PENDING]

Variant C Results

Date: [PENDING]
MAE: [PENDING]
Improvement: [PENDING]
Verdict: [PENDING]

Decision Log

HE Notes

2025-08-23: Created hypothesis based on FAMILY_SATELLITE_NDVI_REAL learnings
Multi-index approach addresses single NDVI weakness
Focus on real extreme weather events (2018, 2022)
All data sources verified as REAL through repository interfaces

EX Notes

[To be added after experiment execution]

Next Actions

Implement index calculation pipeline
Run Variant A as baseline
Optimize Variant B with full feature set
Test Variant C on extreme years specifically
Deploy best performer for operational use

FAMILY_MULTI_SPECTRAL_INDICES - Experiment Log

Experimentnotities

FAMILY_MULTI_SPECTRAL_INDICES - Experiment Log

Experimental Design

Index Calculation Methodology

Anomaly Detection Strategy

Feature Engineering Approach

Model Ensemble Strategy

Validation Methodology

Expected Results

Implementation Notes

Data Access Pattern

Computational Optimizations

Quality Controls

Results Tracking

Variant A Results

Variant B Results

Variant C Results

Decision Log

HE Notes

EX Notes

Next Actions

Geen Codex-samenvatting