Outlier Detection in Yield Maps

Improving sub-field yield predictions through automated data harmonization and cleaning.

Overview

Modern combine harvesters collect real-time, geo-located yield data during harvesting. However, these yield maps often contain measurement errors due to sensor noise, positional inaccuracies, and data transmission delays. Cleaning this data is crucial for training accurate machine learning models for sub-field level yield prediction.

Challenges

Raw yield maps often contain systematic errors, such as:

These challenges introduce noise in machine learning models, reducing their ability to make reliable predictions at the sub-field scale.

Multi-Stage Data Cleaning Pipeline

We developed a comprehensive pipeline for outlier detection and yield data cleaning, which includes:

Data Harmonization

Standardized unit conversions, header translations, and projection transformations across datasets.

Regional Thresholding

Applied agronomic domain knowledge to filter out physically unrealistic yield values.

Statistical Outlier Detection

Used Three-Sigma Rule & Inter-Quartile Range (IQR) methods to detect anomalies in yield values.

Spatio-Temporal Filtering

Applied DBSCAN-based clustering to detect and correct spatial inconsistencies.

Key Findings

Our study found that:

Publication

This research was presented at IGARSS 2023 - IEEE International Geoscience and Remote Sensing Symposium.
📄 Read the Full Paper

Sample Yield Maps

Want to Know More?

Feel free to reach out for details or collaboration opportunities.

Email Me