Active Learning for Forest Cover Classification

2022 · Machine Learning

Machine Learning Active Learning Random Forest

Active Learning for Forest Cover Classification

BITS Pilani | Sep 2022 – Dec 2022

Python Scikit-Learn Active Learning Random Forest UCI Dataset

Project Overview

Implemented active learning framework to efficiently classify forest cover types with minimal labeled data, demonstrating significant labeling cost reduction.

Key Contributions

Active Learning Framework: Implemented uncertainty sampling and query-by-committee strategies on UCI Forest Covertype dataset (581,012 instances, 7 classes)

Efficiency Gains: Achieved 94% classification accuracy using only 70% of training labels compared to passive learning baseline, reducing annotation costs by 30%

Comparative Analysis: Analyzed learning curves and query efficiency, demonstrating uncertainty sampling outperforms random sampling by 12% at 50% label budget

Technologies Used

Languages: Python
Libraries: Scikit-Learn, modAL
Methods: Uncertainty Sampling, Query-by-Committee
Dataset: UCI Forest Covertype (581K instances)

Key Results

Metric	Value
Dataset Size	581,012 instances
Classes	7 forest cover types
Final Accuracy	94%
Label Reduction	30%
Improvement over Random	+12% at 50% budget

Active Learning Workflow

Initialize with small labeled seed set
Train classifier on current labeled data
Score unlabeled instances by uncertainty
Query most informative samples for labeling
Update labeled set and retrain
Repeat until budget exhausted

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Namit Shrivastava

Active Learning for Forest Cover Classification

Project Overview

Key Contributions

Technologies Used

Key Results

Active Learning Workflow

Share on