2022 · Machine Learning

Active Learning for Forest Cover Classification

Machine Learning Active Learning Random Forest

Active Learning for Forest Cover Classification

BITS Pilani | Sep 2022 – Dec 2022

Python Scikit-Learn Active Learning Random Forest UCI Dataset

Project Overview

Implemented active learning framework to efficiently classify forest cover types with minimal labeled data, demonstrating significant labeling cost reduction.

Key Contributions

Active Learning Framework: Implemented uncertainty sampling and query-by-committee strategies on UCI Forest Covertype dataset (581,012 instances, 7 classes)
Efficiency Gains: Achieved 94% classification accuracy using only 70% of training labels compared to passive learning baseline, reducing annotation costs by 30%
Comparative Analysis: Analyzed learning curves and query efficiency, demonstrating uncertainty sampling outperforms random sampling by 12% at 50% label budget

Technologies Used

  • Languages: Python
  • Libraries: Scikit-Learn, modAL
  • Methods: Uncertainty Sampling, Query-by-Committee
  • Dataset: UCI Forest Covertype (581K instances)

Key Results

MetricValue
Dataset Size581,012 instances
Classes7 forest cover types
Final Accuracy94%
Label Reduction30%
Improvement over Random+12% at 50% budget

Active Learning Workflow

1. Initialize with small labeled seed set
2. Train classifier on current labeled data
3. Score unlabeled instances by uncertainty
4. Query most informative samples for labeling
5. Update labeled set and retrain
6. Repeat until budget exhausted