2022 · Machine Learning
Active Learning for Forest Cover Classification
Machine Learning Active Learning Random Forest
Active Learning for Forest Cover Classification
BITS Pilani | Sep 2022 – Dec 2022
Python Scikit-Learn Active Learning Random Forest UCI Dataset
Project Overview
Implemented active learning framework to efficiently classify forest cover types with minimal labeled data, demonstrating significant labeling cost reduction.
Key Contributions
Active Learning Framework: Implemented uncertainty sampling and query-by-committee strategies on UCI Forest Covertype dataset (581,012 instances, 7 classes)
Efficiency Gains: Achieved 94% classification accuracy using only 70% of training labels compared to passive learning baseline, reducing annotation costs by 30%
Comparative Analysis: Analyzed learning curves and query efficiency, demonstrating uncertainty sampling outperforms random sampling by 12% at 50% label budget
Technologies Used
- Languages: Python
- Libraries: Scikit-Learn, modAL
- Methods: Uncertainty Sampling, Query-by-Committee
- Dataset: UCI Forest Covertype (581K instances)
Key Results
| Metric | Value |
|---|---|
| Dataset Size | 581,012 instances |
| Classes | 7 forest cover types |
| Final Accuracy | 94% |
| Label Reduction | 30% |
| Improvement over Random | +12% at 50% budget |
Active Learning Workflow
1. Initialize with small labeled seed set
2. Train classifier on current labeled data
3. Score unlabeled instances by uncertainty
4. Query most informative samples for labeling
5. Update labeled set and retrain
6. Repeat until budget exhausted
