Unsupervised machine learning for humanitarian needs assessment and visualization
Project description
AidMind
Unsupervised machine learning for humanitarian needs assessment at ANY geographic level
AidMind is a production-ready Python tool that enables humanitarian data analysts to quickly identify areas with the highest need for aid using unsupervised machine learning. Works with provinces, districts, villages, refugee camps, neighborhoods, or any custom geographic units. It automatically clusters geographic units, ranks them by need level, and generates interactive choropleth maps with discrete color-coded need levels.
Fully generalized: Works with any CSV structure and any GeoJSON boundaries.
Features
- Works at ANY geographic level: Provinces, districts, villages, refugee camps, neighborhoods, or any custom zones
- Completely generalized: Works with ANY CSV structure and ANY column names
- Easy to use: Single function call with dataset path
- Flexible inputs: Works with any numeric indicators (any column names accepted)
- Custom boundaries: Use your own GeoJSON for villages or custom units
- Automatic preprocessing: Handles missing values, duplicates, and name variations
- Intelligent clustering: Uses KMeans to identify need patterns across indicators
- Geographic visualization: Generates interactive HTML maps with 4 discrete need levels (high, medium, low, lowest)
- Online or offline: Use GeoBoundaries or custom GeoJSON files
- International ready: Works with any country, any admin level (ADM1, ADM2, ADM3, custom)
- CSV export: Outputs structured data with need scores, ranks, and levels
- Professional logging: Transparent processing with diagnostic information
Installation
Option 1: Pip install (recommended)
pip install aidmind
Option 2: From source
git clone https://github.com/yourorg/aidmind.git
cd aidmind
pip install -r requirements.txt
pip install -e .
Requirements
- Python 3.8+
- pandas >= 2.0
- numpy >= 1.24
- scikit-learn >= 1.3
- folium >= 0.15
- requests >= 2.31
- pycountry >= 22.3.5
- branca >= 0.7
- shapely >= 2.0
Quick Start
Province-level (with GeoBoundaries)
from aidmind import analyze_needs
# Analyze provinces
output = analyze_needs("provinces.csv", "Afghanistan", admin_level="ADM1")
print(f"Map saved to: {output}")
District-level (with GeoBoundaries)
# Analyze districts
output = analyze_needs(
"districts.csv",
"Afghanistan",
admin_level="ADM2",
admin_col="district"
)
Village-level (with custom boundaries)
# Analyze villages using your own GeoJSON
output = analyze_needs(
"villages.csv",
local_geojson="village_boundaries.geojson",
admin_col="village_name"
)
Any custom geographic unit
# Works with refugee camps, neighborhoods, health zones, etc.
output = analyze_needs(
"refugee_camps.csv",
local_geojson="camp_boundaries.geojson",
admin_col="camp_name",
fixed_thresholds=(0.25, 0.50, 0.75) # Optional: fixed thresholds
)
Command line
# Province-level
python -m aidmind provinces.csv "Afghanistan" --admin-level ADM1
# District-level
python -m aidmind districts.csv "Kenya" --admin-level ADM2 --admin-col district
# Village-level with custom boundaries
python -m aidmind villages.csv --geojson villages.geojson --admin-col village_name
See USAGE_EXAMPLES.md for complete documentation with 10+ examples.
Data Requirements
Required
- One geographic unit column: Any column with location names (province, district, village, camp, zone, etc.)
- At least one numeric indicator: Any metric columns with numeric values
Supported formats
- CSV files with UTF-8 encoding
- ANY column names: Tool auto-detects geographic column and uses all numeric columns
- GeoJSON boundaries: Either from GeoBoundaries or your own custom file
Example: Province-level
province,health_index,education_index,income_index,food_security,water_access
Kabul,0.75,0.80,0.70,0.85,0.78
Kandahar,0.45,0.40,0.50,0.35,0.44
Herat,0.60,0.65,0.55,0.60,0.63
Example: Village-level
village_name,health_access,school_access,water_quality,food_availability
Qala-e-Fatullah,0.30,0.25,0.40,0.35
Deh-e-Bagh,0.45,0.40,0.55,0.50
Karez-e-Mir,0.25,0.20,0.35,0.30
Example: Refugee camps
camp_name,shelter,water,sanitation,food,health
Camp Dadaab 1,0.40,0.35,0.30,0.45,0.50
Camp Kakuma,0.55,0.50,0.45,0.60,0.65
Camp Nyarugusu,0.30,0.25,0.20,0.35,0.40
Handling duplicates
If you have multiple records per unit (e.g., Kabul_1, Kabul_2), the tool automatically:
- Strips trailing numeric suffixes
- Aggregates by averaging indicators
How It Works
1. Preprocessing
- Auto-detects admin column or uses specified
admin_col - Aggregates duplicate admin records by averaging
- Imputes missing numeric values with median
- Standardizes all indicators (zero mean, unit variance)
2. Need Assessment
- Computes composite need score (mean of standardized indicators)
- Applies KMeans clustering (3-5 clusters depending on data size)
- Ranks clusters by mean need score
3. Name Harmonization
- Normalizes admin names (lowercase, remove special characters)
- Applies fuzzy matching to align with GeoBoundaries names
- Logs match rate and coverage improvements
4. Visualization
- Fetches admin boundaries from GeoBoundaries (or uses local file)
- Assigns discrete color levels based on quartiles or fixed thresholds:
- High (red-700): Top 25% need scores
- Medium (red-400): 50th-75th percentile
- Low (green-300): 25th-50th percentile
- Lowest (green-600): Bottom 25%
- Generates interactive Folium map with tooltips
5. Output
- HTML map:
output/needs_map_<ISO3>.html - CSV scores:
output/needs_scores_<ISO3>.csv
Outputs
Interactive HTML Map
- Choropleth with 4 discrete color levels
- Hover tooltips showing: Province, Need Score, Need Rank, Level
- Legend with color key
- Highlight on hover
CSV Export
Example needs_scores_AFG.csv:
admin1,need_score,need_rank,cluster,need_level
Kabul,0.142,3,2,lowest
Kandahar,0.856,0,0,high
Herat,0.487,2,1,medium
Advanced Usage
Fixed thresholds for cross-country comparison
# Use consistent cutoffs across all countries
output = analyze_needs(
"country1.csv",
"Afghanistan",
fixed_thresholds=(0.25, 0.50, 0.75)
)
Offline mode with local boundaries
# No internet required after initial download
output = analyze_needs(
"data.csv",
"Kenya",
local_geojson="boundaries/kenya_adm1.geojson"
)
ADM2 (district-level) analysis
output = analyze_needs(
"district_data.csv",
"Ethiopia",
admin_level="ADM2",
admin_col="district"
)
Troubleshooting
Low match rate warning
Problem: WARNING: Low admin name match rate: 45%
Solution:
- Ensure admin names in your dataset match official names in GeoBoundaries
- Check for typos, spelling variations, or extra characters
- Use official admin names from GeoBoundaries
- Or provide a local GeoJSON with matching name properties
No numeric columns found
Problem: ValueError: No numeric feature columns found
Solution:
- Ensure at least one column contains numeric values
- Check for non-numeric characters in indicator columns
- Remove or fix text values in numeric columns
Admin column not detected
Problem: ValueError: Could not detect an admin name column
Solution:
- Rename your admin column to:
province,admin1,region, orstate - Or specify it explicitly:
admin_col="your_column_name"
Empty or very small dataset
Problem: WARNING: Dataset has only 2 rows
Solution:
- AidMind requires at least 3 rows for clustering
- For reliable results, use datasets with 10+ admin units
API Reference
analyze_needs()
def analyze_needs(
dataset_path: str,
country_name: Optional[str] = None,
output_html_path: Optional[str] = None,
*,
admin_level: Optional[str] = None,
admin_col: Optional[str] = None,
local_geojson: Optional[str] = None,
fixed_thresholds: Optional[Tuple[float, float, float]] = None,
) -> str
Parameters:
dataset_path(str): Path to CSV file with geographic units and indicatorscountry_name(str, optional): Country name (e.g., "Afghanistan", "Kenya"). Required only if using GeoBoundaries. Can be None if providinglocal_geojsonoutput_html_path(str, optional): Custom output path for HTMLadmin_level(str, optional): Admin level ("ADM1", "ADM2", "ADM3", or any custom). Only used with GeoBoundariesadmin_col(str, optional): Name of geographic unit column (auto-detected if None)local_geojson(str, optional): Path to local GeoJSON boundaries. Use this for villages or custom unitsfixed_thresholds(tuple, optional): (q25, q50, q75) for color levels
Returns:
str: Path to generated HTML file
Raises:
FileNotFoundError: If dataset or local_geojson not foundValueError: If invalid inputs, empty dataset, or both country_name and local_geojson missing
Examples:
# Province-level with GeoBoundaries
analyze_needs("provinces.csv", "Afghanistan", admin_level="ADM1")
# District-level with GeoBoundaries
analyze_needs("districts.csv", "Kenya", admin_level="ADM2")
# Village-level with custom boundaries
analyze_needs("villages.csv", local_geojson="villages.geojson")
# Custom zones
analyze_needs("camps.csv", local_geojson="camps.geojson", admin_col="camp_name")
Use Cases
Humanitarian Organizations
- Rapid needs assessment: Identify priority areas for intervention
- Resource allocation: Visualize where aid is most needed
- Monitoring & evaluation: Track changes in need levels over time
- Reporting: Generate maps and data exports for donors
Example Organizations
- UN agencies (UNHCR, UNICEF, WFP)
- International NGOs (MSF, Oxfam, Save the Children)
- National disaster management agencies
- Research institutions studying humanitarian crises
Best Practices
Data Quality
- Use official admin names from GeoBoundaries or national sources
- Include multiple indicators (3-5+) for robust assessment
- Check for outliers and data quality issues before analysis
- Document data sources and collection methodology
Interpretation
- Need scores are relative within the dataset (0-1 scale)
- Clustering is unsupervised: No ground truth labels used
- Combine with qualitative data for complete picture
- Validate results with local experts and stakeholders
Production Deployment
- Use fixed thresholds for consistent cross-country comparison
- Cache boundaries locally for offline or restricted environments
- Version control datasets and track changes over time
- Automate workflows with CI/CD pipelines
Examples
See examples/ directory for:
basic_usage.ipynb: Step-by-step tutorialmulti_country.py: Batch processing multiple countriescustom_config.py: Advanced configuration options
Contributing
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
See CONTRIBUTING.md for detailed guidelines.
License
MIT License - see LICENSE file for details.
Citation
If you use AidMind in your research or reports, please cite:
AidMind: Unsupervised Machine Learning for Humanitarian Needs Assessment
Version 1.0.0
https://github.com/yourorg/aidmind
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@aidmind.org
Acknowledgments
- GeoBoundaries: For providing open administrative boundary data
- Humanitarian Data Exchange: For inspiring accessible data tools
- Open-source community: For the amazing libraries this tool builds on
Changelog
See CHANGELOG.md for version history and updates.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aidmind-1.0.1.tar.gz.
File metadata
- Download URL: aidmind-1.0.1.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79e6e1405150c56ff5488a66b937c599d467c42798d0bd217b09ec6bfc392005
|
|
| MD5 |
8c1a9c1f260dc77345a1d28063b7fe7b
|
|
| BLAKE2b-256 |
d70b5047a0d98bdd2b93c9d7deb92ea30770b19164856bd4bb65c794221a212e
|
File details
Details for the file aidmind-1.0.1-py3-none-any.whl.
File metadata
- Download URL: aidmind-1.0.1-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a59f5b16290aab7d565c4760d0f0b03cdffeaaf5ac52c34eead4385defa096f4
|
|
| MD5 |
852a6ef3d6e1bcb07aac1fada1a13ae8
|
|
| BLAKE2b-256 |
e8918da1116d2b842ab8e5753aef0b335a6205ade278f5f64a5cbbf90a6a8b8f
|