Skip to main content

Unsupervised machine learning for humanitarian needs assessment and visualization

Project description

AidMind

Unsupervised machine learning for humanitarian needs assessment at ANY geographic level

AidMind is a production-ready Python tool that enables humanitarian data analysts to quickly identify areas with the highest need for aid using unsupervised machine learning. Works with provinces, districts, villages, refugee camps, neighborhoods, or any custom geographic units. It automatically clusters geographic units, ranks them by need level, and generates interactive choropleth maps with discrete color-coded need levels.

Fully generalized: Works with any CSV structure and any GeoJSON boundaries.


Features

  • Works at ANY geographic level: Provinces, districts, villages, refugee camps, neighborhoods, or any custom zones
  • Completely generalized: Works with ANY CSV structure and ANY column names
  • Easy to use: Single function call with dataset path
  • Flexible inputs: Works with any numeric indicators (any column names accepted)
  • Custom boundaries: Use your own GeoJSON for villages or custom units
  • Automatic preprocessing: Handles missing values, duplicates, and name variations
  • Intelligent clustering: Uses KMeans to identify need patterns across indicators
  • Geographic visualization: Generates interactive HTML maps with 4 discrete need levels (high, medium, low, lowest)
  • Online or offline: Use GeoBoundaries or custom GeoJSON files
  • International ready: Works with any country, any admin level (ADM1, ADM2, ADM3, custom)
  • CSV export: Outputs structured data with need scores, ranks, and levels
  • Professional logging: Transparent processing with diagnostic information

Installation

Option 1: Pip install (recommended)

pip install aidmind

Option 2: From source

git clone https://github.com/yourorg/aidmind.git
cd aidmind
pip install -r requirements.txt
pip install -e .

Requirements

  • Python 3.8+
  • pandas >= 2.0
  • numpy >= 1.24
  • scikit-learn >= 1.3
  • folium >= 0.15
  • requests >= 2.31
  • pycountry >= 22.3.5
  • branca >= 0.7
  • shapely >= 2.0

Quick Start

Province-level (with GeoBoundaries)

from aidmind import analyze_needs

# Analyze provinces
output = analyze_needs("provinces.csv", "Afghanistan", admin_level="ADM1")
print(f"Map saved to: {output}")

District-level (with GeoBoundaries)

# Analyze districts
output = analyze_needs(
    "districts.csv",
    "Afghanistan",
    admin_level="ADM2",
    admin_col="district"
)

Village-level (with custom boundaries)

# Analyze villages using your own GeoJSON
output = analyze_needs(
    "villages.csv",
    local_geojson="village_boundaries.geojson",
    admin_col="village_name"
)

Any custom geographic unit

# Works with refugee camps, neighborhoods, health zones, etc.
output = analyze_needs(
    "refugee_camps.csv",
    local_geojson="camp_boundaries.geojson",
    admin_col="camp_name",
    fixed_thresholds=(0.25, 0.50, 0.75)  # Optional: fixed thresholds
)

Command line

# Province-level
python -m aidmind provinces.csv "Afghanistan" --admin-level ADM1

# District-level
python -m aidmind districts.csv "Kenya" --admin-level ADM2 --admin-col district

# Village-level with custom boundaries
python -m aidmind villages.csv --geojson villages.geojson --admin-col village_name

See USAGE_EXAMPLES.md for complete documentation with 10+ examples.


Data Requirements

Required

  • One geographic unit column: Any column with location names (province, district, village, camp, zone, etc.)
  • At least one numeric indicator: Any metric columns with numeric values

Supported formats

  • CSV files with UTF-8 encoding
  • ANY column names: Tool auto-detects geographic column and uses all numeric columns
  • GeoJSON boundaries: Either from GeoBoundaries or your own custom file

Example: Province-level

province,health_index,education_index,income_index,food_security,water_access
Kabul,0.75,0.80,0.70,0.85,0.78
Kandahar,0.45,0.40,0.50,0.35,0.44
Herat,0.60,0.65,0.55,0.60,0.63

Example: Village-level

village_name,health_access,school_access,water_quality,food_availability
Qala-e-Fatullah,0.30,0.25,0.40,0.35
Deh-e-Bagh,0.45,0.40,0.55,0.50
Karez-e-Mir,0.25,0.20,0.35,0.30

Example: Refugee camps

camp_name,shelter,water,sanitation,food,health
Camp Dadaab 1,0.40,0.35,0.30,0.45,0.50
Camp Kakuma,0.55,0.50,0.45,0.60,0.65
Camp Nyarugusu,0.30,0.25,0.20,0.35,0.40

Handling duplicates

If you have multiple records per unit (e.g., Kabul_1, Kabul_2), the tool automatically:

  • Strips trailing numeric suffixes
  • Aggregates by averaging indicators

How It Works

1. Preprocessing

  • Auto-detects admin column or uses specified admin_col
  • Aggregates duplicate admin records by averaging
  • Imputes missing numeric values with median
  • Standardizes all indicators (zero mean, unit variance)

2. Need Assessment

  • Computes composite need score (mean of standardized indicators)
  • Applies KMeans clustering (3-5 clusters depending on data size)
  • Ranks clusters by mean need score

3. Name Harmonization

  • Normalizes admin names (lowercase, remove special characters)
  • Applies fuzzy matching to align with GeoBoundaries names
  • Logs match rate and coverage improvements

4. Visualization

  • Fetches admin boundaries from GeoBoundaries (or uses local file)
  • Assigns discrete color levels based on quartiles or fixed thresholds:
    • High (red-700): Top 25% need scores
    • Medium (red-400): 50th-75th percentile
    • Low (green-300): 25th-50th percentile
    • Lowest (green-600): Bottom 25%
  • Generates interactive Folium map with tooltips

5. Output

  • HTML map: output/needs_map_<ISO3>.html
  • CSV scores: output/needs_scores_<ISO3>.csv

Outputs

Interactive HTML Map

  • Choropleth with 4 discrete color levels
  • Hover tooltips showing: Province, Need Score, Need Rank, Level
  • Legend with color key
  • Highlight on hover

CSV Export

Example needs_scores_AFG.csv:

admin1,need_score,need_rank,cluster,need_level
Kabul,0.142,3,2,lowest
Kandahar,0.856,0,0,high
Herat,0.487,2,1,medium

Advanced Usage

Fixed thresholds for cross-country comparison

# Use consistent cutoffs across all countries
output = analyze_needs(
    "country1.csv",
    "Afghanistan",
    fixed_thresholds=(0.25, 0.50, 0.75)
)

Offline mode with local boundaries

# No internet required after initial download
output = analyze_needs(
    "data.csv",
    "Kenya",
    local_geojson="boundaries/kenya_adm1.geojson"
)

ADM2 (district-level) analysis

output = analyze_needs(
    "district_data.csv",
    "Ethiopia",
    admin_level="ADM2",
    admin_col="district"
)

Troubleshooting

Low match rate warning

Problem: WARNING: Low admin name match rate: 45%

Solution:

  • Ensure admin names in your dataset match official names in GeoBoundaries
  • Check for typos, spelling variations, or extra characters
  • Use official admin names from GeoBoundaries
  • Or provide a local GeoJSON with matching name properties

No numeric columns found

Problem: ValueError: No numeric feature columns found

Solution:

  • Ensure at least one column contains numeric values
  • Check for non-numeric characters in indicator columns
  • Remove or fix text values in numeric columns

Admin column not detected

Problem: ValueError: Could not detect an admin name column

Solution:

  • Rename your admin column to: province, admin1, region, or state
  • Or specify it explicitly: admin_col="your_column_name"

Empty or very small dataset

Problem: WARNING: Dataset has only 2 rows

Solution:

  • AidMind requires at least 3 rows for clustering
  • For reliable results, use datasets with 10+ admin units

API Reference

analyze_needs()

def analyze_needs(
    dataset_path: str,
    country_name: Optional[str] = None,
    output_html_path: Optional[str] = None,
    *,
    admin_level: Optional[str] = None,
    admin_col: Optional[str] = None,
    local_geojson: Optional[str] = None,
    fixed_thresholds: Optional[Tuple[float, float, float]] = None,
) -> str

Parameters:

  • dataset_path (str): Path to CSV file with geographic units and indicators
  • country_name (str, optional): Country name (e.g., "Afghanistan", "Kenya"). Required only if using GeoBoundaries. Can be None if providing local_geojson
  • output_html_path (str, optional): Custom output path for HTML
  • admin_level (str, optional): Admin level ("ADM1", "ADM2", "ADM3", or any custom). Only used with GeoBoundaries
  • admin_col (str, optional): Name of geographic unit column (auto-detected if None)
  • local_geojson (str, optional): Path to local GeoJSON boundaries. Use this for villages or custom units
  • fixed_thresholds (tuple, optional): (q25, q50, q75) for color levels

Returns:

  • str: Path to generated HTML file

Raises:

  • FileNotFoundError: If dataset or local_geojson not found
  • ValueError: If invalid inputs, empty dataset, or both country_name and local_geojson missing

Examples:

# Province-level with GeoBoundaries
analyze_needs("provinces.csv", "Afghanistan", admin_level="ADM1")

# District-level with GeoBoundaries
analyze_needs("districts.csv", "Kenya", admin_level="ADM2")

# Village-level with custom boundaries
analyze_needs("villages.csv", local_geojson="villages.geojson")

# Custom zones
analyze_needs("camps.csv", local_geojson="camps.geojson", admin_col="camp_name")

Use Cases

Humanitarian Organizations

  • Rapid needs assessment: Identify priority areas for intervention
  • Resource allocation: Visualize where aid is most needed
  • Monitoring & evaluation: Track changes in need levels over time
  • Reporting: Generate maps and data exports for donors

Example Organizations

  • UN agencies (UNHCR, UNICEF, WFP)
  • International NGOs (MSF, Oxfam, Save the Children)
  • National disaster management agencies
  • Research institutions studying humanitarian crises

Best Practices

Data Quality

  1. Use official admin names from GeoBoundaries or national sources
  2. Include multiple indicators (3-5+) for robust assessment
  3. Check for outliers and data quality issues before analysis
  4. Document data sources and collection methodology

Interpretation

  1. Need scores are relative within the dataset (0-1 scale)
  2. Clustering is unsupervised: No ground truth labels used
  3. Combine with qualitative data for complete picture
  4. Validate results with local experts and stakeholders

Production Deployment

  1. Use fixed thresholds for consistent cross-country comparison
  2. Cache boundaries locally for offline or restricted environments
  3. Version control datasets and track changes over time
  4. Automate workflows with CI/CD pipelines

Examples

See examples/ directory for:

  • basic_usage.ipynb: Step-by-step tutorial
  • multi_country.py: Batch processing multiple countries
  • custom_config.py: Advanced configuration options

Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

See CONTRIBUTING.md for detailed guidelines.


License

MIT License - see LICENSE file for details.


Citation

If you use AidMind in your research or reports, please cite:

AidMind: Unsupervised Machine Learning for Humanitarian Needs Assessment
Version 1.0.0
https://github.com/yourorg/aidmind

Support


Acknowledgments

  • GeoBoundaries: For providing open administrative boundary data
  • Humanitarian Data Exchange: For inspiring accessible data tools
  • Open-source community: For the amazing libraries this tool builds on

Changelog

See CHANGELOG.md for version history and updates.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aidmind-1.0.1.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aidmind-1.0.1-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file aidmind-1.0.1.tar.gz.

File metadata

  • Download URL: aidmind-1.0.1.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for aidmind-1.0.1.tar.gz
Algorithm Hash digest
SHA256 79e6e1405150c56ff5488a66b937c599d467c42798d0bd217b09ec6bfc392005
MD5 8c1a9c1f260dc77345a1d28063b7fe7b
BLAKE2b-256 d70b5047a0d98bdd2b93c9d7deb92ea30770b19164856bd4bb65c794221a212e

See more details on using hashes here.

File details

Details for the file aidmind-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: aidmind-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for aidmind-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a59f5b16290aab7d565c4760d0f0b03cdffeaaf5ac52c34eead4385defa096f4
MD5 852a6ef3d6e1bcb07aac1fada1a13ae8
BLAKE2b-256 e8918da1116d2b842ab8e5753aef0b335a6205ade278f5f64a5cbbf90a6a8b8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page