Unsupervised machine learning for humanitarian needs assessment and visualization

These details have not been verified by PyPI

Project links

Project description

AidMind

Unsupervised machine learning for humanitarian needs assessment at ANY geographic level

AidMind is a production-ready Python tool that enables humanitarian data analysts to quickly identify areas with the highest need for aid using unsupervised machine learning. Works with provinces, districts, villages, refugee camps, neighborhoods, or any custom geographic units. It automatically clusters geographic units, ranks them by need level, and generates interactive choropleth maps with discrete color-coded need levels.

Fully generalized: Works with any CSV structure and any GeoJSON boundaries.

Features

Works at ANY geographic level: Provinces, districts, villages, refugee camps, neighborhoods, or any custom zones
Completely generalized: Works with ANY CSV structure and ANY column names
Easy to use: Single function call with dataset path
Flexible inputs: Works with any numeric indicators (any column names accepted)
Custom boundaries: Use your own GeoJSON for villages or custom units
Automatic preprocessing: Handles missing values, duplicates, and name variations
Intelligent clustering: Uses KMeans to identify need patterns across indicators
Geographic visualization: Generates interactive HTML maps with 4 discrete need levels (high, medium, low, lowest)
Online or offline: Use GeoBoundaries or custom GeoJSON files
International ready: Works with any country, any admin level (ADM1, ADM2, ADM3, custom)
CSV export: Outputs structured data with need scores, ranks, and levels
Professional logging: Transparent processing with diagnostic information

Installation

Option 1: Pip install (recommended)

pip install aidmind

Option 2: From source

git clone https://github.com/yourorg/aidmind.git
cd aidmind
pip install -r requirements.txt
pip install -e .

Requirements

Python 3.8+
pandas >= 2.0
numpy >= 1.24
scikit-learn >= 1.3
folium >= 0.15
requests >= 2.31
pycountry >= 22.3.5
branca >= 0.7
shapely >= 2.0

Quick Start

Province-level (with GeoBoundaries)

from aidmind import analyze_needs

# Analyze provinces
output = analyze_needs("provinces.csv", "Afghanistan", admin_level="ADM1")
print(f"Map saved to: {output}")

District-level (with GeoBoundaries)

# Analyze districts
output = analyze_needs(
    "districts.csv",
    "Afghanistan",
    admin_level="ADM2",
    admin_col="district"
)

Village-level (with custom boundaries)

# Analyze villages using your own GeoJSON
output = analyze_needs(
    "villages.csv",
    local_geojson="village_boundaries.geojson",
    admin_col="village_name"
)

Any custom geographic unit

# Works with refugee camps, neighborhoods, health zones, etc.
output = analyze_needs(
    "refugee_camps.csv",
    local_geojson="camp_boundaries.geojson",
    admin_col="camp_name",
    fixed_thresholds=(0.25, 0.50, 0.75)  # Optional: fixed thresholds
)

Command line

# Province-level
python -m aidmind provinces.csv "Afghanistan" --admin-level ADM1

# District-level
python -m aidmind districts.csv "Kenya" --admin-level ADM2 --admin-col district

# Village-level with custom boundaries
python -m aidmind villages.csv --geojson villages.geojson --admin-col village_name

See USAGE_EXAMPLES.md for complete documentation with 10+ examples.

Data Requirements

Required

One geographic unit column: Any column with location names (province, district, village, camp, zone, etc.)
At least one numeric indicator: Any metric columns with numeric values

Supported formats

CSV files with UTF-8 encoding
ANY column names: Tool auto-detects geographic column and uses all numeric columns
GeoJSON boundaries: Either from GeoBoundaries or your own custom file

Example: Province-level

province,health_index,education_index,income_index,food_security,water_access
Kabul,0.75,0.80,0.70,0.85,0.78
Kandahar,0.45,0.40,0.50,0.35,0.44
Herat,0.60,0.65,0.55,0.60,0.63

Example: Village-level

village_name,health_access,school_access,water_quality,food_availability
Qala-e-Fatullah,0.30,0.25,0.40,0.35
Deh-e-Bagh,0.45,0.40,0.55,0.50
Karez-e-Mir,0.25,0.20,0.35,0.30

Example: Refugee camps

camp_name,shelter,water,sanitation,food,health
Camp Dadaab 1,0.40,0.35,0.30,0.45,0.50
Camp Kakuma,0.55,0.50,0.45,0.60,0.65
Camp Nyarugusu,0.30,0.25,0.20,0.35,0.40

Handling duplicates

If you have multiple records per unit (e.g., Kabul_1, Kabul_2), the tool automatically:

Strips trailing numeric suffixes
Aggregates by averaging indicators

How It Works

1. Preprocessing

Auto-detects admin column or uses specified admin_col
Aggregates duplicate admin records by averaging
Imputes missing numeric values with median
Standardizes all indicators (zero mean, unit variance)

2. Need Assessment

Computes composite need score (mean of standardized indicators)
Applies KMeans clustering (3-5 clusters depending on data size)
Ranks clusters by mean need score

3. Name Harmonization

Normalizes admin names (lowercase, remove special characters)
Applies fuzzy matching to align with GeoBoundaries names
Logs match rate and coverage improvements

4. Visualization

Fetches admin boundaries from GeoBoundaries (or uses local file)
Assigns discrete color levels based on quartiles or fixed thresholds:
- High (red-700): Top 25% need scores
- Medium (red-400): 50th-75th percentile
- Low (green-300): 25th-50th percentile
- Lowest (green-600): Bottom 25%
Generates interactive Folium map with tooltips

5. Output

HTML map: output/needs_map_<ISO3>.html
CSV scores: output/needs_scores_<ISO3>.csv

Outputs

Interactive HTML Map

Choropleth with 4 discrete color levels
Hover tooltips showing: Province, Need Score, Need Rank, Level
Legend with color key
Highlight on hover

CSV Export

Example needs_scores_AFG.csv:

admin1,need_score,need_rank,cluster,need_level
Kabul,0.142,3,2,lowest
Kandahar,0.856,0,0,high
Herat,0.487,2,1,medium

Advanced Usage

Fixed thresholds for cross-country comparison

# Use consistent cutoffs across all countries
output = analyze_needs(
    "country1.csv",
    "Afghanistan",
    fixed_thresholds=(0.25, 0.50, 0.75)
)

Offline mode with local boundaries

# No internet required after initial download
output = analyze_needs(
    "data.csv",
    "Kenya",
    local_geojson="boundaries/kenya_adm1.geojson"
)

ADM2 (district-level) analysis

output = analyze_needs(
    "district_data.csv",
    "Ethiopia",
    admin_level="ADM2",
    admin_col="district"
)

Troubleshooting

Low match rate warning

Problem: WARNING: Low admin name match rate: 45%

Solution:

Ensure admin names in your dataset match official names in GeoBoundaries
Check for typos, spelling variations, or extra characters
Use official admin names from GeoBoundaries
Or provide a local GeoJSON with matching name properties

No numeric columns found

Problem: ValueError: No numeric feature columns found

Solution:

Ensure at least one column contains numeric values
Check for non-numeric characters in indicator columns
Remove or fix text values in numeric columns

Admin column not detected

Problem: ValueError: Could not detect an admin name column

Solution:

Rename your admin column to: province, admin1, region, or state
Or specify it explicitly: admin_col="your_column_name"

Empty or very small dataset

Problem: WARNING: Dataset has only 2 rows

Solution:

AidMind requires at least 3 rows for clustering
For reliable results, use datasets with 10+ admin units

API Reference

`analyze_needs()`

def analyze_needs(
    dataset_path: str,
    country_name: Optional[str] = None,
    output_html_path: Optional[str] = None,
    *,
    admin_level: Optional[str] = None,
    admin_col: Optional[str] = None,
    local_geojson: Optional[str] = None,
    fixed_thresholds: Optional[Tuple[float, float, float]] = None,
) -> str

Parameters:

dataset_path (str): Path to CSV file with geographic units and indicators
country_name (str, optional): Country name (e.g., "Afghanistan", "Kenya"). Required only if using GeoBoundaries. Can be None if providing local_geojson
output_html_path (str, optional): Custom output path for HTML
admin_level (str, optional): Admin level ("ADM1", "ADM2", "ADM3", or any custom). Only used with GeoBoundaries
admin_col (str, optional): Name of geographic unit column (auto-detected if None)
local_geojson (str, optional): Path to local GeoJSON boundaries. Use this for villages or custom units
fixed_thresholds (tuple, optional): (q25, q50, q75) for color levels

Returns:

str: Path to generated HTML file

Raises:

FileNotFoundError: If dataset or local_geojson not found
ValueError: If invalid inputs, empty dataset, or both country_name and local_geojson missing

Examples:

# Province-level with GeoBoundaries
analyze_needs("provinces.csv", "Afghanistan", admin_level="ADM1")

# District-level with GeoBoundaries
analyze_needs("districts.csv", "Kenya", admin_level="ADM2")

# Village-level with custom boundaries
analyze_needs("villages.csv", local_geojson="villages.geojson")

# Custom zones
analyze_needs("camps.csv", local_geojson="camps.geojson", admin_col="camp_name")

Use Cases

Humanitarian Organizations

Rapid needs assessment: Identify priority areas for intervention
Resource allocation: Visualize where aid is most needed
Monitoring & evaluation: Track changes in need levels over time
Reporting: Generate maps and data exports for donors

Example Organizations

UN agencies (UNHCR, UNICEF, WFP)
International NGOs (MSF, Oxfam, Save the Children)
National disaster management agencies
Research institutions studying humanitarian crises

Best Practices

Data Quality

Use official admin names from GeoBoundaries or national sources
Include multiple indicators (3-5+) for robust assessment
Check for outliers and data quality issues before analysis
Document data sources and collection methodology

Interpretation

Need scores are relative within the dataset (0-1 scale)
Clustering is unsupervised: No ground truth labels used
Combine with qualitative data for complete picture
Validate results with local experts and stakeholders

Production Deployment

Use fixed thresholds for consistent cross-country comparison
Cache boundaries locally for offline or restricted environments
Version control datasets and track changes over time
Automate workflows with CI/CD pipelines

Examples

See examples/ directory for:

basic_usage.ipynb: Step-by-step tutorial
multi_country.py: Batch processing multiple countries
custom_config.py: Advanced configuration options

Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

See CONTRIBUTING.md for detailed guidelines.

License

MIT License - see LICENSE file for details.

Citation

If you use AidMind in your research or reports, please cite:

AidMind: Unsupervised Machine Learning for Humanitarian Needs Assessment
Version 1.0.0
https://github.com/yourorg/aidmind

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: support@aidmind.org

Acknowledgments

GeoBoundaries: For providing open administrative boundary data
Humanitarian Data Exchange: For inspiring accessible data tools
Open-source community: For the amazing libraries this tool builds on

Changelog

See CHANGELOG.md for version history and updates.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Nov 2, 2025

1.0.0

Nov 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aidmind-1.0.1.tar.gz (20.4 kB view details)

Uploaded Nov 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aidmind-1.0.1-py3-none-any.whl (16.0 kB view details)

Uploaded Nov 2, 2025 Python 3

File details

Details for the file aidmind-1.0.1.tar.gz.

File metadata

Download URL: aidmind-1.0.1.tar.gz
Upload date: Nov 2, 2025
Size: 20.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for aidmind-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`79e6e1405150c56ff5488a66b937c599d467c42798d0bd217b09ec6bfc392005`
MD5	`8c1a9c1f260dc77345a1d28063b7fe7b`
BLAKE2b-256	`d70b5047a0d98bdd2b93c9d7deb92ea30770b19164856bd4bb65c794221a212e`

See more details on using hashes here.

File details

Details for the file aidmind-1.0.1-py3-none-any.whl.

File metadata

Download URL: aidmind-1.0.1-py3-none-any.whl
Upload date: Nov 2, 2025
Size: 16.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for aidmind-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a59f5b16290aab7d565c4760d0f0b03cdffeaaf5ac52c34eead4385defa096f4`
MD5	`852a6ef3d6e1bcb07aac1fada1a13ae8`
BLAKE2b-256	`e8918da1116d2b842ab8e5753aef0b335a6205ade278f5f64a5cbbf90a6a8b8f`

See more details on using hashes here.

aidmind 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AidMind

Features

Installation

Option 1: Pip install (recommended)

Option 2: From source

Requirements

Quick Start

Province-level (with GeoBoundaries)

District-level (with GeoBoundaries)

Village-level (with custom boundaries)

Any custom geographic unit

Command line

Data Requirements

Required

Supported formats

Example: Province-level

Example: Village-level

Example: Refugee camps

Handling duplicates

How It Works

1. Preprocessing

2. Need Assessment

3. Name Harmonization

4. Visualization

5. Output

Outputs

Interactive HTML Map

CSV Export

Advanced Usage

Fixed thresholds for cross-country comparison

Offline mode with local boundaries

ADM2 (district-level) analysis

Troubleshooting

Low match rate warning

No numeric columns found

Admin column not detected

Empty or very small dataset

API Reference

analyze_needs()

Use Cases

Humanitarian Organizations

Example Organizations

Best Practices

Data Quality

Interpretation

Production Deployment

Examples

Contributing

License

Citation

Support

Acknowledgments

Changelog

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`analyze_needs()`