A python package project for doing analysis on coordinates and clustering them.
Project description
Hedron
A Python package for geolocation clustering analysis using geohash-based spatial indexing.
Overview
Hedron provides tools for analyzing and clustering coordinate data based on geographic proximity. It uses geohash encoding to efficiently group locations and identify patterns such as:
- Spatial clusters of coordinates
- Colocation events (multiple IDs at the same location)
- Temporal colocation (multiple IDs at the same location on the same day)
Features
- Geohash-based clustering: Efficient spatial grouping using configurable precision
- Colocation detection: Identify when multiple entities appear at the same location
- Temporal analysis: Find colocation events on specific days
- Visualization: Built-in map plotting and heat map generation with OpenStreetMap tiles
- Pandas integration: Works seamlessly with pandas DataFrames
- Type hints: Full type annotation support for better IDE integration
- Export capabilities: Save clusters to Excel (with separate sheets) or CSV
- Comprehensive examples: 5 Jupyter notebooks with working visualizations
Installation
From PyPI (recommended)
pip install hedron
Note: For visualization features to work, Pillow version 9.x is required (automatically installed). If you have Pillow 10+ already installed, the maps may not render correctly.
From source
git clone https://github.com/eddiethedean/hedron.git
cd hedron
pip install -e .
Development installation
pip install -e ".[dev]"
pre-commit install
Examples
Check out the examples/ directory for complete Jupyter notebooks with working code:
- 01_basic_clustering.ipynb - Introduction to clustering
- 02_colocation_analysis.ipynb - Finding when entities meet
- 03_temporal_analysis.ipynb - Day-based colocation detection
- 04_advanced_usage.ipynb - Performance and large datasets
- 05_visualization_and_maps.ipynb - Map generation with examples
All notebooks include executed outputs and are ready to run!
Quick Start
import pandas as pd
import hedron as hd
# Create a DataFrame with coordinate data
data = pd.DataFrame({
'ID': ['a', 'b', 'c', 'd', 'e', 'f'],
'Date': pd.to_datetime([
'Dec 6, 2019 2:27:45 PM',
'Dec 6, 2019 2:27:45 PM',
'Dec 8, 2019 2:27:45 PM',
'Dec 8, 2019 2:27:45 PM',
'Dec 10, 2019 2:27:45 PM',
'Dec 11, 2019 2:27:45 PM'
]),
'Latitude': [29.4259671, 29.42525, 29.4237056, 29.423606, 29.4239835, 29.4239835],
'Longitude': [-98.4861419, -98.4860167, -98.4868973, -98.4860462, -98.4851705, -98.4851705]
})
# Create a Cluster object
cluster = hd.Cluster(
data,
lat_header='Latitude',
lon_header='Longitude',
date_time_header='Date',
id_header='ID'
)
# Create spatial clusters (precision 7 ≈ 150m accuracy)
clusters = cluster.make_clusters(digits=7)
print(f"Found {len(clusters)} clusters")
# Get only colocations (multiple IDs in same location)
colocations = clusters.colocation_clusters()
print(f"Found {len(colocations)} colocation clusters")
# Visualize clusters on a map
image = cluster.plot(size=(800, 500))
image.save('cluster_map.png') # Save as PNG
# or display in Jupyter: display(image)
# Create a heat map (shows density: blue=low, red=high)
heat_map = clusters.plot_heat(precision=6)
heat_map.save('heatmap.png')
# Export to Excel (each cluster = one sheet)
clusters.to_xlsx('analysis_results.xlsx')
Geohash Precision Guide
The digits parameter controls clustering granularity:
| Digits | Lat/Lon Accuracy | Approximate Size |
|---|---|---|
| 4 | ±2.4 km | Large neighborhood |
| 5 | ±610 m | Neighborhood |
| 6 | ±76 m | Street block |
| 7 | ±19 m | Building |
| 8 | ±2.4 m | Room |
API Reference
Cluster Class
The main class for working with coordinate data.
cluster = Cluster(
df, # pandas DataFrame
lat_header, # Latitude column name
lon_header, # Longitude column name
date_time_header, # Datetime column name
id_header, # ID column name
colors=None # Optional color mapping for IDs
)
Methods
make_clusters(digits)- Create spatial clusters by geohashcolocation_clusters(digits)- Create clusters with multiple IDsday_colocation_cluster()- Get colocations on same dayday_colocation_clusters()- Get separate clusters per dayplot(size=(800, 500))- Visualize cluster on a map
Properties
lats- Latitude values as Serieslons- Longitude values as Seriesids- ID values as Seriesdates- Datetime values as Seriesdays- Date values as Series
SuperCluster Class
Container for multiple Cluster objects.
super_cluster = SuperCluster(
clusters_dict, # Dict of cluster name to Cluster
colors=None # Optional color mapping
)
Methods
plot()- Visualize all clusters on a mapplot_heat(precision)- Create a heat mapcolocation_clusters()- Filter to only colocation clustersday_colocation_clusters()- Get day colocations for each clustermerge()- Combine all clusters into a single Clusterto_xlsx(filename)- Save each cluster to Excel sheet
Properties
lats- All latitude valueslons- All longitude valuesids- All ID values
Visualization Examples
Hedron includes powerful visualization capabilities:
# Single cluster map
map_image = cluster.plot(size=(800, 500))
map_image.save('locations.png')
# SuperCluster with geohash boundaries
cluster_map = clusters.plot()
# Density heat map (blue → green → yellow → orange → red)
heat_map = clusters.plot_heat(precision=6)
See examples/05_visualization_and_maps.ipynb for working map examples with outputs!
Use Cases
Finding Meeting Points
Identify locations where multiple people/devices appear together:
# Find all locations where 2+ IDs appear
colocations = cluster.colocation_clusters(digits=7)
# Further refine to same-day meetings
meetings = colocations.day_colocation_clusters()
Analyzing Movement Patterns
Cluster GPS tracks to find frequently visited areas:
# Group all coordinates into neighborhood-sized clusters
areas = cluster.make_clusters(digits=5)
# Export to Excel for analysis
areas.to_xlsx('movement_patterns.xlsx')
Heat Map Generation
Visualize activity density across a geographic area:
clusters = cluster.make_clusters(digits=6)
heat_map = clusters.plot_heat(precision=5)
heat_map.save('activity_heatmap.png')
Advanced Usage
Working with Large Datasets
For large datasets, consider:
- Filtering data by time windows before clustering
- Using lower precision (fewer digits) for initial analysis
- Leveraging the built-in caching for repeated analyses
# Filter to specific time period
recent = data[data['Date'] > '2019-12-01']
cluster = hd.Cluster(recent, 'Latitude', 'Longitude', 'Date', 'ID')
# Start with broad clusters
broad = cluster.make_clusters(digits=5)
# Refine specific areas of interest
for name, cluster in broad.items():
detailed = cluster.make_clusters(digits=7)
print(f"Area {name}: {len(detailed)} detailed clusters")
Custom Visualization
import staticmaps
# Create custom colored markers
colors = {
'a': staticmaps.RED,
'b': staticmaps.BLUE,
'c': staticmaps.GREEN
}
cluster = hd.Cluster(data, 'Latitude', 'Longitude', 'Date', 'ID', colors=colors)
image = cluster.plot()
Development
Running Tests
# Run full test suite
pytest
# With coverage report
pytest --cov=hedron --cov-report=html
Code Quality
# Format code
black hedron tests
# Lint
ruff check hedron tests
# Type check
mypy hedron --ignore-missing-imports
# Run all checks
pytest && ruff check hedron tests && mypy hedron --ignore-missing-imports
Pre-commit Hooks
pre-commit install
pre-commit run --all-files
Exploring Examples
# Launch Jupyter with examples
cd examples
jupyter notebook
# Or open a specific notebook
jupyter notebook 05_visualization_and_maps.ipynb
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
- Run the test suite and linters
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Requirements
- Python 3.9+
- pandas >= 1.5.0
- pygeodesy >= 21.6.9
- py-staticmaps >= 0.4.0
- range-key-dict >= 1.1.0
- Pillow >= 9.0.0, < 10.0.0 (for visualization compatibility)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Uses pygeodesy for geohash encoding
- Map rendering via py-staticmaps
Citation
If you use Hedron in your research, please cite:
@software{hedron,
author = {Matthews, Odos},
title = {Hedron: Geolocation Clustering Analysis},
year = {2021},
url = {https://github.com/eddiethedean/hedron}
}
Support
- Issues: GitHub Issues
- Email: odosmatthews@gmail.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hedron-0.0.6.tar.gz.
File metadata
- Download URL: hedron-0.0.6.tar.gz
- Upload date:
- Size: 43.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bc61f6fb54672ca42c7af5a4f1ab949fcf6bf96069d723facd232e058e2f00a
|
|
| MD5 |
dbd437d4cbc0246f4c3b2e9e7eb638bd
|
|
| BLAKE2b-256 |
5e6234f10ecb1105397f89f9b544ab5f46e684dbd167a6c3ae336ff6ad7f8b6e
|
File details
Details for the file hedron-0.0.6-py3-none-any.whl.
File metadata
- Download URL: hedron-0.0.6-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0784741c0f1ca2385f7dde790132d4cc80c66e7429ba5d9728ce803d430a0e84
|
|
| MD5 |
b55756b901825062fc1e50366db4c1f5
|
|
| BLAKE2b-256 |
ecff753fee00dd73edc6a5b44b48a4c25fb696fa5b60a22bf042290a7a73e88f
|