A Haystack component integrating Overpass API for OpenStreetMap
Project description
OSM Integration Haystack
Haystack component to fetch geographic data via the freely available OpenStreetMap (OSM) Overpass API.
Table of Contents
Installation
pip install osm-integration-haystack
Overview
This repository implements a Haystack component that integrates with OpenStreetMap data through the Overpass API. It allows you to fetch geographic information and convert it into Haystack Documents for use in RAG (Retrieval-Augmented Generation) pipelines.
When you give OSMFetcher a location and radius, it returns a list of nearby points of interest (POIs) as Haystack Documents. It uses the Overpass API to query OpenStreetMap data and converts the results into structured documents with geographic metadata.
Basic Usage
Here's a simple example of how to use the OSMFetcher component:
from osm_integration_haystack import OSMFetcher
# Create an instance of OSMFetcher
osm_fetcher = OSMFetcher(
preset_center=(51.898403, -8.473978), # Cork, Ireland
preset_radius_m=500, # 500m radius
target_osm_types=["node"], # Search nodes
target_osm_tags=["amenity"], # Search amenity types
maximum_query_mb=2, # Limit query size
overpass_timeout=20
)
# Fetch nearby locations
results = osm_fetcher.run()
# Access the documents
documents = results["documents"]
print("Found locations:")
for doc in documents[:5]: # Show first 5
print(f"Name: {doc.meta.get('name', 'Unknown')}")
print(f"Type: {doc.meta.get('category', 'Unknown')}")
print(f"Distance: {doc.meta.get('distance_m', 0):.1f}m")
print(f"Content: {doc.content}")
print("\n")
Haystack Pipeline Integration
You can also integrate OSMFetcher into a complete Haystack pipeline:
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.utils import Secret
from osm_integration_haystack import OSMFetcher
# Create pipeline components
osm_fetcher = OSMFetcher(
preset_center=(51.898403, -8.473978),
preset_radius_m=200,
target_osm_types=["node"],
target_osm_tags=["amenity"],
maximum_query_mb=2,
overpass_timeout=20
)
prompt_builder = PromptBuilder(template="""
You are a geographic information assistant. Based on the provided OpenStreetMap data, help me find the nearest coffee shops.
User location: {{ user_location }}
Search radius: {{ radius }}m
Available location data:
{% for document in documents[:10] %}
- {{ document.content }}
Location: ({{ document.meta.lat }}, {{ document.meta.lon }})
Distance: {{ document.meta.distance_m }}m
Type: {{ document.meta.category }}
{% endfor %}
Please help me find coffee shop related locations and recommend the nearest 3.
""")
llm_generator = OpenAIGenerator(
api_key=Secret.from_env_var("OPENAI_API_KEY"),
model="gpt-4-turbo"
)
# Create and connect pipeline
pipeline = Pipeline()
pipeline.add_component("osm_fetcher", osm_fetcher)
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm_generator", llm_generator)
pipeline.connect("osm_fetcher.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm_generator.prompt")
# Run the pipeline
result = pipeline.run({
"osm_fetcher": {},
"prompt_builder": {
"user_location": "Cork, Ireland (51.898403, -8.473978)",
"radius": 200
}
})
print(result["llm_generator"]["replies"][0])
GeoRadiusFilter
The GeoRadiusFilter component provides additional geographic filtering capabilities for OSM documents. It's designed to help agents decide whether to perform further filtering based on distance criteria.
from osm_integration_haystack import OSMFetcher
from osm_integration_haystack.utils import GeoRadiusFilter
# First, fetch OSM data
osm_fetcher = OSMFetcher(
preset_center=(51.898403, -8.473978),
preset_radius_m=1000, # Large initial radius
target_osm_types=["node"],
target_osm_tags=["amenity"]
)
# Get all nearby locations
results = osm_fetcher.run()
all_documents = results["documents"]
# Then apply additional radius filtering
geo_filter = GeoRadiusFilter(max_radius_m=500) # Limit to 500m
filtered_results = geo_filter.run(
documents=all_documents,
center=(51.898403, -8.473978), # Same center
radius_m=300 # Filter to 300m radius
)
filtered_documents = filtered_results["documents"]
print(f"Filtered from {len(all_documents)} to {len(filtered_documents)} documents")
Use Cases:
- Agent Decision Making: Help AI agents decide whether to apply additional geographic filtering
- Multi-stage Filtering: First fetch a large area, then filter to smaller specific regions
- Dynamic Radius Adjustment: Allow agents to adjust search radius based on initial results
- Distance-based Ranking: Ensure all returned documents are within a specific distance threshold
Configuration Parameters:
max_radius_m (int): Maximum allowed radius in meters (default: 5000)center (Tuple[float, float]): Center coordinates for distance calculationradius_m (int): Target radius for filtering
Features:
- Distance Calculation: Uses Haversine formula for accurate geographic distance
- Automatic Sorting: Returns documents sorted by distance from center
- Validation: Validates coordinate ranges and radius values
- Flexible Input: Works with any list of Haystack Documents containing lat/lon metadata
Agent-Friendly Lite Mode (slim_output)
When OSMFetcher is used as a ComponentTool inside a Haystack Agent, the full document output is serialised as a tool message and forwarded to the LLM. For dense urban areas this can easily exceed the model's context window.
Set slim_output=True to compress each document to only the fields an LLM needs for reasoning (name, category, lat, lon, distance_m, address). The token budget (max_token) still applies on top — if the slim documents collectively exceed the budget, the farthest ones are dropped.
from osm_integration_haystack import OSMFetcher
fetcher = OSMFetcher(
preset_center=(51.898403, -8.473978),
preset_radius_m=500,
target_osm_types=["node"],
target_osm_tags=["amenity", "tourism", "leisure"],
slim_output=True, # compact output safe for Agent tool-calling
max_token=12000, # drop farthest POIs if still over budget
)
results = fetcher.run()
documents = results["documents"]
# Each document now contains only: name, category, lat, lon, distance_m, address
slim_output=False (default) preserves the existing full-metadata behaviour for standard RAG pipelines.
Configuration Parameters
The OSMFetcher component accepts several parameters to customize its behavior:
preset_center (Tuple[float, float], optional): Default center coordinates (latitude, longitude).preset_radius_m (int, optional): Default search radius in meters.target_osm_types (Union[str, List[str]], optional): OSM element types to search ("node", "way", "relation"). Default: ["node", "way", "relation"].target_osm_tags (Union[str, List[str]], optional): OSM tags to filter by (e.g., ["amenity", "shop"]). Default: None (all tags).maximum_query_mb (int, optional): Maximum query size in MB to prevent API timeouts. Default: 5.overpass_timeout (int, optional): Timeout for Overpass API requests in seconds. Default: 25.slim_output (bool, optional): WhenTrue, compresses each returned Document to only the fields essential for LLM reasoning (name,category,lat,lon,distance_m,address) and truncates content to 300 characters. Use in Agent /ComponentToolworkflows to stay within model context limits. Default:False.
Document Structure
Each returned document contains:
content: Human-readable description of the locationmeta: Geographic and OSM metadata including:lat,lon: Coordinatesdistance_m: Distance from search centerosm_id: OSM element IDosm_type: OSM element typename: Location namecategory: Primary categoryaddress: Address information (if available)tags: Additional OSM tags
Examples
Coffee Shop Finder
Find nearby coffee shops and restaurants. You can run the example directly:
# Clone the repository
git clone https://github.com/grexrr/osm-integration-haystack.git
cd osm-integration-haystack
# Install dependencies
pip install -r requirements.txt
# Run the example
python examples/agent_osm_demo.py
The script will prompt you to choose between:
- Full version (requires OpenAI API key) - Uses Haystack pipeline with LLM
- Simplified version (no API key needed) - Direct results display
# Search for coffee shops
coffee_fetcher = OSMFetcher(
preset_center=(51.898403, -8.473978),
preset_radius_m=500,
target_osm_types=["node"],
target_osm_tags=["amenity"],
maximum_query_mb=2
)
results = coffee_fetcher.run()
documents = results["documents"]
# Filter for coffee-related locations
coffee_keywords = ["cafe", "coffee", "restaurant", "bar", "pub", "food"]
coffee_related = []
for doc in documents:
content_lower = doc.content.lower()
category_lower = doc.meta.get("category", "").lower()
if any(keyword in content_lower or keyword in category_lower
for keyword in coffee_keywords):
coffee_related.append(doc)
# Display results
for i, doc in enumerate(coffee_related[:5]):
print(f"{i+1}. {doc.meta.get('name', 'Unknown')}")
print(f" Type: {doc.meta.get('category', 'Unknown')}")
print(f" Distance: {doc.meta.get('distance_m', 0):.1f}m")
API Rate Limitations
The Overpass API has rate limitations to prevent abuse. If you encounter rate limiting:
- Reduce query frequency
- Use smaller search radii
- Limit
maximum_query_mbparameter - Implement retry logic with exponential backoff
For production use, consider using a commercial OSM data provider or hosting your own Overpass instance.
License
osm-integration-haystack is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file osm_integration_haystack-0.1.13.tar.gz.
File metadata
- Download URL: osm_integration_haystack-0.1.13.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8364e624104cc18730d69659609fd9edecf481bf567f12b1ce13702513e0793a
|
|
| MD5 |
0b424841e708d46de56176a8206ee78d
|
|
| BLAKE2b-256 |
b5e8a1d50a311a824780b0c874cd7ee43f0c0412865e1fd2b7afa6ee8ae253cf
|
File details
Details for the file osm_integration_haystack-0.1.13-py3-none-any.whl.
File metadata
- Download URL: osm_integration_haystack-0.1.13-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b5fb5d6d3faf8ffbea3581432ce3df899f9314bfd0c3f104ac1f110d0ede6bb
|
|
| MD5 |
fbd92adda613f3662f4a1fb7a53db213
|
|
| BLAKE2b-256 |
3cedab3f6c971e60bbfbdb168a752d69cc6529ed8ba21fe4081a090591ff60ad
|