A Python package for accessing Toronto Open Data Portal
Project description
TorontoOpenData Python Package
Overview
The TorontoOpenData package provides a Python interface to interact with the Toronto Open Data portal. It allows users to list, search, and download datasets, as well as load specific resources.
Installation
To install the package, run:
pip install toronto-open-data
Development Installation
For development and contributing:
git clone https://github.com/alexwolson/toronto-open-data.git
cd toronto-open-data
pip install -e ".[dev]"
make pre-commit # Install pre-commit hooks
Dependencies
pandasrequeststqdmckanapi
Usage
Initialization
Initialize the TorontoOpenData class:
from toronto_open_data import TorontoOpenData
tod = TorontoOpenData()
List All Datasets
List all available datasets:
datasets = tod.list_all_datasets()
Search Datasets
Search datasets by keyword:
search_results = tod.search_datasets('parks')
Download Dataset
Download a specific dataset:
tod.download_dataset('dataset_name')
Load Dataset
Load a specific file from a dataset:
file_path = tod.load('dataset_name', 'file_name.csv', smart_return=False)
Load a specific file, returning an object if supported (default behaviour):
file_object = tod.load('dataset_name', 'file_name.csv', smart_return=True)
Using the Datastore API (New!)
For datasets that support CKAN's datastore, you can query data directly without downloading files:
Basic Datastore Search
# Get type-enforced data directly from the datastore
data = tod.datastore_search('resource-id-here', limit=100)
print(data.dtypes) # Shows proper data types (dates, numbers, etc.)
Filtered Search
# Search with filters and sorting
filtered_data = tod.datastore_search(
'resource-id-here',
filters={'status': 'active', 'year': 2023},
sort='date_created desc',
limit=50
)
Get Resource Metadata
# Get field information and descriptions
info = tod.datastore_info('resource-id-here')
for field in info['fields']:
print(f"{field['id']}: {field.get('type')} - {field.get('info', {}).get('label', 'No description')}")
Custom SQL Queries
# Advanced querying with SQL
data = tod.datastore_search_sql('''
SELECT category, COUNT(*) as count, AVG(value) as avg_value
FROM "resource-id-here"
WHERE status = 'active'
GROUP BY category
ORDER BY count DESC
LIMIT 10
''')
Find Datastore Resources
# Check which resources support datastore
datastore_resources = tod.get_datastore_resources('dataset-name')
for resource in datastore_resources:
print(f"Datastore resource: {resource['name']} (ID: {resource['id']})")
Datastore vs File Download
| Feature | File Download (load()) |
Datastore API |
|---|---|---|
| Data freshness | Static files | Real-time data |
| Type enforcement | Basic pandas inference | CKAN-defined types |
| Filtering | Client-side (after download) | Server-side |
| Metadata | Limited | Rich field descriptions |
| Query flexibility | None | Full SQL support |
| Network usage | Downloads entire file | Only requested data |
Methods
Basic Dataset Operations
list_all_datasets(as_frame=True): List all datasets.search_datasets(query, as_frame=True): Search datasets by keyword.search_resources_by_name(name, as_frame=True): Get dataset by name.download_dataset(name, file_path='./cache/', overwrite=False): Download resource.load(name, filename, file_path='./cache/', reload=False, smart_return=True): Load a file from the dataset.
Datastore API Methods (New!)
datastore_search(resource_id, filters=None, q=None, limit=100, offset=0, fields=None, sort=None, as_frame=True): Search datastore records with type-enforced results and filtering.datastore_info(resource_id): Get metadata about datastore resource fields, types, and descriptions.datastore_search_sql(sql, as_frame=True): Execute SQL queries on datastore resources.get_datastore_resources(name, as_frame=True): Get only datastore-enabled resources for a dataset.
Smart Return File Types
The package supports smart return for the following file types:
- csv
- docx
- gpkg
- geojson
- jpeg
- json
- kml
- sav
- shp
- txt
- xlsm
- xlsx
- xml
- xsd
Development
Running Tests
# Run all tests
make test
# Run tests with coverage
make test-cov
# Run linting checks
make lint
Code Quality
This project uses several tools to maintain code quality:
- Black: Code formatting
- isort: Import sorting
- flake8: Linting
- mypy: Type checking
- pre-commit: Automated checks
Contributing
Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
License
MIT License
Changelog
See CHANGELOG.md for a list of changes and version history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toronto_open_data-0.2.0.tar.gz.
File metadata
- Download URL: toronto_open_data-0.2.0.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
326a0e8d7cf8e6f16cde8f701bfccf6889a5054a66c35ba5f08f75306c8f0a0f
|
|
| MD5 |
999ae3bb2330776f5e30087b3006e282
|
|
| BLAKE2b-256 |
027cfb5d9a27a14a179984befc08c67b607e89731567e66146ef06f4b72df01d
|
File details
Details for the file toronto_open_data-0.2.0-py3-none-any.whl.
File metadata
- Download URL: toronto_open_data-0.2.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7f4d217aec361e47e1d692b448fcc2eb3f56302b515b57a4c26922d02abfa98
|
|
| MD5 |
50bfaaf0c9f1d9c83c89fc6c4503b7a1
|
|
| BLAKE2b-256 |
b98ea20547c1f994088f338092312420e3eb6237fdac4c2159da86b807e40179
|