Skip to main content

A Python package for accessing Toronto Open Data Portal

Project description

TorontoOpenData Python Package

Overview

The TorontoOpenData package provides a Python interface to interact with the Toronto Open Data portal. It allows users to list, search, and download datasets, as well as load specific resources.

Installation

To install the package, run:

pip install toronto-open-data

Development Installation

For development and contributing:

git clone https://github.com/alexwolson/toronto-open-data.git
cd toronto-open-data
pip install -e ".[dev]"
make pre-commit  # Install pre-commit hooks

Dependencies

  • pandas
  • requests
  • tqdm
  • ckanapi

Usage

Initialization

Initialize the TorontoOpenData class:

from toronto_open_data import TorontoOpenData

tod = TorontoOpenData()

List All Datasets

List all available datasets:

datasets = tod.list_all_datasets()

Search Datasets

Search datasets by keyword:

search_results = tod.search_datasets('parks')

Download Dataset

Download a specific dataset:

tod.download_dataset('dataset_name')

Load Dataset

Load a specific file from a dataset:

file_path = tod.load('dataset_name', 'file_name.csv', smart_return=False)

Load a specific file, returning an object if supported (default behaviour):

file_object = tod.load('dataset_name', 'file_name.csv', smart_return=True)

Using the Datastore API (New!)

For datasets that support CKAN's datastore, you can query data directly without downloading files:

Basic Datastore Search

# Get type-enforced data directly from the datastore
data = tod.datastore_search('resource-id-here', limit=100)
print(data.dtypes)  # Shows proper data types (dates, numbers, etc.)

Filtered Search

# Search with filters and sorting
filtered_data = tod.datastore_search(
    'resource-id-here',
    filters={'status': 'active', 'year': 2023},
    sort='date_created desc',
    limit=50
)

Get Resource Metadata

# Get field information and descriptions
info = tod.datastore_info('resource-id-here')
for field in info['fields']:
    print(f"{field['id']}: {field.get('type')} - {field.get('info', {}).get('label', 'No description')}")

Custom SQL Queries

# Advanced querying with SQL
data = tod.datastore_search_sql('''
    SELECT category, COUNT(*) as count, AVG(value) as avg_value
    FROM "resource-id-here"
    WHERE status = 'active'
    GROUP BY category
    ORDER BY count DESC
    LIMIT 10
''')

Find Datastore Resources

# Check which resources support datastore
datastore_resources = tod.get_datastore_resources('dataset-name')
for resource in datastore_resources:
    print(f"Datastore resource: {resource['name']} (ID: {resource['id']})")

Datastore vs File Download

Feature File Download (load()) Datastore API
Data freshness Static files Real-time data
Type enforcement Basic pandas inference CKAN-defined types
Filtering Client-side (after download) Server-side
Metadata Limited Rich field descriptions
Query flexibility None Full SQL support
Network usage Downloads entire file Only requested data

Methods

Basic Dataset Operations

  • list_all_datasets(as_frame=True): List all datasets.
  • search_datasets(query, as_frame=True): Search datasets by keyword.
  • search_resources_by_name(name, as_frame=True): Get dataset by name.
  • download_dataset(name, file_path='./cache/', overwrite=False): Download resource.
  • load(name, filename, file_path='./cache/', reload=False, smart_return=True): Load a file from the dataset.

Datastore API Methods (New!)

  • datastore_search(resource_id, filters=None, q=None, limit=100, offset=0, fields=None, sort=None, as_frame=True): Search datastore records with type-enforced results and filtering.
  • datastore_info(resource_id): Get metadata about datastore resource fields, types, and descriptions.
  • datastore_search_sql(sql, as_frame=True): Execute SQL queries on datastore resources.
  • get_datastore_resources(name, as_frame=True): Get only datastore-enabled resources for a dataset.

Smart Return File Types

The package supports smart return for the following file types:

  • csv
  • docx
  • gpkg
  • geojson
  • jpeg
  • json
  • kml
  • pdf
  • sav
  • shp
  • txt
  • xlsm
  • xlsx
  • xml
  • xsd

Development

Running Tests

# Run all tests
make test

# Run tests with coverage
make test-cov

# Run linting checks
make lint

Code Quality

This project uses several tools to maintain code quality:

  • Black: Code formatting
  • isort: Import sorting
  • flake8: Linting
  • mypy: Type checking
  • pre-commit: Automated checks

PyPI Publishing

The package is automatically published to PyPI when you create a new release on GitHub:

  1. Update the version in pyproject.toml
  2. Commit and push your changes
  3. Create a new release on GitHub (this triggers the publishing workflow)
  4. The workflow runs tests and publishes automatically using Trusted Publishing

For detailed instructions, see docs/PYPI_PUBLISHING.md.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

License

MIT License

Changelog

See CHANGELOG.md for a list of changes and version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toronto_open_data-0.2.1.tar.gz (41.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toronto_open_data-0.2.1-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file toronto_open_data-0.2.1.tar.gz.

File metadata

  • Download URL: toronto_open_data-0.2.1.tar.gz
  • Upload date:
  • Size: 41.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for toronto_open_data-0.2.1.tar.gz
Algorithm Hash digest
SHA256 66bed47c1fb690497b234a8158e56b962a91d28c62e3108f5d4cdb5281acc31f
MD5 119c316f108db8400726819803a8fc9c
BLAKE2b-256 41f3c0187bf95422cebe5c1c688409864200322123dcf7f435372a8ba182b8ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for toronto_open_data-0.2.1.tar.gz:

Publisher: release.yml on alexwolson/toronto-open-data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file toronto_open_data-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for toronto_open_data-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eb61150be7fb7f71a0d589a104cbc22456d4163dba10bf820a59a3cb5887a721
MD5 2018a13d989c1e151f755021bf0c6f99
BLAKE2b-256 eb452bf5f716f43d6a24949dab5269ee1d6e9b7f89cda9dd06fcaec8903b7941

See more details on using hashes here.

Provenance

The following attestation bundles were made for toronto_open_data-0.2.1-py3-none-any.whl:

Publisher: release.yml on alexwolson/toronto-open-data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page