Skip to main content

A Python package for accessing Toronto Open Data Portal

Project description

TorontoOpenData Python Package

Overview

The TorontoOpenData package provides a Python interface to interact with the Toronto Open Data portal. It allows users to list, search, and download datasets, as well as load specific resources.

Installation

To install the package, run:

pip install toronto-open-data

Development Installation

For development and contributing:

git clone https://github.com/alexwolson/toronto-open-data.git
cd toronto-open-data
pip install -e ".[dev]"
make pre-commit  # Install pre-commit hooks

Dependencies

  • pandas
  • requests
  • tqdm
  • ckanapi

Usage

Initialization

Initialize the TorontoOpenData class:

from toronto_open_data import TorontoOpenData

tod = TorontoOpenData()

List All Datasets

List all available datasets:

datasets = tod.list_all_datasets()

Search Datasets

Search datasets by keyword:

search_results = tod.search_datasets('parks')

Download Dataset

Download a specific dataset:

tod.download_dataset('dataset_name')

Load Dataset

Load a specific file from a dataset:

file_path = tod.load('dataset_name', 'file_name.csv', smart_return=False)

Load a specific file, returning an object if supported (default behaviour):

file_object = tod.load('dataset_name', 'file_name.csv', smart_return=True)

Using the Datastore API (New!)

For datasets that support CKAN's datastore, you can query data directly without downloading files:

Basic Datastore Search

# Get type-enforced data directly from the datastore
data = tod.datastore_search('resource-id-here', limit=100)
print(data.dtypes)  # Shows proper data types (dates, numbers, etc.)

Filtered Search

# Search with filters and sorting
filtered_data = tod.datastore_search(
    'resource-id-here',
    filters={'status': 'active', 'year': 2023},
    sort='date_created desc',
    limit=50
)

Get Resource Metadata

# Get field information and descriptions
info = tod.datastore_info('resource-id-here')
for field in info['fields']:
    print(f"{field['id']}: {field.get('type')} - {field.get('info', {}).get('label', 'No description')}")

Custom SQL Queries

# Advanced querying with SQL
data = tod.datastore_search_sql('''
    SELECT category, COUNT(*) as count, AVG(value) as avg_value
    FROM "resource-id-here"
    WHERE status = 'active'
    GROUP BY category
    ORDER BY count DESC
    LIMIT 10
''')

Find Datastore Resources

# Check which resources support datastore
datastore_resources = tod.get_datastore_resources('dataset-name')
for resource in datastore_resources:
    print(f"Datastore resource: {resource['name']} (ID: {resource['id']})")

Datastore vs File Download

Feature File Download (load()) Datastore API
Data freshness Static files Real-time data
Type enforcement Basic pandas inference CKAN-defined types
Filtering Client-side (after download) Server-side
Metadata Limited Rich field descriptions
Query flexibility None Full SQL support
Network usage Downloads entire file Only requested data

Methods

Basic Dataset Operations

  • list_all_datasets(as_frame=True): List all datasets.
  • search_datasets(query, as_frame=True): Search datasets by keyword.
  • search_resources_by_name(name, as_frame=True): Get dataset by name.
  • download_dataset(name, file_path='./cache/', overwrite=False): Download resource.
  • load(name, filename, file_path='./cache/', reload=False, smart_return=True): Load a file from the dataset.

Datastore API Methods (New!)

  • datastore_search(resource_id, filters=None, q=None, limit=100, offset=0, fields=None, sort=None, as_frame=True): Search datastore records with type-enforced results and filtering.
  • datastore_info(resource_id): Get metadata about datastore resource fields, types, and descriptions.
  • datastore_search_sql(sql, as_frame=True): Execute SQL queries on datastore resources.
  • get_datastore_resources(name, as_frame=True): Get only datastore-enabled resources for a dataset.

Smart Return File Types

The package supports smart return for the following file types:

  • csv
  • docx
  • gpkg
  • geojson
  • jpeg
  • json
  • kml
  • pdf
  • sav
  • shp
  • txt
  • xlsm
  • xlsx
  • xml
  • xsd

Development

Running Tests

# Run all tests
make test

# Run tests with coverage
make test-cov

# Run linting checks
make lint

Code Quality

This project uses several tools to maintain code quality:

  • Black: Code formatting
  • isort: Import sorting
  • flake8: Linting
  • mypy: Type checking
  • pre-commit: Automated checks

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

License

MIT License

Changelog

See CHANGELOG.md for a list of changes and version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toronto_open_data-0.2.0.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toronto_open_data-0.2.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file toronto_open_data-0.2.0.tar.gz.

File metadata

  • Download URL: toronto_open_data-0.2.0.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for toronto_open_data-0.2.0.tar.gz
Algorithm Hash digest
SHA256 326a0e8d7cf8e6f16cde8f701bfccf6889a5054a66c35ba5f08f75306c8f0a0f
MD5 999ae3bb2330776f5e30087b3006e282
BLAKE2b-256 027cfb5d9a27a14a179984befc08c67b607e89731567e66146ef06f4b72df01d

See more details on using hashes here.

File details

Details for the file toronto_open_data-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for toronto_open_data-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7f4d217aec361e47e1d692b448fcc2eb3f56302b515b57a4c26922d02abfa98
MD5 50bfaaf0c9f1d9c83c89fc6c4503b7a1
BLAKE2b-256 b98ea20547c1f994088f338092312420e3eb6237fdac4c2159da86b807e40179

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page