Access, retrieve, and work with Canadian Census data and geography
Project description
pycancensus
Access, retrieve, and work with Canadian Census data and geography.
pycancensus is a Python package that provides integrated, convenient, and uniform access to Canadian Census data and geography retrieved using the CensusMapper API. This package produces analysis-ready tidy DataFrames and spatial data in multiple formats, with full equivalence to the R cancensus library.
Recent Updates
- Full R Library Equivalence: Verified 100% data compatibility with R cancensus
- Enhanced API Reliability: Production-grade error handling and retry logic
- Vector Hierarchy Functions: Navigate census variable relationships like R
- Improved Data Quality: Fixed column naming and data processing issues
- Comprehensive Testing: 450+ integration tests covering real-world scenarios
- National-Level Support: Added level='C' for Canada-wide baseline comparisons
Features
Data Access
- Download Census data and geography in analysis-ready format
- Support for multiple Census years: 2021, 2016, 2011, 2006, 2001, 1996
- All Census geographic levels: PR, CMA, CD, CSD, CT, DA, EA, DB
- Taxfiler data at Census Tract level (2000-2018)
Variable Discovery
list_census_vectors()- Browse all available variablessearch_census_vectors()- Search variables by keywordparent_census_vectors()- Navigate variable hierarchies upwardchild_census_vectors()- Navigate variable hierarchies downwardfind_census_vectors()- Enhanced variable search with fuzzy matching
Geographic Capabilities
- GeoPandas integration for spatial analysis
- Multiple resolution options (simplified/high)
- Seamless geometry + data integration
Reliability & Performance
- Production-grade error handling with helpful messages
- Automatic retry logic with exponential backoff
- Connection pooling for improved performance
- Rate limiting to respect API constraints
- Comprehensive caching system
Installation
Note: pycancensus is not yet published on PyPI. Install directly from GitHub:
# Install latest version from GitHub
pip install git+https://github.com/dshkol/pycancensus.git
Or for development:
git clone https://github.com/dshkol/pycancensus.git
cd pycancensus
pip install -e .[dev]
Coming soon: Publication to PyPI for pip install pycancensus
API Key
pycancensus requires a valid CensusMapper API key to use. You can obtain a free API key by signing up for a CensusMapper account.
Set your API key as an environment variable:
export CANCENSUS_API_KEY="your_api_key_here"
Or set it programmatically:
import pycancensus as pc
pc.set_api_key("your_api_key_here")
Quick Start
import pycancensus as pc
# Set your API key
pc.set_api_key("your_api_key_here")
# List available datasets
datasets = pc.list_census_datasets()
# Discover variables with new hierarchy functions
vectors = pc.list_census_vectors("CA21")
income_vars = pc.search_census_vectors("income", "CA21")
related_vars = pc.child_census_vectors("v_CA21_1", dataset="CA21")
# Get census data
data = pc.get_census(
dataset="CA21",
regions={"CMA": "35535"}, # Toronto CMA
vectors=["v_CA21_1", "v_CA21_2", "v_CA21_3"], # Population by gender
level="CSD"
)
# Get census data with geography for mapping
geo_data = pc.get_census(
dataset="CA21",
regions={"PR": "35"}, # Ontario
vectors=["v_CA21_1"], # Total population
level="CSD",
geo_format="geopandas" # Returns GeoDataFrame
)
# Advanced: Compare multiple Census years
data_2021 = pc.get_census("CA21", {"CSD": "5915022"}, ["v_CA21_1"], "CSD")
data_2016 = pc.get_census("CA16", {"CSD": "5915022"}, ["v_CA16_401"], "CSD")
Variable Discovery Examples
# Search for housing-related variables
housing = pc.search_census_vectors("dwelling", "CA21")
# Navigate variable hierarchies
population_base = "v_CA21_1"
breakdowns = pc.child_census_vectors(population_base, dataset="CA21")
parent_categories = pc.parent_census_vectors(population_base, dataset="CA21")
# Enhanced search with fuzzy matching
income_vectors = pc.find_census_vectors("CA21", "median household income")
Error Handling & Resilience
pycancensus includes production-grade error handling:
from pycancensus.resilience import CensusAPIError, RateLimitError
try:
data = pc.get_census("CA21", {"PR": "35"}, ["v_CA21_1"], "PR")
except RateLimitError as e:
print(f"Rate limited: {e}")
print(f"Retry after: {e.retry_after} seconds")
except CensusAPIError as e:
print(f"API error: {e}")
print(f"Suggestion: {e.suggestion}")
Testing & Verification
pycancensus includes comprehensive testing to ensure reliability and R equivalence:
Cross-Validation with R cancensus
- 4/4 tests passing with full data equivalence
- Identical results for vector listing, data retrieval, and multi-region queries
- Automated testing against R cancensus library
Integration Testing
- 6 real-world scenarios covering typical data analysis workflows
- Provincial population analysis, demographic breakdowns, income analysis
- Vector hierarchy navigation, time series comparisons, geographic analysis
- Performance benchmarking with large datasets
Robustness Testing
- Error handling with invalid regions/vectors
- Large dataset performance testing
- API resilience and retry logic validation
# Run the test suite
python -m pytest tests/ -v
# Run cross-validation against R
python tests/cross_validation/test_r_equivalence.py
# Run integration scenarios
python tests/integration/test_comprehensive_scenarios.py
See tests/cross_validation/results/ for detailed test results and validation reports.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines on:
- Development setup
- Running tests
- Code style (Black, flake8)
- Submitting pull requests
- Reporting issues
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Packages
This package is inspired by and based on the R cancensus package.
Statistics Canada Attribution
Subject to the Statistics Canada Open Data License Agreement, licensed products using Statistics Canada data should employ the following acknowledgement of source:
Acknowledgment of Source
(a) You shall include and maintain the following notice on all licensed rights of the Information:
- Source: Statistics Canada, name of product, reference date. Reproduced and distributed on an "as is" basis with the permission of Statistics Canada.
(b) Where any Information is contained within a Value-added Product, you shall include on such Value-added Product the following notice:
- Adapted from Statistics Canada, name of product, reference date. This does not constitute an endorsement by Statistics Canada of this product.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycancensus-0.1.0.tar.gz.
File metadata
- Download URL: pycancensus-0.1.0.tar.gz
- Upload date:
- Size: 35.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5bf12dd44eb33baf29ba2dec608533f7041da94422f388093a7b2898eb69f78
|
|
| MD5 |
1d46fc3ee16504388d22918efd56687b
|
|
| BLAKE2b-256 |
088b4bcbc9c3cae9237d90d171fd67236ff14ab7dd4ad85dde9847e3a6a56fc0
|
File details
Details for the file pycancensus-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pycancensus-0.1.0-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04c3859ad98ac64533db6fbaa5eb479b312e943d886596c851060283ced8b403
|
|
| MD5 |
8039270b79cf84f36d716e95dcb0e07b
|
|
| BLAKE2b-256 |
2e5d160ceae036fab300b38c3bdc593d832632bdef0c4befa09759076083ba3e
|