A convenience tool for downloading census data from different countries
Project description
poppusher
What is popgetter?
Popgetter is a collection of tools, designed to make it convenient to download census data from a number of different jurisdictions and coercing the data into common formats. The aim is that city or region scale analysis can be easily replicated for different geographies, using the most detailed, locally available data.
What is poppusher?
This repo is "poppusher", which is one component of the popgetter project. Poppusher is a pipeline which downloads data from a number of different jurisdictions and then processes it into a common format. The data is then stored in a cloud-based data store, which can be accessed by other components of the popgetter project.
See the flow diagram for more details.
What the popgetter system does and doesn't do
Popgetter DOES:
For each of the implemented countries:
- Download the most detailed geometries, for which census data is available.
- Download the most detailed census available for most, if not all, variables published by the census.
- Ensures that the geometries and census data join correctly.
- Presents some standard metadata to allow the user to explore which variables are available.
- publish the data in a set of common file types (eg CloudGeoBuff, Parquet, PMtiles).
Popgetter DOES NOT:
- It does not attempt to ensure that census variables are comparable between different jurisdictions. Nor does it attempt to ensure that the results of any analysis can be directly compared across multiple countries.
Getting started
At present, this is still a development project, so the first step is to clone
the repo and then install using pip, with the --editable option:
- Create a virtual environment and activate it (you should be able to use your
own choice of environment manager, such as
condaorvenv, but so farpyenvis the most tested with popgetter). eg:
python -m venv poppusher_venv # create a virtual environment called `poppusher_venv`
source poppusher_venv/bin/activate # activate the virtual environment
- Clone the repo and then do an 'editable' install:
git clone https://github.com/Urban-Analytics-Technology-Platform/poppusher.git
cd poppusher
pip install -e ".[dev]"
- Then, start the Dagster UI web server:
dagster dev
Open http://localhost:3000 with your browser to see the project.
Development
You can start writing assets in poppusher/assets/ directory. New assets and
jobs will need to be added to the poppusher/__init__.py file.
Adding new Python dependencies
You can specify new Python dependencies in pyproject.toml.
Unit testing
Tests are in the poppusher_tests directory and you can run tests using
pytest:
pytest
Repo structure
This is a Dagster project. The repo layout was initially
created with the
dagster project scaffold
command. It has been subsequently updated using the
copier command and the
Scientific Python template.
There is code, which predates the migration to Dagster in the previous_code
directory. In due course, this will be removed as the remaining countries are
migrated to Dagster. There are usage instructions for this old code in
previous_code/previous_code_usage.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file poppusher-0.1.0.tar.gz.
File metadata
- Download URL: poppusher-0.1.0.tar.gz
- Upload date:
- Size: 360.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28a47e431ae2d81e7241b67916fca90acec6cd4a5975b50d9b2db1915ab0ab71
|
|
| MD5 |
3ac1d29446652ced7f4bfff925931cb2
|
|
| BLAKE2b-256 |
f6762da6f4d92002de01038ef86280de40a25428f083de9bb577e850a2691b2c
|
Provenance
The following attestation bundles were made for poppusher-0.1.0.tar.gz:
Publisher:
cd.yml on Urban-Analytics-Technology-Platform/poppusher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
poppusher-0.1.0.tar.gz -
Subject digest:
28a47e431ae2d81e7241b67916fca90acec6cd4a5975b50d9b2db1915ab0ab71 - Sigstore transparency entry: 154234788
- Sigstore integration time:
-
Permalink:
Urban-Analytics-Technology-Platform/poppusher@60486d133961bba7000a2c34ccf2e0608ff0737f -
Branch / Tag:
refs/heads/fix-deploy-to-pypi - Owner: https://github.com/Urban-Analytics-Technology-Platform
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@60486d133961bba7000a2c34ccf2e0608ff0737f -
Trigger Event:
workflow_dispatch
-
Statement type: