A pipeline for processing open medical examier's data using GitHub Actions CI/CD.
Project description
TODO:
- Add A LOT more PRINT statements
- Add comments
- Add documentation (README and docs site)
- The latter will be necesarry once we move to dockerfiles and actions
- Add tests 😅
- including CLI tests
- Use arcgis package for geocoding
- Use batch geocoding (had problem with Token... can register as anonymous user?)
- [x] Use Socrata package (register API key) for data fetching from datasets published on Socrata - Use
requests
package for data fetching from datasets published on odata
- Use batch geocoding (had problem with Token... can register as anonymous user?)
- Use github python package to keep config.yaml updated after successful runs
- Can also use to update JS datafiles at end of analysis (see below)
- Just used requests and api directly
- These should be very small and generated by pandas analysis of the data
- results should be in a github release (data files) (can zip them)
- Use GH CLI in bash script because pre-installed in Actions
- We can then just use the OctoKit JS package to point to the LINKS of the files and when you click on them it will download them
- then web page to enable file downloads and show some graphs (basic --> records over time for each dataset)
- what charting frameowkr to use?
- Need an action to update the frontend codebase with the new data
- Store in JSON format
- [ ] add website to socrata key
- Store in JSON format
- Make a container to run the whole pipeline (so no downloads for users)
- Host on GHCR
- MAKE OUR OWN UNIQUE IDENTIFIERS FOR ALL DATASETS COMBINED
- SAME COLUMN NAME IN ALL DATASETS, THEN DON'T HAVE TO PROVIDE IDENTIFIER COLUMN IN config.yaml
- Also allows for better merging of datasets (i.e. records + drugs + geo)
- [ ] DO we want to publish a Web API as well?- [ ] Then weould need DB
- No Windows support due to drug extraction tool usage
I think, if my math is right, we can do ~20 minutes / day of actions... (2,000 minutes per month limit for free)
*** make a note it is very important to often PULL to stay updated with the CONFIG
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
opendata-pipeline-0.2.1.tar.gz
(14.0 kB
view hashes)
Built Distribution
Close
Hashes for opendata_pipeline-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 637c85d2c969c565089b46f9a165e91c1cebfddfc19254dc3c947bb2adbfdb43 |
|
MD5 | cffc22ad7d50a4aba74ccd4852a3677b |
|
BLAKE2b-256 | b3bb57a53ddb33db490d196e8d42a415c670f448b64bb3daf24da0fd6e1a2ead |