A pipeline for processing open medical examier's data using GitHub Actions CI/CD.
Project description
TODO:
- Add A LOT more PRINT statements
- Add comments
- Add documentation (README and docs site)
- The latter will be necesarry once we move to dockerfiles and actions
- Add tests 😅
- including CLI tests
- Use arcgis package for geocoding
- Use batch geocoding (had problem with Token... can register as anonymous user?)
- [x] Use Socrata package (register API key) for data fetching from datasets published on Socrata - Use
requests
package for data fetching from datasets published on odata
- Use batch geocoding (had problem with Token... can register as anonymous user?)
- Use github python package to keep config.yaml updated after successful runs
- Can also use to update JS datafiles at end of analysis (see below)
- Just used requests and api directly
- These should be very small and generated by pandas analysis of the data
- results should be in a github release (data files) (can zip them)
- Use GH CLI in bash script because pre-installed in Actions
- We can then just use the OctoKit JS package to point to the LINKS of the files and when you click on them it will download them
- then web page to enable file downloads and show some graphs (basic --> records over time for each dataset)
- what charting frameowkr to use?
- Need an action to update the frontend codebase with the new data
- Store in JSON format
- [ ] add website to socrata key
- Store in JSON format
- Make a container to run the whole pipeline (so no downloads for users)
- Host on GHCR
- MAKE OUR OWN UNIQUE IDENTIFIERS FOR ALL DATASETS COMBINED
- SAME COLUMN NAME IN ALL DATASETS, THEN DON'T HAVE TO PROVIDE IDENTIFIER COLUMN IN config.yaml
- Also allows for better merging of datasets (i.e. records + drugs + geo)
- [ ] DO we want to publish a Web API as well?- [ ] Then weould need DB
- No Windows support due to drug extraction tool usage
I think, if my math is right, we can do ~20 minutes / day of actions... (2,000 minutes per month limit for free)
*** make a note it is very important to often PULL to stay updated with the CONFIG
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
opendata-pipeline-0.1.1.tar.gz
(13.8 kB
view hashes)
Built Distribution
Close
Hashes for opendata_pipeline-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d349e93dadbc3eff64c034e0d6b758cc08578ab5e4b3855bae7b7d55043d84fb |
|
MD5 | 92f7322dc1fc0c016463a32acbd94f66 |
|
BLAKE2b-256 | 8d21d73138c3064cbe18abb439acc9f17241efe859d6c9f5978dec9d15a97756 |