No project description provided

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

datagov-harvesting-logic

This is a library that will be utilized for metadata extraction, validation, transformation, and loading into the data.gov catalog.

Features

Extract
- General purpose fetching and downloading of web resources.
- Catered extraction to the following data formats:
  - DCAT-US
Validation
- DCAT-US
  - jsonschema validation using draft 2020-12.
Load
- DCAT-US
  - Conversion of dcat-us catalog into ckan dataset schema
  - Create, delete, update, and patch of ckan package/dataset

Requirements

This project is using poetry to manage this project. Install here.

Once installed, poetry install installs dependencies into a local virtual environment.

Testing

CKAN load testing

CKAN load testing doesn't require the services provided in the docker-compose.yml.
catalog-dev is used for ckan load testing.
Create an api-key by signing into catalog-dev.
Create a credentials.py file at the root of the project containing the variable ckan_catalog_dev_api_key assigned to the api-key.
Run tests with the command poetry run pytest ./tests/load/ckan

Harvester testing

These tests are found in extract, and validate. Some of them rely on services in the docker-compose.yml. Run using docker docker compose up -d and with the command poetry run pytest --ignore=./tests/load/ckan.

If you followed the instructions for CKAN load testing and Harvester testing you can simply run poetry run pytest to run all tests.

Integration testing

to run integration tests locally add the following env variables to your .env file in addition to their appropriate values
- CF_SERVICE_USER = "put username here"
- CF_SERVICE_AUTH = "put password here"

Comparison

./tests/harvest_sources/ckan_datasets_resp.json
- Represents what ckan would respond with after querying for the harvest source name

./tests/harvest_sources/dcatus_compare.json

Represents a changed harvest source
Created:
- datasets[0]
```
+ "identifier" = "cftc-dc10"
```
Deleted:
- datasets[0]
```
- "identifier" = "cftc-dc1"
```

Updated:

datasets[1]

- "modified": "R/P1M"
+ "modified": "R/P1M Update"

datasets[2]

- "keyword": ["cotton on call", "cotton on-call"]
+ "keyword": ["cotton on call", "cotton on-call", "update keyword"]

datasets[3]

"publisher": {
  "name": "U.S. Commodity Futures Trading Commission",
  "subOrganizationOf": {
-   "name": "U.S. Government"
+   "name": "Changed Value"
  }
}

./test/harvest_sources/dcatus.json
- Represents an original harvest source prior to change occuring.

Flask App

Local development

set your local configurations in .env file.
Use the Makefile to set up local Docker containers, including a PostgreSQL database and the Flask application:
```
make build 
make up
make test
make clean
```
This will start the necessary services and execute the test.

when there are database DDL changes, use following steps to generate migration scripts and update database:

docker compose db up
docker compose run app flask db migrate -m "migration description"
docker compose run app flask db upgrade

Deployment to cloud.gov

Database Service Setup

A database service is required for use on cloud.gov.

In a given Cloud Foundry space, a db can be created with cf create-service <service offering> <plan> <service instance>.

In dev, for example, the db was created with cf create-service aws-rds micro-psql harvesting-logic-db.

Creating databases for the other spaces should follow the same pattern, though the size may need to be adjusted (see available AWS RDS service offerings with cf marketplace -e aws-rds).

Any created service needs to be bound to an app with cf bind-service <app> <service>. With the above example, the db can be bound with cf bind-service harvesting-logic harvesting-logic-db.

Accessing the service can be done with service keys. They can be created with cf create-service-keys, listed with cf service-keys, and shown with

cf service-key <service-key-name>.

Manually Deploying the Flask Application to development

Ensure you have a manifest.yml and vars.development.yml file configured for your Flask application. The vars file may include variables:
```
app_name: harvesting-logic
database_name: harvesting-logic-db
route-external: harvester-dev-datagov.app.cloud.gov
```

Deploy the application using Cloud Foundry's cf push command with the variable file:

poetry export -f requirements.txt --output requirements.txt --without-hashes
cf push --vars-file vars.development.yml

when there are database DDL changes, use following to do the database update:

cf run-task harvesting-logic --command "flask db upgrade" --name database-upgrade

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.4.2

May 10, 2024

0.4.1

May 7, 2024

0.4.0

May 6, 2024

0.3.10

May 1, 2024

This version

0.3.9

Apr 25, 2024

0.3.8

Apr 24, 2024

0.3.7

Apr 24, 2024

0.3.6

Apr 3, 2024

0.3.5

Mar 21, 2024

0.3.4

Mar 18, 2024

0.3.3

Feb 27, 2024

0.3.2

Feb 6, 2024

0.3.1

Feb 6, 2024

0.3.0

Jan 30, 2024

0.2.0

Jan 22, 2024

0.1.0

Jan 4, 2024

0.0.4

Dec 22, 2023

0.0.3.post2

Dec 19, 2023

0.0.3.post1

Dec 19, 2023

0.0.2.post6

Dec 18, 2023

0.0.2.post5

Dec 18, 2023

0.0.2.post4

Dec 18, 2023

0.0.2.post3

Dec 15, 2023

0.0.1

Dec 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datagov_harvesting_logic-0.3.9.tar.gz (125.0 kB view hashes)

Uploaded Apr 25, 2024 Source

Built Distribution

datagov_harvesting_logic-0.3.9-py3-none-any.whl (131.0 kB view hashes)

Uploaded Apr 25, 2024 Python 3

Hashes for datagov_harvesting_logic-0.3.9.tar.gz

Hashes for datagov_harvesting_logic-0.3.9.tar.gz
Algorithm	Hash digest
SHA256	`08a6fd8b0ccbc60903521cd8a6df1a3930cbba30aae289ecdd72280c1652b7ff`
MD5	`4a041f68efe6a7c153dca6aaffe5e363`
BLAKE2b-256	`435ad454fb25d97f94c843807aa8493adaa9651d71856a9ad8bbb1323535bd66`

Hashes for datagov_harvesting_logic-0.3.9-py3-none-any.whl

Hashes for datagov_harvesting_logic-0.3.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fb9e0fa413c7b389ee6d28b97843385c217bbf9aaacdeb2181f8f8c6a3e00eed`
MD5	`a6fafc76a057c696dd448daac64903fa`
BLAKE2b-256	`b0bf30053d776630f91d73ca4ad90f0e0645a1a14bd5e3ee97e54fb38d4bf61f`