Skip to main content

Harvester Next Generation Core for CKAN

Project description

Build Status

Harvester Next Generation for CKAN

Install

pip install ckan-harvesters

Use data.json sources

from harvesters.datajson.harvester import DataJSON
dj = DataJSON()
dj.url = 'https://data.iowa.gov/data.json'
try:
	dj.fetch()
except Exception as e:
	print(e)

valid = dj.validate(validator_schema='non-federal-v1.1')
print(dj.errors)
# ['Error validating JsonSchema: \'bureauCode\' is a required property ...

# full dict with the source
print(dj.as_json())
"""
{
	'@context': 'https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld',
	'@id': 'https://data.iowa.gov/data.json',
	'@type': 'dcat:Catalog',
	'conformsTo': 'https://project-open-data.cio.gov/v1.1/schema',
	'describedBy': 'https://project-open-data.cio.gov/v1.1/schema/catalog.json',
	'dataset': [{
		'accessLevel': 'public',
		'landingPage': 'https://data.iowa.gov/d/23jk-3uwr',
		'issued': '2017-01-30',
		'@type': 'dcat:Dataset',

        ... 
"""
# just headers
print(dj.headers)

"""
{
'@context': 'https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld',
'@id': 'https://data.iowa.gov/data.json',
'@type': 'dcat:Catalog',
'conformsTo': 'https://project-open-data.cio.gov/v1.1/schema',
'describedBy': 'https://project-open-data.cio.gov/v1.1/schema/catalog.json',
}
"""

for dataset in dj.datasets:
    print(dataset['title'])

Impaired Streams 2014
2009-2010 Iowa Public School District Boundaries
2015 - 2016 Iowa Public School District Boundaries
Impaired Streams 2010
Impaired Lakes 2014
2007-2008 Iowa Public School District Boundaries
Impaired Streams 2012
2011-2012 Iowa Public School District Boundaries
Active and Completed Watershed Projects - IDALS
2012-2013 Iowa Public School District Boundaries
2010-2011 Iowa Public School District Boundaries
2016-2017 Iowa Public School District Boundaries
2014 - 2015 Iowa Public School District Boundaries
Impaired Lakes 2008
2008-2009 Iowa Public School District Boundaries
2013-2014 Iowa Public School District Boundaries
Impaired Lakes 2010
Impaired Lakes 2012
Impaired Streams 2008

Use CSW sources

from harvesters.csw.harvester import CSWSource
c = CSWSource(url='http://data.nconemap.com/geoportal/csw?Request=GetCapabilities&Service=CSW&Version=2.0.2')

csw.fetch()
csw_info = csw.as_json()
print('CSW title: {}'.format(csw_info['identification']['title']))
 # CSW title: ArcGIS Server Geoportal Extension 10 - OGC CSW 2.0.2 ISO AP

Development

To setup a develop environment, clone the repository and in a virtualenv install the dependencies

pip install -r requirements.txt

This will install the library in development mode, and other libraries for tests.

Test

Then to run the test suite with pytest:

pytest

We use pytest-vcr based on the wonderful VCRpy, to mock http requests. In this way, we don't need to hit the real internet to run our test (which is very fragile and slow), because there is a mocked version of a each response needed by tests, in vcr's cassettes format.

In order to update these cassettes just run as following:

pytest --vcr-record=all

To actually hit the internet without use mocks, disable the plugin

pytest --disable-vcr

In order to read from these cassettes just run as following:

pytest --vcr-record=none

Tests without a CKAN instance

python -m pytest tests

================ test session starts =============
platform linux -- Python 3.6.8, pytest-5.2.0, py-1.8.0, pluggy-0.13.0
rootdir: /home/hudson/dev/datopian/ckan-ng-harvester-core
plugins: vcr-1.0.2
collected 17 items                                                                                                                                                          

tests/test_csw_dataset_adapter.py ....      [ 23%]
tests/test_data_json.py .......             [ 64%]
tests/test_datajson_dataset_adapter.py .....[100%]

=============== 17 passed in 17.52s ==============

Tests with a CKAN instance.
You will need to copy settings.py file to local_settings.py file and fill the required values.
You can use a local or remote CKAN instance.

python -m pytest tests_with_ckan/test_harvest.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckan-harvesters-0.133.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

ckan_harvesters-0.133-py3-none-any.whl (42.0 kB view details)

Uploaded Python 3

File details

Details for the file ckan-harvesters-0.133.tar.gz.

File metadata

  • Download URL: ckan-harvesters-0.133.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.7

File hashes

Hashes for ckan-harvesters-0.133.tar.gz
Algorithm Hash digest
SHA256 a08b3096a39a8a73e46fb811a82dcb8d5a139f5aa74d1d8d329df16f7544c003
MD5 a8ea0cb944d290bcfcadb94be604bbbb
BLAKE2b-256 5fefb01c85a08b36d5ebd2231167cbae27c40f63d24fe8d97b26f5731cd7ade2

See more details on using hashes here.

File details

Details for the file ckan_harvesters-0.133-py3-none-any.whl.

File metadata

  • Download URL: ckan_harvesters-0.133-py3-none-any.whl
  • Upload date:
  • Size: 42.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.7

File hashes

Hashes for ckan_harvesters-0.133-py3-none-any.whl
Algorithm Hash digest
SHA256 75a62a42f1a61ac6be40163648e77e5f045c6c50fff0f5a43e58342fbcac1428
MD5 dc554e32daa93d5292dd3b79ff1a0aee
BLAKE2b-256 ded6a8e81972f73317c8586339d28d2240044c713a661bcba224670d3250f176

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page