Skip to main content

OECD fetcher building templates and helpers

Project description

OECD Toolbox

A library of abstract classes that can serve as a skeleton for writing downloaders and converters for DbNomics style fetchers. Additionally it contains utility/helper functions to handle common operations or transformations in both the downloading and conversion process.

Installation

To install the package - proceed the usual way:

pip install oecd_toolbox

or if you have it installed already, upgrade:

pip install oecd_toolbox --upgrade

Build the project

To build the project, after changes, make sure the version number in setup.cfg is updated. Then issue the following command:

python -m build

Publish the project on pypi.org

WARNING!!! make sure no confidential data is stored in the published package

The package is published on pypi.org. Manually manage the available variants of the package here. Access details are stored in the Practices teams access details store.

To push the distributions that are available in the toolbox's dist folder use twine with the commandline:

twine upload dist/*

DataCapture conversions

NEW in version 0.3.27 of the toolbox - the package includes a converter to generate DataCapture style csvs. To use this new converter copy into your fetcher project the 'tests\datacapture-postprocessor.py' file or just its contents (below).

import asyncio
import sys
from oecd_toolbox import csv_writers as lbc

def main():
  
    cnvtr = lbc.DataCaptureConverter()
    cnvtr.init_arguments_and_logging()
    asyncio.run(cnvtr.convert_resources(cnvtr.prepare_resources(), cnvtr.process_single_resource))

if __name__ == "__main__":
    sys.exit(main())

In order to run the conversion from jsonl files use the following powershell command assuming that you have already created a folder <projectname-datacapture-data> to recieve your csv files:

python datacapture-postprocessor.py <path-to\projectname-json-data> <path-to\projectname-datacapture-data> --force

A similar command could be added to the postprocessor in the continouous integration pipeline. The usual behaviour modifiers [--only --except --limit] are available.

NEW in version 0.4.0 beyond the basic converter two new converter flavours are available:

  • DataCaptureConverterWithRegex can filter series from a resource, so that the conversion is lighter both in used resources and resulting file (filtering is based on an SDMX webservice like syntax - dots for dimension separators, '+' to connect eligible dimension members, empty position allows all members) The above sample code adapted:

    def main():
        filterset = [
            ('sts_trtu_m', '.TOVT+TOVV.G46+G47..I15.'),
            ('prc_hicp_ctrb', '.I15+INX_A_AVG+RCH_A+RCH_A_AVG+RCH_M..CP00+CP01+CP02+CP03+CP04+CP041+CP043+CP044+CP045')
        ]
    
        cnvtr = lbc.DataCaptureConverterWithRegex()
        cnvtr.init_arguments_and_logging()
        asyncio.run(cnvtr.convert_resources(cnvtr.prepare_resources(filterset), cnvtr.process_single_resource))
    
  • DataCaptureConverterWithRegexAndAggregator can handle frequency aggregations, it is sufficient to provide a target frequency from the list ['M','Q','A'] and an aggregator function (pandas-style)

    def main():
        filterset = [
            ("MMSD008A", None, "M", pd.DataFrame.mean),
            ("MMSD402A", None, "A", pd.DataFrame.median)
        ]
    
        cnvtr = lbc.DataCaptureConverterWithRegexAndAggregator()
        cnvtr.init_arguments_and_logging()
        asyncio.run(cnvtr.convert_resources(cnvtr.prepare_resources(filterset), cnvtr.process_single_resource))
    

Common behaviour modifiers

All fetcher components that adopt the toolbox inherit some behaviour modifiers from the underlying toolbox. These command line arguments can be used to modify how the programs iterate through resources:

  • removing the excluded ones if the --exclude option is used; provide a space separated list of resource IDs
  • keeping only some of them if the --only option is used; provide a space separated list of resource IDs
  • processing a limited number of resources if the --limit option is used; provide an integer after the argument

By default resources that were already processed with a SUCCESS or FAILURE status will not be processed again. If the option --retry-failed is used, resources with FAILURE status will be retried. If the option --force is used, process all resources. !!! This is often needed if the status log is not cleared after each execution.

The basic behaviour of the iterator will call process_resource(resource), logging messages, allowing to track the processing progress. If an exception is raised during the execution of process_resource:

  • log the error and process the next resource, or re-raise if --fail-fast option is used
  • call resource.delete() if --delete-on-error option is used

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oecd_toolbox-0.4.3.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oecd_toolbox-0.4.3-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file oecd_toolbox-0.4.3.tar.gz.

File metadata

  • Download URL: oecd_toolbox-0.4.3.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for oecd_toolbox-0.4.3.tar.gz
Algorithm Hash digest
SHA256 cbcf6c3538df22b64d19e9f83eb5a29c5f8e8b7136a3a50127b01c75c7e9a4b9
MD5 579f91575ee62f9a05884e6d0c3237f8
BLAKE2b-256 3528a9d0b4bf745cc5b2ecf11be04c62d05393a21970b2853761cbb51def2bcf

See more details on using hashes here.

File details

Details for the file oecd_toolbox-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: oecd_toolbox-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for oecd_toolbox-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 df32575251e2116c9abecc8950a307cdbc6e1678c10faa7b716d80e214e91829
MD5 ad8f82cca0b30120a863d996a5096205
BLAKE2b-256 4bea1996a1814f324a1371015e5f4eae1380a73e5baa039d7dae67cf3d73647a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page