Skip to main content

No project description provided

Project description

Social Finance Data Pipeline

This is a package that aims to standerdise one or more datasets against a defined schema. It is currently in it's very early stagies.

check the wiki for more details on the plan.

Current status

Current the pipeline only works with xml files and is limited to one schema and one file. It's a first generic approach, that was tested against CIN and SWWF datasets.

It covers partially the following steps described in the wiki:

  • Identify - identifies the stream of data against the schema. only for XML files for now.
  • Convert - Tries to converts the datatypes and throws a warning when not possible. It uses the xsdata package for it.
  • Normalise - adds the primary and foreign keys for each record.

How to run

Check the demo.py file. It has 2 functions that run against the CIN and SWWF datasets present in the samples directory.

There's also smaller samples of those datasets.

This methods will print a set of dataframes for each dataset.

Improvements

There's still a lot to be done. Besides completing what's in the wiki, here are some things I believe should be done first:

  • Datastore - the values are directly pulled to a tablib databook in a very unneficcient nested for loop. This is a big no. We should use RTOF datstore for this.

  • Datatypes It should be possible to define the way we want to export the datatypes. maybe the user wants the dates to come out in a the "dd-mm-yyyy" format when exporting. Or maybe they want just mm-yyyy. This should be possible. Currently, I'm assuming this in export_value. But it should be adjusted.

  • path vs context - I was using path as a reference for where the each node sits in the hierarchy. However, having a tuple in the context is probably a better approach. I'm currently using both, this should not be the case - use just the context.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fmsfdata-3.1.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

fmsfdata-3.1.0-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file fmsfdata-3.1.0.tar.gz.

File metadata

  • Download URL: fmsfdata-3.1.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.5.0

File hashes

Hashes for fmsfdata-3.1.0.tar.gz
Algorithm Hash digest
SHA256 a11cc74af1777308d80d848e7c3a56a46d5e017eea5a3922aa80b60adde88eb4
MD5 ff9d280a4c3b32238f26d0e47d92eae7
BLAKE2b-256 d3756c3c46ce6430531b3de0b6e953d7f8d5f7480bad37553a1b2208cf7b434d

See more details on using hashes here.

File details

Details for the file fmsfdata-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: fmsfdata-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.5.0

File hashes

Hashes for fmsfdata-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8064b1d30e705c236485c850edb351eeb510ef7ba595f60594389b3c48f5fdbb
MD5 f5eb3a10d95c3b03a2a19bf7edd9500b
BLAKE2b-256 d6fe17c7ef93b29a7ed624bd5c6c0b188d3fccdca86b40a39a232c95af26c0f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page