Skip to main content

data wrangling simplicity, complete audit transparency, and at speed

Project description

whyqd provides an intuitive method for restructuring messy data to conform to a standardised metadata schema. It supports data managers and researchers looking to rapidly, and continuously, normalise any messy spreadsheets using a simple series of steps. Once complete, you can import wrangled data into more complex analytical systems or full-feature wrangling tools.

It aims to get you to the point where you can perform automated data munging prior to committing your data into a database, and no further. It is built on Pandas, and plays well with existing Python-based data-analytical tools. Each raw source file will produce a json schema and method file which defines the set of actions to be performed to produce refined data, and a destination file validated against that schema.

whyqd ensures complete audit transparency by saving all actions performed to restructure your input data to a separate json-defined methods file. This permits others to scrutinise your approach, validate your methodology, or even use your methods to import data in production.

Once complete, a method file can be shared, along with your input data, and anyone can import whyqd and validate your method to verify that your output data is the product of these inputs.

Why use it?

If all you want to do is test whether your source data are even useful, spending days or weeks slogging through data restructuring could kill a project. If you already have a workflow and established software which includes Python and pandas, having to change your code every time your source data changes is really, really frustrating.

There are two complex and time-consuming parts to preparing data for analysis: social, and technical.

The social part requires multi-stakeholder engagement with source data-publishers, and with destination database users, to agree structural metadata. Without any agreement on data publication formats or destination structure, you are left with the tedious frustration of manually wrangling each independent dataset into a single schema.

whyqd allows you to get to work without requiring you to achieve buy-in from anyone or change your existing code.

Wrangling process

  • Create, update or import a data schema which defines the destination data structure;
  • Create a new method and associate it with your schema and input data source/s;
  • Assign a foreign key column and (if required) merge input data sources;
  • Structure input data fields to conform to the requriements for each schema field;
  • Assign categorical data identified during structuring;
  • Transform and filter input data to produce a final destination data file;
  • Share your data and a citation;

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whyqd-0.2.1.tar.gz (44.6 kB view details)

Uploaded Source

Built Distribution

whyqd-0.2.1-py3-none-any.whl (63.6 kB view details)

Uploaded Python 3

File details

Details for the file whyqd-0.2.1.tar.gz.

File metadata

  • Download URL: whyqd-0.2.1.tar.gz
  • Upload date:
  • Size: 44.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1

File hashes

Hashes for whyqd-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b31cb3bc1ac18ea502be55096d831230346abae9fac90a8b8b6b1f395bdb8fb5
MD5 727fd4bc9e6b6364ff7023263ef5ba79
BLAKE2b-256 2f41f64f8aa582f4733fef40b57d62098e78b4f68e85018ac2e11c18dde203c0

See more details on using hashes here.

File details

Details for the file whyqd-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: whyqd-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 63.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1

File hashes

Hashes for whyqd-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4c405516661a4658b91aee1d90780fe1b7dacf6574e81835f005fadbdd1dee5d
MD5 6d1c35852ff3cd0f16a0c3b0193fab0f
BLAKE2b-256 b9016d19c1c7d5972383d5364cb00d39b92bdcbf47c288ef21d7be66eb9d2ae3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page