Skip to main content

This project provides a collection of utilities for doing lightweight data wrangling.

Project description

datashaper

This project provides a collection of utilities for doing lightweight data wrangling.

There are two goals of the project:

  1. Create a shareable client/server schema for serialized wrangling instructions
  2. Maintain an implementation of a basic wrangling engine (based on Arquero) and in the case of python implemented in Pandas

Building

  • You need to install poetry python package manager.
  • Run: poetry install

Usage

This project is inteded to be used as a library for lightweight data wrangling. In the examples folder there is a Notebook which provides several examples of how to create data wrangling pipelines and how to read json specifications that can be generated by the js implementation.

Example of joining two tables:

from datashaper.pipeline import Pipeline
import datashaper.types as types
import pandas as pd

# id   name
# 1    bob
# 2    joe
# 3    jane
parents = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ['bob', 'joe', 'jane']
})

# id   kid
# 1    billy
# 1    jill
# 2    kaden
# 2    kyle
# 3    moe
kids = pd.DataFrame({
    "id": [1, 1, 2, 2, 3],
    "kid": ['billy', 'jill', 'kaden', 'kyle', 'moe']
})

pipeline = Pipeline()

pipeline.add_dataset('parents', parents)
pipeline.add_dataset('kids', kids)

pipeline.add(Step(
    verb=Verb.join,
    input="parents",
    output="output",
    args={
        "other": "kids",
        "on":["id"]
    }
))

# id   name    kid
# 1    bob     billy
# 1    bob     jill
# 2    joe     kaden
# 2    joe     kyle
# 3    jane    moe
result = pipeline.run()

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datashaper-0.0.6.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

datashaper-0.0.6-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file datashaper-0.0.6.tar.gz.

File metadata

  • Download URL: datashaper-0.0.6.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.13 Linux/5.15.0-1017-azure

File hashes

Hashes for datashaper-0.0.6.tar.gz
Algorithm Hash digest
SHA256 3b1aeaf56d451381e411bdeed2c41ed561bd304129cdd77541bde53384f865af
MD5 a8fdc12a9be53698df004c31a046d26e
BLAKE2b-256 bebe1387a5007d124a27f5081572413f433b6f13c72d8a8d976b5471faf987dc

See more details on using hashes here.

File details

Details for the file datashaper-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: datashaper-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.13 Linux/5.15.0-1017-azure

File hashes

Hashes for datashaper-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f38c1fb1b05088fb7ecb06a88a0ca36e6f9b78e0bd9a9e7baf9d1363c54472dd
MD5 f7f23adbcc11ae37568a0099efbefc3a
BLAKE2b-256 f3aac0df616a94dcaacf65ac4c80e5cacb2c9066b52c8c6eb6646c2340dcf981

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page