A framework that allows the definition of data transformations in a composable way, agnostic of data processing engine.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Nanbi

Nanbiquara: speech of smart people, of clever people

Translated from the Tupi Guarani Illustrated Dictionary

Nanbi is a framework that allows you to define data transformations in a composable way, agnostic of data processing engine (Pandas, mySQL, Spark etc).

Its syntax is sql-like, inspired by PySpark and Scala-Spark approaches
It allows you to define a set of data transformations in a more composable way than SQL, for example, allowing for better readability specially on complex queries
It allows you to execute your data transformations definitions in multiple engines (Pandas, mySQL, Spark etc) without having to change the data transformation definition

Nanbi is right now under the initial stages of development. It's not fully ready for a version 1. So far, there is no compatibility with engines other than Pandas.

Please get in touch if you have interest in using Nanbi on your work or personal project. Feature requests are welcome.

Setup

While the library isn't published in PyPI

Clone the repo
Create a symlink to the repo

TODO(eemery): Add installation details once package gets published in PyPI

Getting Started

1. Creating a DataFrame

Nanbi uses the concept of a DataFrame to represent a table and its annotations (or metadata). Currently, Nanbi supports the creation of DataFrames from Pandas DataFrames and CSV files (using Pandas behind the scenes).

From a Pandas DataFrame

import pandas as pd
import nanbi.connectors.pandas as nb

pandas_df = pd.DataFrame({"num_a": [10, 50, 20, 50, 20],
                          "num_b": [41, 51, 21, 31, 11]})

df = nb.from_data_frame(pandas_df)

From a CSV file (with Pandas)

import nanbi.connectors.pandas as nb

df = nb.from_csv("path/to/my-file.csv")

Viewing your imported data

To visualize the imported or created data, just use the .display() method:

import nanbi.connectors.pandas as nb

df = nb.from_csv("path/to/my-file.csv")

df.display()

The output will be a Pandas DataFrame, for example:

  col_a col_b
0 50    51
1 50    31
2 20    21
3 20    11
4 10    51

2. Enriching tables (`.with_columns()`)

Nanbi goal is to allow you to define data transformations to enrich your table with derived data in a composable way. One of the main ways that you can achieve this, is by the use of the .with_column() method. It creates a new column in your table according to the transformation formula you gave it. For example:

import nanbi.connectors.pandas as nb

df = nb.from_csv("path/to/my-file.csv")

enriched_df = df.with_column("result", col("col_a") + col("col_a"))

enriched_df.display()

The output will be a Pandas DataFrame in the form of:

  col_a col_b result
0 50    51    101
1 50    31    81
2 20    21    41
3 20    11    31
4 10    51    61

Chaining Transformations

One improvement that we can make to the code above is to take advantage of chaining transformations. We could have written the above code like:

import nanbi.connectors.pandas as nb

df = nb.from_csv("path/to/my-file.csv")
       .with_column("result", col("col_a") + col("col_a"))

df.display()

Improving Transformations Readability and Reusability

Another improvement that can be done, specially when transformations get complex, is to move the formula definition (i.e., col("col_a") + col("col_a")) to its own variable. In the code above, this would look like:

import nanbi.connectors.pandas as nb

my_complex_formula = col("col_a") + col("col_a")

df = nb.from_csv("path/to/my-file.csv")
       .with_column("result", my_complex_formula)

df.display()

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.1

Mar 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanbi-0.0.1.tar.gz (11.2 kB view hashes)

Uploaded Mar 13, 2023 Source

Built Distribution

nanbi-0.0.1-py3-none-any.whl (11.8 kB view hashes)

Uploaded Mar 13, 2023 Python 3

Hashes for nanbi-0.0.1.tar.gz

Hashes for nanbi-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`537d714885c8a33cb54a1edc21d85601c34b7fd5da8121510aa6d074c4675f6e`
MD5	`d402818491797801e30e0a6445ebcd6e`
BLAKE2b-256	`4e87548143dfe0f9d524f5f57f020683f3eb4e0378c1b5abc8b1b0d3ead8746f`

Hashes for nanbi-0.0.1-py3-none-any.whl

Hashes for nanbi-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`42a7906c115ab9f612c5f5e596f20fae2ac42e9046ec0cee0af86097b79b776f`
MD5	`00bffb7798562c4b2c5508d149cf47c8`
BLAKE2b-256	`112f033487113bd30ca32624f2cb065c53670724043e5e6932eb6d0715755522`

nanbi 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Nanbi

Setup

Getting Started

1. Creating a DataFrame

2. Enriching tables (`.with_columns()`)

Chaining Transformations

Improving Transformations Readability and Reusability

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

nanbi 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Nanbi

Setup

Getting Started

1. Creating a DataFrame

2. Enriching tables (.with_columns())

Chaining Transformations

Improving Transformations Readability and Reusability

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

2. Enriching tables (`.with_columns()`)