Skip to main content

Tool to adapt multiple dataframes to one unique format

Project description

Data + Adapter

Dapter is a convenient tool that helps working with multiple data sources. It allows you to easily rename column names and transform your data in one go.

With Dapter, you can store a series of instructions for your data cleaning routines into custom objects. You can then reuse the object to any DataFrames at any part of your code. See the step-by-step example below.

📝 Example

Renaming columns and adding transformations can be "lazily" set-up in a tuple:

import pandas as pd
from dapter import accepts

def convert_to_eur(col: pd.Series) -> pd.Series:
    return col * 0.92

eur_col = (accepts("Amount USD", "amount_usd","USD"), convert_to_eur)

euro_col is a series of instructions that will tell dapter to:

  • Consider any column that is named after one of the names in accepts
  • Apply convert_to_eur to those columns

Once we have defined all the column "instructions" we can then store them together in a custom object that inherits from dapter.BaseMapper

from dapter import BaseMapper

class TransactionMapper(BaseMapper):
    amount_eur = euro_col

We have just defined that all instructions of euro_col will be assigned to a new column called amount_eur.

This object can then be used to apply all the renaming and transformations stored inside it to any DataFrame

mapper = TransactionMapper()

dfs = mapper.apply(df1, df2, df3)
df = pd.concat(dfs)

🧰 Installation

Using pip:

pip install dapter

🔄 Infinite DataFrame compatibility

Dapter uses narwhals in the background so it can accepts any (See supported[^1]) kind of DataFrame libraries.

Which means you can define Polars Series and Expr transformations for pandas' Series and vice-versa!

You can also feed any DataFrame to the apply method.

[^1]: cuDF, Modin, pandas, Polars, PyArrow, Dask, Ibis, Vaex

Full sample code

from dapter import BaseMapper, accepts, accepts_anycases
import pandas as pd

df1 = pd.DataFrame(
    [
        {
            "Date": "2023-02-01 10:00:01",
            "Vendor Name": "Golden Oil LLC",
            "Amount USD": 49.99,
            "Category": "Personal",
        }
    ]
)

df2 = pd.DataFrame(
    [
        {
            "transaction_date": "2023-03-01 10:00:01",
            "vendor_name": "Get Cars Inc.",
            "amount_usd": 2999.9,
            "category": "Transportation",
        }
    ]
)
df3 = pd.DataFrame(
    [
        {
            "DATE": "2023-04-01 10:00:01",
            "VENDOR_NAME": "Maintainers Exc.",
            "USD": 5249.0,
            "CAT": "Personal",
        }
    ]
)


def convert_to_eur(col: pd.Series) -> pd.Series:
    return col * 0.92

def clean_str(col:pd.Series) -> pd.Series:
    return col.str.to_lower().str.replace(" ","_")

class TransactionMapper(BaseMapper):
    transaction_date = accepts("transaction_date", "Date","DATE")
    vendor_name = accepts_anycases()    
    amount_eur = accepts("Amount USD", "amount_usd","USD"), convert_to_eur
    category = accepts("Category", "category","CAT"), clean_str

mapper = TransactionMapper()

dfs = mapper.apply(df1, df2, df3)
df = pd.concat(dfs)
df
transaction_date vendor_name amount_eur category
2023-02-01 10:00:01 Golden Oil LLC 45.99 personal
2023-03-01 10:00:01 Get Cars Inc. 2999.9 transportation
2023-04-01 10:00:01 Maintainers Exc. 5249.0 personal

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dapter-0.1.1.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

dapter-0.1.1-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file dapter-0.1.1.tar.gz.

File metadata

  • Download URL: dapter-0.1.1.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for dapter-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2d993235640178edffa8d46c1ad5e6df832cd8d3eb74f2b745675afa39c3a385
MD5 0298f743cbc618bf3ec41b62e08110b6
BLAKE2b-256 d32ac225a5c654bcaaf3c555c2e989a7c459eb822a08a59f999848b33559b7e4

See more details on using hashes here.

File details

Details for the file dapter-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dapter-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for dapter-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd6bb7b392cc7f8650a7702194ec22ce3dbff7b4481455581652f9da39f18e88
MD5 984f3d7486998fe185c51287ca68bef0
BLAKE2b-256 d2f6f54a49d801775eea88f146cb44202deb2d8bb19922bd0610838c5c539d62

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page