Tool to adapt multiple dataframes to one unique format
Project description
Data + Adapter
Dapter is a convenient tool that helps working with multiple data sources. It allows you to easily rename column names and transform your data in one go.
With Dapter, you can store a series of instructions for your data cleaning routines into custom objects. You can then reuse the object to any DataFrames at any part of your code. See the step-by-step example below.
📝 Example
Renaming columns and adding transformations can be "lazily" set-up in a tuple:
import pandas as pd
from dapter import accepts
def convert_to_eur(col: pd.Series) -> pd.Series:
return col * 0.92
eur_col = (accepts("Amount USD", "amount_usd","USD"), convert_to_eur)
euro_col
is a series of instructions that will tell dapter
to:
- Consider any column that is named after one of the names in
accepts
- Apply
convert_to_eur
to those columns
Once we have defined all the column "instructions" we can then store them together in a custom object that inherits from dapter.BaseMapper
from dapter import BaseMapper
class TransactionMapper(BaseMapper):
amount_eur = euro_col
We have just defined that all instructions of euro_col
will be assigned to a new column called amount_eur
.
This object can then be used to apply all the renaming and transformations stored inside it to any DataFrame
mapper = TransactionMapper()
dfs = mapper.apply(df1, df2, df3)
df = pd.concat(dfs)
🧰 Installation
Using pip:
pip install dapter
🔄 Infinite DataFrame compatibility
Dapter uses narwhals in the background so it can accepts any (See supported[^1]) kind of DataFrame libraries.
Which means you can define Polars Series
and Expr
transformations for pandas' Series
and vice-versa!
You can also feed any DataFrame to the apply
method.
[^1]: cuDF, Modin, pandas, Polars, PyArrow, Dask, Ibis, Vaex
Full sample code
from dapter import BaseMapper, accepts, accepts_anycases
import pandas as pd
df1 = pd.DataFrame(
[
{
"Date": "2023-02-01 10:00:01",
"Vendor Name": "Golden Oil LLC",
"Amount USD": 49.99,
"Category": "Personal",
}
]
)
df2 = pd.DataFrame(
[
{
"transaction_date": "2023-03-01 10:00:01",
"vendor_name": "Get Cars Inc.",
"amount_usd": 2999.9,
"category": "Transportation",
}
]
)
df3 = pd.DataFrame(
[
{
"DATE": "2023-04-01 10:00:01",
"VENDOR_NAME": "Maintainers Exc.",
"USD": 5249.0,
"CAT": "Personal",
}
]
)
def convert_to_eur(col: pd.Series) -> pd.Series:
return col * 0.92
def clean_str(col:pd.Series) -> pd.Series:
return col.str.to_lower().str.replace(" ","_")
class TransactionMapper(BaseMapper):
transaction_date = accepts("transaction_date", "Date","DATE")
vendor_name = accepts_anycases()
amount_eur = accepts("Amount USD", "amount_usd","USD"), convert_to_eur
category = accepts("Category", "category","CAT"), clean_str
mapper = TransactionMapper()
dfs = mapper.apply(df1, df2, df3)
df = pd.concat(dfs)
df
transaction_date | vendor_name | amount_eur | category |
---|---|---|---|
2023-02-01 10:00:01 | Golden Oil LLC | 45.99 | personal |
2023-03-01 10:00:01 | Get Cars Inc. | 2999.9 | transportation |
2023-04-01 10:00:01 | Maintainers Exc. | 5249.0 | personal |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dapter-0.1.1.tar.gz
.
File metadata
- Download URL: dapter-0.1.1.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d993235640178edffa8d46c1ad5e6df832cd8d3eb74f2b745675afa39c3a385 |
|
MD5 | 0298f743cbc618bf3ec41b62e08110b6 |
|
BLAKE2b-256 | d32ac225a5c654bcaaf3c555c2e989a7c459eb822a08a59f999848b33559b7e4 |
File details
Details for the file dapter-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: dapter-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd6bb7b392cc7f8650a7702194ec22ce3dbff7b4481455581652f9da39f18e88 |
|
MD5 | 984f3d7486998fe185c51287ca68bef0 |
|
BLAKE2b-256 | d2f6f54a49d801775eea88f146cb44202deb2d8bb19922bd0610838c5c539d62 |