Datastore supporting merql
Project description
I took a career break to build this. If you like it and are looking to hire a ML engineer, please contact me :)
CAUTION: PROTOTYPE! NO PRODUCTION USE, not even development.
Merdb is a data processing library that
- is a relational api like SQL to query data
- has Unix like pipes to compose operators using the
|
syntax - scales to multi core or a cluster(via Modin)
- processes data too big to fit into memory
- support interactive and optimized processing(optimizations in roadmap)
Install
pip install merdb
Example
import pandas as pd
from merdb.interactive import *
# for lazy(TBD) use `from merdb.lazy import *`
def is_senior(row) -> bool:
return row['age'] > 35
def double_age(row) -> int:
return row["age"] * 2
cols = ["name", "age"]
people_df = pd.DataFrame([
["Raj", 35],
["Sona", 20],
["Abby", 70],
["Abba", 90],
], columns=cols)
# One can specify functions without any source data like quadruple age
quadruple_age = map(double_age, "age") | map(double_age, "age")
result = (t(people_df) # convert people_df to a merdb table
| where(is_senior)
| order_by("name", "asc")
| quadruple_age # Unix like pipe syntax making it easy to refactor out intermediate processing
| select("age")
| rename({"age": "new_age"})
)
# Convert to Pandas Dataframe and print
print(result.df())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
merdb-0.0.2.tar.gz
(11.7 kB
view hashes)
Built Distribution
merdb-0.0.2-py3-none-any.whl
(9.8 kB
view hashes)