Extract data from a bunch of files and load into a table
Project description
Elbow
Elbow is a library for extracting data from a bunch of files and loading into table (and that's it).
Examples
import json
from elbow import load_table, load_parquet
# Extract records from JSON-lines
def extract(path):
with open(path) as f:
for line in path:
record = json.loads(line)
yield record
# Load as a pandas dataframe
df = load_table(
pattern="**/*.json",
extract=extract,
)
# Load as a parquet dataset (in parallel)
dset = load_parquet(
pattern="**/*.json",
extract=extract,
where="dset.parquet",
workers=8,
)
Installation
A pre-release version can be installed with
pip install git+https://github.com/clane9/elbow
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
elbow-0.1.0a0.tar.gz
(9.0 kB
view hashes)