Summarize data in Dask DataFrames.
Project description
TSum - Table Summarization
Given a table where rows correspond to records and columns correspond to attributes, we want to find a small number of patterns that succinctly summarize the dataset.
TSum is a table summarization algorithm published by Google Research. This is a Python implementation of the algorithm using Dask Dataframes for scale.
Usage
import dask.dataframe as dd
from tsum import summarize, Pattern
from dask.distributed import LocalCluster
cluster = LocalCluster(n_workers=1, nthreads=8, diagnostics_port=8787)
client = cluster.get_client()
ddf: dd.DataFrame = ...
patterns: list[Pattern] = summarize(ddf=ddf)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tsum-0.1.0.tar.gz
(7.7 kB
view hashes)
Built Distribution
tsum-0.1.0-py3-none-any.whl
(14.0 kB
view hashes)