Skip to main content

Summarize data in Dask DataFrames.

Project description

TSum - Table Summarization

Given a table where rows correspond to records and columns correspond to attributes, we want to find a small number of patterns that succinctly summarize the dataset.

TSum is a table summarization algorithm published by Google Research. This is a Python implementation of the algorithm using Dask Dataframes for scale.

Usage

import dask.dataframe as dd
from tsum import summarize, Pattern
from dask.distributed import LocalCluster

cluster = LocalCluster(n_workers=1, nthreads=8, diagnostics_port=8787)
client = cluster.get_client()
ddf: dd.DataFrame = ...
patterns: list[Pattern] = summarize(ddf=ddf)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsum-0.1.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

tsum-0.1.0-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file tsum-0.1.0.tar.gz.

File metadata

  • Download URL: tsum-0.1.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.19 Linux/6.5.0-26-generic

File hashes

Hashes for tsum-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4828d78e827290848a19c028dce1056abca56a75a4c62fea3b0d50bac2f4eb67
MD5 9f33d1eb019852e10e6d727b3a49320f
BLAKE2b-256 049e51a253c6bb4690411190d5acaa78d324ec11fd62e25cb0ef96877483b432

See more details on using hashes here.

File details

Details for the file tsum-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tsum-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.19 Linux/6.5.0-26-generic

File hashes

Hashes for tsum-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8714a2e3e34d229bf584ddec635b3259ef1f3b9c58a53b11842edebc38f27bee
MD5 884cea66f693887829e87e5fa2101e0b
BLAKE2b-256 f128c22302bfd944aa9f0b4616b4a457a51de7f89de104118b552338472d8c3b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page