Skip to main content

Streaming DataFrame: streaming over pandas.

Project description

Build Status Windows https://dev.azure.com/xavierdupre3/pandas_streaming/_apis/build/status/sdpython.pandas_streaming https://badge.fury.io/py/pandas_streaming.svg MIT License https://codecov.io/gh/sdpython/pandas-streaming/branch/main/graph/badge.svg?token=0caHX1rhr8 GitHub Issues Downloads Forks Stars size

pandas-streaming aims at processing big files with pandas, too big to hold in memory, too small to be parallelized with a significant gain. The module replicates a subset of pandas API and implements other functionalities for machine learning.

from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_csv("filename", sep="\t", encoding="utf-8")

for df in sdf:
    # process this chunk of data
    # df is a dataframe
    print(df)

The module can also stream an existing dataframe.

import pandas
df = pandas.DataFrame([dict(cf=0, cint=0, cstr="0"),
                       dict(cf=1, cint=1, cstr="1"),
                       dict(cf=3, cint=3, cstr="3")])

from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_df(df)

for df in sdf:
    # process this chunk of data
    # df is a dataframe
    print(df)

It contains other helpers to split datasets into train and test with some weird constraints.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_streaming-0.5.2.tar.gz (35.6 kB view details)

Uploaded Source

File details

Details for the file pandas_streaming-0.5.2.tar.gz.

File metadata

  • Download URL: pandas_streaming-0.5.2.tar.gz
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pandas_streaming-0.5.2.tar.gz
Algorithm Hash digest
SHA256 18022bed8adcc55db17ed4851be26de3d6fca5888152c674f82df78c5199f9fd
MD5 5f3c4c50c1a0047f6ddea6fb8e2a59d6
BLAKE2b-256 c755e0096d6755c779a5e9be9f16d2e9b683dac7d2f5d0fbb5934562274ba312

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page