Skip to main content

Stream and query zipped datasets using LLMs

Project description

zipstream-ai

PyPI - Python Version PyPI License Docs Tests mypy code style: black

Stream, Parse, and Chat with Compressed Datasets Using LLMs

zipstream-ai is a Python package that lets you interact with .zip and .tar.gz files directlyโ€”no need to extract them manually. It integrates archive streaming, format detection, data parsing (e.g., CSV, JSON), and natural language querying with LLMs like Gemini, all through a unified interface.


Installation

pip install zipstream-ai

Features

Feature Description
๐Ÿ“‚ Archive Streaming Stream .zip and .tar.gz files without extraction
๐Ÿ” Format Auto-Detection Automatically detects file types (CSV, JSON, TXT, etc.)
๐Ÿ“Š DataFrame Integration Parses tabular data directly into pandas DataFrames
๐Ÿ’ฌ LLM Querying Ask questions about your data using Gemini (Google's LLM)
๐Ÿงฉ Modular Design Easily extensible for new formats or models
๐Ÿ–ฅ๏ธ Python + CLI Support Use via command line or as a Python package

Use Case Examples

1. Load & Explore ZIP

from zipstream_ai import ZipStreamReader

reader = ZipStreamReader("dataset.zip")
print(reader.list_files())

2. Parse CSV from ZIP

from zipstream_ai import FileParser

parser = FileParser(reader)
df = parser.load("data.csv")
print(df.head())

3. Ask Questions with Gemini

from zipstream_ai import ask

response = ask(df, "Which 3 rows have the highest 'score'?")
print(response)

Why zipstream-ai?

Traditional Workflow With zipstream-ai
Manually unzip files Stream directly from archive
Write boilerplate code to parse Built-in file parsers (CSV, JSON, etc.)
Switch between tools for LLMs One-liner ask(df, question) integration

Architecture Diagram

         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚  .zip/.tar   โ”‚
         โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  ZipStreamReader    โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚
     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
     โ”‚   FileParser    โ”‚โ”€โ”€โ”€โ”€>  pd.DataFrame
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚
     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
     โ”‚     ask()       โ”‚โ”€โ”€โ”€โ”€> Gemini LLM Output
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zipstream_ai-1.0.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zipstream_ai-1.0.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file zipstream_ai-1.0.0.tar.gz.

File metadata

  • Download URL: zipstream_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for zipstream_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 23ca1c8a964b9f7f1a4b61e94acd84b342b6b7a3a83bea16e6867e36261f6ddb
MD5 6bf8572a68db0d9c677e807cf9c21c6c
BLAKE2b-256 1b517ee18f7a5b688353c5a194a406d8f04b980239fe3092729c9ffc2dbe8c1d

See more details on using hashes here.

File details

Details for the file zipstream_ai-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: zipstream_ai-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for zipstream_ai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 deb60fb4dcd5eb40671c6f1c3d708e2a5ea518e040e37b33daa174351b158371
MD5 0af4093a68eabc3de29980dedc73356d
BLAKE2b-256 7c647de0faf9083ef4bd4f19899af1568e1f30984c1eb8b68fd288d11671898c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page