Skip to main content

Stream and query zipped datasets using LLMs

Project description

zipstream-ai logo

PyPI - Python Version PyPI Conda License Tests mypy

Stream, Parse, and Chat with Compressed Datasets Using LLMs

zipstream-ai is a Python package that lets you interact with .zip and .tar.gz files directly—no need to extract them manually. It integrates archive streaming, format detection, data parsing (e.g., CSV, JSON), and natural language querying with LLMs like Gemini, all through a unified interface.


Installation

Option 1: Install from PyPI (Recommended)

pip install zipstream-ai

Option 2: Install from Conda

# Install from conda
conda install -c pranav_motarwar zipstream-ai

# Install PyPI-only dependencies (required)
pip install openai typer python-dotenv google-generativeai

Note: The conda package includes core dependencies, but you'll need to install PyPI-only dependencies (openai, typer, python-dotenv, google-generativeai) separately via pip.


Features

Feature Description
Archive Streaming Stream .zip and .tar.gz files without extraction
Format Auto-Detection Automatically detects file types (CSV, JSON, TXT, etc.)
DataFrame Integration Parses tabular data directly into pandas DataFrames
LLM Querying Ask questions about your data using Gemini (Google's LLM)
Modular Design Easily extensible for new formats or models
Python + CLI Support Use via command line or as a Python package

Use Case Examples

1. Load & Explore ZIP

from zipstream_ai import ZipStreamReader

reader = ZipStreamReader("dataset.zip")
print(reader.list_files())

2. Parse CSV from ZIP

from zipstream_ai import FileParser

parser = FileParser(reader)
df = parser.load("data.csv")
print(df.head())

3. Ask Questions with Gemini

from zipstream_ai import ask

response = ask(df, "Which 3 rows have the highest 'score'?")
print(response)

Why zipstream-ai?

Traditional Workflow With zipstream-ai
Manually unzip files Stream directly from archive
Write boilerplate code to parse Built-in file parsers (CSV, JSON, etc.)
Switch between tools for LLMs One-liner ask(df, question) integration

Architecture Diagram

         ┌──────────────┐
         │  .zip/.tar   │
         └────┬─────────┘
              │
   ┌──────────▼──────────┐
   │  ZipStreamReader    │
   └──────────┬──────────┘
              │
     ┌────────▼────────┐
     │   FileParser    │────>  pd.DataFrame
     └────────┬────────┘
              │
     ┌────────▼────────┐
     │     ask()       │────> Gemini LLM Output
     └─────────────────┘

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zipstream_ai-1.0.1.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zipstream_ai-1.0.1-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file zipstream_ai-1.0.1.tar.gz.

File metadata

  • Download URL: zipstream_ai-1.0.1.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for zipstream_ai-1.0.1.tar.gz
Algorithm Hash digest
SHA256 20f112fcad56191c54b13691fc677e5ebb06b926f6e4d687070c1c81a6428f20
MD5 0f9a0923b9b60ba556936a6322814905
BLAKE2b-256 329ced0c1a7f8f5919847e4a18c77fa503a9834b33d0edefb78b0af8761eef17

See more details on using hashes here.

File details

Details for the file zipstream_ai-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: zipstream_ai-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for zipstream_ai-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb32a776de30c27a350ae56c03d5a69bd04582cc149555524060922dd426f6d0
MD5 0a06272caa9cabcdbef44ed01ca46a29
BLAKE2b-256 4112dc98d075b84a1d052576a59c673fcff816b101a833c713c9a763eb65f3cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page