AI-powered Python library that extends pandas to import and analyze complex, unstructured files

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Fundas

Fundamental Data Source - An AI-powered Python library that extends pandas to import and analyze complex, unstructured files.

Overview

Fundas leverages the OpenRouter API and generative AI to intelligently extract features and structured data from various file types based on simple prompts. It seamlessly converts any file into a clean pandas DataFrame for immediate analysis.

Features

📄 read_pdf() - Extract structured data from PDF documents
🖼️ read_image() - Extract data and text from images
🎵 read_audio() - Process audio files and extract information
🌐 read_webpage() - Scrape and structure web content
🎥 read_video() - Analyze video content from frames, audio, or both

All functions return pandas DataFrames, making the data ready for immediate analysis!

Installation

pip install fundas

Or install from source:

git clone https://github.com/AMSeify/fundas.git
cd fundas
pip install -e .

Quick Start

Setup

First, set your OpenRouter API key. You can either:

Option 1: Use environment variable

export OPENROUTER_API_KEY="your-api-key-here"

Option 2: Use .env file (recommended)

# Copy the example file
cp .env.example .env

# Edit .env and add your credentials:
# OPENROUTER_API_KEY=your-api-key-here
# OPENROUTER_MODEL=openai/gpt-3.5-turbo  # Optional: set default model

Option 3: Pass directly to functions

import fundas as fd

df = fd.read_pdf("document.pdf", api_key="your-api-key-here")

Basic Usage

Read PDF Files

import fundas as fd

# Extract invoice data
df = fd.read_pdf(
    "invoice.pdf",
    prompt="Extract invoice items with product name, quantity, and price"
)
print(df)

Read Images

# Extract data from a chart or screenshot
df = fd.read_image(
    "sales_chart.png",
    prompt="Extract the sales data points from this chart"
)
print(df)

# Process a receipt
df = fd.read_image(
    "receipt.jpg",
    prompt="Extract items and their prices",
    columns=["item", "price", "quantity"]
)

Read Webpages

# Scrape product information
df = fd.read_webpage(
    "https://example.com/products",
    prompt="Extract product names, descriptions, and prices"
)
print(df)

# Extract article data
df = fd.read_webpage(
    "https://news.example.com/article",
    columns=["title", "author", "date", "content"]
)

Read Audio Files

# Transcribe and extract meeting notes
df = fd.read_audio(
    "meeting.mp3",
    prompt="Extract speaker names and key discussion points"
)

Read Video Files

# Analyze video frames
df = fd.read_video(
    "presentation.mp4",
    prompt="Extract slide titles and key points from this presentation",
    from_="pics"  # Extract from video frames
)

# Process audio track
df = fd.read_video(
    "lecture.mp4",
    prompt="Transcribe the lecture and identify key topics",
    from_="audios"  # Extract from audio track
)

# Analyze both video and audio
df = fd.read_video(
    "interview.mp4",
    prompt="Extract interview questions and answers",
    from_="both"  # or from_=["pics", "audios"]
)

Advanced Usage

Specify Columns

You can specify which columns you want to extract:

df = fd.read_pdf(
    "report.pdf",
    prompt="Extract quarterly financial data",
    columns=["quarter", "revenue", "expenses", "profit"]
)

Custom AI Models

Use different AI models via OpenRouter:

# Option 1: Pass model parameter to each function
df = fd.read_image(
    "complex_diagram.png",
    prompt="Extract relationships between components",
    model="anthropic/claude-3-opus"
)

# Option 2: Set default model in .env file
# OPENROUTER_MODEL=anthropic/claude-3-sonnet
df = fd.read_image("diagram.png", prompt="Extract data")  # Uses model from .env

# Option 3: Set via environment variable
import os
os.environ["OPENROUTER_MODEL"] = "openai/gpt-4"
df = fd.read_pdf("document.pdf", prompt="Extract info")

DataFrame Operations

Since all functions return pandas DataFrames, you can immediately use pandas operations:

import fundas as fd

# Read and analyze in one workflow
df = fd.read_pdf("sales.pdf", prompt="Extract sales data")
print(df.head())
print(df.describe())
print(df.groupby('region')['sales'].sum())

Requirements

Python >= 3.8
pandas >= 1.3.0
requests >= 2.25.0
PyPDF2 >= 3.0.0
Pillow >= 10.3.0
beautifulsoup4 >= 4.9.0
opencv-python >= 4.8.1.78

Advanced Features

Caching

Fundas includes an intelligent caching system to reduce redundant API calls:

import fundas as fd

# Enable caching (enabled by default)
df = fd.read_pdf("document.pdf", prompt="Extract data")

# The same file with the same prompt will use cached results
df2 = fd.read_pdf("document.pdf", prompt="Extract data")  # No API call

# Disable caching if needed
from fundas import OpenRouterClient
client = OpenRouterClient(api_key="key", use_cache=False)

Exporting Data

Export your DataFrames with AI-powered summarization:

import fundas as fd
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({"product": ["A", "B", "C"], "sales": [100, 200, 150]})

# Export to CSV
fd.to_summarized_csv(df, "output.csv")

# Export to Excel with AI summary
fd.to_summarized_excel(
    df,
    "summary.xlsx",
    prompt="Add a summary row with totals"
)

# Generate AI summary
summary = fd.summarize_dataframe(df, prompt="Summarize sales performance")
print(summary)

Error Handling

Fundas includes robust error handling with automatic retries:

import fundas as fd

try:
    df = fd.read_pdf("document.pdf", prompt="Extract data")
except FileNotFoundError:
    print("File not found")
except ValueError as e:
    print(f"Invalid parameters: {e}")
except RuntimeError as e:
    print(f"API error: {e}")

Cache Management

Control the cache behavior:

from fundas import get_cache

cache = get_cache()

# Clear all cache entries
cache.clear()

# Clear only expired entries
cache.clear_expired()

# Disable/enable cache
cache.disable()
cache.enable()

API Reference

Read Functions

All read functions share similar parameters:

Common Parameters:

filepath or url (str | Path): Source file or URL
prompt (str): Description of what data to extract
columns (List[str], optional): Column names to extract
api_key (str, optional): OpenRouter API key
model (str, optional): AI model to use (default: gpt-3.5-turbo)

Returns: pandas DataFrame

Export Functions

All export functions accept:

Parameters:

df (pd.DataFrame): DataFrame to export
filepath (str | Path): Output file path
prompt (str, optional): AI transformation prompt
api_key (str, optional): OpenRouter API key
model (str, optional): AI model to use

`read_pdf(filepath, prompt, columns=None, api_key=None, model=None)`

Extract structured data from PDF files.

Parameters:

filepath (str | Path): Path to the PDF file
prompt (str): Description of what data to extract
columns (List[str], optional): Column names to extract
api_key (str, optional): OpenRouter API key
model (str, optional): AI model to use

Returns: pandas DataFrame

`read_image(filepath, prompt, columns=None, api_key=None, model=None)`

Extract structured data from image files.

Parameters:

filepath (str | Path): Path to the image file
prompt (str): Description of what data to extract
columns (List[str], optional): Column names to extract
api_key (str, optional): OpenRouter API key
model (str, optional): AI model to use

Returns: pandas DataFrame

`read_audio(filepath, prompt, columns=None, api_key=None, model=None)`

Extract structured data from audio files.

Parameters:

filepath (str | Path): Path to the audio file
prompt (str): Description of what data to extract
columns (List[str], optional): Column names to extract
api_key (str, optional): OpenRouter API key
model (str, optional): AI model to use

Returns: pandas DataFrame

`read_webpage(url, prompt, columns=None, api_key=None, model=None)`

Extract structured data from web pages.

Parameters:

url (str): URL of the webpage
prompt (str): Description of what data to extract
columns (List[str], optional): Column names to extract
api_key (str, optional): OpenRouter API key
model (str, optional): AI model to use

Returns: pandas DataFrame

`read_video(filepath, prompt, from_='both', columns=None, api_key=None, model=None, sample_rate=30)`

Extract structured data from video files.

Parameters:

filepath (str | Path): Path to the video file
prompt (str): Description of what data to extract
from_ (str | List[str]): Source to extract from - 'pics', 'audios', or 'both'
columns (List[str], optional): Column names to extract
api_key (str, optional): OpenRouter API key
model (str, optional): AI model to use
sample_rate (int): Frame sampling rate (default: 30)

Returns: pandas DataFrame

`to_summarized_csv(df, filepath, prompt=None, api_key=None, model=None, **kwargs)`

Export DataFrame to CSV with optional AI-powered summarization.

Parameters:

df (pd.DataFrame): DataFrame to export
filepath (str | Path): Path to save the CSV file
prompt (str, optional): Prompt to transform/summarize data
api_key (str, optional): OpenRouter API key
model (str, optional): AI model to use
**kwargs: Additional arguments for pd.DataFrame.to_csv()

`to_summarized_excel(df, filepath, prompt=None, sheet_name="Sheet1", api_key=None, model=None, **kwargs)`

Export DataFrame to Excel with optional AI-powered summarization.

`to_summarized_json(df, filepath, prompt=None, api_key=None, model=None, orient="records", **kwargs)`

Export DataFrame to JSON with optional AI-powered summarization.

`summarize_dataframe(df, prompt="Provide a summary of this data", api_key=None, model=None)`

Generate an AI-powered summary of a DataFrame.

Returns: str (AI-generated summary)

Configuration

Environment Variables

OPENROUTER_API_KEY: Your OpenRouter API key

Cache Settings

The cache is stored in ~/.fundas/cache/ by default. You can configure:

Cache directory location
Time-to-live (TTL) for cache entries
Enable/disable caching

Performance Tips

Use caching: Keep caching enabled (default) to avoid redundant API calls
Specify columns: When you know what columns you need, specify them explicitly
Choose the right model: Balance speed, cost, and accuracy by selecting appropriate models
Batch operations: Process multiple files in sequence to leverage cache warming

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! We appreciate bug fixes, new features, documentation improvements, and more.

Please see our Contributing Guide for details on:

Setting up your development environment
Coding standards and style guide
Testing requirements
Pull request process

Quick start:

Fork the repository
Create a feature branch
Make your changes with tests
Submit a pull request

Support

For issues and questions, please open an issue on GitHub.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AMSeify

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Nov 26, 2025

0.1.0

Nov 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fundas-0.1.1.tar.gz (1.1 MB view details)

Uploaded Nov 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fundas-0.1.1-py3-none-any.whl (17.0 kB view details)

Uploaded Nov 26, 2025 Python 3

File details

Details for the file fundas-0.1.1.tar.gz.

File metadata

Download URL: fundas-0.1.1.tar.gz
Upload date: Nov 26, 2025
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fundas-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`471f618205e94aa195088fbc434af34c7824496d3e87ae5ca42e351f5c461548`
MD5	`3b8deb2c25ca86f856d9f5cbbef798b6`
BLAKE2b-256	`62ac76f558e5e19929f5472ab9de84aebb06e2f70166ffa609085cf26a6e9771`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fundas-0.1.1.tar.gz:

Publisher: publish.yml on AMSeify/fundas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fundas-0.1.1.tar.gz
- Subject digest: 471f618205e94aa195088fbc434af34c7824496d3e87ae5ca42e351f5c461548
- Sigstore transparency entry: 726994458
- Sigstore integration time: Nov 26, 2025
Source repository:
- Permalink: AMSeify/fundas@a3dd175aa2163ff037d746cb2ab3dc703979f0bc
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/AMSeify
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a3dd175aa2163ff037d746cb2ab3dc703979f0bc
- Trigger Event: release

File details

Details for the file fundas-0.1.1-py3-none-any.whl.

File metadata

Download URL: fundas-0.1.1-py3-none-any.whl
Upload date: Nov 26, 2025
Size: 17.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fundas-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6054449a3c80f411177fe5761f1a382e08b7534473d55ef454ebbbd4bda15054`
MD5	`febf83a8c8f42f910650d64b98d2b55e`
BLAKE2b-256	`c8d661e9ac7e1232ba99e15ddceec0fafe82a91eb661bd7ee57fbefcc3344874`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fundas-0.1.1-py3-none-any.whl:

Publisher: publish.yml on AMSeify/fundas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fundas-0.1.1-py3-none-any.whl
- Subject digest: 6054449a3c80f411177fe5761f1a382e08b7534473d55ef454ebbbd4bda15054
- Sigstore transparency entry: 726994481
- Sigstore integration time: Nov 26, 2025
Source repository:
- Permalink: AMSeify/fundas@a3dd175aa2163ff037d746cb2ab3dc703979f0bc
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/AMSeify
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a3dd175aa2163ff037d746cb2ab3dc703979f0bc
- Trigger Event: release

fundas 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Fundas

Overview

Features

Installation

Quick Start

Setup

Basic Usage

Read PDF Files

Read Images

Read Webpages

Read Audio Files

Read Video Files

Advanced Usage

Specify Columns

Custom AI Models

DataFrame Operations

Requirements

Advanced Features

Caching

Exporting Data

Error Handling

Cache Management

API Reference

Read Functions

Export Functions

read_pdf(filepath, prompt, columns=None, api_key=None, model=None)

read_image(filepath, prompt, columns=None, api_key=None, model=None)

read_audio(filepath, prompt, columns=None, api_key=None, model=None)

read_webpage(url, prompt, columns=None, api_key=None, model=None)

read_video(filepath, prompt, from_='both', columns=None, api_key=None, model=None, sample_rate=30)

to_summarized_csv(df, filepath, prompt=None, api_key=None, model=None, **kwargs)

to_summarized_excel(df, filepath, prompt=None, sheet_name="Sheet1", api_key=None, model=None, **kwargs)

to_summarized_json(df, filepath, prompt=None, api_key=None, model=None, orient="records", **kwargs)

summarize_dataframe(df, prompt="Provide a summary of this data", api_key=None, model=None)

Configuration

Environment Variables

Cache Settings

Performance Tips

License

Contributing

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`read_pdf(filepath, prompt, columns=None, api_key=None, model=None)`

`read_image(filepath, prompt, columns=None, api_key=None, model=None)`

`read_audio(filepath, prompt, columns=None, api_key=None, model=None)`

`read_webpage(url, prompt, columns=None, api_key=None, model=None)`

`read_video(filepath, prompt, from_='both', columns=None, api_key=None, model=None, sample_rate=30)`

`to_summarized_csv(df, filepath, prompt=None, api_key=None, model=None, **kwargs)`

`to_summarized_excel(df, filepath, prompt=None, sheet_name="Sheet1", api_key=None, model=None, **kwargs)`

`to_summarized_json(df, filepath, prompt=None, api_key=None, model=None, orient="records", **kwargs)`

`summarize_dataframe(df, prompt="Provide a summary of this data", api_key=None, model=None)`