Next-generation native DataFrame for Python - Simple like Excel, Powerful like SQL, Smart like AI

These details have not been verified by PyPI

Project links

Project description

🚀 PyFrameX

Next-Generation Native DataFrame for Python

Simple like Excel, Powerful like SQL, Smart like AI

PyFrameX is a revolutionary DataFrame engine built from scratch in pure Python. It combines the simplicity of Excel, the power of SQL, and the intelligence of machine learning into one intuitive package.

🌟 What Makes PyFrameX Different?

❌ The Problem

Pandas: Powerful but complicated (.loc, .iloc, .apply confusion)
Polars: Fast but too technical for beginners
Excel: Simple but limited in scale and automation

✅ The Solution: PyFrameX

from pyframex import Frame

# Load data - just like Excel
df = Frame("sales.csv")

# Excel-style operations
df["profit"] = df["revenue"] - df["cost"]

# SQL-style queries
df.sql("SELECT region, SUM(revenue) FROM df GROUP BY region")

# AI-powered automation
df.auto_predict(target="sales")

🎯 Key Features

1️⃣ Pure Python Native Engine

Zero dependencies for core functionality
Custom column store implementation
Type-aware operations (Int, Float, String, Date, Bool)
Automatic type inference

2️⃣ Excel-Like Simplicity

# Simple, intuitive operations
df["ratio"] = df["sales"] / df["visits"]
df["status"] = "active"

# No confusing .loc or .iloc needed!

3️⃣ Built-in SQL Engine

# Execute SQL queries directly on DataFrames
result = df.sql("""
    SELECT 
        region, 
        SUM(revenue) as total_revenue,
        AVG(profit) as avg_profit
    FROM df 
    WHERE year = 2024 
    GROUP BY region
    ORDER BY total_revenue DESC
    LIMIT 10
""")

4️⃣ AI-Powered Automation

# Automatic data cleaning
clean_df = df.auto_clean()

# Automatic predictive modeling
results = df.auto_predict(target="price")
print(f"Accuracy: {results['metrics']['accuracy']}")

# Automatic clustering
clustered = df.auto_cluster(n_clusters=3)

# Automatic feature engineering
enriched = df.auto_feature_engineering()

5️⃣ Optimized Performance

Lazy evaluation
Column-oriented storage
Cached statistics
Query optimization
Filter pushdown

📦 Installation

# Basic installation
pip install pyframex

# With ML capabilities
pip install pyframex[ml]

# Install all features
pip install pyframex[all]

🚀 Quick Start

Loading Data

from pyframex import Frame

# From CSV
df = Frame("data.csv")

# From JSON
df = Frame("data.json")

# From dictionary
df = Frame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "salary": [50000, 60000, 70000]
})

# From list of dictionaries
df = Frame([
    {"name": "Alice", "age": 25, "salary": 50000},
    {"name": "Bob", "age": 30, "salary": 60000},
    {"name": "Charlie", "age": 35, "salary": 70000}
])

Basic Operations

# View data
print(df)
print(df.head(10))
print(df.tail(5))

# Get info
print(df.summary())
print(df.shape())  # (rows, columns)
print(df.dtypes())  # Column types

# Select columns
names = df["name"]
subset = df[["name", "salary"]]

# Add/modify columns
df["bonus"] = df["salary"] * 0.1
df["total"] = df["salary"] + df["bonus"]

Filtering

# Excel-style filtering
high_earners = df.filter("salary > 60000")
young_staff = df.filter("age < 30")

# Combined conditions
filtered = df.filter("age > 25 and salary < 70000")

# Using column comparisons
mask = df["age"] > 30
filtered = df.filter(mask)

Sorting & Grouping

# Sort
sorted_df = df.sort("salary", ascending=False)

# Group by
by_region = df.groupby("region").agg({
    "revenue": "sum",
    "orders": "count"
})

# Multiple aggregations
summary = df.groupby(["region", "category"]).agg({
    "revenue": "sum",
    "profit": "mean",
    "orders": "count"
})

SQL Queries

# Simple query
result = df.sql("SELECT name, salary FROM df WHERE age > 30")

# With aggregation
result = df.sql("""
    SELECT 
        region, 
        SUM(revenue) as total,
        AVG(profit) as avg_profit
    FROM df 
    GROUP BY region
""")

# With ordering and limit
result = df.sql("""
    SELECT * FROM df 
    WHERE status = 'active' 
    ORDER BY created_date DESC 
    LIMIT 100
""")

# Explain query plan
from pyframex.query import QueryPlanner
planner = QueryPlanner()
print(planner.explain("SELECT * FROM df WHERE revenue > 1000"))

🤖 Machine Learning Integration

Auto Clean

# Automatically:
# - Remove duplicates
# - Handle missing values (median/mode imputation)
# - Remove outliers
# - Fix data types
clean_df = df.auto_clean()

Auto Predict

# Automatic model training
results = df.auto_predict(
    target="price",
    test_size=0.2
)

# Results include:
print(results['metrics'])  # Performance metrics
print(results['model'])  # Trained model
print(results['predictions'])  # Test predictions

# Feature importance
for feature, importance in results['metrics']['feature_importance'].items():
    print(f"{feature}: {importance:.4f}")

Auto Cluster

# Automatic clustering
clustered = df.auto_cluster(n_clusters=3)
print(clustered["cluster"].value_counts())

Feature Engineering

# Automatically create:
# - Polynomial features
# - Interaction terms
# - Date extractions
enriched = df.auto_feature_engineering()

Smart Suggestions

# Get transformation suggestions
suggestions = df._ml_engine.suggest_transformations(df)
for suggestion in suggestions:
    print(f"💡 {suggestion}")

🔧 Advanced Features

Column Operations

# Numeric columns
df["price"].sum()
df["price"].mean()
df["price"].median()
df["price"].min()
df["price"].max()
df["price"].std()  # Standard deviation

# String columns
df["name"].lower()
df["name"].upper()
df["name"].strip()
df["name"].contains("alice")
df["name"].replace("old", "new")
df["name"].len()  # String lengths

# Date columns
df["date"].year()
df["date"].month()
df["date"].day()
df["date"].weekday()

Mathematical Operations

# Column arithmetic
df["total"] = df["price"] * df["quantity"]
df["discount_price"] = df["price"] * 0.9
df["profit"] = df["revenue"] - df["cost"]

# Column-to-column operations
df["ratio"] = df["sales"] / df["visits"]
df["growth"] = df["current"] - df["previous"]

Data Export

# Save to CSV
df.to_csv("output.csv")

# Save to JSON
df.to_json("output.json")

# Convert to dictionary
data_dict = df.to_dict()

📊 Real-World Examples

Example 1: Sales Analysis

from pyframex import Frame

# Load sales data
df = Frame("sales.csv")

# Calculate profit
df["profit"] = df["revenue"] - df["cost"]
df["margin"] = df["profit"] / df["revenue"]

# Find top performing regions
top_regions = df.sql("""
    SELECT 
        region,
        SUM(revenue) as total_revenue,
        AVG(margin) as avg_margin
    FROM df
    GROUP BY region
    ORDER BY total_revenue DESC
    LIMIT 5
""")

print(top_regions)

Example 2: Customer Segmentation

# Load customer data
customers = Frame("customers.csv")

# Auto-clean data
customers = customers.auto_clean()

# Perform clustering
segmented = customers.auto_cluster(n_clusters=4)

# Analyze clusters
cluster_summary = segmented.groupby("cluster").agg({
    "age": "mean",
    "purchases": "sum",
    "lifetime_value": "mean"
})

print(cluster_summary)

Example 3: Predictive Modeling

# Load historical data
data = Frame("historical_sales.csv")

# Engineer features
data = data.auto_feature_engineering()

# Train model
results = data.auto_predict(target="next_month_sales")

print(f"Model R²: {results['metrics']['r2']:.4f}")
print(f"RMSE: {results['metrics']['rmse']:.2f}")

# Feature importance
for feature, importance in results['metrics']['feature_importance'].items():
    if importance > 0.05:
        print(f"  {feature}: {importance:.2%}")

🎯 Use Cases

Perfect For:

✅ Data Analysts - Excel-like simplicity with SQL power
✅ Data Scientists - Built-in ML with no setup
✅ Python Beginners - Intuitive, no steep learning curve
✅ Rapid Prototyping - Fast iteration with auto features
✅ Educational Projects - Learn data science easily
✅ Small to Medium Data - Pure Python, no heavy dependencies

Not Ideal For:

❌ Massive datasets (100M+ rows) - Use Polars/DuckDB
❌ Distributed computing - Use Spark/Dask
❌ Production big data pipelines - Use enterprise solutions

🏗️ Architecture

PyFrameX consists of 6 core components:

┌─────────────────────────────────────────┐
│           Frame (Main API)              │
│   Simple like Excel, Powerful like SQL  │
└─────────────────────────────────────────┘
                  │
        ┌─────────┴─────────┐
        │                   │
┌───────▼────────┐  ┌───────▼────────┐
│ Column Engine  │  │  Query Planner │
│ - IntColumn    │  │  - SQL Parser  │
│ - FloatColumn  │  │  - Optimizer   │
│ - StringColumn │  │  - Executor    │
│ - DateColumn   │  │  - Cache       │
│ - BoolColumn   │  └────────────────┘
└────────────────┘
        │
┌───────▼────────┐  ┌────────────────┐
│   AutoML       │  │  Visualizer    │
│ - auto_clean   │  │  - Charts      │
│ - auto_predict │  │  - Summaries   │
│ - auto_cluster │  │  - Reports     │
└────────────────┘  └────────────────┘

📈 Performance

PyFrameX is optimized for clarity and moderate-sized datasets:

Column-oriented storage for efficient operations
Lazy evaluation where possible
Cached statistics to avoid recomputation
Type-specific optimizations for each column type
Query optimization with filter pushdown

Benchmark (1M rows):

Loading CSV: ~2-3 seconds
Filtering: ~0.1-0.5 seconds
Grouping: ~0.5-1 second
SQL query: ~0.5-2 seconds

🛠️ CLI Usage

# Show DataFrame info
pyframex info data.csv

# Show first 10 rows
pyframex head data.csv -n 10

# Execute SQL query
pyframex query data.csv "SELECT * FROM df WHERE age > 30"

# Auto-clean data
pyframex clean data.csv cleaned_data.csv

# Show version
pyframex version

🤝 Contributing

Contributions are welcome! Here's how you can help:

Report bugs - Open an issue on GitHub
Suggest features - Describe your use case
Submit PRs - Fix bugs or add features
Write docs - Improve documentation
Share examples - Show how you use PyFrameX

📝 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

PyFrameX is inspired by:

Pandas - The gold standard for DataFrame operations
Polars - Modern columnar data processing
DuckDB - Fast in-process SQL
Excel - Universal data manipulation tool

📧 Contact & Support

Author: Idriss Bado
Email: idrissbadoolivier@gmail.com
GitHub: https://github.com/idrissbado/PyFrameX
Issues: GitHub Issues

🎓 Citation

If you use PyFrameX in your research, please cite:

@software{pyframex2024,
  author = {Bado, Idriss},
  title = {PyFrameX: Next-Generation Native DataFrame for Python},
  year = {2024},
  url = {https://github.com/idrissbado/PyFrameX}
}

⭐ Star History

If you find PyFrameX useful, please give it a star on GitHub! ⭐

Made with ❤️ by Idriss Bado

Simple like Excel, Powerful like SQL, Smart like AI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyframex-0.1.0.tar.gz (27.4 kB view details)

Uploaded Dec 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyframex-0.1.0-py3-none-any.whl (22.4 kB view details)

Uploaded Dec 3, 2025 Python 3

File details

Details for the file pyframex-0.1.0.tar.gz.

File metadata

Download URL: pyframex-0.1.0.tar.gz
Upload date: Dec 3, 2025
Size: 27.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for pyframex-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e141c4e37c684356f1280d1ba80d685dd3dd7a9ca2b7a3aa1978d8fc0c442592`
MD5	`9854214fcd0184a2fb676aec149a7180`
BLAKE2b-256	`513f24067d13bc94dec56c87d8655566b2a5bf291b018c5eb2d33fb11b5a01bb`

See more details on using hashes here.

File details

Details for the file pyframex-0.1.0-py3-none-any.whl.

File metadata

Download URL: pyframex-0.1.0-py3-none-any.whl
Upload date: Dec 3, 2025
Size: 22.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for pyframex-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d3a52aad2c9c272b80dae5f2600477e34065decee7391f1469321edca8017cc5`
MD5	`4de1e176347e3c68885f91ace9ac690a`
BLAKE2b-256	`dff17fabe9fd4e27e934e705c5fbac650ed6a8e280c799dd7a5285ee067285e4`

See more details on using hashes here.

pyframex 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 PyFrameX

🌟 What Makes PyFrameX Different?

❌ The Problem

✅ The Solution: PyFrameX

🎯 Key Features

1️⃣ Pure Python Native Engine

2️⃣ Excel-Like Simplicity

3️⃣ Built-in SQL Engine

4️⃣ AI-Powered Automation

5️⃣ Optimized Performance

📦 Installation

🚀 Quick Start

Loading Data

Basic Operations

Filtering

Sorting & Grouping

SQL Queries

🤖 Machine Learning Integration

Auto Clean

Auto Predict

Auto Cluster

Feature Engineering

Smart Suggestions

🔧 Advanced Features

Column Operations

Mathematical Operations

Data Export

📊 Real-World Examples

Example 1: Sales Analysis

Example 2: Customer Segmentation

Example 3: Predictive Modeling

🎯 Use Cases

Perfect For:

Not Ideal For:

🏗️ Architecture

📈 Performance

🛠️ CLI Usage

🤝 Contributing

📝 License

🙏 Acknowledgments

📧 Contact & Support

🎓 Citation

⭐ Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes