A high-performance DataFrame implementation built on top of NumPy
Project description
FastDF: High-Performance DataFrame for Python
FastDF is a lightning-fast, memory-efficient DataFrame implementation built on top of NumPy, designed to overcome the performance limitations of pandas for basic data operations.
🚀 Key Features
- Blazing Fast: Up to 126x faster data access compared to pandas
- Memory Efficient: Optimized memory usage with NumPy 2D arrays
- Pandas-Compatible: Seamless integration with existing pandas-based projects
- Minimalist: Focuses on core functionality for maximum performance
🎯 Motivation
FastDF was born out of frustration with the sluggish performance of pandas, especially when dealing with large datasets. After exploring various alternatives that either didn't work as expected or introduced complex syntax changes, we realized that for many data analysis tasks, we only need a handful of core features:
- Named columns
- Efficient slicing
- Basic operations like
shift
andany
By leveraging the power of NumPy's 2D arrays and implementing only the essential features, FastDF achieves remarkable performance improvements without sacrificing ease of use.
⚡ Performance
In our benchmarks, FastDF has shown:
- 126x faster data access compared to pandas
- Significantly faster slicing operations
- Reduced memory footprint
🛠 Installation
pip install git+https://github.com/stwrn/fastdf.git
🚦 Quick Start
from fastdf import fdf
import pandas as pd
import numpy as np
# Create a pandas DataFrame
pdf = pd.DataFrame({'A': np.random.rand(1000000), 'B': np.random.rand(1000000)})
# Convert to FastDF
fast_df = fdf.from_pandas(pdf)
# Use FastDF with familiar pandas-like syntax
print(fast_df.loc[0:5, 'A'])
print(fast_df['B'].shift(1))
print(fast_df.any())
🔄 Compatibility
FastDF is designed to be a drop-in replacement for basic pandas operations. You can easily convert your pandas DataFrame to FastDF and continue using the familiar syntax:
# Your existing pandas code
result = df.loc[1000:2000, 'column_name']
# With FastDF
fast_df = fdf.from_pandas(df)
result = fast_df.loc[1000:2000, 'column_name']
🤝 Contributing
We welcome contributions to FastDF! Whether it's bug reports, feature requests, or code contributions, please feel free to make a pull request or open an issue.
📜 License
FastDF is released under the MIT License. See the LICENSE file for more details.
🙏 Acknowledgements
Special thanks to the NumPy and pandas teams for their incredible work, which laid the foundation for this project.
FastDF is still in active development. We're excited to see how it can help accelerate your data analysis workflows!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fastdf-0.1.0.tar.gz
.
File metadata
- Download URL: fastdf-0.1.0.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40191121bc66e0d8b0dd5cec225d4aa94aedf1e69c4ca18ed25ba298d560fdd6 |
|
MD5 | 2ce97abc9264904ceda2825c83301929 |
|
BLAKE2b-256 | 952ad04f0d71146753a7b3f3ae110c7340c5f664acec625aa822794a9dfaf6da |
File details
Details for the file fastdf-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: fastdf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 440cc5cbf78b9025613966c9bdb1bb6e094007dcddc7c0b725cc4a7901fcfef8 |
|
MD5 | d83036b0b3591c3c00fec6b6a14bda9b |
|
BLAKE2b-256 | 87e0172c95821e07141ca9cdcb63fa311580c73f4b37edbe4537687e6fa38c6c |