A Python library for data manipulation and analysis
Project description
DataLib Project
DataLib is a Python library designed to simplify data manipulation and analysis in various projects. It provides features catering to a wide range of users, from beginners exploring the basics of data processing to experts seeking advanced tools for statistical analysis and machine learning models.
Installation
You can install DataLib using pip:
pip install datalib
Features
Data Manipulation
- Load and process CSV files (read, write, filters).
- Data transformations (normalization, handling missing values).
Statistical Computations
- Mean, median, mode, standard deviation, correlation.
- Basic statistical tests (t-test, chi-square test).
Data Visualization
- Generate simple graphs (bar charts, histograms, scatter plots).
- Support for advanced visualizations like correlation matrices.
Advanced Analysis
- Linear and polynomial regression models.
- Supervised classification algorithms (k-NN, decision trees).
- Unsupervised methods (k-means, principal component analysis).
Usage
from datalib.data_manipulation import normalize_column
from datalib.visualization import plot_histogram
import pandas as pd
# Load and normalize data
data = pd.DataFrame({"values": [1, 2, 3, 4, 5]})
normalized = normalize_column(data, "values", method="minmax")
# Create visualization
fig = plot_histogram(data, "values")
fig.savefig("histogram.png")
Development
Setting up the development environment
- Clone the repository:
git clone https://github.com/NaderFerjani/datalib.git
cd datalib
- Install development dependencies:
pip install -e ".[test,doc]"
Running tests
pytest tests/
Building documentation
cd docs
python -m sphinx -b html . _build/html
Versioning
DataLib follows Semantic Versioning. Version numbers follow the format MAJOR.MINOR.PATCH:
- MAJOR version for incompatible API changes
- MINOR version for new functionality in a backward compatible manner
- PATCH version for backward compatible bug fixes
Creating a new release
- Update version:
python scripts/release.py [major|minor|patch]
- Review and commit changes
- Push to GitHub with tags
- The GitHub Actions workflow will automatically publish to PyPI
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Author
- Nader Ferjani - GitHub
- Email: ferjani.nader@hotmail.fr
Project Goals
The main objective is to develop a professional packaging system for the DataLib library, enabling:
- Easy and intuitive installation through package managers like
pip. - Distribution on platforms like PyPI (Python Package Index).
- Integrated, clear, and accessible documentation.
Work Plan
1. Project Structure
- Organize the source code in a modular format (e.g.,
src/directory). - Define essential files like
setup.py,pyproject.toml, orsetup.cfg.
2. Dependencies and Compatibility
- Identify and include necessary dependencies (e.g.,
numpy,pandas,matplotlib,scikit-learn). - Ensure compatibility with recent Python versions.
3. Documentation
- Write a detailed
README.mdorREADME.rstoutlining the library's usage and features. - Add concrete usage examples.
- Generate technical documentation using tools like Sphinx.
4. Testing
- Write unit tests for main functions using
pytest. - Integrate CI/CD workflows (e.g., GitHub Actions) to validate changes.
5. Publication
- Prepare and publish the library on PyPI.
- Regularly update the version following semantic versioning (SemVer).
Deliverables
- A functional library distributable via
pip. - Online documentation (e.g., hosted on Read the Docs).
- Automated tests and code quality monitoring.
Evaluation Criteria
- Packaging Quality: Ease of installation and compatibility.
- Documentation Clarity: Completeness and ease of understanding.
- Functionality and Robustness: Reliability of the library's tools.
- Test Coverage: Quality and extent of automated testing.
DataLib aims to become a reliable and user-friendly library for data enthusiasts and professionals alike, enhancing the Python data ecosystem.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nf_datalib-0.1.0.tar.gz.
File metadata
- Download URL: nf_datalib-0.1.0.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86680fec3237b1ef0c0f50c03807c8892ab73d9e7f0fef902830f664ae72618d
|
|
| MD5 |
d63343b45142b5e288da8a9920e1bfd1
|
|
| BLAKE2b-256 |
a8a8dd5fd639fb3b43a8146493c8ea1fc395c234daf1452ecde5207455877927
|
File details
Details for the file nf_datalib-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nf_datalib-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ee6f112a052ce782c8c48b39dca8be1bf37cbd80e7408d78d16b849616ef650
|
|
| MD5 |
baa4c6e328b721c4252b0b4f5a36cc7f
|
|
| BLAKE2b-256 |
c8513550b49270e520e7513d6e8911bd88576fdfadb1008cb49d6f24610bd359
|