A Python library for data manipulation and analysis

Project description

DataLib Project

DataLib is a Python library designed to simplify data manipulation and analysis in various projects. It provides features catering to a wide range of users, from beginners exploring the basics of data processing to experts seeking advanced tools for statistical analysis and machine learning models.

Installation

You can install DataLib using pip:

pip install datalib

Features

Data Manipulation

Load and process CSV files (read, write, filters).
Data transformations (normalization, handling missing values).

Statistical Computations

Mean, median, mode, standard deviation, correlation.
Basic statistical tests (t-test, chi-square test).

Data Visualization

Generate simple graphs (bar charts, histograms, scatter plots).
Support for advanced visualizations like correlation matrices.

Advanced Analysis

Linear and polynomial regression models.
Supervised classification algorithms (k-NN, decision trees).
Unsupervised methods (k-means, principal component analysis).

Usage

from datalib.data_manipulation import normalize_column
from datalib.visualization import plot_histogram
import pandas as pd

# Load and normalize data
data = pd.DataFrame({"values": [1, 2, 3, 4, 5]})
normalized = normalize_column(data, "values", method="minmax")

# Create visualization
fig = plot_histogram(data, "values")
fig.savefig("histogram.png")

Development

Setting up the development environment

Clone the repository:

git clone https://github.com/NaderFerjani/datalib.git
cd datalib

Install development dependencies:

pip install -e ".[test,doc]"

Running tests

pytest tests/

Building documentation

cd docs
python -m sphinx -b html . _build/html

Versioning

DataLib follows Semantic Versioning. Version numbers follow the format MAJOR.MINOR.PATCH:

MAJOR version for incompatible API changes
MINOR version for new functionality in a backward compatible manner
PATCH version for backward compatible bug fixes

Creating a new release

Update version:

python scripts/release.py [major|minor|patch]

Review and commit changes
Push to GitHub with tags
The GitHub Actions workflow will automatically publish to PyPI

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Author

Nader Ferjani - GitHub
Email: ferjani.nader@hotmail.fr

Project Goals

The main objective is to develop a professional packaging system for the DataLib library, enabling:

Easy and intuitive installation through package managers like pip.
Distribution on platforms like PyPI (Python Package Index).
Integrated, clear, and accessible documentation.

Work Plan

1. Project Structure

Organize the source code in a modular format (e.g., src/ directory).
Define essential files like setup.py, pyproject.toml, or setup.cfg.

2. Dependencies and Compatibility

Identify and include necessary dependencies (e.g., numpy, pandas, matplotlib, scikit-learn).
Ensure compatibility with recent Python versions.

3. Documentation

Write a detailed README.md or README.rst outlining the library's usage and features.
Add concrete usage examples.
Generate technical documentation using tools like Sphinx.

4. Testing

Write unit tests for main functions using pytest.
Integrate CI/CD workflows (e.g., GitHub Actions) to validate changes.

5. Publication

Prepare and publish the library on PyPI.
Regularly update the version following semantic versioning (SemVer).

Deliverables

A functional library distributable via pip.
Online documentation (e.g., hosted on Read the Docs).
Automated tests and code quality monitoring.

Evaluation Criteria

Packaging Quality: Ease of installation and compatibility.
Documentation Clarity: Completeness and ease of understanding.
Functionality and Robustness: Reliability of the library's tools.
Test Coverage: Quality and extent of automated testing.

DataLib aims to become a reliable and user-friendly library for data enthusiasts and professionals alike, enhancing the Python data ecosystem.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jan 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nf_datalib-0.1.0.tar.gz (13.0 kB view details)

Uploaded Jan 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nf_datalib-0.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Jan 21, 2025 Python 3

File details

Details for the file nf_datalib-0.1.0.tar.gz.

File metadata

Download URL: nf_datalib-0.1.0.tar.gz
Upload date: Jan 21, 2025
Size: 13.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.11.2

File hashes

Hashes for nf_datalib-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`86680fec3237b1ef0c0f50c03807c8892ab73d9e7f0fef902830f664ae72618d`
MD5	`d63343b45142b5e288da8a9920e1bfd1`
BLAKE2b-256	`a8a8dd5fd639fb3b43a8146493c8ea1fc395c234daf1452ecde5207455877927`

See more details on using hashes here.

File details

Details for the file nf_datalib-0.1.0-py3-none-any.whl.

File metadata

Download URL: nf_datalib-0.1.0-py3-none-any.whl
Upload date: Jan 21, 2025
Size: 8.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.11.2

File hashes

Hashes for nf_datalib-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ee6f112a052ce782c8c48b39dca8be1bf37cbd80e7408d78d16b849616ef650`
MD5	`baa4c6e328b721c4252b0b4f5a36cc7f`
BLAKE2b-256	`c8513550b49270e520e7513d6e8911bd88576fdfadb1008cb49d6f24610bd359`

See more details on using hashes here.

nf-datalib 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

DataLib Project

Installation

Features

Data Manipulation

Statistical Computations

Data Visualization

Advanced Analysis

Usage

Development

Setting up the development environment

Running tests

Building documentation

Versioning

Creating a new release

License

Contributing

Author

Project Goals

Work Plan

1. Project Structure

2. Dependencies and Compatibility

3. Documentation

4. Testing

5. Publication

Deliverables

Evaluation Criteria

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes