Skip to main content

A Python library for data manipulation and analysis

Project description

DataLib Project

DataLib is a Python library designed to simplify data manipulation and analysis in various projects. It provides features catering to a wide range of users, from beginners exploring the basics of data processing to experts seeking advanced tools for statistical analysis and machine learning models.

Installation

You can install DataLib using pip:

pip install datalib

Features

Data Manipulation

  • Load and process CSV files (read, write, filters).
  • Data transformations (normalization, handling missing values).

Statistical Computations

  • Mean, median, mode, standard deviation, correlation.
  • Basic statistical tests (t-test, chi-square test).

Data Visualization

  • Generate simple graphs (bar charts, histograms, scatter plots).
  • Support for advanced visualizations like correlation matrices.

Advanced Analysis

  • Linear and polynomial regression models.
  • Supervised classification algorithms (k-NN, decision trees).
  • Unsupervised methods (k-means, principal component analysis).

Usage

from datalib.data_manipulation import normalize_column
from datalib.visualization import plot_histogram
import pandas as pd

# Load and normalize data
data = pd.DataFrame({"values": [1, 2, 3, 4, 5]})
normalized = normalize_column(data, "values", method="minmax")

# Create visualization
fig = plot_histogram(data, "values")
fig.savefig("histogram.png")

Development

Setting up the development environment

  1. Clone the repository:
git clone https://github.com/NaderFerjani/datalib.git
cd datalib
  1. Install development dependencies:
pip install -e ".[test,doc]"

Running tests

pytest tests/

Building documentation

cd docs
python -m sphinx -b html . _build/html

Versioning

DataLib follows Semantic Versioning. Version numbers follow the format MAJOR.MINOR.PATCH:

  • MAJOR version for incompatible API changes
  • MINOR version for new functionality in a backward compatible manner
  • PATCH version for backward compatible bug fixes

Creating a new release

  1. Update version:
python scripts/release.py [major|minor|patch]
  1. Review and commit changes
  2. Push to GitHub with tags
  3. The GitHub Actions workflow will automatically publish to PyPI

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Author


Project Goals

The main objective is to develop a professional packaging system for the DataLib library, enabling:

  1. Easy and intuitive installation through package managers like pip.
  2. Distribution on platforms like PyPI (Python Package Index).
  3. Integrated, clear, and accessible documentation.

Work Plan

1. Project Structure

  • Organize the source code in a modular format (e.g., src/ directory).
  • Define essential files like setup.py, pyproject.toml, or setup.cfg.

2. Dependencies and Compatibility

  • Identify and include necessary dependencies (e.g., numpy, pandas, matplotlib, scikit-learn).
  • Ensure compatibility with recent Python versions.

3. Documentation

  • Write a detailed README.md or README.rst outlining the library's usage and features.
  • Add concrete usage examples.
  • Generate technical documentation using tools like Sphinx.

4. Testing

  • Write unit tests for main functions using pytest.
  • Integrate CI/CD workflows (e.g., GitHub Actions) to validate changes.

5. Publication

  • Prepare and publish the library on PyPI.
  • Regularly update the version following semantic versioning (SemVer).

Deliverables

  • A functional library distributable via pip.
  • Online documentation (e.g., hosted on Read the Docs).
  • Automated tests and code quality monitoring.

Evaluation Criteria

  • Packaging Quality: Ease of installation and compatibility.
  • Documentation Clarity: Completeness and ease of understanding.
  • Functionality and Robustness: Reliability of the library's tools.
  • Test Coverage: Quality and extent of automated testing.

DataLib aims to become a reliable and user-friendly library for data enthusiasts and professionals alike, enhancing the Python data ecosystem.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nf_datalib-0.1.0.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nf_datalib-0.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file nf_datalib-0.1.0.tar.gz.

File metadata

  • Download URL: nf_datalib-0.1.0.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.2

File hashes

Hashes for nf_datalib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 86680fec3237b1ef0c0f50c03807c8892ab73d9e7f0fef902830f664ae72618d
MD5 d63343b45142b5e288da8a9920e1bfd1
BLAKE2b-256 a8a8dd5fd639fb3b43a8146493c8ea1fc395c234daf1452ecde5207455877927

See more details on using hashes here.

File details

Details for the file nf_datalib-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nf_datalib-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.2

File hashes

Hashes for nf_datalib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ee6f112a052ce782c8c48b39dca8be1bf37cbd80e7408d78d16b849616ef650
MD5 baa4c6e328b721c4252b0b4f5a36cc7f
BLAKE2b-256 c8513550b49270e520e7513d6e8911bd88576fdfadb1008cb49d6f24610bd359

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page