Skip to main content

A class to handle and process multiple files with identical structures within a directory.

Project description

data-harvest-reader

Features

  1. Reading Various File Formats: Suporta leitura de arquivos CSV, JSON, Parquet e Excel.

  2. Directory and ZIP File Handling: Capacidade de ler dados de diretórios e arquivos ZIP, além de bytes e objetos zipfile.ZipFile.

  3. Data Joining: União de DataFrames que possuem colunas semelhantes.

  4. Deduplication: Remoção de duplicatas com base em colunas específicas.

  5. Custom Filters: Aplicação de filtros personalizados aos DataFrames.

  6. Logging: Registro detalhado das operações de leitura e manipulação de dados.

Installation Requirements

pip install polars loguru

Usage

Initialization

from data_harvest_reader import DataReader



data_reader = DataReader(log_to_file=True, log_file="data_reader.log")

Reading Data

From Directory

data = data_reader.read_data('path/to/directory', join_similar=True)

From ZIP File

data = data_reader.read_data('path/to/zipfile.zip', join_similar=False)

From Bytes

with open('path/to/zipfile.zip', 'rb') as f:

    zip_bytes = f.read()

data = data_reader.read_data(zip_bytes, join_similar=False)

From `zipfile.ZipFile` Object

with zipfile.ZipFile('path/to/zipfile.zip', 'r') as zip_file:

    data = data_reader.read_data(zip_file, join_similar=False)

Applying Deduplication

duplicated_subset_dict = {'file1': ['column1', 'column2']}

data = data_reader.read_data('path/to/source', duplicated_subset_dict=duplicated_subset_dict)

Applying Filters

filter_subset = {

    'file1': [{'column': 'Col1', 'operation': '>', 'values': 100},

              {'column': 'Col2', 'operation': '==', 'values': 'Value'}]

}



data = data_reader.read_data('path/to/source', filter_subset=filter_subset)

Handling Exceptions

try:

    data = data_reader.read_data('path/to/source')

except UnsupportedFormatError:

    print("Unsupported file format provided")

except FilterConfigurationError:

    print("Error in filter configuration")

Example

data_reader = DataReader()



data = data_reader.read_data(r'C:\path	o\data', join_similar=True,

                             filter_subset={'example_file': [{'column': 'Age', 'operation': '>', 'values': 30}]})

Contributing to DataReader

Getting Started

  1. Fork the Repository: Start by forking the main repository. This creates your own copy of the project where you can make changes.

  2. Clone the Forked Repository: Clone your fork to your local machine. This step allows you to work on the codebase directly.

  3. Set Up the Development Environment: Ensure you have all necessary dependencies installed. It's recommended to use a virtual environment.

  4. Create a New Branch: Always create a new branch for your changes. This keeps the main branch stable and makes reviewing changes easier.

Making Contributions

  1. Make Your Changes: Implement your feature, fix a bug, or make your proposed changes. Ensure your code adheres to the project's coding standards and guidelines.

  2. Test Your Changes: Before submitting, test your changes thoroughly. Write unit tests if applicable, and ensure all existing tests pass.

  3. Document Your Changes: Update the documentation to reflect your changes. If you're adding a new feature, include usage examples.

  4. Commit Your Changes: Make concise and clear commit messages, describing what each commit does.

  5. Push to Your Fork: Push your changes to your fork on GitHub.

  6. Create a Pull Request (PR): Go to the original `DataReader` repository and create a pull request from your fork. Ensure you describe your changes in detail and link any relevant issues.

Review Process

After submitting your PR, the maintainers will review your changes. Be responsive to feedback:

  1. Respond to Comments: If the reviewers ask for changes, make them promptly. Discuss any suggestions or concerns.

  2. Update Your PR: If needed, update your PR based on feedback. This may involve adding more tests or tweaking your approach.

Final Steps

Once your PR is approved:

  1. Merge: The maintainers will merge your changes into the main codebase.

  2. Stay Engaged: Continue to stay involved in the project. Look out for feedback from users on your new feature or fix.

Conclusion

Contributing to `DataReader` is a rewarding experience that benefits the entire user community. Your contributions help make `DataReader` a more robust and versatile tool. We welcome developers of all skill levels and appreciate every form of contribution, from code to documentation. Thank you for considering contributing to `DataReader`!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_harvest_reader-0.0.11.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

data_harvest_reader-0.0.11-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file data_harvest_reader-0.0.11.tar.gz.

File metadata

  • Download URL: data_harvest_reader-0.0.11.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for data_harvest_reader-0.0.11.tar.gz
Algorithm Hash digest
SHA256 2abccf9ff54d85e00b4dd45bbaadd24b5ed42e2be25218a368a7baea95aa5f5e
MD5 e7c731c8341979f29d206838c75cacaf
BLAKE2b-256 d37cd3d1e93f6be02107033e056fa2233a5cdaee094bddc39d3e18c7fcf6f77e

See more details on using hashes here.

File details

Details for the file data_harvest_reader-0.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for data_harvest_reader-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 caf229956b11be051d44bcfb26fe447cbfde74a7ccd93214d74550a55450aa1c
MD5 a3ff4dd7568732cc0c242de20feae1c2
BLAKE2b-256 50a4927dd4ad6b8f6bdfbb9891956af05e848a2e2cd2e449414f0d40e44bf433

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page