A library that prepares raw documents for downstream ML tasks.
Project description
# Data Processing
## Current Version Main Features
Data Processing is used for data processing through MinIO, databases, Web APIs, etc. The data types handled include: - txt - json - doc - html - excel - csv - pdf - markdown - ppt
### Current Text Type Processing
The data processing process includes: cleaning abnormal data, filtering, de-duplication, and anonymization.
## Design
![Design](../../docs/images/data-process.drawio.png)
## Local Development ### Software Requirements
Before setting up the local data-process environment, please make sure the following software is installed:
Python 3.10.x
### Environment Setup
Install the Python dependencies in the requirements.txt file
### Running
Run the server.py file in the src directory
# isort isort is a tool for sorting imports alphabetically within your Python code. It helps maintain a consistent and clean import order.
## install `shell pip install isort `
## isort a file `shell isort src/server.py `
## isort a directory `shell isort . `
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for a_data_processing-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b17845d30a734266a7ced56d0625404de65b5b91391d14ec7d2e45b577153a5 |
|
MD5 | d377e56c6410d49f4bc01b9fc7745376 |
|
BLAKE2b-256 | 902484ecf0ab0a70ea980e2decdfe055021eb5b4086e99bb8df0d5da905f2601 |