A library that prepares raw documents for downstream ML tasks.
Project description
# Data Processing
## Current Version Main Features
Data Processing is used for data processing through MinIO, databases, Web APIs, etc. The data types handled include: - txt - json - doc - html - excel - csv - pdf - markdown - ppt
### Current Text Type Processing
The data processing process includes: cleaning abnormal data, filtering, de-duplication, and anonymization.
## Design
![Design](../../docs/images/data-process.drawio.png)
## Local Development ### Software Requirements
Before setting up the local data-process environment, please make sure the following software is installed:
Python 3.10.x
### Environment Setup
Install the Python dependencies in the requirements.txt file
### Running
Run the server.py file in the src directory
# isort isort is a tool for sorting imports alphabetically within your Python code. It helps maintain a consistent and clean import order.
## install `shell pip install isort `
## isort a file `shell isort src/server.py `
## isort a directory `shell isort . `
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file a-data-processing-0.0.1.tar.gz
.
File metadata
- Download URL: a-data-processing-0.0.1.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6be65c32a4e8ba62324fb12b19c121d692a623c40fe417caed744a73a9af4a0d |
|
MD5 | ee67abe21e7989f1511716fbe83024dd |
|
BLAKE2b-256 | 3070001f4d1841f58cb92d82478c450a8ae0f21712ebdb93d45ca3f9ad6c3a5f |
File details
Details for the file a_data_processing-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: a_data_processing-0.0.1-py3-none-any.whl
- Upload date:
- Size: 2.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b17845d30a734266a7ced56d0625404de65b5b91391d14ec7d2e45b577153a5 |
|
MD5 | d377e56c6410d49f4bc01b9fc7745376 |
|
BLAKE2b-256 | 902484ecf0ab0a70ea980e2decdfe055021eb5b4086e99bb8df0d5da905f2601 |