Skip to main content

A library that prepares raw documents for downstream ML tasks.

Project description

# Data Processing

## Current Version Main Features

Data Processing is used for data processing through MinIO, databases, Web APIs, etc. The data types handled include: - txt - json - doc - html - excel - csv - pdf - markdown - ppt

### Current Text Type Processing

The data processing process includes: cleaning abnormal data, filtering, de-duplication, and anonymization.

## Design

![Design](../../docs/images/data-process.drawio.png)

## Local Development ### Software Requirements

Before setting up the local data-process environment, please make sure the following software is installed:

  • Python 3.10.x

### Environment Setup

Install the Python dependencies in the requirements.txt file

### Running

Run the server.py file in the src directory

# isort isort is a tool for sorting imports alphabetically within your Python code. It helps maintain a consistent and clean import order.

## install `shell pip install isort `

## isort a file `shell isort src/server.py `

## isort a directory `shell isort . `

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a-data-processing-0.0.1.tar.gz (2.8 kB view details)

Uploaded Source

Built Distribution

a_data_processing-0.0.1-py3-none-any.whl (2.1 kB view details)

Uploaded Python 3

File details

Details for the file a-data-processing-0.0.1.tar.gz.

File metadata

  • Download URL: a-data-processing-0.0.1.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for a-data-processing-0.0.1.tar.gz
Algorithm Hash digest
SHA256 6be65c32a4e8ba62324fb12b19c121d692a623c40fe417caed744a73a9af4a0d
MD5 ee67abe21e7989f1511716fbe83024dd
BLAKE2b-256 3070001f4d1841f58cb92d82478c450a8ae0f21712ebdb93d45ca3f9ad6c3a5f

See more details on using hashes here.

File details

Details for the file a_data_processing-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for a_data_processing-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7b17845d30a734266a7ced56d0625404de65b5b91391d14ec7d2e45b577153a5
MD5 d377e56c6410d49f4bc01b9fc7745376
BLAKE2b-256 902484ecf0ab0a70ea980e2decdfe055021eb5b4086e99bb8df0d5da905f2601

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page