Skip to main content

A tool for transforming conversational data to a unified format

Project description

Convector: Process Your Conversational Datasets

Convector simplifies the process of transforming and standardizing conversational datasets. It aim to consolidate and utilize conversational data effectively, enabling a harmonized approach towards data analysis, model training or fine-tuning, and various natural language processing (NLP). applications.

Core Utility

Convector focuses on automatically recognizing and adapting various conversational data structures into a unified, consistent format, essential for users who manage or engage with multiple conversational datasets.

Use Cases

Convector is beneficial for various applications such as:

Data Consolidation: Merging and utilizing multiple conversational datasets for comprehensive analysis. Model Training: Preparing datasets effectively for training and fine-tuning conversational models. Research and Development: Facilitating a consistent data format for research and application development in NLP.

Features

  • Automated Recognition: Adept at recognizing and adapting to various conversational data structures.
  • Flexibility: Customizable keys for a more personalized data handling experience.
  • Command-Line Operability: Designed for easy operation directly from the command line.
  • Broad Compatibility: Fluent in JSON, JSONL, CSV, Parquet, and NZST.
  • Performance Tuned: Efficient and effective, ensuring your data’s process is swift and smooth.

🚶‍♂️ Installation

  • This suppose you have python3 installed with pip.

Open a terminal:

  • Type :
git clone https://github.com/teilomillet/convector
cd convector
pip install .
  • You can use convector directly from the command line.
convector --help

🌐 Universal Ticket - CLI Access

Convector’s CLI simplifies your interaction, making data processing as easy as a command away.

Auto-Process

Convector’s intelligent auto-recognition will process your data to a unified format.

convector process file/path/data.*

Custom Key Express

Tailor your process by specifying custom keys that suit your unique dataset.

Example : This will process a jsonl file with specifique columns names and transform it in the unified format and keeping the 'id' column.

convector process data.jsonl --input='user_message' --output='bot_message' --instruction='system_prompt' --add='id'

Advanced Toolkit

Explore advanced features like byte and lines limitations.

convector process "PATH/TO/data.parquet" --bytes=10000 

By setting a '--output_file=' we ensure that the dataset is saved with the name specify.

convector process "PATH/TO/data.csv" --lines=333 --output_file=dataset.jsonl

Python usage

Or you can operate Convector in python, using the following code.

import convector
convector.process( filepath )

Delivery

The transformed datasets are saved in a standardized JSONL format, ensuring consistency and compatibility for various applications. If no output_file is set, it will be saved under the same name adding "_tr" at the end by default.

If there is no instruction equivalent in the dataset, an instruction with a "" value will be set.

Default :

{"instruction": "...","input": "...","output": "..."}

Conclusion

Convector offers a reliable and efficient solution to handle conversational data transformation needs, ensuring that your data is consistent, usable, and ready for a multitude of applications.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convector-0.0.3.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

convector-0.0.3-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file convector-0.0.3.tar.gz.

File metadata

  • Download URL: convector-0.0.3.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for convector-0.0.3.tar.gz
Algorithm Hash digest
SHA256 b15b24bd99ddaa3b89b10486e852f6d5c1c14c74bc4319007cbbfe72a0295114
MD5 a71cd33607043f200f4d34471d3ff380
BLAKE2b-256 015f7aa8428b06eb430a01b96528ddd57bfacb57531cd6707e4df0c720ae30f4

See more details on using hashes here.

File details

Details for the file convector-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: convector-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for convector-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 319c7e6c423e3c22d0094c584cd773f1460d8968bbf0a23a5c431eb66f122951
MD5 3098a182eefe403c2ca26f588f38f3dd
BLAKE2b-256 14a9b793dd6ef4d26e301c0ba5612bd957f7b61f2329a5e5b20a857bef10fbb4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page