Skip to main content

A tool for transforming conversational data to a unified format

Project description

Convector: Process Your Conversational Datasets

Convector simplifies the process of transforming and standardizing conversational datasets. It aim to consolidate and utilize conversational data effectively, enabling a harmonized approach towards data analysis, model training or fine-tuning, and various natural language processing (NLP). applications.

Core Utility

Convector focuses on automatically recognizing and adapting various conversational data structures into a unified, consistent format, essential for users who manage or engage with multiple conversational datasets.

Use Cases

Convector is beneficial for various applications such as:

Data Consolidation: Merging and utilizing multiple conversational datasets for comprehensive analysis. Model Training: Preparing datasets effectively for training and fine-tuning conversational models. Research and Development: Facilitating a consistent data format for research and application development in NLP.

Features

  • Automated Recognition: Adept at recognizing and adapting to various conversational data structures.
  • Flexibility: Customizable keys for a more personalized data handling experience.
  • Command-Line Operability: Designed for easy operation directly from the command line.
  • Broad Compatibility: Fluent in JSON, JSONL, CSV, Parquet, and NZST.
  • Performance Tuned: Efficient and effective, ensuring your data’s process is swift and smooth.

🚶‍♂️ Installation

  • This suppose you have python3 installed with pip.

Open a terminal:

  • Type :
git clone https://github.com/teilomillet/convector
cd convector
pip install .
  • You can use convector directly from the command line.
convector --help

🌐 Universal Ticket - CLI Access

Convector’s CLI simplifies your interaction, making data processing as easy as a command away.

Auto-Process

Convector’s intelligent auto-recognition will process your data to a unified format.

convector process file/path/data.*

Custom Key Express

Tailor your process by specifying custom keys that suit your unique dataset.

Example : This will process a jsonl file with specifique columns names and transform it in the unified format and keeping the 'id' column.

convector process data.jsonl --input='user_message' --output='bot_message' --instruction='system_prompt' --add='id'

Advanced Toolkit

Explore advanced features like byte and lines limitations.

convector process "PATH/TO/data.parquet" --bytes=10000 

By setting a '--output_file=' we ensure that the dataset is saved with the name specify.

convector process "PATH/TO/data.csv" --lines=333 --output_file=dataset.jsonl

Python usage

Or you can operate Convector in python, using the following code.

import convector
convector.process( filepath )

Delivery

The transformed datasets are saved in a standardized JSONL format, ensuring consistency and compatibility for various applications. If no output_file is set, it will be saved under the same name adding "_tr" at the end by default.

If there is no instruction equivalent in the dataset, an instruction with a "" value will be set.

Default :

{"instruction": "...","input": "...","output": "..."}

Conclusion

Convector offers a reliable and efficient solution to handle conversational data transformation needs, ensuring that your data is consistent, usable, and ready for a multitude of applications.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convector-0.0.5.tar.gz (32.6 kB view details)

Uploaded Source

Built Distribution

convector-0.0.5-py3-none-any.whl (37.8 kB view details)

Uploaded Python 3

File details

Details for the file convector-0.0.5.tar.gz.

File metadata

  • Download URL: convector-0.0.5.tar.gz
  • Upload date:
  • Size: 32.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for convector-0.0.5.tar.gz
Algorithm Hash digest
SHA256 6119edf5301e478fdb77b93622930baf6c4acb5a56b270359c489e81cee5e816
MD5 81fa8bc239b0d2864f161b48ce3f1cbc
BLAKE2b-256 de5b8691d209591a41d9b371b92150916331d36d6a01f2bd26141d2e941f13ff

See more details on using hashes here.

File details

Details for the file convector-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: convector-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 37.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for convector-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f30576de0580c897a1f9943b71d0d5d700892fab818bf3ffe512e2af70f1713d
MD5 860296267aa5443de6707962ad2cac3c
BLAKE2b-256 18ecffa0ffd7b99333a8de73e82a605e720b660b29c22ec515163a4c335f62d6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page