Skip to main content

A dataset module for your projects

Project description

DataNexus

DataNexus is a simple to use Python module that you can use in your projects to get transcripts, datasets, etc. The module also allows you to extract character lines from transcripts witch makes it easyer for you to be able to do finetunning of a GPT2 model as an example.

Key feactures

  • Downloading of Datasets and Transcripts
  • Extract Characters from Transcripts

Installation

To get started:

pip install datanexus

Usage

⚠️ | Full documenation link to come in the future and the code may be unstable as in testing!

Downloading of Datasets/Transcripts

from datanexus import datanexus

datanexus = datanexus()

datasets = datanexus.download_dataset('ironman.txt')
print(datasets)

Extract character's from transcripts

You will need to create a folder called Models to sucessfully extract the character information

from datanexus import datanexus

character = datanexus.save_character('ironman.txt', 'Tony Stark:', 'Tony.txt')
print(character)

Showing all of the possible datasets

This function will show all of the possible datasets that is usable

from datanexus import datanexus

datasets = datanexus.possible_models()

for dataset in datasets:
    print(dataset)

Support

If you have any question or any issues then feel free to create an issue on Github.

Feel free to join The Workshop discord server and send me a ping (_Ethan_)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datanexus-0.0.8.tar.gz (2.7 kB view details)

Uploaded Source

File details

Details for the file datanexus-0.0.8.tar.gz.

File metadata

  • Download URL: datanexus-0.0.8.tar.gz
  • Upload date:
  • Size: 2.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for datanexus-0.0.8.tar.gz
Algorithm Hash digest
SHA256 93a0cf9a41fce22c683e113f8517866e5ad30ebd7ca19307a504ddbbf2263750
MD5 31af4b84a5bad290919ff21c5fe3ce19
BLAKE2b-256 54ed176f25de24f517c5020081011df52adc5f2588013857bcab8c52511f89d9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page