This is a helper library to push data to HuggingFace.

These details have not been verified by PyPI

Project description

huggify-data

Introduction

huggify-data 📦 is a Python library 🐍 designed to simplify the process of uploading datasets 📊 to the Hugging Face Hub 🤗. It allows you to verify ✅, process 🔄, and push 🚀 your pandas DataFrame directly to Hugging Face, making it easier to share and collaborate 🤝 on datasets.

Installation

To use huggify-data, ensure you have the necessary libraries installed. You can install them using pip:

pip install huggify-data

Usage

Here's a step-by-step guide on how to use huggify-data:

Import the necessary libraries:

import pandas as pd
from huggify_data import DataFrameUploader

Load your DataFrame:

Make sure your DataFrame has columns named questions and answers.

df = pd.read_csv('/content/toy_data.csv')

Initialize the DataFrameUploader:

Provide your Hugging Face token, desired repository name, and username.

uploader = DataFrameUploader(df, hf_token="<huggingface-token-here>", repo_name='<desired-repo-name>', username='<your-username>')

Process your data:

Convert the DataFrame into a DatasetDict object.

uploader.process_data()

Push to Hugging Face Hub:

Upload your processed data to the Hugging Face Hub.

uploader.push_to_hub()

Example

Here's a complete example to illustrate how to use the huggify-data to scrape PDF and save as question-answer pairs in a .csv file. The block of code below will scrape it, convert it into a .csv and save the file locally.

# Example usage:
pdf_path = "path_of_pdf.pdf"
openai_api_key = "sk-API_KEY_HERE
generator = PDFQnAGenerator(pdf_path, openai_api_key)
generator.process_scraped_content()
generator.generate_questions_answers()
df = generator.convert_to_dataframe()
print(df)

Example

Here's a complete example to illustrate how to use the huggify-data library:

import pandas as pd
from datasets import Dataset, DatasetDict
from huggingface_hub import HfApi, create_repo
from huggify_data import DataFrameUploader

# Example usage:
df = pd.read_csv('/content/toy_data.csv')
uploader = DataFrameUploader(df, hf_token="<huggingface-token-here>", repo_name='<desired-repo-name>', username='<your-username>')
uploader.process_data()
uploader.push_to_hub()

Class Details

DataFrameUploader

DataFrameUploader is the main class provided by huggify-data.

Initialization

uploader = DataFrameUploader(df, hf_token="<huggingface-token-here>", repo_name='<desired-repo-name>', username='<your-username>')

df: A pandas DataFrame containing the data.
hf_token: Your Hugging Face API token.
repo_name: The desired name for the Hugging Face repository.
username: Your Hugging Face username.

Methods

verify_dataframe():
- Checks if the DataFrame has columns named questions and answers.
- Raises a ValueError if the columns are not present.
process_data():
- Verifies the DataFrame.
- Converts the data into a DatasetDict object.
push_to_hub():
- Creates a repository on the Hugging Face Hub.
- Pushes the DatasetDict to the repository.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request if you have any improvements or suggestions.

Contact

For any questions or support, please contact [your-email@example.com].

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.4

Jul 24, 2024

0.4.3

Jul 24, 2024

0.4.2

Jul 24, 2024

0.4.1

Jul 4, 2024

0.4.0

Jul 4, 2024

0.3.9

Jul 4, 2024

0.3.8

Jun 27, 2024

0.3.7

Jun 26, 2024

0.3.6

Jun 26, 2024

0.3.5

Jun 26, 2024

0.3.4

Jun 26, 2024

0.3.3

Jun 24, 2024

0.3.2

Jun 23, 2024

0.3.1

Jun 23, 2024

0.3.0

Jun 23, 2024

0.2.9

Jun 23, 2024

0.2.8

Jun 23, 2024

0.2.7

Jun 23, 2024

0.2.6

Jun 23, 2024

0.2.5

Jun 22, 2024

0.2.4

Jun 22, 2024

0.2.3

Jun 22, 2024

0.2.2

Jun 22, 2024

0.2.1

Jun 22, 2024

0.2.0

Jun 22, 2024

0.1.9

Jun 22, 2024

0.1.8

Jun 22, 2024

0.1.7

Jun 22, 2024

This version

0.1.6

Jun 22, 2024

0.1.5

Jun 22, 2024

0.1.4

Jun 22, 2024

0.1.3

Jun 22, 2024

0.1.2

Jun 22, 2024

0.1.1

Jun 22, 2024

0.1.0

Jun 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

huggify_data-0.1.6.tar.gz (4.9 kB view hashes)

Uploaded Jun 22, 2024 Source

Built Distribution

huggify_data-0.1.6-py3-none-any.whl (5.5 kB view hashes)

Uploaded Jun 22, 2024 Python 3

Hashes for huggify_data-0.1.6.tar.gz

Hashes for huggify_data-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`f31d5a400201abb8020b59298ceeb46e2a841b201d526acb6d88579a4b82c29a`
MD5	`05d839c9e658e1c33bfb24bf356c9d98`
BLAKE2b-256	`fc1aaa7247ca87acea92440ac07c7a55ab4d329188ddbc427e8c93df348ccb54`

Hashes for huggify_data-0.1.6-py3-none-any.whl

Hashes for huggify_data-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`474e2b328eb49a14aa7df4a1bc41b8b9c735e5b467bd835a54e8cc5501098cfc`
MD5	`adb620a42a39a9a367ac44ba09cefe39`
BLAKE2b-256	`88207ef350a9ffb3c2da544d8cd7a1420c23ed15135485851cff3c53eb9b381b`