This is a helper library to push data to HuggingFace.
Project description
huggify-data
Introduction
huggify-data 📦 is a Python library 🐍 designed to simplify the process of scraping any .pdf
documents, generating question-answer pairs using openai
, and then uploading datasets 📊 to the Hugging Face Hub 🤗. It allows you to verify ✅, process 🔄, and push 🚀 your pandas DataFrame directly to Hugging Face, making it easier to share and collaborate 🤝 on datasets.
Installation
To use huggify-data, ensure you have the necessary libraries installed. You can install them using pip:
pip install huggify-data
Examples
Here's a complete example to illustrate how to use the huggify-data to scrape PDF and save as question-answer pairs in a .csv
file. The block of code below will scrape it, convert it into a .csv
and save the file locally.
from huggify_data.scrape_modules import *
# Example usage:
pdf_path = "path_of_pdf.pdf"
openai_api_key = "<sk-API_KEY_HERE>"
generator = PDFQnAGenerator(pdf_path, openai_api_key)
generator.process_scraped_content()
generator.generate_questions_answers()
df = generator.convert_to_dataframe()
print(df)
Here's a complete example to illustrate how to use the huggify-data library to push data (assuming an existing .csv
file with columns questions
and answers
inside) to HuggingFace Hub:
from huggify_data.push_modules import DataFrameUploader
# Example usage:
df = pd.read_csv('/content/toy_data.csv')
uploader = DataFrameUploader(df, hf_token="<huggingface-token-here>", repo_name='<desired-repo-name>', username='<your-username>')
uploader.process_data()
uploader.push_to_hub()
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Contributing
Contributions are welcome! Please open an issue or submit a pull request if you have any improvements or suggestions.
Contact
For any questions or support, please contact [eagle0504@gmail.com](mailto: eagle0504@gmail.com).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for huggify_data-0.2.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac3b83cbfd4c2e35f6627ec3126661d54cb58a6f425cd74cd4c1906b1af6dd01 |
|
MD5 | dbe24276e482ade05c1bb7de7a3d0499 |
|
BLAKE2b-256 | f94b6c61ebe2873f21504505bbc0552ec2920157b9d329b6ac78a046025bc536 |