No project description provided
Project description
Here's a template for your README.md file for the "huggify-data" librar
huggify-data
Introduction
huggify-data is a Python library designed to simplify the process of uploading datasets to the Hugging Face Hub. It allows you to verify, process, and push your pandas DataFrame directly to Hugging Face, making it easier to share and collaborate on datasets.
Installation
To use huggify-data, ensure you have the necessary libraries installed. You can install them using pip:
pip install huggify-data
Usage
Here's a step-by-step guide on how to use huggify-data:
- Import the necessary libraries:
import pandas as pd
from huggify_data import DataFrameUploader
- Load your DataFrame:
Make sure your DataFrame has columns named questions
and answers
.
df = pd.read_csv('/content/toy_data.csv')
- Initialize the DataFrameUploader:
Provide your Hugging Face token, desired repository name, and username.
uploader = DataFrameUploader(df, hf_token="<huggingface-token-here>", repo_name='<desired-repo-name>', username='<your-username>')
- Process your data:
Convert the DataFrame into a DatasetDict object.
uploader.process_data()
- Push to Hugging Face Hub:
Upload your processed data to the Hugging Face Hub.
uploader.push_to_hub()
Example
Here's a complete example to illustrate how to use the huggify-data library:
import pandas as pd
from datasets import Dataset, DatasetDict
from huggingface_hub import HfApi, create_repo
from huggify_data import DataFrameUploader
# Example usage:
df = pd.read_csv('/content/toy_data.csv')
uploader = DataFrameUploader(df, hf_token="<huggingface-token-here>", repo_name='<desired-repo-name>', username='<your-username>')
uploader.process_data()
uploader.push_to_hub()
Class Details
DataFrameUploader
DataFrameUploader is the main class provided by huggify-data.
Initialization
uploader = DataFrameUploader(df, hf_token="<huggingface-token-here>", repo_name='<desired-repo-name>', username='<your-username>')
- df: A pandas DataFrame containing the data.
- hf_token: Your Hugging Face API token.
- repo_name: The desired name for the Hugging Face repository.
- username: Your Hugging Face username.
Methods
-
verify_dataframe():
- Checks if the DataFrame has columns named
questions
andanswers
. - Raises a
ValueError
if the columns are not present.
- Checks if the DataFrame has columns named
-
process_data():
- Verifies the DataFrame.
- Converts the data into a DatasetDict object.
-
push_to_hub():
- Creates a repository on the Hugging Face Hub.
- Pushes the DatasetDict to the repository.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Contributing
Contributions are welcome! Please open an issue or submit a pull request if you have any improvements or suggestions.
Contact
For any questions or support, please contact [your-email@example.com].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for huggify_data-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79a4baa0ab2f66e4185e3618bd1bffe7f3e14538c6a1e55ecbd94f52e8091659 |
|
MD5 | 38ccdebab645509ac3f40e283448d1bf |
|
BLAKE2b-256 | 555216bb28981cb770c0a7af6d05bbbdd2fbdb237c95c7a7160749f78ab720d7 |