An one-stop Python library for dataset compilation and processing.
Project description
Introducing Mdataset (/em-dataset/): a comprehensive solution tailored for researchers and students seeking a streamlined approach to compiling high-quality datasets and processing.
Why is it necessary?
Dataset compilation and processing can be a formidable challenge in various fields utilizing big data. The complexities lie in either navigating unfamiliar data banks or grappling with the intricacies of downloading specific datasets. Mdataset addresses these hurdles, providing an all-encompassing solution that equips users with the essential tools and methods needed to download existing datasets from renowned sources like Kaggle and Hugging Face. The only limiting factor is your computational power.
What do we offer?
Mdataset delivers a set of wrappers and functions, either wrapping existing tools developed by researchers or providing our solutions. With a simple three-line command, users can effortlessly download and compile datasets while also performing various processing tasks. Our offerings include:
- Downloading datasets from high-quality sources
- Video transcription
- Text-to-audio conversion
- User-friendly web scraping tools
- Secure local-to-internet file transfer on demand
- Scraping popular image boards
- Synthetic generation of tabular and text data
- Secure one-on-one interview method for data extraction by researchers
- YouTube video and audio download
- Powerful Optical Character Recognition for extracting data from PDFs
- Unrestricted on-terminal search engine
- Tor route circuits for secure communication
Choose Mdataset for a comprehensive and efficient solution to your dataset compilation and processing needs.
Quickstart
pip install mdataset
from mdataset import scrape_data
url = "www.example.com'
wanted_list = ['What's the data ca?']
scrape_data(url, wanted_list)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mdatasets-0.1.4.tar.gz
.
File metadata
- Download URL: mdatasets-0.1.4.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 683919dd3ed58aa09859a2d34dc29d9a01fb15595d6d3042ab42159b9f7d4b56 |
|
MD5 | ea74f056d63b5e450cc5cad70e2e9136 |
|
BLAKE2b-256 | bec81db0c6475b3928c07b2606f1a13ae6236b15bf12b7e16cb9eba2b2185916 |
File details
Details for the file mdatasets-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: mdatasets-0.1.4-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1804fc7852c4d722016c301d7007ef17b8961a656b293155276fe0821423a657 |
|
MD5 | e33150d8c051366a4f269b2edc294f67 |
|
BLAKE2b-256 | a92b70462f29eb93875de41f4700417d70ebb4e899e66b3a5713537ab8c5747e |