Skip to main content

An one-stop Python library for dataset compilation and processing.

Project description

Introducing Mdataset (/em-dataset/): a comprehensive solution tailored for researchers and students seeking a streamlined approach to compiling high-quality datasets and processing.

Why is it necessary?

Dataset compilation and processing can be a formidable challenge in various fields utilizing big data. The complexities lie in either navigating unfamiliar data banks or grappling with the intricacies of downloading specific datasets. Mdataset addresses these hurdles, providing an all-encompassing solution that equips users with the essential tools and methods needed to download existing datasets from renowned sources like Kaggle and Hugging Face. The only limiting factor is your computational power.

What do we offer?

Mdataset delivers a set of wrappers and functions, either wrapping existing tools developed by researchers or providing our solutions. With a simple three-line command, users can effortlessly download and compile datasets while also performing various processing tasks. Our offerings include:

  • Downloading datasets from high-quality sources
  • Video transcription
  • Text-to-audio conversion
  • User-friendly web scraping tools
  • Secure local-to-internet file transfer on demand
  • Scraping popular image boards
  • Synthetic generation of tabular and text data
  • Secure one-on-one interview method for data extraction by researchers
  • YouTube video and audio download
  • Powerful Optical Character Recognition for extracting data from PDFs
  • Unrestricted on-terminal search engine
  • Tor route circuits for secure communication

Choose Mdataset for a comprehensive and efficient solution to your dataset compilation and processing needs.

Quickstart

pip install mdataset

from mdataset import scrape_data
url = "www.example.com'
wanted_list = ['What's the data ca?']
scrape_data(url, wanted_list)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdatasets-0.1.4.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

mdatasets-0.1.4-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file mdatasets-0.1.4.tar.gz.

File metadata

  • Download URL: mdatasets-0.1.4.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for mdatasets-0.1.4.tar.gz
Algorithm Hash digest
SHA256 683919dd3ed58aa09859a2d34dc29d9a01fb15595d6d3042ab42159b9f7d4b56
MD5 ea74f056d63b5e450cc5cad70e2e9136
BLAKE2b-256 bec81db0c6475b3928c07b2606f1a13ae6236b15bf12b7e16cb9eba2b2185916

See more details on using hashes here.

File details

Details for the file mdatasets-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: mdatasets-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for mdatasets-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1804fc7852c4d722016c301d7007ef17b8961a656b293155276fe0821423a657
MD5 e33150d8c051366a4f269b2edc294f67
BLAKE2b-256 a92b70462f29eb93875de41f4700417d70ebb4e899e66b3a5713537ab8c5747e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page