Skip to main content

Fetch sample data like images, videos, code, GIFs, text, and JSON files.

Project description

๐Ÿ“ฆ smpldta โ€” Sample Data Fetcher

smpldta is a Python library to fetch and generate real sample data like images, videos, gifs, code, JSON, text files, and PDFs โ€” in any quantity you specify.

It helps developers, testers, and data scientists quickly build test environments, validate pipelines, or create dummy data for demos, machine learning, or automation.


๐Ÿš€ Features

  • ๐Ÿ“ธ Download real images from the web with size/dimension constraints
  • ๐ŸŽฅ Fetch videos in multiple formats like mp4, mkv, flv, 3gp
  • ๐ŸŽž๏ธ Get animated GIFs from Giphy
  • ๐Ÿ“„ Generate PDFs with size limits
  • ๐Ÿ“ Create structured JSON files with a schema
  • ๐Ÿ’ฌ Generate random text files with word/size limits
  • ๐Ÿ’ป Generate code files in Python, Java, JavaScript, C, etc.

๐Ÿ“ฆ smpldta

smpldta is a Python-based utility that generates sample data files for testing and prototyping. It supports fetching images, videos, code snippets, text, JSON, and PDFsโ€”organized in a clear directory structure.


๐Ÿ“š Table of Contents


๐Ÿ› ๏ธ Installation

pip install smpldta

๐Ÿ“ฆ Usage

from smpldta import smpldta

fetcher = smpldta("FOLDER_NAME")

Add folder name where the data will be stored

๐Ÿ–ผ๏ธ fetch_images(config)

config = {
    "jpg": {
        "count": 3,
        "min_size": "5kb",
        "max_size": "500kb",
        "height": 400,
        "width": 400
    },
    "jpeg": {
        "count": 2,
        "min_size": "10kb",
        "max_size": "1mb",
        "height": 600,
        "width": 300
    }
}
fetcher.fetch_images(config = config, subdir="name_of_the_folder(images)")

๐ŸŽฅ fetch_videos(config)

config = {
    "mp4": {
        "count": 3,
        "min_size":"100kb,
        "max_size": "20mb"
    },
    "3gp": {
        "count": 2,
        "min_size":"10kb"
        "max_size": "10mb"
    }
}
fetcher.fetch_videos(config = config, subdir="name_of_the_folder(videos)")

Note: Formats can be mp4, flv, mkv, 3gp


๐ŸŽž๏ธ fetch_gifs(count)

fetcher.fetch_gifs(count=5, subdir="name_of_the_folder(gifs)")

๐Ÿ’ป fetch_code(config)

config = {
    "python": 2,
    "java": 2,
    "c": 1,
    "cpp": 1,
    "javascript": 1,
    "typescript": 1
}
fetcher.fetch_code(config = config, subdir="name_of_the_folder(code)")

Note: Only Python, Cpp, C, Java, Typescript, Javascript can be used


๐Ÿ“ fetch_text(config)

config = {
    "count": 4,
    "min_words": 100,
    "max_words": 1000,
    "max_size": "200kb"
}
fetcher.fetch_text(config = config, subdir="name_of_the_folder(text)")

๐Ÿงพ fetch_json(config)

config = {
    "schema": {
        "id": "uuid",
        "name": "str",
        "email": "email",
        "age": "int",
        "joined": "date",
        "score": "float"
    },
    "min_data_per_file": 5,
    "max_data_per_file": 15,
    "count": 5
}
fetcher.fetch_json(config = config, subdir="name_of_the_folder(json)")

๐Ÿ“„ fetch_pdfs(config)

config = {
    "count": 3,
    "min_size": "100kb",
    "max_size": "500kb"
}
fetcher.fetch_pdfs(config = config, subdir="name_of_the_folder(pdfs)")

Note: Generate less size of pdf


๐Ÿ“‚ Output Structure

output/
โ”œโ”€โ”€ images/
โ”œโ”€โ”€ videos/
โ”œโ”€โ”€ gifs/
โ”œโ”€โ”€ code/
โ”œโ”€โ”€ text/
โ”œโ”€โ”€ json/
โ””โ”€โ”€ pdfs/

Each data type is saved in its own subfolder with unique filenames.

To change the names of the folder use this

subdir="name_of_the_folder"

Example:
fetcher.fetch_images(config, subdir="name_of_the_folder")

๐Ÿ’ก Why Use smpldta?

  • Eliminate the need to manually source or generate test data
  • Supports a variety of formats and customization options
  • Ideal for pipelines, automated tests, and demos

๐Ÿ“„ License

MIT License

Author: Parteek

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smpldta-0.2.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smpldta-0.2.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file smpldta-0.2.0.tar.gz.

File metadata

  • Download URL: smpldta-0.2.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for smpldta-0.2.0.tar.gz
Algorithm Hash digest
SHA256 826f80e2a086f4aaff3f7a80eeb781eed4bb8a4e925df566742935f6ac2fb15d
MD5 07382e8de52555e54b0cf08fa2fe789f
BLAKE2b-256 0834105711eeb5a14350b15f77f684bc8b1356f18f9ad40835455205ac2c616f

See more details on using hashes here.

File details

Details for the file smpldta-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: smpldta-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for smpldta-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1cc16a373142df859cbb689fc9988ae544f1619139a9b7211c3484f95f0da06c
MD5 56e25447e2c21c09f8aed25519f586e4
BLAKE2b-256 9aa7b6d63b6ea2db095c0601039c5a13ea519b04e9b3dd5a4b5b0eeacb93c100

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page