Skip to main content

Fetch sample data like images, videos, code, GIFs, text, and JSON files.

Project description

๐Ÿ“ฆ smpldta โ€” Sample Data Fetcher

smpldta is a Python library to fetch and generate real sample data like images, videos, gifs, code, JSON, text files, and PDFs โ€” in any quantity you specify.

It helps developers, testers, and data scientists quickly build test environments, validate pipelines, or create dummy data for demos, machine learning, or automation.


๐Ÿš€ Features

  • ๐Ÿ“ธ Download real images from the web with size/dimension constraints
  • ๐ŸŽฅ Fetch videos in multiple formats like mp4, mkv, flv, 3gp
  • ๐ŸŽž๏ธ Get animated GIFs from Giphy
  • ๐Ÿ“„ Generate PDFs with size limits
  • ๐Ÿ“ Create structured JSON files with a schema
  • ๐Ÿ’ฌ Generate random text files with word/size limits
  • ๐Ÿ’ป Generate code files in Python, Java, JavaScript, C, etc.

๐Ÿ“ฆ smpldta

smpldta is a Python-based utility that generates sample data files for testing and prototyping. It supports fetching images, videos, code snippets, text, JSON, and PDFsโ€”organized in a clear directory structure.


๐Ÿ“š Table of Contents


๐Ÿ› ๏ธ Installation

pip install smpldta

requires python>=3.0


๐Ÿ“ฆ Usage

from smpldta import Smpldta

fetcher = Smpldta("FOLDER_NAME")

Add folder name where the data will be stored

๐Ÿ–ผ๏ธ fetch_images(config)

config = {
    "jpg": {
        "count": 3,
        "min_size": "5kb",
        "max_size": "500kb",
        "height": 400,
        "width": 400
    },
    "jpeg": {
        "count": 2,
        "min_size": "10kb",
        "max_size": "1mb",
        "height": 600,
        "width": 300
    }
}
fetcher.fetch_images(config = config, subdir="name_of_the_folder(images)")

๐ŸŽฅ fetch_videos(config)

config = {
    "mp4": {
        "count": 3,
        "min_size":"100kb,
        "max_size": "20mb"
    },
    "3gp": {
        "count": 2,
        "min_size":"10kb"
        "max_size": "10mb"
    }
}
fetcher.fetch_videos(config = config, subdir="name_of_the_folder(videos)")

Note: Formats can be mp4, flv, mkv, 3gp


๐ŸŽž๏ธ fetch_gifs(count)

fetcher.fetch_gifs(count=5, subdir="name_of_the_folder(gifs)")

๐Ÿ’ป fetch_code(config)

config = {
    "python": 2,
    "java": 2,
    "c": 1,
    "cpp": 1,
    "javascript": 1,
    "typescript": 1
}
fetcher.fetch_code(config = config, subdir="name_of_the_folder(code)")

Note: Only Python, Cpp, C, Java, Typescript, Javascript can be used


๐Ÿ“ fetch_text(config)

config = {
    "count": 4,
    "min_words": 100,
    "max_words": 1000,
    "max_size": "200kb"
}
fetcher.fetch_text(config = config, subdir="name_of_the_folder(text)")

๐Ÿงพ fetch_json(config)

config = {
    "schema": {
        "id": "uuid",
        "name": "str",
        "email": "email",
        "age": "int",
        "joined": "date",
        "score": "float"
    },
    "min_data_per_file": 5,
    "max_data_per_file": 15,
    "count": 5
}
fetcher.fetch_json(config = config, subdir="name_of_the_folder(json)")

๐Ÿ“„ fetch_pdfs(config)

config = {
    "count": 3,
    "min_size": "100kb",
    "max_size": "500kb"
}
fetcher.fetch_pdfs(config = config, subdir="name_of_the_folder(pdfs)")

Note: Generate less size of pdf


๐Ÿ“‚ Output Structure

output/
โ”œโ”€โ”€ images/
โ”œโ”€โ”€ videos/
โ”œโ”€โ”€ gifs/
โ”œโ”€โ”€ code/
โ”œโ”€โ”€ text/
โ”œโ”€โ”€ json/
โ””โ”€โ”€ pdfs/

Each data type is saved in its own subfolder with unique filenames.

To change the names of the folder use this

subdir="name_of_the_folder"

Example:
fetcher.fetch_images(config, subdir="name_of_the_folder")

๐Ÿ’ก Why Use smpldta?

  • Eliminate the need to manually source or generate test data
  • Supports a variety of formats and customization options
  • Ideal for pipelines, automated tests, and demos

๐Ÿ“„ License

MIT License

Author: Parteek

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smpldta-0.2.1.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smpldta-0.2.1-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file smpldta-0.2.1.tar.gz.

File metadata

  • Download URL: smpldta-0.2.1.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for smpldta-0.2.1.tar.gz
Algorithm Hash digest
SHA256 17d53185566ae68495f514c4e8662fef345c3977dacd82aa5121e9414c447f1b
MD5 24129af6377bb888699b603d33c9c157
BLAKE2b-256 350c534b347fc977e724140dad4939c391c22ac0132813f0751c8ef11e82a201

See more details on using hashes here.

File details

Details for the file smpldta-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: smpldta-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for smpldta-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0be301033999c87d92cc2cc25ce038e7566964ea6aa51f0718a66d2bfe6227c8
MD5 3dcc0fc88f2c14c0bd694ea3e8c2c287
BLAKE2b-256 d60ddf9717d472fb38acdcb36b21ba053599ed90472bf69e838fcfe1cbd917d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page