Python SDK for PixCrawler image dataset platform - simple, lightweight, ML-ready

These details have not been verified by PyPI

Project description

PixCrawler Python SDK

A simple, lightweight Python SDK for accessing PixCrawler datasets. Designed for ML workflows with minimal API surface and maximum ease of use.

Installation

pip install pixcrawler

Or install from source:

cd sdk
pip install -e .

Quick Start

import pixcrawler as pix

# Set authentication (optional if using environment variables)
pix.auth(token="your_api_key")

# Load dataset into memory
project = pix.project("project-id")
dataset = project.dataset("dataset-id-123")
data = dataset.load()

# Iterate over items
for item in dataset:
    print(item)

Authentication

The SDK supports three authentication methods (in priority order):

1. Programmatic Authentication (Recommended for Scripts)

import pixcrawler as pix

pix.auth(token="your_api_key")

2. Environment Variables (Recommended for Production)

export PIXCRAWLER_SERVICE_KEY="your_api_key"

3. Per-Request Configuration

import pixcrawler as pix

dataset = pix.dataset(
    "dataset-id-123",
    config={"api_key": "your_api_key", "project_id": "project-id-123"}
).load()

API Reference

`auth(token, base_url=None)`

Set global authentication token for the session.

Parameters:

token (str): API token or JWT token from Supabase Auth
base_url (str, optional): Override API base URL (default: https://api.pixcrawler.com/v1)

Example:

import pixcrawler as pix

pix.auth(token="your_api_key", project_id="project-id-123")
# All subsequent calls will use this token

`dataset(dataset_id, config=None)`

Load dataset into memory for iteration.

Parameters:

dataset_id (str): UUID of the dataset
config (dict, optional): Configuration with 'api_key' and 'base_url'

Returns:

Dataset: In-memory dataset object

Raises:

AuthenticationError: If authentication fails
NotFoundError: If dataset not found
RuntimeError: If dataset exceeds memory limit (300MB)

Example:

import pixcrawler as pix

# Load dataset
dataset = pix.load_dataset("dataset-id-123")

# Iterate over items
for item in dataset:
    image_url = item['url']
    label = item['label']
    print(f"{label}: {image_url}")

`datasets(config=None)`

List user's datasets with pagination.

Parameters:

config (dict, optional): Configuration with 'api_key' and 'base_url'

Returns:

List[dict]: List of dataset metadata dictionaries

Raises:

AuthenticationError: If authentication fails
APIError: If API request fails

Example:

import pixcrawler as pix

pix.auth(token="your_api_key")

# List all datasets
project = pix.project(project_id="")
datasets = project.datasets()

for dataset in datasets:
    print(f"{dataset['id']}: {dataset['name']} ({dataset['image_count']} images)")

`get_dataset_info(dataset_id, config=None)`

Get dataset metadata without downloading.

Parameters:

dataset_id (str): UUID of the dataset
config (dict, optional): Configuration with 'api_key' and 'base_url'

Returns:

dict: Dataset metadata (image_count, size_mb, labels, etc.)

Raises:

AuthenticationError: If authentication fails
NotFoundError: If dataset not found

Example:

import pixcrawler as pix

# Get metadata
dataset = pix.dataset("dataset-id-123")

print(f"Name: {dataset.name}")
print(f"Images: {dataset.image_count}")
print(f"Size: {dataset.size_mb} MB")

`download_dataset(dataset_id, output_path, config=None)`

Download dataset archive to local file.

Parameters:

dataset_id (str): UUID of the dataset
output_path (str): Local file path (e.g., "./wildlife.zip")
config (dict, optional): Configuration with 'api_key' and 'base_url'

Returns:

str: Absolute path to downloaded file

Raises:

AuthenticationError: If authentication fails
NotFoundError: If dataset not found
PixCrawlerError: If download fails

Example:

import pixcrawler as pix

pix.auth(token="your_api_key")

# Download to file (doesn't load into memory)
path = pix.dataset("dataset-id-123").download("./my_dataset.zip")
print(f"Downloaded to: {path}")

Exception Handling

The SDK provides custom exceptions for different error scenarios:

import pixcrawler as pix
from pixcrawler import (
  PixCrawlerError,  # Base exception
  APIError,  # API returned error
  AuthenticationError,  # Auth failed
  NotFoundError,  # Resource not found
  RateLimitError,  # Rate limit exceeded
)

try:
  dataset = pix.dataset("dataset-id-123")
except AuthenticationError:
  print("Authentication failed. Check your API key.")
except NotFoundError:
  print("Dataset not found.")
except RateLimitError:
  print("Rate limit exceeded. Please try again later.")
except APIError as e:
  print(f"API error {e.status_code}: {e.message}")
except PixCrawlerError as e:
  print(f"SDK error: {e}")

Complete Examples

Example 1: Load and Process Dataset

import pixcrawler as pix

# Authenticate
pix.auth(token="your_api_key")

# Load dataset
dataset = pix.dataset("dataset-id-123")

# Process items
for item in dataset:
    # Your ML preprocessing here
    image_url = item['url']
    label = item['label']
    # Download image, apply transforms, etc.

Example 2: List and Download Datasets

import pixcrawler as pix

pix.auth(token="your_api_key")

# List all datasets
datasets = pix.datasets()

# Find specific dataset
target_dataset = next(
    (d for d in datasets if d['name'] == 'Wildlife Images'),
    None
)

if target_dataset:
    # Get detailed info
    info = pix.dataset(target_dataset['id']).info()
    print(f"Found dataset: {info['name']} ({info['image_count']} images)")
    
    # Download to file
    path = pix.download_dataset(target_dataset['id'], "./wildlife.zip")
    print(f"Downloaded to: {path}")

Example 3: Environment-Based Authentication

# Set environment variable first:
# export SERVICE_API_KEY="your_api_key"

import pixcrawler as pix

# No need to call auth() - uses environment variable
dataset = pix.dataset("dataset-id-123")

for item in dataset:
    print(item)

Example 4: Custom Base URL (Testing)

import pixcrawler as pix

# Use custom API URL (e.g., for testing)
pix.auth(
    token="your_api_key",
)

datasets = pix.datasets()

Memory Considerations

The dataset() function loads data into memory and has a 300MB limit to prevent memory issues. For larger datasets:

Use dataset().download() to save to disk
Process the downloaded file in chunks
Or use the API directly for streaming

import pixcrawler as pix

# For large datasets, download to file instead
path = pix.dataset("large-dataset-id").download("./large_dataset.zip")

# Then process the ZIP file in chunks
import zipfile
with zipfile.ZipFile(path, 'r') as zf:
    # Process files one at a time
    for filename in zf.namelist():
        with zf.open(filename) as f:
            # Process file
            pass

Requirements

Python 3.8+
requests
python-dotenv

License

MIT License

Support

Documentation: https://docs.pixcrawler.com
Issues: https://github.com/pixcrawler/pixcrawler/issues
Email: support@pixcrawler.com

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0.dev1 pre-release

Dec 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pixcrawler_sdk-0.2.0.dev1.tar.gz (26.9 kB view details)

Uploaded Dec 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pixcrawler_sdk-0.2.0.dev1-py2.py3-none-any.whl (10.5 kB view details)

Uploaded Dec 11, 2025 Python 2Python 3

File details

Details for the file pixcrawler_sdk-0.2.0.dev1.tar.gz.

File metadata

Download URL: pixcrawler_sdk-0.2.0.dev1.tar.gz
Upload date: Dec 11, 2025
Size: 26.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.17

File hashes

Hashes for pixcrawler_sdk-0.2.0.dev1.tar.gz
Algorithm	Hash digest
SHA256	`dc7ebe9c747bbc8f07831a7f1044cee9ce7c11fb5b3d75bd076d0353fd631de9`
MD5	`5ba2bddcb091d28ead5735ef8ceec4cf`
BLAKE2b-256	`1bba7d484fbac5b7add19b6e1d4ae7bba2cba1f0f048490cdafbe6ce1475c048`

See more details on using hashes here.

File details

Details for the file pixcrawler_sdk-0.2.0.dev1-py2.py3-none-any.whl.

File metadata

Download URL: pixcrawler_sdk-0.2.0.dev1-py2.py3-none-any.whl
Upload date: Dec 11, 2025
Size: 10.5 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.17

File hashes

Hashes for pixcrawler_sdk-0.2.0.dev1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`df7d34fe07d52c6d4aa24daa0c62c2d54184f8cba1e0d7d6c3706e4115b348d8`
MD5	`e8d775a4e15d867d83469620b535653f`
BLAKE2b-256	`d09959dc0fd43fe33c6630d438b294783507986de2cbb5fc394e54d3d792d2ef`

See more details on using hashes here.

pixcrawler-sdk 0.2.0.dev1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PixCrawler Python SDK

Installation

Quick Start

Authentication

1. Programmatic Authentication (Recommended for Scripts)

2. Environment Variables (Recommended for Production)

3. Per-Request Configuration

API Reference

auth(token, base_url=None)

dataset(dataset_id, config=None)

datasets(config=None)

get_dataset_info(dataset_id, config=None)

download_dataset(dataset_id, output_path, config=None)

Exception Handling

Complete Examples

Example 1: Load and Process Dataset

Example 2: List and Download Datasets

Example 3: Environment-Based Authentication

Example 4: Custom Base URL (Testing)

Memory Considerations

Requirements

License

Support

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`auth(token, base_url=None)`

`dataset(dataset_id, config=None)`

`datasets(config=None)`

`get_dataset_info(dataset_id, config=None)`

`download_dataset(dataset_id, output_path, config=None)`