Skip to main content

The official Python implementation for the Synaptic Data Format (SDF), a modern, tensor-native format for AI.

Project description

SDF-PY 🧠: Synaptic Data Format

PyPI version License: MIT

Welcome to sdf-py, the official Python implementation of the Synaptic Data Format (SDF).

SDF is a next-generation binary format engineered for the age of AI. It's designed from the ground up to be the nervous system for modern data pipelines, seamlessly connecting data sources, training loops, and inference engines.

✨ Key Features

  • 🧠 Tensor-Native: Tensors are a first-class citizen. No more flattening or base64 encoding. numpy arrays are stored directly and efficiently.
  • 🛤️ Sequence-Aware: Natively represent sequences like RL trajectories or time-series events within a single record, preserving temporal context.
  • 🌊 Streaming-First: Built as an append-only log, SDF is perfect for real-time data streams and efficient sequential reads during model training.
  • 🔒 Schema-Driven & Self-Describing: Files are strongly typed and contain their own schema, eliminating ambiguity and making datasets portable.
  • 🛠️ Powerful CLI: Inspect metadata, check record counts, and peek at data directly from your terminal.

🚀 Installation

Install sdf-py directly from PyPI:

pip install sdf-py

⚡ Quickstart

Let's create, write to, and read from an .sdf file in under 20 lines of code.

import numpy as np
import os
from sdf_py import SDFWriter, SDFReader

# 1. Define the schema for your dataset
schema = {
    "image": {"type": "tensor", "dtype": "uint8", "shape": [64, 64, 3]},
    "label": {"type": "scalar", "dtype": "int32"}
}

file_path = "my_first_dataset.sdf"

# 2. Write data using a context manager
with SDFWriter(file_path, schema=schema) as writer:
    for i in range(10):
        record = {
            "image": np.random.randint(0, 256, (64, 64, 3), dtype=np.uint8),
            "label": i
        }
        writer.write(record)

print(f"✅ Dataset created at '{file_path}'")

# 3. Read and iterate over the data
with SDFReader(file_path) as reader:
    print("\n📖 Reading first record...")
    # The reader is an iterator
    first_record, _ = next(reader)
    
    image_tensor = first_record['image']
    label = first_record['label']
    
    print(f"Read back label: {label}")
    print(f"Read back image tensor with shape: {image_tensor.shape} and dtype: {image_tensor.dtype}")

💻 Command-Line Interface (CLI)

After installing, you get the powerful sdf command.

sdf inspect

Get a high-level overview of your file, including its schema and total record count.

sdf inspect my_first_dataset.sdf

sdf head

Peek at the first few records in a human-readable format.

# Show the first 3 records
sdf head my_first_dataset.sdf -n 3

🤖 Advanced Usage: RL Trajectories

SDF's native sequence support makes it ideal for RL. A single record can hold an entire episode.

# Schema for an RL trajectory
rl_schema = {
    "trajectory": {
        "type": "sequence",
        "timesteps": {
            "state": {"type": "tensor", "dtype": "float32", "shape": [84, 84, 4]},
            "action": {"type": "scalar", "dtype": "int32"},
            "reward": {"type": "scalar", "dtype": "float32"},
        }
    }
}

# An entire episode is just one record
episode_trajectory = [
    {"state": state_t0, "action": action_t0, "reward": reward_t0},
    {"state": state_t1, "action": action_t1, "reward": reward_t1},
    # ... more timesteps
]

# Write it as a single entry
with SDFWriter("cartpole_episodes.sdf", schema=rl_schema) as writer:
    writer.write({"trajectory": episode_trajectory})

👨‍💻 About the Founder

The Synaptic Data Format (SDF) and its Python implementation were created by Louati Mahdi, a Data Engineer from Tunisia with a passion for building efficient, next-generation data systems for AI.

  • Email: louatimahdi390@gmail.com
  • GitHub: [https://github.com/mahdi123-tech]
  • LinkedIn: [https://www.linkedin.com/in/mahdi1234/]

We invite the community to explore, critique, and help build the ecosystem around the Synaptic Data Format.

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synaptic_format-1.0.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synaptic_format-1.0.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file synaptic_format-1.0.0.tar.gz.

File metadata

  • Download URL: synaptic_format-1.0.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for synaptic_format-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8eff8b293cb6c6c268f5dac17832f36984dde67eec026ccc1c5e04e59f575a60
MD5 f4ab72cfe779ea4eacb1b89c73284716
BLAKE2b-256 89d80982a307fcf3acf0232a88f06a32bfd82b1d752f7eb4333669817d5b37eb

See more details on using hashes here.

File details

Details for the file synaptic_format-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for synaptic_format-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8cc2dbacc0c5da0dfb5ce8b1e1695891815f998037ce3fa03e217ad69cd13085
MD5 db0cc60b748e76b813c55b36b8ce500c
BLAKE2b-256 f6d49ca1b13ea9272e72cccfd1dd3c45afde19b89e7d8c756ba71ba29cd6d187

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page