Skip to main content

Training and inference templates based on the D-FINE architecture.

Project description



Sinapsis D-FINE

Templates for training and inference with the D-FINE model

🐍 Installation 🚀 Features 📚 Usage example 🌐 Webapp📙 Documentation 🔍 License

The Sinapsis D-FINE module provides templates for training and inference with the D-FINE model, enabling advanced object detection tasks.

🐍 Installation

Install using your package manager of choice. We encourage the use of uv

Example with uv:

  uv pip install sinapsis-dfine --extra-index-url https://pypi.sinapsis.tech

or with raw pip:

  pip install sinapsis-dfine --extra-index-url https://pypi.sinapsis.tech

🚀 Features

Templates Supported

The Sinapsis D-FINE module provides two main templates for inference and training:

  • DFINETraining: A highly flexible template for fine-tuning D-FINE models on custom data. It is designed for rapid setup while still offering deep control.
    • Effortless Setup: Automatically infers class labels directly from the dataset, eliminating the need to manually create id2label maps.
    • Flexible Data Sources: Seamlessly loads datasets from both local directories and the Hugging Face Hub.
    • Adaptable to Your Data: Easily adapts to different dataset schemas by allowing users to specify custom keys for annotations (bbox, category, etc.) via the annotation_keys attribute.
    • Powerful Customization: Provides granular control over every aspect of training through structured Pydantic models for hyperparameters, data mapping, and more.
  • DFINEInference: A streamlined and efficient template for running trained D-FINE models.
    • High-Performance: Processes images in batches for maximum throughput on the target hardware.
    • Structured Output: Generates clear, structured annotations for each image, including bounding boxes, confidence scores, and class labels, ready for downstream tasks.
🌍 General Attributes

Both templates share the following attributes:

  • model_path (str, optional): The model identifier from the Hugging Face Hub or a local path to the model and processor files. Defaults to "ustc-community/dfine-nano-coco".
  • model_cache_dir (str, optional): Directory to cache downloaded model files. Defaults to the path specified by the SINAPSIS_CACHE_DIR environment variable.
  • threshold (float, required): The confidence score threshold (from 0.0 to 1.0) for filtering detections. For inference, it discards all detections below this value from the final output. For training, it is used on the validation dataset to filter predictions before calculating evaluation metrics.
  • device (Literal["auto", "cuda", "cpu"], optional): The hardware device to run the model on. Defaults to "auto", which automatically selects "cuda" if a compatible GPU is available, otherwise falls back to "cpu".
Specific Attributes

There are some attributes specific to the templates used:

  • DFINEInference has one additional attribute:
    • batch_size (int, optional): The number of images to process in a single batch. Defaults to 8.
  • DFINETraining has nine additional attributes:
    • training_mode (Literal["fine-tune", "from-scratch"], optional): Specifies the training strategy.
    • dataset_path (str, required): Path to the dataset to be loaded.
    • id2label (dict[int, str] | None, optional): An optional mapping from class ID to label name. It's recommended to let the template infer this from the dataset. This attribute should only be used as a fallback if the dataset features are non-standard.
    • annotation_keys (AnnotationKeys, optional): A configuration object that specifies the dictionary keys for accessing annotation data within the dataset.
      • bbox (str, optional): The dictionary key for the bounding box annotations. Defaults to "bbox".
      • category (str, optional): The dictionary key for the category/class label annotations. Defaults to "category".
      • area (str, optional): The dictionary key for the bounding box area. If not provided, area will be calculated from the bbox. Defaults to "area".
    • validation_split_size (float, optional): The proportion of the dataset to reserve for validation. Defaults to 0.15
    • mapping_args (DatasetMappingArgs, optional): Parameters for the dataset preprocessing step.
      • batch_size (int, optional): The batch size for applying transformations. A larger size can speed up preprocessing but requires more RAM. Defaults to 16.
      • num_proc (int, optional): The number of CPU processes to use for mapping. Defaults to 0 (no multiprocessing).
    • image_size (TrainingImageSize, optional): The target image size for image resizing.
      • width (int, optional): The target width for image resizing. Defaults to 640.
      • height (int, optional): The target height for image resizing. Defaults to 640.
    • training_args (TrainingArgs, optional): A nested configuration object for all Hugging Face Trainer hyperparameters. Refer to the official documentation for the full list of possible arguments.
    • save_dir (str, required): Path to the directory where the fine-tuned model will be saved.
📁 Supported Dataset Structure

To ensure compatibility and smooth training, the DFINETraining template relies on a specific dataset structure. This format is inspired by the widely used COCO dataset, making it easy to adapt many existing object detection datasets.

IMPORTANT: The DFINETraining template expects datasets to follow a specific nested (COCO-style) format. This ensures consistency and reliability during the data transformation process.

Each example in your dataset must contain at least two features:

  1. image: A PIL Image object.
  2. objects: A dictionary that acts as a container for all annotations related to the image.

The objects dictionary must contain parallel lists for the annotations. The keys for these lists are configurable via the annotation_keys attribute.

Example of a single dataset entry:

{
  'image': <PIL.Image object>,
  'objects': {
    'bbox': [[x, y, width, height], [x, y, width, height], ...],
    'category': [label_id_1, label_id_2, ...],
    'area': [area_1, area_2, ...]  # This is optional and will be calculated if not present
  }
}
Preparing a Local Dataset

To load a local dataset of images, the files must be structured with a metadata.jsonl file, which is the standard method for the Hugging Face datasets library.

  1. The folder structure should be organized as follows:
my_dataset/
|--- train/
|   |--- image1.jpg
|   |--- image2.png
|   |--- metadata.jsonl
|--- validation/
    |--- image3.jpg
    |--- metadata.jsonl
  1. A metadata.jsonl file must be created. Each line in this file is a JSON object describing one image and its annotations.

Example line in train/metadata.jsonl:

{"file_name": "image1.jpg", "objects": {"bbox": [[22, 34, 100, 150]], "category": [3]}}
  1. The dataset can be loaded by providing the path to the root folder (my_dataset/). The template will automatically find and parse the metadata.jsonl files.

For more detailed information on creating image datasets for object detection, refer to the official Hugging Face documentation.

Advanced Configuration

License Validation for Hub Datasets

For commercial safety, the DFINETraining template automatically validates that datasets from the Hugging Face Hub have a permissive license. This check can be managed using an environment variable.

  • ALLOW_UNVETTED_DATASETS:
    • Default Behavior (True): By default, the license check is skipped. This is to provide a smooth experience for local development and testing.
    • Production Behavior (False): For production environments, this variable must be explicitly set to False to enforce the license validation and ensure only commercially safe datasets are used.

Example (for production):

export ALLOW_UNVETTED_DATASETS=False

[!TIP] Use CLI command sinapsis info --example-template-config TEMPLATE_NAME to produce an example Agent config for the Template specified in TEMPLATE_NAME.

For example, for DFINEInference use sinapsis info --example-template-config DFINEInference to produce an example config like:

agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: DFINEInference
  class_name: DFINEInference
  template_input: InputTemplate
  attributes:
    model_path: ustc-community/dfine-nano-coco
    model_cache_dir: '/path/to/sinapsis/cache'
    threshold: '`replace_me:<class ''float''>`'
    device: auto
    batch_size: 8

📚 Usage example

The following example demonstrates how to use the DFINEInference template for object detection. This setup processes a folder of images, runs inference using the D-FINE model, and saves the results, including detected bounding boxes.

Config
agent:
  name: dfine_inference
  description: "run inferences with D-FINE"

templates:
  - template_name: InputTemplate
    class_name: InputTemplate
    attributes: {}

  - template_name: FolderImageDatasetCV2
    class_name: FolderImageDatasetCV2
    template_input: InputTemplate
    attributes:
      data_dir: datasets/coco

  - template_name: DFINEInference
    class_name: DFINEInference
    template_input: FolderImageDatasetCV2
    attributes:
      model_path: ustc-community/dfine-small-coco
      batch_size: 16
      threshold: 0.5
      device: cuda

  - template_name: BBoxDrawer
    class_name: BBoxDrawer
    template_input: DFINEInference
    attributes:
      overwrite: true
      randomized_color: false

  - template_name: ImageSaver
    class_name: ImageSaver
    template_input: BBoxDrawer
    attributes:
      root_dir: datasets
      save_dir: output
      extension: png

This configuration defines an agent and a sequence of templates to run object detection with D-FINE.

[!IMPORTANT] The FolderImageDatasetCV2, BBoxDrawer and ImageSaver correspond to sinapsis-data-readers, sinapsis-data-visualization and sinapsis-data-writers. If you want to use the example, please make sure you install the packages.

To run the config, use the CLI:

sinapsis run name_of_config.yml

🌐 Webapp

The webapps included in this project demonstrate the modularity of the templates, showcasing the capabilities of various object detection models for different tasks.

[!IMPORTANT] To run the app, you first need to clone this repository:

git clone git@github.com:Sinapsis-ai/sinapsis-object-detection.git
cd sinapsis-object-detection

[!NOTE] If you'd like to enable external app sharing in Gradio, export GRADIO_SHARE_APP=True

[!NOTE] Agent configuration can be changed through the AGENT_CONFIG_PATH env var. You can check the available configurations in each package configs folder.

[!NOTE] When running the app with the D-FINE model, it defaults to a confidence threshold of 0.5, uses CUDA for acceleration, and employs the nano-sized D-FINE model trained on the COCO dataset. These settings can be customized by modifying the demo.yml file inside packages/sinapsis_dfine/src/sinapsis_dfine/configs directory and restarting the webapp.

🐳 Docker

IMPORTANT: This docker image depends on the sinapsis-nvidia:base image. Please refer to the official sinapsis instructions to Build with Docker.

  1. Build the sinapsis-object-detection image:
docker compose -f docker/compose.yaml build
  1. Start the app container:
docker compose -f docker/compose_apps.yaml up sinapsis-dfine-gradio -d
  1. Check the status:
docker logs -f sinapsis-dfine-gradio
  1. The logs will display the URL to access the webapp, e.g.:
Running on local URL:  http://127.0.0.1:7860

NOTE: The url can be different, check the output of logs.

  1. To stop the app:
docker compose -f docker/compose_apps.yaml down
💻 UV

To run the webapp using the uv package manager, follow these steps:

  1. Create the virtual environment and sync the dependencies:
uv sync --frozen
  1. Install the sinapsis-object-detection package:
uv pip install sinapsis-object-detection[all] --extra-index-url https://pypi.sinapsis.tech
  1. Run the webapp:
uv run webapps/detection_demo.py
  1. The terminal will display the URL to access the webapp, e.g.:
Running on local URL:  http://127.0.0.1:7860

NOTE: The url can be different, check the output of the terminal.

📙 Documentation

Documentation for this and other sinapsis packages is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

🔍 License

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.

For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_dfine-0.2.2.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinapsis_dfine-0.2.2-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file sinapsis_dfine-0.2.2.tar.gz.

File metadata

  • Download URL: sinapsis_dfine-0.2.2.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.16

File hashes

Hashes for sinapsis_dfine-0.2.2.tar.gz
Algorithm Hash digest
SHA256 57382c1b9ec22ec74f49f4116211237903a57125f43d6c532ecfdaf6f6aeea87
MD5 c8a466c0d31b436f7b1e6eee08e8f66e
BLAKE2b-256 a189806ccdbed6c8fabf4b24f9ac7f42346015060391f8c2c0f62de50652742d

See more details on using hashes here.

File details

Details for the file sinapsis_dfine-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sinapsis_dfine-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 203875c15c315ecd2c62d3e8963463dd4aef6bee27426e440831f8a409b27004
MD5 c14ee83fc8a6e9a2ced40db3fbda64b8
BLAKE2b-256 9048ebeadbb25485bb2e72f0328e5747bd3ce36a78a4cf9c0c9a2cb0fb18187a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page