Data bundle for conformal-clip examples and tests
Project description
conformal-clip-data
A companion data package providing benchmark datasets for the clip-conformal package.
Overview
This package bundles the simulated textile image dataset used in Megahed et al., 2025 for demonstrating conformal prediction with CLIP-based few-shot image classification in manufacturing quality control applications.
This is a data-only package designed to work seamlessly with the clip-conformal package (coming soon to PyPI), which provides the core implementation of conformal prediction methods for CLIP models. By separating data from implementation, we keep the main package lightweight while providing easy access to reproducible benchmark datasets.
Installation
Install directly from PyPI:
pip install conformal-clip-data
Or install from source:
git clone https://github.com/fmegahed/conformal-clip-data.git
cd conformal-clip-data
pip install -e .
Quick Start
from conformal_clip_data import get_data_path
# Access the textile dataset
data_path = get_data_path("textile")
print(f"Dataset location: {data_path}")
Dataset Provenance
These images were originally generated using the R script below and were previously released under an MIT License in our repository:
-
Image generation script: https://raw.githubusercontent.com/fmegahed/qe_genai/refs/heads/main/data/textile_images/extract_textile_images_from_r_textile_pkg.R
-
Original dataset release: https://github.com/fmegahed/qe_genai/tree/main/data/textile_images
Dataset Summary
To systematically evaluate CLIP's performance on STS image classification, we used the spc4sts R package to create a controlled dataset of simulated textile fabric textures. This approach allowed us to precisely model both nominal and defective weave structures and to control defect type and severity.
Our dataset contains:
| Class | Description | Count |
|---|---|---|
| Nominal | Standard textile weave patterns | 1,000 |
| Local defects | Localized disruptions in the weave | 500 |
| Global defects | Systematic shifts in weave parameters | 500 |
Each image is 250 × 250 px, generated using spc4sts recommended
parameters:
-
Nominal images:
Spatial autoregressive parameters ϕ₁ = 0.6, ϕ₂ = 0.35 -
Global defects:
Both parameters reduced by 5% -
Local defects:
Generated using the package's defect-insertion functions
Relationship with clip-conformal
This data package is designed as a companion to the clip-conformal package, which will be released to PyPI shortly. The separation of concerns provides several benefits:
- Lightweight installation: The clip-conformal package remains small and fast to install
- Reproducibility: Benchmark datasets are versioned and distributed consistently
- Extensibility: Additional datasets can be added without modifying the core package
- Optional usage: Users can work with clip-conformal using their own data without downloading benchmark datasets
For the full implementation of conformal prediction methods for CLIP models and complete examples using this dataset, please install the clip-conformal package (coming soon).
Citation
If you use this dataset in your research, please cite:
@misc{megahed2025adaptingopenaisclipmodel,
title={Adapting OpenAI's CLIP Model for Few-Shot Image Inspection in Manufacturing Quality Control: An Expository Case Study with Multiple Application Examples},
author={Fadel M. Megahed and Ying-Ju Chen and Bianca Maria Colosimo and Marco Luigi Giuseppe Grasso and L. Allison Jones-Farmer and Sven Knoth and Hongyue Sun and Inez Zwetsloot},
year={2025},
eprint={2501.12596},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.12596},
}
And the original spc4sts package used to generate the textile images:
@article{bui2020spc4sts,
title={spc4sts: Statistical process control for stochastic textured surfaces in R},
author={Bui, Anh Tuan and Apley, Daniel W},
journal={Journal of Quality Technology},
volume={53},
number={3},
pages={219--242},
year={2020},
doi={10.1080/00224065.2019.1707730}
}
License
MIT License. These images were generated by the authors and are released under the MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file conformal_clip_data-0.1.1.tar.gz.
File metadata
- Download URL: conformal_clip_data-0.1.1.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
037ee85c215bd2011919a463206bc69855341acf00c0612f77d41702d2fae0a9
|
|
| MD5 |
40f4cc83d78c0c07b6e479bc7d6c7e59
|
|
| BLAKE2b-256 |
1e8f89ec29a222ab696739ad17c47261483087287635257f11d9ba09033a6e7d
|
Provenance
The following attestation bundles were made for conformal_clip_data-0.1.1.tar.gz:
Publisher:
publish-to-pypi.yml on fmegahed/conformal-clip-data
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
conformal_clip_data-0.1.1.tar.gz -
Subject digest:
037ee85c215bd2011919a463206bc69855341acf00c0612f77d41702d2fae0a9 - Sigstore transparency entry: 676721519
- Sigstore integration time:
-
Permalink:
fmegahed/conformal-clip-data@d6dd1530e691c0583f7e2b15db61f42e0234d6ef -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/fmegahed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@d6dd1530e691c0583f7e2b15db61f42e0234d6ef -
Trigger Event:
push
-
Statement type:
File details
Details for the file conformal_clip_data-0.1.1-py3-none-any.whl.
File metadata
- Download URL: conformal_clip_data-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
641cef639099b648f9a7cdb5a2bb42d1a185ba145b7790b7f1a49951b48c505b
|
|
| MD5 |
52da1d36f88262c8609d51346f662bbf
|
|
| BLAKE2b-256 |
e32cf3ea6f9060e4709a8fe29db25a9259e40dbca9589dd091587d7301b0014e
|
Provenance
The following attestation bundles were made for conformal_clip_data-0.1.1-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on fmegahed/conformal-clip-data
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
conformal_clip_data-0.1.1-py3-none-any.whl -
Subject digest:
641cef639099b648f9a7cdb5a2bb42d1a185ba145b7790b7f1a49951b48c505b - Sigstore transparency entry: 676721532
- Sigstore integration time:
-
Permalink:
fmegahed/conformal-clip-data@d6dd1530e691c0583f7e2b15db61f42e0234d6ef -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/fmegahed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@d6dd1530e691c0583f7e2b15db61f42e0234d6ef -
Trigger Event:
push
-
Statement type: