HuggingFace dataset wrappers for SAT-verified training data
Project description
satisfiable-ai
HuggingFace dataset wrappers for SAT-verified training data.
Every training sample passes through a SAT verification gate before entering any pipeline. Only logically consistent samples survive. Built on satisfaction-suffices.
Critical Path
Read This First
| Time | What you get |
|---|---|
| 30 seconds | What this is and why it exists |
| 60 seconds | Why verified training data matters |
| 2 minutes | How to hook in a transformers architecture |
| 10 minutes | Full architecture + curriculum system |
| 15.5 minutes | Training on the four-verdict signal as frontier detection |
⏱ 30 seconds
Models trained on logically inconsistent data learn inconsistent reasoning. This library filters every training sample through a SAT verification gate — the same gate from satisfaction-suffices — before it enters your pipeline. Only verified samples survive by default.
⏱ 60 seconds
Current safety training (RLHF, DPO) adjusts output distribution. It does not verify logical consistency. A model can learn to sound safe while reasoning incorrectly.
Training on SAT-verified data builds a model whose training distribution is structurally consistent. Every sample that reached it passed a proof. The model learns from a population that couldn't contradict itself.
⏱ 2 minutes — Hook into transformers
from satisfiable_ai import VerifiedDataset, default_schedule
# Filter your data to SAT-verified samples only
dataset = VerifiedDataset(
samples=your_data, # list of {"text": "..."} dicts
domain="logic", # "logic" | "code" | "math" | "proof"
strict=True, # True = only VERIFIED; False = exclude CONTRADICTION only
)
# Curriculum scheduling — staggers complexity across training phases
schedule = default_schedule()
# schedule.phases: Foundation → Execution → Domain → Consolidate
# Feed each phase to your HuggingFace Trainer as a separate training stage
for phase in schedule.phases:
trainer.train_dataset = phase.sources
trainer.train()
The four verdict classes are available as labels if you want to train on the full signal:
from satisfiable_ai import verify
result = verify("if A then B. A. therefore B.", domain="logic")
# result.verdict: "VERIFIED" | "CONTRADICTION" | "PARADOX" | "TIMEOUT"
# result.sat_ratio: float — fraction of constraint groups that resolved satisfiable
⏱ 10 minutes
Architecture:
satisfiable-ai
├── satisfiable_ai/
│ ├── dataset_builder/
│ │ ├── sources.py — HuggingFace dataset registry
│ │ ├── curriculum.py — phase scheduling (Foundation → Execution → Domain → Consolidate)
│ │ └── runtime.py — VerifiedDataset / StreamingVerifiedDataset (requires torch)
│ ├── verifier/ — imported from satisfaction-suffices
│ │ ├── sat.py — DPLL solver with WalkSAT fallback
│ │ ├── verify.py — gate: extract → solve → aggregate → verdict
│ │ └── partial.py — prefix feasibility for streaming use
│ └── __init__.py — public API surface
Install:
pip install satisfiable-ai
# with torch:
pip install "satisfiable-ai[torch]"
Verdict reference:
| Verdict | Meaning | Default behavior |
|---|---|---|
| VERIFIED | Constraints satisfiable, gate opens | Include in training data |
| CONTRADICTION | Provably impossible | Exclude (strict and non-strict) |
| PARADOX | Each clause valid, conjunction unsatisfiable | Exclude (use as evolution seed) |
| TIMEOUT | Solver budget exhausted, verdict unknown | Exclude in strict mode |
⏱ 15.5 minutes — Training on the frontier
The four verdict classes are not just filters. They are training signal.
Verdict-conditioned training: label each sample with its verdict and train the model to predict verdict alongside output. A model that can predict whether its own output is VERIFIED, PARADOX, or TIMEOUT is building an internal model of its logical boundary.
Timeout as emergence detector: The 3-SAT phase transition concentrates solver timeouts at clause/variable ratio ~4.267. As a model's outputs grow more complex, timeout density in your verification pass may spike before capability emergence is otherwise measurable. Monitor result.sat_ratio distributions across training checkpoints.
Paradox curriculum: PARADOX samples — individually satisfiable clauses that are jointly unsatisfiable — represent the structural frontier of the model's current reasoning. Feed them back as the hardest curriculum stage. The model learns to recognize the shape of its own inconsistency boundaries.
For the full theoretical grounding, read the paper in satisfaction-suffices/paper/paper_01_submission.md.
License
AGPL-3.0-or-later
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file satisfiable_ai-0.1.0.tar.gz.
File metadata
- Download URL: satisfiable_ai-0.1.0.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1ce2d2ebefbc80c262e420a4b2605f5a87f2e67f5b460b3821a19b64c150a8d
|
|
| MD5 |
1419ba80cd4408b2c2139012d79ea6ec
|
|
| BLAKE2b-256 |
f2ca0528098ec29ba44ebbbd10b31a7aab857e0236ba68e7948f5fd28f884623
|
Provenance
The following attestation bundles were made for satisfiable_ai-0.1.0.tar.gz:
Publisher:
publish.yml on TimeLordRaps/satisfiable-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
satisfiable_ai-0.1.0.tar.gz -
Subject digest:
a1ce2d2ebefbc80c262e420a4b2605f5a87f2e67f5b460b3821a19b64c150a8d - Sigstore transparency entry: 1069297767
- Sigstore integration time:
-
Permalink:
TimeLordRaps/satisfiable-ai@81a368565cb5c98c69d2cc60fad567eae9b8325f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/TimeLordRaps
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@81a368565cb5c98c69d2cc60fad567eae9b8325f -
Trigger Event:
push
-
Statement type:
File details
Details for the file satisfiable_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: satisfiable_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91ee263f13e58e436c4f8888318878970aefc3cc422529eddf6974492f9e9fe0
|
|
| MD5 |
ba9ba4e08fa4e413822d5c264c76913b
|
|
| BLAKE2b-256 |
62d9f041cd46541b2480948580321dcdb52cd08af63bafa24abec79b02e7f8f6
|
Provenance
The following attestation bundles were made for satisfiable_ai-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on TimeLordRaps/satisfiable-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
satisfiable_ai-0.1.0-py3-none-any.whl -
Subject digest:
91ee263f13e58e436c4f8888318878970aefc3cc422529eddf6974492f9e9fe0 - Sigstore transparency entry: 1069297771
- Sigstore integration time:
-
Permalink:
TimeLordRaps/satisfiable-ai@81a368565cb5c98c69d2cc60fad567eae9b8325f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/TimeLordRaps
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@81a368565cb5c98c69d2cc60fad567eae9b8325f -
Trigger Event:
push
-
Statement type: