Skip to main content

HuggingFace dataset wrappers for SAT-verified training data

Project description

satisfiable-ai

CI PyPI version License: CCUL v1.0 Python 3.10+

HuggingFace dataset wrappers for SAT-verified training data.

Every training sample passes through a SAT verification gate before entering any pipeline. Only logically consistent samples survive. Built on satisfaction-suffices.


Critical Path

Critical execution path


Read This First

Time What you get
30 seconds What this is and why it exists
60 seconds Why verified training data matters
2 minutes How to hook in a transformers architecture
10 minutes Full architecture + curriculum system
15.5 minutes Training on the four-verdict signal as frontier detection

⏱ 30 seconds

Models trained on logically inconsistent data learn inconsistent reasoning. This library filters every training sample through a SAT verification gate — the same gate from satisfaction-suffices — before it enters your pipeline. Only verified samples survive by default.


⏱ 60 seconds

Current safety training (RLHF, DPO) adjusts output distribution. It does not verify logical consistency. A model can learn to sound safe while reasoning incorrectly.

Training on SAT-verified data builds a model whose training distribution is structurally consistent. Every sample that reached it passed a proof. The model learns from a population that couldn't contradict itself.


⏱ 2 minutes — Hook into transformers

from satisfiable_ai import VerifiedDataset, default_schedule

# Filter your data to SAT-verified samples only
dataset = VerifiedDataset(
    samples=your_data,   # list of {"text": "..."} dicts
    domain="logic",      # "logic" | "code" | "math" | "proof"
    strict=True,         # True = only VERIFIED; False = exclude CONTRADICTION only
)

# Curriculum scheduling — staggers complexity across training phases
schedule = default_schedule()
# schedule.phases: Foundation → Execution → Domain → Consolidate
# Feed each phase to your HuggingFace Trainer as a separate training stage

for phase in schedule.phases:
    trainer.train_dataset = phase.sources
    trainer.train()

The four verdict classes are available as labels if you want to train on the full signal:

from satisfiable_ai import verify
result = verify("if A then B. A. therefore B.", domain="logic")
# result.verdict: "VERIFIED" | "CONTRADICTION" | "PARADOX" | "TIMEOUT"
# result.sat_ratio: float — fraction of constraint groups that resolved satisfiable

⏱ 10 minutes

Architecture:

satisfiable-ai
├── satisfiable_ai/
│   ├── dataset_builder/
│   │   ├── sources.py      — HuggingFace dataset registry
│   │   ├── curriculum.py   — phase scheduling (Foundation → Execution → Domain → Consolidate)
│   │   └── runtime.py      — VerifiedDataset / StreamingVerifiedDataset (requires torch)
│   ├── verifier/           — imported from satisfaction-suffices
│   │   ├── sat.py          — DPLL solver with WalkSAT fallback
│   │   ├── verify.py       — gate: extract → solve → aggregate → verdict
│   │   └── partial.py      — prefix feasibility for streaming use
│   └── __init__.py         — public API surface

Install:

pip install satisfiable-ai
# with torch:
pip install "satisfiable-ai[torch]"

Verdict reference:

Verdict Meaning Default behavior
VERIFIED Constraints satisfiable, gate opens Include in training data
CONTRADICTION Provably impossible Exclude (strict and non-strict)
PARADOX Each clause valid, conjunction unsatisfiable Exclude (use as evolution seed)
TIMEOUT Solver budget exhausted, verdict unknown Exclude in strict mode

⏱ 15.5 minutes — Training on the frontier

The four verdict classes are not just filters. They are training signal.

Verdict-conditioned training: label each sample with its verdict and train the model to predict verdict alongside output. A model that can predict whether its own output is VERIFIED, PARADOX, or TIMEOUT is building an internal model of its logical boundary.

Timeout as emergence detector: The 3-SAT phase transition concentrates solver timeouts at clause/variable ratio ~4.267. As a model's outputs grow more complex, timeout density in your verification pass may spike before capability emergence is otherwise measurable. Monitor result.sat_ratio distributions across training checkpoints.

Paradox curriculum: PARADOX samples — individually satisfiable clauses that are jointly unsatisfiable — represent the structural frontier of the model's current reasoning. Feed them back as the hardest curriculum stage. The model learns to recognize the shape of its own inconsistency boundaries.

For the full theoretical grounding, read the paper in satisfaction-suffices/paper/paper_01_submission.md.


License

AGPL-3.0-or-later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

satisfiable_ai-0.1.0.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

satisfiable_ai-0.1.0-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file satisfiable_ai-0.1.0.tar.gz.

File metadata

  • Download URL: satisfiable_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for satisfiable_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a1ce2d2ebefbc80c262e420a4b2605f5a87f2e67f5b460b3821a19b64c150a8d
MD5 1419ba80cd4408b2c2139012d79ea6ec
BLAKE2b-256 f2ca0528098ec29ba44ebbbd10b31a7aab857e0236ba68e7948f5fd28f884623

See more details on using hashes here.

Provenance

The following attestation bundles were made for satisfiable_ai-0.1.0.tar.gz:

Publisher: publish.yml on TimeLordRaps/satisfiable-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file satisfiable_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: satisfiable_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for satisfiable_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91ee263f13e58e436c4f8888318878970aefc3cc422529eddf6974492f9e9fe0
MD5 ba9ba4e08fa4e413822d5c264c76913b
BLAKE2b-256 62d9f041cd46541b2480948580321dcdb52cd08af63bafa24abec79b02e7f8f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for satisfiable_ai-0.1.0-py3-none-any.whl:

Publisher: publish.yml on TimeLordRaps/satisfiable-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page