Structural analysis for integers — classify, scan, compare, and track number structure using a shared label language
Project description
primehelix
primehelix shows how structural constraints reshape integer distributions — beyond what naive prime-counting predicts.
Every integer receives a compact structure label encoding classification, geometric balance, and residue-family membership into one token: semiprime | lopsided | mod4_1x3. Those labels are the common currency across all five commands: classify one number, scan a million, compare two ranges, track trends over time.
Findings
All measurements below come from scanning [1, 1 000 000). Every command shown is fully reproducible.
At 1M scale: ~73% of semiprimes are lopsided. At 10M that rises to ~79%. Balanced (RSA-like) semiprimes fall below 0.7%. The even-involved share nearly doubles under the lopsided constraint. This distribution strengthens — not randomizes — as the range grows.
Lopsided semiprimes dominate — and grow more dominant with range
In [1, 1M), semiprimes break into three balance tiers:
| Balance tier | Share |
|---|---|
| lopsided (factors differ by > 8 bits) | 73.2% |
| moderate | 25.9% |
| balanced (RSA-like — factors nearly equal bit-length) | 0.80% |
Balanced semiprimes are rarer than 1 in 125. The bias compounds: at 10M scale lopsided share reaches 78.5% and balanced falls to 0.66%. As range grows, lopsided pairs gain share and moderate ones shrink — consistently across every mod4 residue family:
| Structure | delta [1,500k) → [500k,1M) | delta [1,5M) → [5M,10M) |
|---|---|---|
| semiprime | lopsided | mod4_1x3 | +2.69% | +1.83% |
| semiprime | moderate | mod4_1x3 | −2.05% | −1.39% |
| semiprime | lopsided | mod4_3x3 | +1.51% | +0.89% |
| semiprime | moderate | mod4_3x3 | −1.39% | −0.82% |
The mechanism: small primes (2, 3, 5, 7, …) are reused repeatedly as the smaller factor of larger and larger semiprimes, widening the bit-gap with every step. The effect is self-reinforcing and does not saturate.
The lopsided constraint shifts residue families
Applying a structural constraint (lopsidedness) measurably distorts residue-family distribution:
Filtering to lopsided semiprimes changes the mod4 pair distribution in a predictable direction:
| Mod4 pair | All semiprimes | Lopsided only | Shift |
|---|---|---|---|
| mod4_1x3 (mixed families) | 40.0% | 36.4% | −3.6 pp |
| mod4_3x3 (both gaussian) | 23.7% | 22.9% | −0.9 pp |
| mod4_1x1 (both pythagorean) | 16.4% | 13.7% | −2.7 pp |
| even-involved (factor of 2) | 19.8% | 27.0% | +7.2 pp |
The lopsided bucket absorbs all 2×p semiprimes — 2 paired with a large prime is always lopsided. This inflates the even-involved share and compresses every odd pair class.
Primes split evenly by residue family
Among 78,498 primes in [1, 1M): 50.09% gaussian (p ≡ 3 mod 4), 49.91% pythagorean (p ≡ 1 mod 4). The near-perfect symmetry is consistent with Dirichlet's theorem and stable across ranges.
primehelix structure-scan --start 1 --stop 1000000 --json
primehelix compare-ranges --a-start 1 --a-stop 500000 --b-start 500000 --b-stop 1000000 \
--only-classification semiprime --top-delta 6 --json
Install
pip install primehelix # core: classify, factor, scan, compare
pip install 'primehelix[plot]' # add matplotlib for --plot
On Linux, install GMP first for full performance (gmpy2):
sudo apt install libgmp-dev libmpfr-dev libmpc-dev
pip install primehelix
Commands
Core workflow: classify one number → scan a range → compare two ranges → track structure over time.
classify — inspect one integer
primehelix classify 1300039
primehelix classify 1300039 --helix # ASCII double-helix visualization
primehelix classify 1300039 --coil # geometric footprint metrics
primehelix classify 1300039 --residue # full residue profile
primehelix classify 1300039 --json # machine-readable output
--helix output (1300039 = 13 × 100003, bit_gap=13):
1300039 → semiprime
Helix (p=13, q=100003)
balance=87.696, bit_gap=13
+-------------------*
+ *
*---------------------+
* +
+~~~~~~~*
+
--json output:
{
"command": "classify",
"n": 1300039,
"classification": "semiprime",
"factors": {"13": 1, "100003": 1},
"factorization": "13 * 100003",
"method": "trial",
"complete": true,
"structure": "semiprime | lopsided | mod4_1x3",
"residue": {
"semiprime_mod4_pair": "1x3",
"semiprime_mod4_note": "mixed 1 mod 4 and 3 mod 4 factor families",
"factor_families_mod4": ["pythagorean", "gaussian"]
}
}
factor — full factoring pipeline
primehelix factor 2147483646
primehelix factor 2147483646 --verbose # show pipeline steps
primehelix factor 2147483646 --json --verbose
Pipeline: trial division → Pollard p−1 → Williams p+1 → Pollard Rho (Brent) → Lenstra ECM → Quadratic Sieve
Primality testing uses Baillie–PSW — deterministic for all 64-bit integers. complete: true means every factor is proven prime.
structure-scan — count structure labels across a range
primehelix structure-scan --start 1 --stop 1000000
primehelix structure-scan --start 1 --stop 1000000 --only-classification semiprime
primehelix structure-scan --start 1 --stop 1000000 --profile # show method distribution
primehelix structure-scan --start 1 --stop 1000000 --json
Scans every integer in [start, stop), assigns a structure label, returns counts, histogram, and Shannon entropy of the distribution. Progress shown on stderr for ranges over 10,000.
compare-ranges — diff structure distributions
primehelix compare-ranges \
--a-start 1 --a-stop 500000 \
--b-start 500000 --b-stop 1000000 \
--only-classification semiprime --top-delta 6
Shows which structure labels gained or lost share between two ranges, with delta, ratio, and per-range entropy.
structure-time-series — track structural trends over sliding windows
primehelix structure-time-series \
--start 1 --stop 1000000 \
--window 100000 --step 100000 \
--only-classification semiprime \
--top 5 \
--plot semiprime_ts.png
Divides [start, stop) into windows, computes structure distributions in each, and plots the top-N label series as a line chart. Omit --plot for a text summary.
Python API
All analysis functions work as a library — no CLI required. Results are typed dataclasses.
from primehelix.analysis import scan_range, compare_summaries, build_time_series
# Scan a range and inspect label counts
scan = scan_range(1, 100_000)
print(scan.total) # total integers counted
print(scan.counts.most_common(5)) # top 5 structure labels
# Compare two ranges — see which labels gained or lost share
s1 = scan_range(1, 500_000, only_classification="semiprime")
s2 = scan_range(500_000, 1_000_000, only_classification="semiprime")
rows = compare_summaries(s1, s2)
for row in sorted(rows, key=lambda r: -abs(r.delta))[:5]:
print(f"{row.delta:+.2f}pp {row.structure}")
# Track structure trends across windows
ts = build_time_series(1, 1_000_000, window=100_000, step=100_000,
only_classification="semiprime")
for label in ts.top_labels:
print(label, ts.series_map[label])
# Export results directly from the API
import json
with open("scan.json", "w") as f:
json.dump({"start": 1, "stop": 100_000, **scan.to_json_dict()}, f, indent=2)
Use detail="classification" for fast classification-only counts (no geometry, ~10% faster):
scan = scan_range(1, 10_000_000, only_classification="prime", detail="classification")
print(scan.total) # prime count in [1, 10M) — no residue family breakdown
Structure labels
Every integer gets a label of up to three parts joined by |:
semiprime | lopsided | mod4_1x3
prime | gaussian
composite
invalid
| Part | What it encodes |
|---|---|
| Classification | prime, semiprime, composite, invalid |
| Balance | balanced, moderate, lopsided — bit-length gap between factors; semiprimes only |
| Residue family | mod4_1x3, mod4_3x3, pythagorean, gaussian, etc. |
Labels are stable strings — safe to grep, aggregate, diff between ranges, and use as dict keys across runs. The grammar is fixed: classification first, balance second (when present), residue family last.
JSON schema
All commands support --json. The schema is stable across patch versions.
classify and factor:
| Field | Present in | Notes |
|---|---|---|
command |
both | "classify" or "factor" |
n |
both | integer |
classification |
classify | "prime", "semiprime", "composite", "invalid" |
factors |
both | {"p": exponent, ...} |
prime_factors |
both | flat list, e.g. [3, 3, 7] for 3²×7 |
factorization |
both | "2 * 3^2 * 7" (ASCII) |
method |
both | last algorithm used |
elapsed_ms |
both | wall time in milliseconds |
complete |
both | true if all factors proven prime |
structure |
classify | compact label string |
steps |
factor with --verbose |
pipeline step trail; [] otherwise |
coil |
classify with --coil |
geometric footprint + insight string |
residue |
classify | mod4/mod6/mod30 profile |
structure-scan and compare-ranges:
| Field | Notes |
|---|---|
entropy |
Shannon entropy (bits) of label distribution — 0 = single label, log₂(k) = uniform |
a.entropy, b.entropy |
per-range entropy in compare-ranges |
entropy_delta |
b.entropy − a.entropy; positive = B more structurally diverse |
methods |
factorization method counts (structure-scan with --profile) |
Breaking changes will be documented in release notes with a minor version bump.
Guarantees and limits
Deterministic: Structure labels are computed from factorization alone — identical input always produces identical output. Baillie–PSW is deterministic for all integers up to 2⁶⁴.
May time out: The factoring pipeline has a configurable budget (--budget, default 10 000 ms). Hard numbers may return complete: false with a partial factorization.
Stable and scriptable: classify, structure-scan, compare-ranges, and structure-time-series with --json produce output safe to pipe, grep, and aggregate across runs.
Experimental: --coil and --helix geometry output is under active development. Coordinate values and balance thresholds may change between minor versions. Do not parse coil.insight strings programmatically.
Develop and test
git clone https://github.com/onojk/primehelix.git
cd primehelix
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v
Architecture
primehelix/
├── cli.py — 5 Click commands + scan helpers
├── core/
│ ├── primes.py — Baillie-PSW (Miller-Rabin + strong Lucas PRP)
│ ├── factor.py — Pipeline orchestration
│ ├── rho.py — Pollard Rho (Brent, batch-GCD)
│ ├── pm1.py — Pollard p−1 / Williams p+1
│ ├── ecm.py — Lenstra ECM (pure Python + gmpy2)
│ └── qs.py — Quadratic Sieve (GF(2) left nullspace)
├── geometry/
│ ├── coil.py — Conical helix model, CoilFootprint, CoilBalance
│ ├── residue.py — Mod4/mod6/mod30 residue profiling
│ ├── bitbucket.py — Bit-bucket placement and density
│ └── tangent.py — Equal/tangent/ideal split diagnostics
├── display/
│ ├── output.py — Rich terminal panels and tables
│ ├── json_output.py — JSON schema, structure_summary label builder
│ ├── plots.py — Matplotlib time-series line charts
│ └── ascii_helix.py — ASCII double-helix renderer
└── scan/
└── wheel.py — Mod-210 wheel scanner, resumable gzip CSV
primehelix consolidates five research repositories: geom_factor (Quadratic Sieve, geometric model), rsacrack (factoring pipeline, coil classifier), ECC-Tools (ECM reference), Cprime (GMP-backed CLI), onojk123 (wheel scanner, tangent prime test).
Integer structure is not uniformly distributed — it is shaped by reusable factor patterns and structural constraints that produce stable, predictable statistical behavior. primehelix makes that behavior visible and measurable.
Author
Jonathan Kendall — https://github.com/onojk
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file primehelix-0.3.0.tar.gz.
File metadata
- Download URL: primehelix-0.3.0.tar.gz
- Upload date:
- Size: 46.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82c3e0f78677cd8e2fba4d57ca431cbe7fdd92af20ac1a3c87ec40ac1c4923a1
|
|
| MD5 |
16863c39856b64ed2d920d2ccc69d5af
|
|
| BLAKE2b-256 |
8a9a3c1b4543cd0ee0a610db4df2ca1111ca1c307ec7537d9ebb43b650881b75
|
Provenance
The following attestation bundles were made for primehelix-0.3.0.tar.gz:
Publisher:
publish.yml on onojk/primehelix
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
primehelix-0.3.0.tar.gz -
Subject digest:
82c3e0f78677cd8e2fba4d57ca431cbe7fdd92af20ac1a3c87ec40ac1c4923a1 - Sigstore transparency entry: 1365257714
- Sigstore integration time:
-
Permalink:
onojk/primehelix@5756cf8c5c6b0825305c7619b494ef30e7390ec6 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/onojk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5756cf8c5c6b0825305c7619b494ef30e7390ec6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file primehelix-0.3.0-py3-none-any.whl.
File metadata
- Download URL: primehelix-0.3.0-py3-none-any.whl
- Upload date:
- Size: 40.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fbf3f86cbde7458e2c8034347025be45032f0ce62f1bb8e3c24282667605363
|
|
| MD5 |
59f86d3fc77a2037c906bb7caa859b2a
|
|
| BLAKE2b-256 |
19b47af4785ae37d26ed85791451fb29400ec5dea801e7d72554b1df999df5ae
|
Provenance
The following attestation bundles were made for primehelix-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on onojk/primehelix
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
primehelix-0.3.0-py3-none-any.whl -
Subject digest:
2fbf3f86cbde7458e2c8034347025be45032f0ce62f1bb8e3c24282667605363 - Sigstore transparency entry: 1365257813
- Sigstore integration time:
-
Permalink:
onojk/primehelix@5756cf8c5c6b0825305c7619b494ef30e7390ec6 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/onojk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5756cf8c5c6b0825305c7619b494ef30e7390ec6 -
Trigger Event:
push
-
Statement type: