Structural analysis for integers — classify, scan, compare, and track number structure using a shared label language
Project description
primehelix
Structural analysis for integers — explore how numbers are built, not just what they are.
Most number theory tools answer what: is this prime, what are the factors. primehelix also answers how: what arithmetic family does each factor belong to, how balanced is the factor pair, where does the number sit on a conical helix, and how do these structural patterns shift across large ranges.
Every integer gets a compact structure label — semiprime | lopsided | mod4_1x3, prime | gaussian, composite — that encodes classification, geometric balance, and residue-family membership into one scannable token. Those labels are the spine of the tool: classify one number, scan a million, compare two ranges, plot trends over time.
Quick start
git clone https://github.com/onojk/primehelix.git
cd primehelix
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
primehelix classify 1300039 --helix
primehelix classify 1300039 --json
primehelix structure-scan --start 1 --stop 100000
primehelix compare-ranges --a-start 1 --a-stop 50000 --b-start 50000 --b-stop 100000 --top-delta 6
On Linux, install GMP before pip install:
sudo apt install libgmp-dev libmpfr-dev libmpc-dev
Core concepts
Structure labels
Every integer is assigned a structure label composed of up to three parts joined by |:
semiprime | lopsided | mod4_1x3
prime | gaussian
composite
invalid
| Part | What it encodes |
|---|---|
| Classification | prime, semiprime, composite, invalid |
| Balance | balanced, moderate, lopsided — based on bit-length gap between factors |
| Residue family | mod4_1x3, mod4_3x3, pythagorean, gaussian, etc. |
These labels are stable strings — safe to grep, aggregate, diff between ranges, and track over time.
Residue families
Odd primes split into two families by their residue mod 4:
- Pythagorean primes (p ≡ 1 mod 4) — expressible as a sum of two squares
- Gaussian primes (p ≡ 3 mod 4) — remain prime in the Gaussian integers
Semiprimes inherit a mod4 pair from their two factors: 1x1, 1x3, or 3x3. This pair is stable under the prime number theorem — its distribution converges predictably as range grows, and shifts between ranges are measurable.
Conical helix geometry
Integers are mapped to points on a conical helix in 3D:
r(n) = r₀ + α·n radius grows with n
θ(n) = 2π·n / L angular position
z(n) = β·n vertical position
For a semiprime n = p·q, the arc distances between n, q, p, and 1 form a geometric footprint. The bit-gap between the factors controls how the helix spreads — balanced RSA-like primes produce a symmetric shape; lopsided pairs compress one strand. The --helix flag renders this as an ASCII double-helix in the terminal.
Commands
classify — classify and inspect one integer
primehelix classify 1300039
primehelix classify 1300039 --helix # ASCII helix visualization
primehelix classify 1300039 --coil # geometric footprint metrics
primehelix classify 1300039 --residue # full residue profile
primehelix classify 1300039 --json # machine-readable output
--helix output (1300039 = 13 × 100003, bit_gap=13):
1300039 → semiprime
Helix (p=13, q=100003)
balance=87.696, bit_gap=13
+-------------------*
+ *
*---------------------+
* +
+~~~~~~~*
+
+~~~~~~~*
+ *
*-----------------+
* +
+-------------------------------*
+ *
*---------------------------------+
The spread and compression of the helix reflect the actual factor structure — a balanced semiprime like 110000479000513 (= 10000019 × 11000027, bit_gap=0) produces a tight symmetric pattern; a lopsided one like 1300039 produces a wide expanding cone.
--json output:
{
"command": "classify",
"n": 1300039,
"classification": "semiprime",
"factors": {"13": 1, "100003": 1},
"factorization": "13 * 100003",
"method": "trial",
"complete": true,
"structure": "semiprime | lopsided | mod4_1x3",
"residue": {
"semiprime_mod4_pair": "1x3",
"semiprime_mod4_note": "mixed 1 mod 4 and 3 mod 4 factor families",
"factor_families_mod4": ["pythagorean", "gaussian"]
}
}
factor — full factoring pipeline
primehelix factor 2147483646
primehelix factor 2147483646 --verbose # show pipeline steps
primehelix factor 2147483646 --json
primehelix factor 2147483646 --json --verbose
Pipeline: trial division → Pollard p−1 → Williams p+1 → Pollard Rho (Brent) → Lenstra ECM → Quadratic Sieve
Output:
n │ 2147483646
factorization │ 2 × 3^2 × 7 × 11 × 31 × 151 × 331
method │ rho
complete │ yes
Pipeline steps (--verbose):
· trial: 2 · trial: 3 · trial: 3 · trial: 7
· trial: 11 · trial: 31 · rho: 151
Primality testing uses Baillie–PSW (Miller–Rabin base-2 + strong Lucas PRP) — deterministic for all 64-bit integers. Prime cofactors are always proven before the factorization is marked complete.
structure-scan — count structure labels across a range
primehelix structure-scan --start 1 --stop 1000000
primehelix structure-scan --start 1 --stop 1000000 --only-classification semiprime
primehelix structure-scan --start 1 --stop 1000000 --json
Scans every integer in [start, stop), assigns a structure label, and returns counts with a histogram. Progress is shown on stderr for ranges over 10,000 numbers.
Output (abridged):
structure summary
┌────────────────────────────────┬────────┬────────┬──────────────────────┐
│ structure │ count │percent │ histogram │
├────────────────────────────────┼────────┼────────┼──────────────────────┤
│ composite │ 531820 │ 53.18% │ ██████████████████████│
│ prime | gaussian │ 87432 │ 8.74% │ ████████ │
│ prime | pythagorean │ 80251 │ 8.03% │ ███████ │
│ semiprime | moderate | mod4_1x3│ 93114 │ 9.31% │ ████████ │
│ semiprime | lopsided | mod4_1x3│ ... │ ... │ ... │
└────────────────────────────────┴────────┴────────┴──────────────────────┘
compare-ranges — diff structure distributions between two ranges
primehelix compare-ranges \
--a-start 1 --a-stop 50000 \
--b-start 50000 --b-stop 100000 \
--top-delta 6
Shows which structure types grew or shrank most between two ranges, with counts, percentages, delta, and ratio.
Output:
range comparison | top delta 6
┌─────────────────────────────────┬───────┬───────┬───────┬───────┬───────┬───────┐
│ structure │ [1,50k│ [1, │[50k, │ [50k, │ delta │ ratio │
│ │ count │ 50k% │100k) │100k)% │ │ │
├─────────────────────────────────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ composite │ 32755 │ 65.5% │ 34273 │ 68.5% │ +3.0% │ 1.05x │
│ semiprime | moderate | mod4_1x3 │ 2203 │ 4.4% │ 1711 │ 3.4% │ -0.9% │ 0.78x │
│ prime | gaussian │ 2583 │ 5.2% │ 2225 │ 4.5% │ -0.7% │ 0.86x │
│ semiprime | lopsided | mod4_1x3 │ 2376 │ 4.8% │ 2684 │ 5.4% │ +0.6% │ 1.13x │
└─────────────────────────────────┴───────┴───────┴───────┴───────┴───────┴───────┘
Use --only-classification semiprime to isolate one class. Use --json to pipe results downstream.
structure-time-series — track structural trends over sliding windows
primehelix structure-time-series \
--start 1 --stop 1000000 \
--window 100000 --step 100000 \
--only-classification semiprime \
--top 5 \
--plot semiprime_ts.png
Divides [start, stop) into overlapping or non-overlapping windows, computes structure distributions in each, selects the top-N series by aggregate weight, and plots them as a line chart. Omit --plot for a compact text summary instead.
JSON output
classify and factor both support --json. The schema is stable across patch versions:
| Field | Present in | Notes |
|---|---|---|
command |
both | "classify" or "factor" |
n |
both | integer |
classification |
classify | "prime", "semiprime", "composite", "invalid" |
factors |
both | {"p": exponent, ...} |
prime_factors |
both | flat list, e.g. [3, 3, 7] for 3²×7 |
factorization |
both | "2 * 3^2 * 7" (ASCII) |
method |
both | last algorithm used |
elapsed_ms |
both | wall time in milliseconds |
complete |
both | true if all factors proven prime |
structure |
classify | compact label — "semiprime | lopsided | mod4_1x3" |
steps |
factor with --verbose |
pipeline step trail; empty list otherwise |
coil |
classify with --coil |
geometric footprint + insight string |
residue |
classify | mod4/mod6/mod30 profile |
The structure-scan and compare-ranges commands also include:
| Field | Command | Notes |
|---|---|---|
entropy |
structure-scan | Shannon entropy (bits) of the label distribution. 0 = single label, log₂(k) = uniform over k labels |
a.entropy, b.entropy |
compare-ranges | Entropy of each range independently |
entropy_delta |
compare-ranges | b.entropy − a.entropy; positive = B more diverse |
Breaking changes to this schema will be documented in release notes and accompanied by a minor version bump.
Guarantees and limits
Deterministic:
- Structure labels and residue families are computed from factorization alone — identical input always produces identical output.
- Primality testing uses Baillie–PSW (Miller-Rabin base-2 + strong Lucas PRP), which is deterministic for all integers up to 2⁶⁴. No known counterexamples exist.
complete: truemeans every factor has been proven prime. The factorization is exact.
May time out:
- The factoring pipeline has a configurable budget (
--budget, default 10 000 ms). For numbers with large prime factors that resist trial division and Pollard Rho, the pipeline may exhaust its budget and returncomplete: falsewith a partial factorization. - For most integers up to ~15 digits, factorization completes in milliseconds. Harder numbers (e.g. RSA-like products of two large primes) may time out.
Stable and scriptable:
classify,structure-scan,compare-ranges,structure-time-serieswith--jsonproduce stable, machine-readable output safe to pipe, grep, and aggregate.- Structure labels are stable strings — they are designed to be safe keys for counting and comparison across runs.
Experimental:
--coiland--helixoutput (geometric footprint, ASCII visualization) reflects a model under active development. The coordinate values and balance thresholds may change between minor versions.- The insight strings in
coil.insightare heuristic and human-readable only — do not parse them programmatically.
Empirical findings
All measurements below were produced by running primehelix against [1, 1 000 000). The commands are fully reproducible.
Overall structure distribution
primehelix structure-scan --start 1 --stop 1000000 --json
| Classification | Count | Share |
|---|---|---|
| composite | 711,465 | 71.15% |
| semiprime | 210,035 | 21.00% |
| prime | 78,498 | 7.85% |
| invalid (n ≤ 1) | 1 | — |
Primes split almost exactly evenly between the two residue families — 50.09% gaussian (p ≡ 3 mod 4) and 49.91% pythagorean (p ≡ 1 mod 4) — consistent with Dirichlet's theorem on primes in arithmetic progressions.
Semiprime balance distribution
Among the 210,035 semiprimes in [1, 1M):
| Balance tier | Count | Share |
|---|---|---|
| lopsided (bit_gap > 8 or balance ≥ 10) | 153,718 | 73.2% |
| moderate | 54,427 | 25.9% |
| balanced (bit_gap ≤ 1 and balance < 0.15) | 1,677 | 0.80% |
Lopsided pairs dominate by a wide margin. Balanced semiprimes — the RSA-like products of two primes of nearly equal bit-length — are extremely rare below 1M: under 1 in 125.
Mod4 pair distribution: all semiprimes vs lopsided-only
| Mod4 pair | All semiprimes | Lopsided only | Shift |
|---|---|---|---|
| mod4_1x3 (mixed families) | 40.0% | 36.4% | −3.6 pp |
| mod4_3x3 (both gaussian) | 23.7% | 22.9% | −0.9 pp |
| mod4_1x1 (both pythagorean) | 16.4% | 13.7% | −2.7 pp |
| even-involved (factor of 2) | 19.8% | 27.0% | +7.2 pp |
The lopsided constraint systematically shifts the distribution toward even-involved pairs. The explanation is structural: any semiprime of the form 2×p is always lopsided (p is at least 2 bits larger than 2 for p ≥ 5), so the entire even semiprime population is absorbed into the lopsided bucket. Mixed (1x3) and symmetric (1x1, 3x3) pairs are all proportionally reduced.
Lopsidedness grows with range
primehelix compare-ranges \
--a-start 1 --a-stop 500000 \
--b-start 500000 --b-stop 1000000 \
--only-classification semiprime --top-delta 6 --json
| Structure | delta | ratio |
|---|---|---|
| semiprime | lopsided | mod4_1x3 | +2.69% | 1.11× |
| semiprime | moderate | mod4_1x3 | −2.05% | 0.85× |
| semiprime | lopsided | mod4_3x3 | +1.51% | 1.09× |
| semiprime | moderate | mod4_3x3 | −1.39% | 0.81× |
| semiprime | lopsided | mod4_1x1 | +1.29% | 1.14× |
| semiprime | moderate | mod4_1x1 | −0.66% | 0.90× |
As the range shifts from [1, 500k) to [500k, 1M), lopsided semiprimes gain share and moderate ones shrink — uniformly across all three odd mod4 families. The mechanism: small primes (2, 3, 5, 7, …) are repeatedly reused as the smaller factor in semiprimes that reach into higher ranges, producing an ever-wider bit-gap between the two factors.
Install and test
git clone https://github.com/onojk/primehelix.git
cd primehelix
python3 -m venv .venv
source .venv/bin/activate
pip install -e . # core: classify, factor, scan, compare
pip install -e ".[plot]" # add matplotlib for --plot
pip install -e ".[dev]" # everything including tests
pytest tests/ -v
Sync after updates:
cd ~/primehelix && git pull && source .venv/bin/activate && pytest tests/ -v
Architecture
primehelix/
├── cli.py — 5 Click commands + scan helpers
├── core/
│ ├── primes.py — Baillie-PSW (Miller-Rabin + strong Lucas PRP)
│ ├── factor.py — Pipeline orchestration
│ ├── rho.py — Pollard Rho (Brent, batch-GCD)
│ ├── pm1.py — Pollard p−1 / Williams p+1
│ ├── ecm.py — Lenstra ECM (pure Python + gmpy2)
│ └── qs.py — Quadratic Sieve (GF(2) left nullspace)
├── geometry/
│ ├── coil.py — Conical helix model, CoilFootprint, CoilBalance
│ ├── residue.py — Mod4/mod6/mod30 residue profiling
│ ├── bitbucket.py — Bit-bucket placement and density
│ └── tangent.py — Equal/tangent/ideal split diagnostics
├── display/
│ ├── output.py — Rich terminal panels and tables
│ ├── json_output.py — JSON schema, structure_summary label builder
│ ├── plots.py — Matplotlib time-series line charts
│ └── ascii_helix.py — ASCII double-helix renderer
└── scan/
└── wheel.py — Mod-210 wheel scanner, resumable gzip CSV
Origins
primehelix consolidates five research repositories:
| Repo | Contribution |
|---|---|
geom_factor |
Quadratic Sieve, bit-bucket theory, geometric model |
rsacrack |
Factoring pipeline, coil classifier |
ECC-Tools |
ECM reference (C + libecm) |
Cprime |
GMP-backed C CLI (trial + p−1 + Rho) |
onojk123 |
Wheel scanner, tangent prime test |
Author
Jonathan Kendall https://github.com/onojk
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file primehelix-0.2.0.tar.gz.
File metadata
- Download URL: primehelix-0.2.0.tar.gz
- Upload date:
- Size: 44.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b1a06b4fddf68667c07441e27d6e446103be6335d4ead36d8c2fc837011e2f2
|
|
| MD5 |
b26482553797c4b2d5a86ba99e43fcaf
|
|
| BLAKE2b-256 |
53705f7fe406d2fa4b8484b3edf840a95c4f47d89872038f12784ef68c41510f
|
Provenance
The following attestation bundles were made for primehelix-0.2.0.tar.gz:
Publisher:
publish.yml on onojk/primehelix
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
primehelix-0.2.0.tar.gz -
Subject digest:
0b1a06b4fddf68667c07441e27d6e446103be6335d4ead36d8c2fc837011e2f2 - Sigstore transparency entry: 1361390298
- Sigstore integration time:
-
Permalink:
onojk/primehelix@cfb7ee55a33f44df497d3898984b6f778046509b -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/onojk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cfb7ee55a33f44df497d3898984b6f778046509b -
Trigger Event:
push
-
Statement type:
File details
Details for the file primehelix-0.2.0-py3-none-any.whl.
File metadata
- Download URL: primehelix-0.2.0-py3-none-any.whl
- Upload date:
- Size: 38.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b70abb49094c1ca1d22db14ff775d976a0e37a28027101280cbbabda1ee0c415
|
|
| MD5 |
60fb574228d8debe59ac34a696e23de3
|
|
| BLAKE2b-256 |
ac90749707fbf0f53d8a520117b814f3c78551a29e826d8678412492ff35faf6
|
Provenance
The following attestation bundles were made for primehelix-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on onojk/primehelix
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
primehelix-0.2.0-py3-none-any.whl -
Subject digest:
b70abb49094c1ca1d22db14ff775d976a0e37a28027101280cbbabda1ee0c415 - Sigstore transparency entry: 1361390306
- Sigstore integration time:
-
Permalink:
onojk/primehelix@cfb7ee55a33f44df497d3898984b6f778046509b -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/onojk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cfb7ee55a33f44df497d3898984b6f778046509b -
Trigger Event:
push
-
Statement type: