Structural analysis for integers — classify, scan, compare, and track number structure using a shared label language

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

primehelix

primehelix shows how structural constraints reshape integer distributions — beyond what naive prime-counting predicts.

Every integer receives a compact structure label encoding classification, geometric balance, and residue-family membership into one token: semiprime | lopsided | mod4_1x3. Those labels are the common currency across all five commands: classify one number, scan a million, compare two ranges, track trends over time.

Findings

All measurements below come from scanning [1, 1 000 000). Every command shown is fully reproducible.

At 1M scale: ~73% of semiprimes are lopsided. At 10M that rises to ~79%. Balanced (RSA-like) semiprimes fall below 0.7%. The even-involved share nearly doubles under the lopsided constraint. This distribution strengthens — not randomizes — as the range grows.

Lopsided semiprimes dominate — and grow more dominant with range

In [1, 1M), semiprimes break into three balance tiers:

Balance tier	Share
lopsided (factors differ by > 8 bits)	73.2%
moderate	25.9%
balanced (RSA-like — factors nearly equal bit-length)	0.80%

Balanced semiprimes are rarer than 1 in 125. The bias compounds: at 10M scale lopsided share reaches 78.5% and balanced falls to 0.66%. As range grows, lopsided pairs gain share and moderate ones shrink — consistently across every mod4 residue family:

Structure	delta [1,500k) → [500k,1M)	delta [1,5M) → [5M,10M)
semiprime \| lopsided \| mod4_1x3	+2.69%	+1.83%
semiprime \| moderate \| mod4_1x3	−2.05%	−1.39%
semiprime \| lopsided \| mod4_3x3	+1.51%	+0.89%
semiprime \| moderate \| mod4_3x3	−1.39%	−0.82%

The mechanism: small primes (2, 3, 5, 7, …) are reused repeatedly as the smaller factor of larger and larger semiprimes, widening the bit-gap with every step. The effect is self-reinforcing and does not saturate.

The lopsided constraint shifts residue families

Applying a structural constraint (lopsidedness) measurably distorts residue-family distribution:

Filtering to lopsided semiprimes changes the mod4 pair distribution in a predictable direction:

Mod4 pair	All semiprimes	Lopsided only	Shift
mod4_1x3 (mixed families)	40.0%	36.4%	−3.6 pp
mod4_3x3 (both gaussian)	23.7%	22.9%	−0.9 pp
mod4_1x1 (both pythagorean)	16.4%	13.7%	−2.7 pp
even-involved (factor of 2)	19.8%	27.0%	+7.2 pp

The lopsided bucket absorbs all 2×p semiprimes — 2 paired with a large prime is always lopsided. This inflates the even-involved share and compresses every odd pair class.

Primes split evenly by residue family

Among 78,498 primes in [1, 1M): 50.09% gaussian (p ≡ 3 mod 4), 49.91% pythagorean (p ≡ 1 mod 4). The near-perfect symmetry is consistent with Dirichlet's theorem and stable across ranges.

primehelix structure-scan --start 1 --stop 1000000 --json
primehelix compare-ranges --a-start 1 --a-stop 500000 --b-start 500000 --b-stop 1000000 \
  --only-classification semiprime --top-delta 6 --json

Install

pip install primehelix                # core: classify, factor, scan, compare
pip install 'primehelix[plot]'        # add matplotlib for --plot

On Linux, install GMP first for full performance (gmpy2):

sudo apt install libgmp-dev libmpfr-dev libmpc-dev
pip install primehelix

Commands

Core workflow: classify one number → scan a range → compare two ranges → track structure over time.

`classify` — inspect one integer

primehelix classify 1300039
primehelix classify 1300039 --helix       # ASCII double-helix visualization
primehelix classify 1300039 --coil        # geometric footprint metrics
primehelix classify 1300039 --residue     # full residue profile
primehelix classify 1300039 --json        # machine-readable output

--helix output (1300039 = 13 × 100003, bit_gap=13):

1300039 → semiprime

Helix (p=13, q=100003)
balance=87.696, bit_gap=13

                      +-------------------*
                     +                     *
                     *---------------------+
                        *               +
                            +~~~~~~~*
                                +

--json output:

{
  "command": "classify",
  "n": 1300039,
  "classification": "semiprime",
  "factors": {"13": 1, "100003": 1},
  "factorization": "13 * 100003",
  "method": "trial",
  "complete": true,
  "structure": "semiprime | lopsided | mod4_1x3",
  "residue": {
    "semiprime_mod4_pair": "1x3",
    "semiprime_mod4_note": "mixed 1 mod 4 and 3 mod 4 factor families",
    "factor_families_mod4": ["pythagorean", "gaussian"]
  }
}

`factor` — full factoring pipeline

primehelix factor 2147483646
primehelix factor 2147483646 --verbose    # show pipeline steps
primehelix factor 2147483646 --json --verbose

Pipeline: trial division → Pollard p−1 → Williams p+1 → Pollard Rho (Brent) → Lenstra ECM → Quadratic Sieve

Primality testing uses Baillie–PSW — deterministic for all 64-bit integers. complete: true means every factor is proven prime.

`structure-scan` — count structure labels across a range

primehelix structure-scan --start 1 --stop 1000000
primehelix structure-scan --start 1 --stop 1000000 --only-classification semiprime
primehelix structure-scan --start 1 --stop 1000000 --profile   # show method distribution
primehelix structure-scan --start 1 --stop 1000000 --json

Scans every integer in [start, stop), assigns a structure label, returns counts, histogram, and Shannon entropy of the distribution. Progress shown on stderr for ranges over 10,000.

`compare-ranges` — diff structure distributions

primehelix compare-ranges \
  --a-start 1 --a-stop 500000 \
  --b-start 500000 --b-stop 1000000 \
  --only-classification semiprime --top-delta 6

Shows which structure labels gained or lost share between two ranges, with delta, ratio, and per-range entropy.

`structure-time-series` — track structural trends over sliding windows

primehelix structure-time-series \
  --start 1 --stop 1000000 \
  --window 100000 --step 100000 \
  --only-classification semiprime \
  --top 5 \
  --plot semiprime_ts.png

Divides [start, stop) into windows, computes structure distributions in each, and plots the top-N label series as a line chart. Omit --plot for a text summary.

Python API

All analysis functions work as a library — no CLI required. Results are typed dataclasses.

from primehelix.analysis import scan_range, compare_summaries, build_time_series

# Scan a range and inspect label counts
scan = scan_range(1, 100_000)
print(scan.total)                        # total integers counted
print(scan.counts.most_common(5))        # top 5 structure labels

# Compare two ranges — see which labels gained or lost share
s1 = scan_range(1, 500_000, only_classification="semiprime")
s2 = scan_range(500_000, 1_000_000, only_classification="semiprime")
rows = compare_summaries(s1, s2)
for row in sorted(rows, key=lambda r: -abs(r.delta))[:5]:
    print(f"{row.delta:+.2f}pp  {row.structure}")

# Track structure trends across windows
ts = build_time_series(1, 1_000_000, window=100_000, step=100_000,
                       only_classification="semiprime")
for label in ts.top_labels:
    print(label, ts.series_map[label])

# Export results directly from the API
import json
with open("scan.json", "w") as f:
    json.dump({"start": 1, "stop": 100_000, **scan.to_json_dict()}, f, indent=2)

Use detail="classification" for fast classification-only counts (no geometry, ~10% faster):

scan = scan_range(1, 10_000_000, only_classification="prime", detail="classification")
print(scan.total)   # prime count in [1, 10M) — no residue family breakdown

Structure labels

Every integer gets a label of up to three parts joined by |:

semiprime | lopsided | mod4_1x3
prime | gaussian
composite
invalid

Part	What it encodes
Classification	`prime`, `semiprime`, `composite`, `invalid`
Balance	`balanced`, `moderate`, `lopsided` — bit-length gap between factors; semiprimes only
Residue family	`mod4_1x3`, `mod4_3x3`, `pythagorean`, `gaussian`, etc.

Labels are stable strings — safe to grep, aggregate, diff between ranges, and use as dict keys across runs. The grammar is fixed: classification first, balance second (when present), residue family last.

JSON schema

All commands support --json. The schema is stable across patch versions.

classify and factor:

Field	Present in	Notes
`command`	both	`"classify"` or `"factor"`
`n`	both	integer
`classification`	classify	`"prime"`, `"semiprime"`, `"composite"`, `"invalid"`
`factors`	both	`{"p": exponent, ...}`
`prime_factors`	both	flat list, e.g. `[3, 3, 7]` for 3²×7
`factorization`	both	`"2 * 3^2 * 7"` (ASCII)
`method`	both	last algorithm used
`elapsed_ms`	both	wall time in milliseconds
`complete`	both	`true` if all factors proven prime
`structure`	classify	compact label string
`steps`	factor with `--verbose`	pipeline step trail; `[]` otherwise
`coil`	classify with `--coil`	geometric footprint + insight string
`residue`	classify	mod4/mod6/mod30 profile

structure-scan and compare-ranges:

Field	Notes
`entropy`	Shannon entropy (bits) of label distribution — 0 = single label, log₂(k) = uniform
`a.entropy`, `b.entropy`	per-range entropy in compare-ranges
`entropy_delta`	`b.entropy − a.entropy`; positive = B more structurally diverse
`methods`	factorization method counts (structure-scan with `--profile`)

Breaking changes will be documented in release notes with a minor version bump.

Guarantees and limits

Deterministic: Structure labels are computed from factorization alone — identical input always produces identical output. Baillie–PSW is deterministic for all integers up to 2⁶⁴.

May time out: The factoring pipeline has a configurable budget (--budget, default 10 000 ms). Hard numbers may return complete: false with a partial factorization.

Stable and scriptable: classify, structure-scan, compare-ranges, and structure-time-series with --json produce output safe to pipe, grep, and aggregate across runs.

Experimental: --coil and --helix geometry output is under active development. Coordinate values and balance thresholds may change between minor versions. Do not parse coil.insight strings programmatically.

Develop and test

git clone https://github.com/onojk/primehelix.git
cd primehelix
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v

Architecture

primehelix/
├── cli.py                  — 5 Click commands + scan helpers
├── core/
│   ├── primes.py           — Baillie-PSW (Miller-Rabin + strong Lucas PRP)
│   ├── factor.py           — Pipeline orchestration
│   ├── rho.py              — Pollard Rho (Brent, batch-GCD)
│   ├── pm1.py              — Pollard p−1 / Williams p+1
│   ├── ecm.py              — Lenstra ECM (pure Python + gmpy2)
│   └── qs.py               — Quadratic Sieve (GF(2) left nullspace)
├── geometry/
│   ├── coil.py             — Conical helix model, CoilFootprint, CoilBalance
│   ├── residue.py          — Mod4/mod6/mod30 residue profiling
│   ├── bitbucket.py        — Bit-bucket placement and density
│   └── tangent.py          — Equal/tangent/ideal split diagnostics
├── display/
│   ├── output.py           — Rich terminal panels and tables
│   ├── json_output.py      — JSON schema, structure_summary label builder
│   ├── plots.py            — Matplotlib time-series line charts
│   └── ascii_helix.py      — ASCII double-helix renderer
└── scan/
    └── wheel.py            — Mod-210 wheel scanner, resumable gzip CSV

primehelix consolidates five research repositories: geom_factor (Quadratic Sieve, geometric model), rsacrack (factoring pipeline, coil classifier), ECC-Tools (ECM reference), Cprime (GMP-backed CLI), onojk123 (wheel scanner, tangent prime test).

Integer structure is not uniformly distributed — it is shaped by reusable factor patterns and structural constraints that produce stable, predictable statistical behavior. primehelix makes that behavior visible and measurable.

Author

Jonathan Kendall — https://github.com/onojk

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

onojk

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.0

Apr 26, 2026

This version

0.3.0

Apr 23, 2026

0.2.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

primehelix-0.3.0.tar.gz (46.0 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

primehelix-0.3.0-py3-none-any.whl (40.1 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file primehelix-0.3.0.tar.gz.

File metadata

Download URL: primehelix-0.3.0.tar.gz
Upload date: Apr 23, 2026
Size: 46.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for primehelix-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`82c3e0f78677cd8e2fba4d57ca431cbe7fdd92af20ac1a3c87ec40ac1c4923a1`
MD5	`16863c39856b64ed2d920d2ccc69d5af`
BLAKE2b-256	`8a9a3c1b4543cd0ee0a610db4df2ca1111ca1c307ec7537d9ebb43b650881b75`

See more details on using hashes here.

Provenance

The following attestation bundles were made for primehelix-0.3.0.tar.gz:

Publisher: publish.yml on onojk/primehelix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: primehelix-0.3.0.tar.gz
- Subject digest: 82c3e0f78677cd8e2fba4d57ca431cbe7fdd92af20ac1a3c87ec40ac1c4923a1
- Sigstore transparency entry: 1365257714
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: onojk/primehelix@5756cf8c5c6b0825305c7619b494ef30e7390ec6
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/onojk
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5756cf8c5c6b0825305c7619b494ef30e7390ec6
- Trigger Event: push

File details

Details for the file primehelix-0.3.0-py3-none-any.whl.

File metadata

Download URL: primehelix-0.3.0-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 40.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for primehelix-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2fbf3f86cbde7458e2c8034347025be45032f0ce62f1bb8e3c24282667605363`
MD5	`59f86d3fc77a2037c906bb7caa859b2a`
BLAKE2b-256	`19b47af4785ae37d26ed85791451fb29400ec5dea801e7d72554b1df999df5ae`

See more details on using hashes here.

Provenance

The following attestation bundles were made for primehelix-0.3.0-py3-none-any.whl:

Publisher: publish.yml on onojk/primehelix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: primehelix-0.3.0-py3-none-any.whl
- Subject digest: 2fbf3f86cbde7458e2c8034347025be45032f0ce62f1bb8e3c24282667605363
- Sigstore transparency entry: 1365257813
- Sigstore integration time: Apr 23, 2026
Source repository:
- Permalink: onojk/primehelix@5756cf8c5c6b0825305c7619b494ef30e7390ec6
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/onojk
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5756cf8c5c6b0825305c7619b494ef30e7390ec6
- Trigger Event: push

primehelix 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

primehelix

Findings

Lopsided semiprimes dominate — and grow more dominant with range

The lopsided constraint shifts residue families

Primes split evenly by residue family

Install

Commands

classify — inspect one integer

factor — full factoring pipeline

structure-scan — count structure labels across a range

compare-ranges — diff structure distributions

structure-time-series — track structural trends over sliding windows

Python API

Structure labels

JSON schema

Guarantees and limits

Develop and test

Architecture

Author

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`classify` — inspect one integer

`factor` — full factoring pipeline

`structure-scan` — count structure labels across a range

`compare-ranges` — diff structure distributions

`structure-time-series` — track structural trends over sliding windows