Skip to main content

Pure Python computational science system — quantum simulation, FFT, optimization, probability, statistics, and more.

Project description

cognitive-discovery-system

cognitive-discovery-system

PyPI version PyPI downloads Python 3.10+ Typed codecov License: MIT CI CodeQL Docs GitHub stars

Open-source computational science platform for research, simulation, and discovery.

CDS brings together quantum computing simulation, statistical analysis, signal processing, optimization, probability, and scientific computing in a single, dependency-light Python package. Every module is pure Python — no NumPy or SciPy required.

The system also includes built-in support for structured hypothesis generation, making it easier to explore ideas and connect them to simulation or analysis tools.

Stable release — see the PyPI version badge above for the latest tag. Contributions welcome!

Documentation | Tutorials | Quick Start


Latest Update: maintenance release fixing the automated publish pipeline (OIDC → scoped PyPI API token; release.yml is now the sole publish authority). No API or behavior changes. The current version is shown in the PyPI badge at the top of this README; see the CI and codecov badges for the live test count and coverage, all in readable pure Python.

Contents


Why CDS?

  • Zero heavy dependencies — pure Python implementations you can read and learn from
  • Quantum simulation — single & multi-qubit circuits with entanglement
  • Built for discovery — hypothesis generation with structured outputs (assumptions, predictions, confidence) plus a Protocol for custom implementations
  • Broad scope — the module catalog below covers math, physics, stats, ML, signals, optimization, graph theory, ODEs, numerical integration, Monte Carlo, knowledge organization, and educational NLP (BPE + embeddings)
  • Thoroughly tested — the full suite runs on every push; see the CI badge for the live test count and the codecov badge for 100% code coverage (statement + branch)
  • Practical automation — workflows for PR checklists, dependency updates, and releases to keep maintenance manageable
  • CLI included — interactive tools, demos, and ASCII visualization from your terminal

CDS vs other libraries

Need CDS NumPy/SciPy SymPy PennyLane
Pure-Python (no compile, no binary) yes no (C/Fortran) yes no (needs Qiskit/Cirq)
Quantum simulation yes, single/multi-qubit no minimal yes, full SDK
Hypothesis generation (structured) yes no no no
Educational NLP (BPE, embeddings) yes, from-scratch no no no
Single-package umbrella (math+physics+stats+ML+signals+NLP) yes no, split across 6+ partial no, focused
Production-ready CI/CD (multi-OS matrix, signed releases) yes n/a partial yes
Educational / readable source yes no, large surface yes no
Edge runtime (no BLAS) yes no partial no
Heavy numerical performance (>10⁷ ops) no, use NumPy instead yes no yes (GPU)

When to use CDS: teaching, prototyping, scientific exploration, edge deployments, custom algorithm development. When to reach for NumPy/SciPy/PennyLane: production HPC, GPU-accelerated quantum, distributed compute.

Citing CDS

If CDS is useful in your research or publications, please cite it using the information in CITATION.cff at the repository root. This helps give proper credit and track adoption in scientific work.

Modules

Module Description
cds.quantum Single & multi-qubit simulation — Hadamard, Pauli, CNOT, SWAP, Toffoli, Bell/GHZ states, entanglement detection
cds.optimization Gradient descent, Newton's method, Adam optimizer, golden section search
cds.ml Pure Python Neural Networks — MLP, dense layers, Adam-based training
cds.signals DFT, radix-2 FFT/IFFT (O(N log N)), 2D FFT/IFFT, convolution, power spectrum, filtering
cds.probability Gaussian, uniform, exponential, binomial, Poisson distributions
cds.stats Descriptive stats, Pearson correlation, linear regression, t-test, chi-square, ANOVA
cds.math_utils Numerical calculus, O(N³) LU / QR / Cholesky, eigenvalue (power iteration), Gram-Schmidt, matrix inverse
cds.data_analysis Mini-Pandas DataSet for filtering/grouping, CSV loading, ASCII visualization
cds.scientific Physical constants, formulas (KE, gravity, gas law, Schwarzschild, de Broglie, escape velocity)
cds.graph BFS, DFS, Dijkstra shortest path, Kruskal MST, topological sort, cycle detection
cds.modeling Symbolic algebra — expressions, symbolic differentiation, simplification, LaTeX export, MathModel equation systems, root-finding & parameter fitting
cds.knowledge Knowledge organization — concept graph with typed relations, research notes notebook, ranked structured retrieval (JSON persistence)
cds.montecarlo Monte Carlo integration, π estimation, Buffon's needle, random walks (1D/2D)
cds.diffeq Euler method, RK4, midpoint method, ODE system solver
cds.numerical_integration Deterministic quadrature — trapezoid, Simpson 1/3 & 3/8, Romberg, Gauss-Legendre, adaptive Simpson
cds.nlp Educational NLP from scratch — BPE tokenizer, sinusoidal embeddings, multi-head attention, Transformer block, scalar autograd (SGD/Adam), MiniGPT demo
cds.hypothesis Structured hypothesis generation with prompt templates for custom research workflows

Quick Start

git clone https://github.com/Furox88/cognitive-discovery-system.git
cd cognitive-discovery-system
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest

# CLI usage
cds --help
cds constants
cds calc ke
cds modules
cds hypothesis "What causes the Hubble tension?"

Intelligence over Brute Force

CDS is built on the principle that algorithmic improvements matter more than raw loop speed. Pure Python cannot match C-extensions for tight loops, so CDS closes the gap where it can through better algorithms:

  • Quantum Simulation: Instead of multiplying state matrices for every shot, CDS uses O(1) probabilistic sampling with explicit state collapse, which is significantly faster than the naive shot-by-shot approach.
  • Linear Algebra: Uses O(N³) Partial Pivoting LU Decomposition in place of the naive O(N!) determinant expansion.
  • Signal Processing: Zero-padded O(N log N) FFT and FFT-based convolution via the Convolution Theorem.
  • Neural Networks: Adam optimizers with momentum state persistence.

See the full Intelligence & Performance Benchmark Report for detailed figures.

ASCII Visualization & Tools

You don't need heavy plotting libraries to inspect your data. CDS includes a built-in terminal visualization engine:

# Plot a sine wave or data series directly in your terminal
cds plot "1, 5, 3, 8, 4, 9" --title "My Data"

Outputs scale-aware ASCII line plots and bar charts.

Interactive Dashboard

CDS includes an Interactive Web Dashboard for scientific exploration. Launch it from your terminal:

pip install "cognitive-discovery-system[dashboard]"
cds dashboard

The dashboard includes a Hypothesis Engine, Quantum Circuit Simulator, Neural Network training visualizer, and Statistical testing lab.

Scientific Case Studies

Explore how CDS is used to solve real-world research problems:

  1. Hubble Tension Analysis: Generating and testing hypotheses for the expansion rate of the universe.
  2. Quantum-ML Integration: Using quantum circuit measurements as features for classical Neural Network training.

Examples

See the examples/ directory for runnable demos and docs/research-workflows.md for guidance on embedding CDS in research pipelines.

Hypothesis Generation (cognitive discovery)

# Basic demo
python examples/hypothesis_demo.py

# With stats / experiment sketch example
python examples/hypothesis_with_stats_demo.py

# Custom generator implementation (using the HypothesisGenerator Protocol)
python examples/hypothesis_custom_generator.py

# Or via CLI
cds hypothesis "What causes the Hubble tension?"

Quantum Circuit (single qubit)

from cds.quantum import QuantumCircuit, hadamard, pauli_x, simulate

circuit = QuantumCircuit().add(hadamard()).add(pauli_x())
result = circuit.run()
print(result.probabilities())

counts = simulate(circuit, shots=1000)
print(counts)  # {0: ~500, 1: ~500}

Multi-Qubit & Entanglement

from cds.quantum import (
    QuantumRegister, h_gate, cnot, bell_state,
    ghz_state, is_entangled,
)

# Bell state (|00⟩ + |11⟩) / √2
reg = bell_state(0)
print(is_entangled(reg))  # True
print(reg.measure_shots(shots=1000))  # {'00': ~500, '11': ~500}

# 4-qubit GHZ state
ghz = ghz_state(4)
counts = ghz.measure_shots(shots=1000)
print(counts)  # {'0000': ~500, '1111': ~500}

Optimization

from cds.optimization import gradient_descent, newton_method

# Find minimum of (x-3)²
result = gradient_descent(lambda x: (x - 3) ** 2, x0=10.0, lr=0.1)
print(f"x = {result.x:.6f}")  # ~3.0

# Find √2 using Newton's method
result = newton_method(lambda x: x ** 2 - 2, x0=1.5)
print(f"√2 = {result.x:.10f}")  # 1.4142135624

Signal Processing

from cds.signals import dft, fft_radix2, convolve, low_pass_filter

# FFT of a signal
signal = [complex(i) for i in range(8)]
spectrum = fft_radix2(signal)

# Convolution
result = convolve([1.0, 2.0, 3.0], [0.5, 0.5])
print(result)  # [0.5, 1.5, 2.5, 1.5]

Probability Distributions

from cds.probability import gaussian_pdf, binomial_pmf, poisson_pmf

# Gaussian PDF at x=0
print(gaussian_pdf(0.0, mu=0, sigma=1))  # 0.3989...

# Binomial: P(3 heads in 5 fair flips)
print(binomial_pmf(3, 5, 0.5))  # 0.3125

# Poisson: P(k=2, λ=3)
print(poisson_pmf(2, 3.0))  # 0.2240...

Statistics

from cds.stats import mean, stdev, correlation, linear_regression

data = [12.5, 14.3, 11.8, 15.1, 13.7]
print(f"mean={mean(data):.2f}, std={stdev(data):.2f}")

x = [1, 2, 3, 4, 5]
y = [2.1, 3.9, 6.2, 7.8, 10.1]
reg = linear_regression(x, y)
print(f"y = {reg.slope:.2f}x + {reg.intercept:.2f}, R²={reg.r_squared:.3f}")

Machine Learning

from cds.ml import Layer, MLP

# Simple XOR-like Neural Network
net = MLP([
    Layer(2, 4, activation="relu"),
    Layer(4, 1, activation="sigmoid")
])
X, y = [[0, 0], [0, 1], [1, 0], [1, 1]], [[0], [1], [1], [0]]

# Train with built-in Adam optimizer
history = net.train(X, y, epochs=50, lr=0.1)
print(f"Final loss: {history['final_loss']:.4f}")

Data Analysis & Visualization

from cds.data_analysis import DataSet, plot_bar

# Mini-Pandas DataSet for filtering and grouping
data = [{"name": "A", "score": 88}, {"name": "B", "score": 92}]
ds = DataSet(data)
filtered = ds.filter(lambda row: row["score"] > 90)
print(filtered.column("name"))  # ['B']

# Terminal Visualization
scores = {row["name"]: row["score"] for row in ds.to_list()}
print(plot_bar(scores, title="Scores"))

Scientific Computing

from cds.scientific import kinetic_energy, escape_velocity, get_constant

print(get_constant("c"))          # speed of light
print(kinetic_energy(10, 5))      # 125.0 J
print(escape_velocity(5.972e24, 6.371e6))  # ~11186 m/s

Graph Theory

from cds.graph import Graph, dijkstra, kruskal_mst, bfs

g = Graph(n_vertices=4, directed=False)
g.add_edge(0, 1, 1.0)
g.add_edge(1, 2, 2.0)
g.add_edge(2, 3, 3.0)
g.add_edge(0, 3, 10.0)

dist, prev = dijkstra(g, 0)
print(dist)  # {0: 0.0, 1: 1.0, 2: 3.0, 3: 6.0}

edges, total = kruskal_mst(g)
print(f"MST weight: {total}")  # 6.0

Mathematical Modeling

from cds.modeling import Variable, Sin, Exp, solve_equation

x = Variable("x")
expr = Sin(x) * Exp(x)        # sin(x) * e^x

# Symbolic derivative (chain + product rules)
print(expr.diff("x").to_str())

# Solve x^2 - 2 = 0  =>  x = sqrt(2)
root = solve_equation(Variable("x") ** 2 - 2, variable="x", x0=1.0)
print(root.x)                 # ~1.4142
print(root.converged)         # True

Knowledge Organization

from cds.knowledge import KnowledgeGraph, Notebook, search

kg = KnowledgeGraph(name="Cosmology")
kg.link_concepts("Dark Energy", "Hubble Constant", kind="affects")
kg.link_concepts("Hubble Constant", "CMB", kind="constrains")

# Shortest path across the (undirected) graph
print(kg.find_path("Dark Energy", "CMB"))
# ['Dark Energy', 'Hubble Constant', 'CMB']

nb = Notebook(name="Lab Book")
nb.add_note("n1", "Hubble Tension", "Local vs CMB H0 disagree.",
            tags=["experiment"], linked_concepts=["Hubble Constant"])

# Ranked retrieval across both concepts and notes
for hit in search(kg, nb, query="hubble"):
    print(hit.concept_name or hit.note_id, hit.score)

Monte Carlo Simulation

import math
from cds.montecarlo import estimate_pi, mc_integrate

if __name__ == "__main__":
    # Unit-circle method
    result = estimate_pi(n_samples=100_000, seed=42)
    print(f"PI approximation: {result.estimate:.4f}")

    # Integration
    area = mc_integrate(math.sin, 0, math.pi, n_samples=100_000)
    print(f"Integral of sin(x): {area.estimate:.4f}")

Differential Equations

from cds.diffeq import rk4, solve_system
import math

# dy/dt = -y, y(0)=1  =>  y(t) = e^(-t)
sol = rk4(lambda t, y: -y, t0=0, y0=1.0, t_end=2.0)
print(f"y(2) = {sol.y[-1]:.6f}")  # ~0.135335 (e^-2)

# Harmonic oscillator: x'' = -x
def harmonic(t, y):
    return [y[1], -y[0]]
t_vals, y_vals = solve_system(harmonic, 0, [1.0, 0.0], math.pi)
print(f"x(π) = {y_vals[-1][0]:.4f}")  # ~-1.0

Numerical Integration

import math
from cds.numerical_integration import simpson, gaussian_quadrature, romberg

# ∫_0^π sin(x) dx = 2
print(simpson(math.sin, 0, math.pi, n=100))  # ~2.0, O(h⁴)

# Gauss-Legendre: exact for polynomials up to degree 2n-1
print(gaussian_quadrature(lambda x: x**7, 0, 1, n=4))  # 0.125 (exact)

# Romberg reaches full machine precision on smooth integrands
result = romberg(math.exp, 0, 1, tol=1e-12)
print(f"∫e^x = {result.value:.10f}")  # ~1.7182818285

Architecture

src/cds/
├── quantum/        # Quantum circuit simulation (single & multi-qubit)
├── optimization/   # Gradient descent, Newton, Adam, line search
├── ml/             # Neural Networks (MLP, Layers, Adam training)
├── signals/        # DFT, FFT, convolution, filtering
├── probability/    # Probability distributions & sampling
├── stats/          # Statistical analysis & regression
├── math_utils/     # Calculus, linear algebra, eigenvalues, Gram-Schmidt
├── data_analysis/  # Mini-Pandas DataSet, CSV loading, ASCII viz
├── scientific/     # Physical constants & formulas
├── graph/          # Graph algorithms (Dijkstra, BFS, DFS, Kruskal MST)
├── modeling/       # Symbolic math (expressions, MathModel, solvers)
├── knowledge/      # Knowledge graph, concepts, notes, structured retrieval
├── montecarlo/     # Monte Carlo methods (π, integration, random walks)
├── diffeq/         # ODE solvers (Euler, RK4, midpoint)
├── numerical_integration/  # Deterministic quadrature (trapezoid, Simpson, Romberg, Gauss-Legendre)
├── nlp/            # Educational NLP (BPE, embeddings, attention, autograd, MiniGPT)
├── hypothesis/     # Hypothesis generation
├── core/           # Shared models, config
└── cli.py          # Command-line interface

examples/           # Runnable demo scripts
tests/              # full test suite (see CI badge for the live count)
docs/               # MkDocs documentation, tutorials, benchmarks

.github/workflows/ # Automation for PRs (labels + checklist), releases, and dependency updates


## Vision

The long-term goal of CDS is to provide a lightweight, dependency-free platform for scientific exploration and discovery.

We aim to combine solid numerical foundations (quantum simulation, FFT, linear algebra, statistics, differential equations, etc.) with higher-level tools for hypothesis generation and research workflows.

A distinctive part is the `cds.hypothesis` module, which generates structured, falsifiable hypotheses with explicit assumptions and predictions. The `cds hypothesis` CLI command and `examples/hypothesis_demo.py` make this side immediately usable. Recent work has focused on making the CLI and docs more practical for day-to-day use while keeping everything readable pure Python.

The project is still early but is being actively developed with a focus on code quality, test coverage, documentation, and usability for researchers and students.

Run `cds modules` after installation to explore the current modules.

## Recent improvements

Recent updates have aimed to make it simpler to generate and explore ideas within the platform:

- New CLI commands for browsing available modules and experimenting with hypothesis generation
- A dedicated example showing how to use the hypothesis features end-to-end
- Automation around pull requests, dependency management, and releases to free up time for core scientific work

The goal is to lower the barrier for using the discovery-oriented parts of the project and reduce time spent on routine tasks.

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for setup and guidelines.

Looking for:
- Researchers with domain expertise
- People interested in pure-Python scientific computing
- Contributors for new modules (ML basics, PDE solvers, etc.)
- People who want to help make scientific tools easier to maintain and use


## Automation and Maintenance Workflows

A few GitHub Actions handle repetitive aspects of keeping the project running:

- Dependabot for regular updates to dependencies and GitHub Actions
- Automatic labeling and review checklists for pull requests
- An automated release pipeline: pushing a version tag builds, publishes to PyPI via a scoped API token, cuts a GitHub Release with artifacts, and attests build provenance (sigstore)

These help ensure that time spent on the project goes more toward developing new modules, improving hypothesis tools, and supporting research use cases rather than manual upkeep.

See `.github/workflows/` for the current setup.

## License

MIT — see [LICENSE](LICENSE).

## Contact

- Maintainer: [@Furox88](https://github.com/Furox88)
- Issues & Discussions: [GitHub](https://github.com/Furox88/cognitive-discovery-system/issues)


## Security

Found a vulnerability? **Please do not open a public issue.** Report it privately:

- GitHub private advisory: [Report a vulnerability](https://github.com/Furox88/cognitive-discovery-system/security/advisories/new)
- Or email the maintainer directly.

Acknowledgement target: **48 hours** · Fix SLA: **7 days**. Full threat model,
supported versions, and out-of-scope items are in [SECURITY.md](SECURITY.md).


## Why These Automations Exist

The project is maintained by a small team (often solo). The workflows above exist so that routine tasks (labeling PRs, running checks, cutting releases, keeping dependencies fresh) take as little time as possible. This frees hours for actual research work: improving the hypothesis tools, adding new scientific modules, writing better examples, and exploring new discovery workflows.

If you're a researcher or educator using CDS, these automations mean you can focus on the science instead of repo housekeeping.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognitive_discovery_system-1.1.7.tar.gz (274.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cognitive_discovery_system-1.1.7-py3-none-any.whl (135.1 kB view details)

Uploaded Python 3

File details

Details for the file cognitive_discovery_system-1.1.7.tar.gz.

File metadata

File hashes

Hashes for cognitive_discovery_system-1.1.7.tar.gz
Algorithm Hash digest
SHA256 5edaa72ed21dce8d42ee6425e84dd0b892c4e5097eb215e61fedb5a42f9ecbb1
MD5 3e271cadcded18121ad8879cfc77de46
BLAKE2b-256 75de7db3bf14f8c2e89c9c45ebf53b1fb73646d33722920bb34249a6be26255d

See more details on using hashes here.

File details

Details for the file cognitive_discovery_system-1.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for cognitive_discovery_system-1.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 615569c16c3c0307303d6371cb28b9760faa52cf49943fbb01bfe2a1158c650b
MD5 7121db991ca187f2fbabe0b1b574f0fe
BLAKE2b-256 2cadfd2f695f1b94cc9722935e093fbfd4903f2bdcc8f49d612c9aae8903b2c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page