Skip to main content

Interdisciplinary GraphRAG -- find cross-domain bridges in scientific literature

Project description

mathesis-ai

Find cross-domain bridges in scientific literature. Give it a research question; it returns papers from unrelated fields that solved the same underlying problem -- expressed in different vocabulary.

Built on SPECTER embeddings, recursive semantic graph expansion, PDF RAG, and Groq LLM summaries. No server required.


Install

# Install CPU torch first (PyPI's default is a 4 GB CUDA build)
pip install torch --index-url https://download.pytorch.org/whl/cpu

pip install mathesis-ai

GPU:

pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install mathesis-ai

Requires Python 3.11+. Needs ~4 GB RAM for the embedding model.


Usage

from mathesis import Mathesis

m = Mathesis(groq_api_key="gsk_...")  # free key at console.groq.com
m.seed()                               # fetch & embed 250 papers, ~10 min, run once

results = m.query("vanishing gradients in deep neural network optimization")

for bridge in results.bridges:
    print(bridge.domain_jump)   # "q-bio.NC -> cs.LG"
    print(bridge.confidence)    # 0.796
    print(bridge.llm_summary)   # one-paragraph analogy explanation

No .env file, no server to start. Data is stored in ~/.mathesis by default.


Configuration

m = Mathesis(
    groq_api_key="gsk_...",          # required for LLM summaries
    data_dir="~/my-mathesis-data",   # default: ~/.mathesis
    domains=["cs.LG", "q-bio.NC"],   # default: 5 arXiv categories
    llm_enabled=True,                # default: True
    device="cpu",                    # "cpu" or "cuda"
)

Query response

results.bridges_found          # bool
results.bridges                # list of BridgeResult
results.top_nodes              # top 5 papers by graph rank
results.graph_stats            # {"nodes": 310, "edges": 1548, "domains": [...]}

bridge.domain_jump             # "q-bio.NC -> cs.LG"
bridge.confidence              # 0.0 - 1.0
bridge.similarity              # raw cosine similarity
bridge.rag_confirmed           # bool - methods text confirmed the bridge
bridge.rag_excerpt             # extracted methods section passage
bridge.llm_summary             # Groq-generated analogy explanation
bridge.source_paper.title
bridge.source_paper.id         # arXiv ID
bridge.target_paper.title

Async support

results = await m.aquery("attention mechanisms in transformers")

Status

print(m)
# Mathesis(papers_indexed=581, domains=5, llm=enabled)

m.status()
# {"papers_indexed": 581, "domains": [...], "data_dir": "~/.mathesis"}

Getting more data

m.seed(papers_per_domain=50)    # quick start -- 250 papers total
m.ingest()                      # full corpus -- 500 papers/domain, ~25 min

Source & docs

https://github.com/nathanpi8/mathesis-ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mathesis_ai-0.1.4.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mathesis_ai-0.1.4-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file mathesis_ai-0.1.4.tar.gz.

File metadata

  • Download URL: mathesis_ai-0.1.4.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for mathesis_ai-0.1.4.tar.gz
Algorithm Hash digest
SHA256 94ebe02593b5c75de768e1d0a5f823163a9e843f93586b75051936367050ad85
MD5 942a7c5ef66630cce7cfb4835295875d
BLAKE2b-256 5961b51a7ac3870d77e8998c2b32feee519c0cf4192245c0157a5440b7217dba

See more details on using hashes here.

File details

Details for the file mathesis_ai-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: mathesis_ai-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for mathesis_ai-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ae255dbf59cfb38c39b6ab9d80dba3b6be5ac7732c878aa1d2cbbbbada54f5a9
MD5 2ae09eeb4ae049f97c0cb19f8460f79a
BLAKE2b-256 9679f37b3f75113f3c28c6b5c16eaeb1e982b071167192c33930ca2401bdefab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page