Interdisciplinary GraphRAG -- find cross-domain bridges in scientific literature
Project description
mathesis-ai
Find cross-domain bridges in scientific literature. Give it a research question; it returns papers from unrelated fields that solved the same underlying problem -- expressed in different vocabulary.
Built on SPECTER embeddings, recursive semantic graph expansion, PDF RAG, and Groq LLM summaries. No server required.
Install
# Install CPU torch first (PyPI's default is a 4 GB CUDA build)
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install mathesis-ai
GPU:
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install mathesis-ai
Requires Python 3.11+. Needs ~4 GB RAM for the embedding model.
Usage
from mathesis import Mathesis
m = Mathesis(groq_api_key="gsk_...") # free key at console.groq.com
m.seed() # fetch & embed 250 papers, ~10 min, run once
results = m.query("vanishing gradients in deep neural network optimization")
for bridge in results.bridges:
print(bridge.domain_jump) # "q-bio.NC -> cs.LG"
print(bridge.confidence) # 0.796
print(bridge.llm_summary) # one-paragraph analogy explanation
No .env file, no server to start. Data is stored in ~/.mathesis by default.
Configuration
m = Mathesis(
groq_api_key="gsk_...", # required for LLM summaries
data_dir="~/my-mathesis-data", # default: ~/.mathesis
domains=["cs.LG", "q-bio.NC"], # default: 5 arXiv categories
llm_enabled=True, # default: True
device="cpu", # "cpu" or "cuda"
)
Query response
results.bridges_found # bool
results.bridges # list of BridgeResult
results.top_nodes # top 5 papers by graph rank
results.graph_stats # {"nodes": 310, "edges": 1548, "domains": [...]}
bridge.domain_jump # "q-bio.NC -> cs.LG"
bridge.confidence # 0.0 - 1.0
bridge.similarity # raw cosine similarity
bridge.rag_confirmed # bool - methods text confirmed the bridge
bridge.rag_excerpt # extracted methods section passage
bridge.llm_summary # Groq-generated analogy explanation
bridge.source_paper.title
bridge.source_paper.id # arXiv ID
bridge.target_paper.title
Async support
results = await m.aquery("attention mechanisms in transformers")
Status
print(m)
# Mathesis(papers_indexed=581, domains=5, llm=enabled)
m.status()
# {"papers_indexed": 581, "domains": [...], "data_dir": "~/.mathesis"}
Getting more data
m.seed(papers_per_domain=50) # quick start -- 250 papers total
m.ingest() # full corpus -- 500 papers/domain, ~25 min
Source & docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mathesis_ai-0.1.4.tar.gz.
File metadata
- Download URL: mathesis_ai-0.1.4.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94ebe02593b5c75de768e1d0a5f823163a9e843f93586b75051936367050ad85
|
|
| MD5 |
942a7c5ef66630cce7cfb4835295875d
|
|
| BLAKE2b-256 |
5961b51a7ac3870d77e8998c2b32feee519c0cf4192245c0157a5440b7217dba
|
File details
Details for the file mathesis_ai-0.1.4-py3-none-any.whl.
File metadata
- Download URL: mathesis_ai-0.1.4-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae255dbf59cfb38c39b6ab9d80dba3b6be5ac7732c878aa1d2cbbbbada54f5a9
|
|
| MD5 |
2ae09eeb4ae049f97c0cb19f8460f79a
|
|
| BLAKE2b-256 |
9679f37b3f75113f3c28c6b5c16eaeb1e982b071167192c33930ca2401bdefab
|