Local CrossRef database with 167M+ works and full-text search
Project description
CrossRef Local
Local CrossRef database with 167M+ scholarly works, full-text search, and impact factor calculation.
Why CrossRef Local?
Built for the LLM era - features that matter for AI research assistants:
| Feature | Benefit |
|---|---|
| 📝 Abstracts | Full text for semantic understanding |
| 📊 Impact Factor | Filter by journal quality |
| 🔗 Citations | Prioritize influential papers |
| ⚡ Speed | 167M records in ms, no rate limits |
Perfect for: RAG systems, research assistants, literature review automation.
Installation
pip install crossref-local
From source:
git clone https://github.com/ywatanabe1989/crossref-local
cd crossref-local && make install
Database setup (1.5 TB, ~2 weeks to build):
# 1. Download CrossRef data (~100GB compressed)
aria2c "https://academictorrents.com/details/..."
# 2. Build SQLite database (~days)
pip install dois2sqlite
dois2sqlite build /path/to/crossref-data ./data/crossref.db
# 3. Build FTS5 index (~60 hours) & citations table (~days)
make fts-build-screen
make citations-build-screen
Python API
from crossref_local import search, get, count
# Full-text search (22ms for 541 matches across 167M records)
results = search("hippocampal sharp wave ripples")
for work in results:
print(f"{work.title} ({work.year})")
# Get by DOI
work = get("10.1126/science.aax0758")
print(work.citation())
# Count matches
n = count("machine learning") # 477,922 matches
Async API:
from crossref_local import aio
async def main():
counts = await aio.count_many(["CRISPR", "neural network", "climate"])
results = await aio.search("machine learning")
CLI
crossref-local search "CRISPR genome editing" -n 5
crossref-local get 10.1038/nature12373
crossref-local impact-factor Nature -y 2023 # IF: 54.067
With abstracts (-a flag):
$ crossref-local search "CRISPR" -n 1 -a
Found 87,473 matches in 18.2ms
1. RS-1 enhances CRISPR/Cas9- and TALEN-mediated knock-in efficiency (2016)
DOI: 10.1038/ncomms10548
Journal: Nature Communications
Abstract: Zinc-finger nuclease, transcription activator-like effector nuclease
and CRISPR/Cas9 are becoming major tools for genome editing. Importantly,
knock-in in several non-rodent species has been finally achieved...
Impact Factor
from crossref_local.impact_factor import ImpactFactorCalculator
with ImpactFactorCalculator() as calc:
result = calc.calculate_impact_factor("Nature", target_year=2023)
print(f"IF: {result['impact_factor']:.3f}") # 54.067
| Journal | IF 2023 |
|---|---|
| Nature | 54.07 |
| Science | 46.17 |
| Cell | 54.01 |
| PLOS ONE | 3.37 |
Citation Network
from crossref_local import get_citing, get_cited, CitationNetwork
citing = get_citing("10.1038/nature12373") # 1539 papers
cited = get_cited("10.1038/nature12373")
# Build visualization (like Connected Papers)
network = CitationNetwork("10.1038/nature12373", depth=2)
network.save_html("citation_network.html") # requires: pip install crossref-local[viz]
Performance
| Query | Matches | Time |
|---|---|---|
hippocampal sharp wave ripples |
541 | 22ms |
machine learning |
477,922 | 113ms |
CRISPR genome editing |
12,170 | 257ms |
Searching 167M records in milliseconds via FTS5.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crossref_local-0.3.0.tar.gz.
File metadata
- Download URL: crossref_local-0.3.0.tar.gz
- Upload date:
- Size: 121.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3db263722c9be4577bae828aa0254a1e3ad5b7ffcd6278fd0b49588487d3358
|
|
| MD5 |
700f9cb9dda29ab270123fb0bd4a76ab
|
|
| BLAKE2b-256 |
dd5ce6d4bfd5f22d4ac85cb54fc5ca55d400b33cba03e3cec2777cd48ee56054
|
File details
Details for the file crossref_local-0.3.0-py3-none-any.whl.
File metadata
- Download URL: crossref_local-0.3.0-py3-none-any.whl
- Upload date:
- Size: 25.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14a31e88775df2cad17b7fe8f91b532c979d610e75684eb9cef0387a09cf62f8
|
|
| MD5 |
11ec1b2dc020f4e85d8d95a0be3df348
|
|
| BLAKE2b-256 |
c84695bf7d77ec721550dbf8251120b732981db06864c87a080c7f3ced20b307
|