Skip to main content

Local OpenAlex database with 284M+ works, abstracts, and semantic search

Project description

OpenAlex Local (openalex-local)

SciTeX

Local OpenAlex database with 284M+ scholarly works, abstracts, and semantic search

SciTeX IF vs JCR Validation
SciTeX Impact Factor (OpenAlex) validated against JCR 2024 (r = 0.96, 17,042 journals)

PyPI version Documentation Tests Python License

Why OpenAlex Local?

Built for the LLM era - features that matter for AI research assistants:

Feature Benefit
284M Works More coverage than CrossRef
Abstracts ~45-60% availability for semantic search
Concepts & Topics Built-in classification
Author Disambiguation Linked to institutions
Open Access Info OA status and URLs

Perfect for: RAG systems, research assistants, literature review automation.

Installation
pip install openalex-local

From source:

git clone https://github.com/ywatanabe1989/openalex-local
cd openalex-local && make install

Database setup (~300 GB, ~1-2 days to build):

# Check system status
make status

# 1. Download OpenAlex Works snapshot (~300GB)
make download-screen  # runs in background

# 2. Build SQLite database
make build-db

# 3. Build FTS5 index
make build-fts
Python API
from openalex_local import search, get, count

# Full-text search (title + abstract)
results = search("machine learning neural networks")
for work in results:
    print(f"{work.title} ({work.year})")
    print(f"  Abstract: {work.abstract[:200]}...")
    print(f"  Concepts: {[c['name'] for c in work.concepts]}")

# Get by OpenAlex ID or DOI
work = get("W2741809807")
work = get("10.1038/nature12373")

# Count matches
n = count("CRISPR")
CLI
openalex-local search "CRISPR genome editing" -n 5
openalex-local search-by-doi W2741809807
openalex-local search-by-doi 10.1038/nature12373
openalex-local status  # Configuration and database stats

With abstracts (-a flag):

$ openalex-local search "neural network" -n 1 -a

Found 1,523,847 matches in 45.2ms

1. Deep learning for neural networks (2015)
   OpenAlex ID: W2741809807
   Abstract: This paper presents a comprehensive overview of deep learning
   techniques for neural network architectures...
HTTP API

Start the FastAPI server:

openalex-local relay --host 0.0.0.0 --port 31292

Endpoints:

# Search works (FTS5)
curl "http://localhost:31292/works?q=CRISPR&limit=10"

# Get by ID or DOI
curl "http://localhost:31292/works/W2741809807"
curl "http://localhost:31292/works/10.1038/nature12373"

# Batch lookup
curl -X POST "http://localhost:31292/works/batch" \
  -H "Content-Type: application/json" \
  -d '{"ids": ["W2741809807", "10.1038/nature12373"]}'

# Database info
curl "http://localhost:31292/info"

HTTP mode (connect to running server):

# On local machine (if server is remote)
ssh -L 31292:127.0.0.1:31292 your-server

# Python client
from openalex_local import configure_http
configure_http("http://localhost:31292")

# Or via CLI
openalex-local --http search "CRISPR"
MCP Server

Run as MCP (Model Context Protocol) server:

openalex-local mcp start

Local MCP client configuration:

{
  "mcpServers": {
    "openalex-local": {
      "command": "openalex-local",
      "args": ["mcp", "start"],
      "env": {
        "OPENALEX_LOCAL_DB": "/path/to/openalex.db"
      }
    }
  }
}

Remote MCP via HTTP:

# On server: start persistent MCP server
openalex-local mcp start -t http --host 0.0.0.0 --port 8083
{
  "mcpServers": {
    "openalex-remote": {
      "url": "http://your-server:8083/mcp"
    }
  }
}

Diagnose setup:

openalex-local mcp doctor        # Check dependencies and database
openalex-local mcp list-tools    # Show available MCP tools
openalex-local mcp installation  # Show client config examples

Available tools:

  • search - Full-text search across 284M+ papers
  • search_by_id - Get paper by OpenAlex ID or DOI
  • enrich_ids - Batch lookup with metadata
  • status - Database statistics
SciTeX Impact Factor (OpenAlex)

We provide precomputed SciTeX Impact Factors calculated from OpenAlex citation data. These follow the JCR formula but use OpenAlex as the data source.

Validation against JCR 2024 (17,042 matched journals):

Metric Value
Pearson r 0.96
Spearman ρ 0.93
p-value < 1e-100

Export SciTeX IF:

# Export all SciTeX IF values
openalex-local export-if -o scitex_if.csv
openalex-local export-if -o scitex_if.json

# Top 1000
openalex-local export-if -o top1000.csv --limit 1000

Use in search results:

openalex-local search "machine learning" --with-if

Formula:

SciTeX IF(Year) = Citations in Year to articles from (Year-1, Year-2)
                  ─────────────────────────────────────────────────────
                  Citable articles published in (Year-1, Year-2)

Note: "SciTeX IF" is our calculation using OpenAlex data. It is not the trademarked "Journal Impact Factor" from Clarivate/JCR.

Related Projects

crossref-local - Sister project with CrossRef data:

Feature crossref-local openalex-local
Works 167M 284M
Abstracts ~21% ~45-60%
Update frequency Real-time Monthly
DOI authority Yes (source) Uses CrossRef
Citations Raw references Linked works
Concepts/Topics No Yes
Author IDs No Yes
Best for DOI lookup, raw refs Semantic search

When to use CrossRef: Real-time DOI updates, raw reference parsing, authoritative metadata. When to use OpenAlex: Semantic search, citation analysis, topic discovery.

Documentation

Full documentation available at openalex-local.readthedocs.io

Data Source

Data from OpenAlex, an open catalog of scholarly works. Updated monthly from their snapshot.

Interfaces: Python ⭐⭐⭐ (primary) · CLI ⭐⭐ · MCP ⭐⭐ · Skills ⭐⭐ · Hook — · HTTP —

Problem and Solution

# Problem Solution
1 OpenAlex API is the largest open bibliographic database but large-scale use needs caching -- rate limits trip at hundreds of requests/second Local SQLite + FTS5 (284M works) -- offline queries including abstracts, author affiliations, citation counts

Part of SciTeX

OpenAlex Local is part of SciTeX. When used inside the SciTeX framework, literature search integrates with the scholar module:

import scitex

# Search local OpenAlex database via SciTeX
results = scitex.scholar.search("neural oscillations gamma band")

# Enrich BibTeX with OpenAlex metadata
scitex.scholar.enrich_bibtex("references.bib")

The SciTeX system follows the Four Freedoms for Research below, inspired by the Free Software Definition:

Four Freedoms for Research

  1. The freedom to run your research anywhere — your machine, your terms.
  2. The freedom to study how every step works — from raw data to final manuscript.
  3. The freedom to redistribute your workflows, not just your papers.
  4. The freedom to modify any module and share improvements with the community.

AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.


SciTeX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openalex_local-0.7.5.tar.gz (69.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openalex_local-0.7.5-py3-none-any.whl (75.0 kB view details)

Uploaded Python 3

File details

Details for the file openalex_local-0.7.5.tar.gz.

File metadata

  • Download URL: openalex_local-0.7.5.tar.gz
  • Upload date:
  • Size: 69.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openalex_local-0.7.5.tar.gz
Algorithm Hash digest
SHA256 f00f4dd3dd8cd9550fcbcc3b9ae3051d477775611036b8d85ec1e072a66f0221
MD5 c8eed9d51f74abfde29da87164bd4397
BLAKE2b-256 9a9e0d06db9f83139019323045a41c69b02b286b88af3936ae870211ea8697f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for openalex_local-0.7.5.tar.gz:

Publisher: publish-pypi.yml on ywatanabe1989/openalex-local

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openalex_local-0.7.5-py3-none-any.whl.

File metadata

  • Download URL: openalex_local-0.7.5-py3-none-any.whl
  • Upload date:
  • Size: 75.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openalex_local-0.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 09ecc295d840a36774da5a440de64b58bf225c16a28342342996208f454643e7
MD5 365914b6e641e5566e64a044c0561e0f
BLAKE2b-256 1ef8ed3021f4b9c8bde715e037b34ce2f782025271280b398e5cd82618b43809

See more details on using hashes here.

Provenance

The following attestation bundles were made for openalex_local-0.7.5-py3-none-any.whl:

Publisher: publish-pypi.yml on ywatanabe1989/openalex-local

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page