Skip to main content

Module to build citation networks with OpenAlex

Project description

OpenAlex Citation Network

This library allows you to easily extract and analyze various citation networks from the OpenAlex repository. The library enables the creation of citation graphs for research studies focused on publication networks. OpenAlex contains more than 250 million publications, and this tool helps build structured citation graphs where a directed edge from one publication to another represents a citation.

A key feature of this library is direct cascade extraction using breadth-first search (BFS). For example, if paper A cites papers B and C, paper B cites papers C and D, and paper C cites paper E, this library extracts papers (A, B, C, D, and E) in a level-by-level order. This approach works by selecting one or more publications as root nodes and expanding citations level by level.

Features

  • General Research Network Extraction

    • Search: Query publications by title, abstract, or full text.
    • Filter: Use OpenAlex filters (OpenAlex Filters) to refine attributes in the work object (JSON).
    • Sort: Order search results using various sorting options (OpenAlex Sort).
  • Cascade Citation Network Extraction

    • Select one or more publications as root nodes.
    • Expand citations using BFS to construct a cascade-like citation tree.
  • Data Profiling and Error Handling

    • Use the built-in profiler to analyze dataset accuracy and view any errors during extraction.
  • Graph Construction with igraph

    • Build a citation network as an igraph Graph object.
    • Metadata includes: title, id, doi, publication_year, abstract_inverted_index, and other OpenAlex attributes.
  • Save and Load Graphs

    • Export citation networks as CSV to avoid repeated downloads.
    • Reload saved CSV files to recreate igraph objects.

Installation

This project uses Poetry for dependency management. To install, run:

pip install poetry
poetry install

Usage

1. Creating a Citation Network

from citation_network.network import createCitationGraph
from citation_network.crawler import EntitiesCrawler

# Initialize crawler
crawler = EntitiesCrawler(email="your-email@example.com")

# Retrieve entities (Example: Filter by topic and year)
entities = crawler.getEntities(filter={"primary_topic": "machine learning", "publication_year": "2020"}, maxEntities=5000)

# Build the citation graph
graph = createCitationGraph(entities)

# Save the graph
graph.write_csv("citation_network.csv")

2. Loading a Citation Network from CSV

from citation_network.utils import create_citation_graph_from_csv

graph = create_citation_graph_from_csv("citation_network.csv")
print(graph.summary())

3. Extracting a Citation Cascade using BFS

# Perform BFS on citations up to a specified depth
bfs_results = crawler.citationBFS(root=["W1234567890"], maxLevels=5, maxNodes=10000)

Jupyter Notebook

For detailed examples and interactive exploration, refer to notebook.ipynb.

Logging

The library includes built-in logging, which can be configured at different levels:

import logging
from citation_network.logging_setup import setup_logging

setup_logging(logging.DEBUG)  # Set logging level to DEBUG

This ensures detailed insights into extraction processes and debugging capabilities.


This library provides an efficient way to extract and analyze publication networks, making it valuable for researchers in citation analysis, network science, and information diffusion modeling.

References

Special thanks to the team @OpenAlex for their great open-source repository and providing us with a freemium subscription. Note, OpenAlex is completly free and for 99% of use cases of this library you will be good without any subscription.

Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. https://arxiv.org/abs/2205.01833

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openalexnetwork-0.1.6.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openalexnetwork-0.1.6-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file openalexnetwork-0.1.6.tar.gz.

File metadata

  • Download URL: openalexnetwork-0.1.6.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.0 Darwin/24.1.0

File hashes

Hashes for openalexnetwork-0.1.6.tar.gz
Algorithm Hash digest
SHA256 20b2b900baa385cdf7c2ec5f2a5a9da6a5ade66ac7f9d773e8c0ce0bee859aa0
MD5 aa54f2d6f0e7cbf426406d0def1e068b
BLAKE2b-256 58cb7b3076d7b30563bf292e186e5ce869300e89696de4d54b5aad18bc44839d

See more details on using hashes here.

File details

Details for the file openalexnetwork-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: openalexnetwork-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.0 Darwin/24.1.0

File hashes

Hashes for openalexnetwork-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d0f37974b14e4be1a6ff15f01bfdcdd522c0e8357b8613a6f696579cf376ac33
MD5 48490023ffc4d5bb4c62e9df77080dc0
BLAKE2b-256 603d6ab8bf08c1f4202b3b3004ff5392ac630b639ebff306031854bfd56d4624

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page