Skip to main content

Module to build citation networks with OpenAlex

Project description

OpenAlex Citation Network

This library allows you to easily extract and analyze various citation networks from the OpenAlex repository. The library enables the creation of citation graphs for research studies focused on publication networks. OpenAlex contains more than 250 million publications, and this tool helps build structured citation graphs where a directed edge from one publication to another represents a citation.

A key feature of this library is direct cascade extraction using breadth-first search (BFS). For example, if paper A cites papers B and C, paper B cites papers C and D, and paper C cites paper E, this library extracts papers (A, B, C, D, and E) in a level-by-level order. This approach works by selecting one or more publications as root nodes and expanding citations level by level.

Features

  • General Research Network Extraction

    • Search: Query publications by title, abstract, or full text.
    • Filter: Use OpenAlex filters (OpenAlex Filters) to refine attributes in the work object (JSON).
    • Sort: Order search results using various sorting options (OpenAlex Sort).
  • Cascade Citation Network Extraction

    • Select one or more publications as root nodes.
    • Expand citations using BFS to construct a cascade-like citation tree.
  • Data Profiling and Error Handling

    • Use the built-in profiler to analyze dataset accuracy and view any errors during extraction.
  • Graph Construction with igraph

    • Build a citation network as an igraph Graph object.
    • Metadata includes: title, id, doi, publication_year, abstract_inverted_index, and other OpenAlex attributes.
  • Save and Load Graphs

    • Export citation networks as CSV to avoid repeated downloads.
    • Reload saved CSV files to recreate igraph objects.

Installation

This project uses Poetry for dependency management. To install, run:

pip install poetry
poetry install

Usage

1. Creating a Citation Network

from citation_network.network import createCitationGraph
from citation_network.crawler import EntitiesCrawler

# Initialize crawler
crawler = EntitiesCrawler(email="your-email@example.com")

# Retrieve entities (Example: Filter by topic and year)
entities = crawler.getEntities(filter={"primary_topic": "machine learning", "publication_year": "2020"}, maxEntities=5000)

# Build the citation graph
graph = createCitationGraph(entities)

# Save the graph
graph.write_csv("citation_network.csv")

2. Loading a Citation Network from CSV

from citation_network.utils import create_citation_graph_from_csv

graph = create_citation_graph_from_csv("citation_network.csv")
print(graph.summary())

3. Extracting a Citation Cascade using BFS

# Perform BFS on citations up to a specified depth
bfs_results = crawler.citationBFS(root=["W1234567890"], maxLevels=5, maxNodes=10000)

Jupyter Notebook

For detailed examples and interactive exploration, refer to notebook.ipynb.

Logging

The library includes built-in logging, which can be configured at different levels:

import logging
from citation_network.logging_setup import setup_logging

setup_logging(logging.DEBUG)  # Set logging level to DEBUG

This ensures detailed insights into extraction processes and debugging capabilities.


This library provides an efficient way to extract and analyze publication networks, making it valuable for researchers in citation analysis, network science, and information diffusion modeling.

References

Special thanks to the team @OpenAlex for their great open-source repository and providing us with a freemium subscription. Note, OpenAlex is completly free and for 99% of use cases of this library you will be good without any subscription.

Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. https://arxiv.org/abs/2205.01833

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openalexnetwork-0.1.5.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openalexnetwork-0.1.5-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file openalexnetwork-0.1.5.tar.gz.

File metadata

  • Download URL: openalexnetwork-0.1.5.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.0 Darwin/24.1.0

File hashes

Hashes for openalexnetwork-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c2c9e637c3d2bea123a97fa3254811d13cfd01810da7d9cc106076ca6df4d51f
MD5 89e530feb55052b88cb0e9ca8a37813c
BLAKE2b-256 b3d27cadfb992062d7d5a77b55ca3302b743e9126809e1f9fdaf0f0a156c3bd6

See more details on using hashes here.

File details

Details for the file openalexnetwork-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: openalexnetwork-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.0 Darwin/24.1.0

File hashes

Hashes for openalexnetwork-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 519e320241087de4056fbdce8d191d7d3e82cd899aad94498c65c83cef700f1e
MD5 caa0ddbf0204ffc2e596ba32fc6d5727
BLAKE2b-256 d26025a51e72d9bd136d13397d8827d5ecb383da2bc3f054ce5e54e3b6cd4a62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page