Module to build citation networks with OpenAlex
Project description
OpenAlex Citation Network
This library allows you to easily extract and analyze various citation networks from the OpenAlex repository. The library enables the creation of citation graphs for research studies focused on publication networks. OpenAlex contains more than 250 million publications, and this tool helps build structured citation graphs where a directed edge from one publication to another represents a citation.
A key feature of this library is direct cascade extraction using breadth-first search (BFS). For example, if paper A cites papers B and C, paper B cites papers C and D, and paper C cites paper E, this library extracts papers (A, B, C, D, and E) in a level-by-level order. This approach works by selecting one or more publications as root nodes and expanding citations level by level.
Features
-
General Research Network Extraction
- Search: Query publications by title, abstract, or full text.
- Filter: Use OpenAlex filters (OpenAlex Filters) to refine attributes in the work object (JSON).
- Sort: Order search results using various sorting options (OpenAlex Sort).
-
Cascade Citation Network Extraction
- Select one or more publications as root nodes.
- Expand citations using BFS to construct a cascade-like citation tree.
-
Data Profiling and Error Handling
- Use the built-in profiler to analyze dataset accuracy and view any errors during extraction.
-
Graph Construction with igraph
- Build a citation network as an igraph Graph object.
- Metadata includes:
title,id,doi,publication_year,abstract_inverted_index, and other OpenAlex attributes.
-
Save and Load Graphs
- Export citation networks as CSV to avoid repeated downloads.
- Reload saved CSV files to recreate igraph objects.
Installation
This project uses Poetry for dependency management. To install, run:
pip install poetry
poetry install
Usage
1. Creating a Citation Network
from citation_network.network import createCitationGraph
from citation_network.crawler import EntitiesCrawler
# Initialize crawler
crawler = EntitiesCrawler(email="your-email@example.com")
# Retrieve entities (Example: Filter by topic and year)
entities = crawler.getEntities(filter={"primary_topic": "machine learning", "publication_year": "2020"}, maxEntities=5000)
# Build the citation graph
graph = createCitationGraph(entities)
# Save the graph
graph.write_csv("citation_network.csv")
2. Loading a Citation Network from CSV
from citation_network.utils import create_citation_graph_from_csv
graph = create_citation_graph_from_csv("citation_network.csv")
print(graph.summary())
3. Extracting a Citation Cascade using BFS
# Perform BFS on citations up to a specified depth
bfs_results = crawler.citationBFS(root=["W1234567890"], maxLevels=5, maxNodes=10000)
Jupyter Notebook
For detailed examples and interactive exploration, refer to notebook.ipynb.
Logging
The library includes built-in logging, which can be configured at different levels:
import logging
from citation_network.logging_setup import setup_logging
setup_logging(logging.DEBUG) # Set logging level to DEBUG
This ensures detailed insights into extraction processes and debugging capabilities.
This library provides an efficient way to extract and analyze publication networks, making it valuable for researchers in citation analysis, network science, and information diffusion modeling.
References
Special thanks to the team @OpenAlex for their great open-source repository and providing us with a freemium subscription. Note, OpenAlex is completly free and for 99% of use cases of this library you will be good without any subscription.
Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. https://arxiv.org/abs/2205.01833
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openalexnetwork-0.1.6.tar.gz.
File metadata
- Download URL: openalexnetwork-0.1.6.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.13.0 Darwin/24.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20b2b900baa385cdf7c2ec5f2a5a9da6a5ade66ac7f9d773e8c0ce0bee859aa0
|
|
| MD5 |
aa54f2d6f0e7cbf426406d0def1e068b
|
|
| BLAKE2b-256 |
58cb7b3076d7b30563bf292e186e5ce869300e89696de4d54b5aad18bc44839d
|
File details
Details for the file openalexnetwork-0.1.6-py3-none-any.whl.
File metadata
- Download URL: openalexnetwork-0.1.6-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.13.0 Darwin/24.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0f37974b14e4be1a6ff15f01bfdcdd522c0e8357b8613a6f696579cf376ac33
|
|
| MD5 |
48490023ffc4d5bb4c62e9df77080dc0
|
|
| BLAKE2b-256 |
603d6ab8bf08c1f4202b3b3004ff5392ac630b639ebff306031854bfd56d4624
|