A comprehensive Python library for retrieving and analyzing researcher data from ORCID and converting Lattes XML files to BibTeX format
Project description
Papers, Please
A comprehensive Python library for retrieving and analyzing researcher data from ORCID and converting Lattes XML files to BibTeX format.
Features
- ORCID Integration: Retrieve detailed researcher information including publications, education, employment history, and more
- Research Group Analysis: Analyze groups of researchers and their combined publication metrics
- Lattes XML Conversion: Convert Brazilian Lattes Platform XML files to BibTeX format to update your work in ORCID
- API Clients: Built-in support for OpenAlex and Scopus APIs for publication metrics
- Data Processing: Clean and structured data output with pandas DataFrames
Installation
pip install papers-please
Quick Start
Individual Researcher Analysis
from papers_please import Researcher
# Initialize with ORCID ID
researcher = Researcher("0000-0003-1574-0784")
# Basic information
print(f"Name: {researcher.name}")
print(f"Biography: {researcher.biography}")
# Keywords and research areas
print(f"Keywords: {researcher.keywords}")
# Publications as pandas DataFrame
papers = researcher.papers
print(f"Total publications: {len(papers)}")
# Education and employment history
print("Education:", researcher.education)
print("Employment:", researcher.employments)
Research Group Analysis
from papers_please import Researcher, ResearchGroup
# Create multiple researchers
researchers = [
Researcher("0000-0003-1574-0784"),
Researcher("0000-0002-8715-2896")
]
# Analyze as a group
group = ResearchGroup(researchers)
group_papers = group.papers
print(f"Total unique publications: {len(group_papers)}")
print(f"Publication types: {group_papers['type'].value_counts()}")
Lattes XML Conversion
from papers_please import XMLParser
# Convert Lattes XML to BibTeX
parser = XMLParser(xml_path="lattes_data.xml")
parser.generate_bibtex(output_path="publications.bib")
Publication Metrics
from papers_please import Metrics
# Initialize with API keys (optional)
metrics = Metrics(
scopus_api_key="your_scopus_key", # Optional (only if you wish to use metrics from scopus)
openalex_email="your_email@domain.com" # Optional for polite pool
)
# Calculate metrics for a researcher
researcher = Researcher("0000-0003-1574-0784")
researcher_metrics = metrics.get_metrics_for_entity(researcher)
print(f"Total citations: {researcher_metrics['total_citations']}")
print(f"H-index: {researcher_metrics['h_index']}")
print(f"Publications per year: {researcher_metrics['publications_per_year']}")
API Reference
Core Classes
Researcher
Represents an individual researcher with ORCID data.
Properties:
name: Full namefirst_name: First namelast_name: Last namebiography: Researcher biographykeywords: List of research keywordsemails: List of email addressespapers: Publications as pandas DataFrameeducation: Education historyemployments: Employment historyexternal_links: External profile links
ResearchGroup
Represents a group of researchers for collective analysis.
Properties:
researchers: List of Researcher objectspapers: Combined unique publications from all researchers
XMLParser
Converts Lattes platform XML files to BibTeX format.
Methods:
generate_bibtex(output_path): Generate BibTeX file from XML data
Metrics
Calculate publication metrics using external APIs.
Methods:
get_metrics_for_entity(entity): Calculate metrics for researcher or groupget_metrics_for_works(entity): Get detailed per-publication metrics
API Clients
OrcidAPIClient
Client for ORCID API interactions.
OpenAlexAPIClient
Client for OpenAlex API with features like:
- Author metrics by ORCID
- Publication data by DOI
- Citation counts and open access information
ScopusAPIClient
Client for Scopus API with enhanced bibliometric data.
Configuration
API Keys (Optional)
For enhanced functionality, you can configure API keys:
# Scopus API (for advanced metrics)
metrics = Metrics(scopus_api_key="your_scopus_api_key")
# OpenAlex (email for polite pool - faster response)
metrics = Metrics(openalex_email="your_email@example.com")
Rate Limiting
The library implements automatic rate limiting for API calls to respect service limits.
Data Structure
Publications DataFrame
The papers property returns a pandas DataFrame with columns:
title: Publication titledoi: Digital Object Identifierjournal: Journal namepublication_date: Publication datetype: Publication typeauthors: List of authorsurl: Publication URL
Error Handling
The library includes comprehensive error handling for:
- Invalid ORCID IDs
- API rate limits
- Network connectivity issues
- Malformed data
Examples
Export to Different Formats
researcher = Researcher("0000-0003-1574-0784")
papers = researcher.papers
# Export to CSV
papers.to_csv("publications.csv", index=False)
# Export to Excel
papers.to_excel("publications.xlsx", index=False)
# Filter by publication type
journal_articles = papers[papers['type'] == 'journal-article']
Research Collaboration Analysis
group = ResearchGroup([researcher1, researcher2, researcher3])
papers = group.papers
# Find collaborative publications
collaboration_matrix = papers.groupby(['authors']).size()
# Analyze publication trends
yearly_trends = papers.groupby(papers['publication_date'].dt.year).size()
Requirements
- Python 3.10+
- pandas
- requests
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Authors
- Henrique Marques
- Gabriel Barbosa
- Renato Spessoto
- Henrique Gomes
- Eduardo Neves
Support
For support and questions, please open an issue on the project repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file papers_please-1.1.1.tar.gz.
File metadata
- Download URL: papers_please-1.1.1.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50f2c47d004368cddf4ee9ee9b9c1b58a20ea86a48dac76e7daf9cf2c5e7146b
|
|
| MD5 |
6571c3224d19eb4a324f0fd1cdca32d9
|
|
| BLAKE2b-256 |
7f32a7f99b848a0f25b629dff0f64559e384be158b26937986efacdc9d375902
|
Provenance
The following attestation bundles were made for papers_please-1.1.1.tar.gz:
Publisher:
release.yml on EngSoft2025/orcid-project-papers-please
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
papers_please-1.1.1.tar.gz -
Subject digest:
50f2c47d004368cddf4ee9ee9b9c1b58a20ea86a48dac76e7daf9cf2c5e7146b - Sigstore transparency entry: 247044126
- Sigstore integration time:
-
Permalink:
EngSoft2025/orcid-project-papers-please@c3d02241bea0b56711d8044df86d0947c9f3e4bb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/EngSoft2025
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c3d02241bea0b56711d8044df86d0947c9f3e4bb -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file papers_please-1.1.1-py3-none-any.whl.
File metadata
- Download URL: papers_please-1.1.1-py3-none-any.whl
- Upload date:
- Size: 26.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be58a4e261b354389ea730d5af41fadf6e31b41a8bdb7fc572c658f40d8f6898
|
|
| MD5 |
2954aef1bc4012afe803e119686a3dff
|
|
| BLAKE2b-256 |
47f4d5591aeae0a37c6d12660e27cbb553c6c4cac370ba1467f79942e5687399
|
Provenance
The following attestation bundles were made for papers_please-1.1.1-py3-none-any.whl:
Publisher:
release.yml on EngSoft2025/orcid-project-papers-please
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
papers_please-1.1.1-py3-none-any.whl -
Subject digest:
be58a4e261b354389ea730d5af41fadf6e31b41a8bdb7fc572c658f40d8f6898 - Sigstore transparency entry: 247044145
- Sigstore integration time:
-
Permalink:
EngSoft2025/orcid-project-papers-please@c3d02241bea0b56711d8044df86d0947c9f3e4bb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/EngSoft2025
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c3d02241bea0b56711d8044df86d0947c9f3e4bb -
Trigger Event:
workflow_dispatch
-
Statement type: