Skip to main content

A comprehensive Python library for retrieving and analyzing researcher data from ORCID and converting Lattes XML files to BibTeX format

Project description

Papers, Please

A comprehensive Python library for retrieving and analyzing researcher data from ORCID and converting Lattes XML files to BibTeX format.

Features

  • ORCID Integration: Retrieve detailed researcher information including publications, education, employment history, and more
  • Research Group Analysis: Analyze groups of researchers and their combined publication metrics
  • Lattes XML Conversion: Convert Brazilian Lattes Platform XML files to BibTeX format to update your work in ORCID
  • API Clients: Built-in support for OpenAlex and Scopus APIs for publication metrics
  • Data Processing: Clean and structured data output with pandas DataFrames

Installation

pip install papers-please

Quick Start

Individual Researcher Analysis

from papers_please import Researcher

# Initialize with ORCID ID
researcher = Researcher("0000-0003-1574-0784")

# Basic information
print(f"Name: {researcher.name}")
print(f"Biography: {researcher.biography}")

# Keywords and research areas
print(f"Keywords: {researcher.keywords}")

# Publications as pandas DataFrame
papers = researcher.papers
print(f"Total publications: {len(papers)}")

# Education and employment history
print("Education:", researcher.education)
print("Employment:", researcher.employments)

Research Group Analysis

from papers_please import Researcher, ResearchGroup

# Create multiple researchers
researchers = [
    Researcher("0000-0003-1574-0784"),
    Researcher("0000-0002-8715-2896")
]

# Analyze as a group
group = ResearchGroup(researchers)
group_papers = group.papers

print(f"Total unique publications: {len(group_papers)}")
print(f"Publication types: {group_papers['type'].value_counts()}")

Lattes XML Conversion

from papers_please import XMLParser

# Convert Lattes XML to BibTeX
parser = XMLParser(xml_path="lattes_data.xml")
parser.generate_bibtex(output_path="publications.bib")

Publication Metrics

from papers_please import Metrics

# Initialize with API keys (optional)
metrics = Metrics(
    scopus_api_key="your_scopus_key",  # Optional (only if you wish to use metrics from scopus)
    openalex_email="your_email@domain.com"  # Optional for polite pool
)

# Calculate metrics for a researcher
researcher = Researcher("0000-0003-1574-0784")
researcher_metrics = metrics.get_metrics_for_entity(researcher)

print(f"Total citations: {researcher_metrics['total_citations']}")
print(f"H-index: {researcher_metrics['h_index']}")
print(f"Publications per year: {researcher_metrics['publications_per_year']}")

API Reference

Core Classes

Researcher

Represents an individual researcher with ORCID data.

Properties:

  • name: Full name
  • first_name: First name
  • last_name: Last name
  • biography: Researcher biography
  • keywords: List of research keywords
  • emails: List of email addresses
  • papers: Publications as pandas DataFrame
  • education: Education history
  • employments: Employment history
  • external_links: External profile links

ResearchGroup

Represents a group of researchers for collective analysis.

Properties:

  • researchers: List of Researcher objects
  • papers: Combined unique publications from all researchers

XMLParser

Converts Lattes platform XML files to BibTeX format.

Methods:

  • generate_bibtex(output_path): Generate BibTeX file from XML data

Metrics

Calculate publication metrics using external APIs.

Methods:

  • get_metrics_for_entity(entity): Calculate metrics for researcher or group
  • get_metrics_for_works(entity): Get detailed per-publication metrics

API Clients

OrcidAPIClient

Client for ORCID API interactions.

OpenAlexAPIClient

Client for OpenAlex API with features like:

  • Author metrics by ORCID
  • Publication data by DOI
  • Citation counts and open access information

ScopusAPIClient

Client for Scopus API with enhanced bibliometric data.

Configuration

API Keys (Optional)

For enhanced functionality, you can configure API keys:

# Scopus API (for advanced metrics)
metrics = Metrics(scopus_api_key="your_scopus_api_key")

# OpenAlex (email for polite pool - faster response)
metrics = Metrics(openalex_email="your_email@example.com")

Rate Limiting

The library implements automatic rate limiting for API calls to respect service limits.

Data Structure

Publications DataFrame

The papers property returns a pandas DataFrame with columns:

  • title: Publication title
  • doi: Digital Object Identifier
  • journal: Journal name
  • publication_date: Publication date
  • type: Publication type
  • authors: List of authors
  • url: Publication URL

Error Handling

The library includes comprehensive error handling for:

  • Invalid ORCID IDs
  • API rate limits
  • Network connectivity issues
  • Malformed data

Examples

Export to Different Formats

researcher = Researcher("0000-0003-1574-0784")
papers = researcher.papers

# Export to CSV
papers.to_csv("publications.csv", index=False)

# Export to Excel
papers.to_excel("publications.xlsx", index=False)

# Filter by publication type
journal_articles = papers[papers['type'] == 'journal-article']

Research Collaboration Analysis

group = ResearchGroup([researcher1, researcher2, researcher3])
papers = group.papers

# Find collaborative publications
collaboration_matrix = papers.groupby(['authors']).size()

# Analyze publication trends
yearly_trends = papers.groupby(papers['publication_date'].dt.year).size()

Requirements

  • Python 3.10+
  • pandas
  • requests

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Authors

  • Henrique Marques
  • Gabriel Barbosa
  • Renato Spessoto
  • Henrique Gomes
  • Eduardo Neves

Support

For support and questions, please open an issue on the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papers_please-1.1.1.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

papers_please-1.1.1-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file papers_please-1.1.1.tar.gz.

File metadata

  • Download URL: papers_please-1.1.1.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for papers_please-1.1.1.tar.gz
Algorithm Hash digest
SHA256 50f2c47d004368cddf4ee9ee9b9c1b58a20ea86a48dac76e7daf9cf2c5e7146b
MD5 6571c3224d19eb4a324f0fd1cdca32d9
BLAKE2b-256 7f32a7f99b848a0f25b629dff0f64559e384be158b26937986efacdc9d375902

See more details on using hashes here.

Provenance

The following attestation bundles were made for papers_please-1.1.1.tar.gz:

Publisher: release.yml on EngSoft2025/orcid-project-papers-please

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file papers_please-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: papers_please-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for papers_please-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 be58a4e261b354389ea730d5af41fadf6e31b41a8bdb7fc572c658f40d8f6898
MD5 2954aef1bc4012afe803e119686a3dff
BLAKE2b-256 47f4d5591aeae0a37c6d12660e27cbb553c6c4cac370ba1467f79942e5687399

See more details on using hashes here.

Provenance

The following attestation bundles were made for papers_please-1.1.1-py3-none-any.whl:

Publisher: release.yml on EngSoft2025/orcid-project-papers-please

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page