A Python library for interacting with HubZero Citation Manager API
Project description
NanoHub Citation Manager
A Python library for interacting with the HubZero Citation Manager API, built on top of nanohubremote.
Features
- Full CRUD Operations: Create, read, update, and delete citations
- PDF Management: Upload, download, and manage PDF files associated with citations
- Complete Metadata Support: All DocumentExp fields including BibTeX, NanoHub-specific fields, and citation metrics
- LLM Integration: Example showing automated metadata extraction using local LLMs (Ollama)
- Type-safe: Full type hints for better IDE support
- Built on nanohubremote: Leverages the robust NanoHub API client
Installation
pip install nanohub-citmanager
For LLM features:
pip install nanohub-citmanager[llm]
For development:
pip install nanohub-citmanager[dev]
Quick Start
Basic Usage
from nanohubremote import Session
from nanohubcitmanager import CitationManagerClient, Citation
# Create session
credentials = {
"grant_type": "personal_token",
"token": "your-api-token"
}
session = Session(credentials)
# Create client
client = CitationManagerClient(session)
# Get a citation
citation = client.get(123)
print(f"Title: {citation.title}")
print(f"Authors: {len(citation.authors)}")
print(f"Keywords: {', '.join(citation.keywords)}")
# Update citation
citation.abstract = "Updated abstract text..."
citation.add_keyword("machine learning")
client.update(citation)
Creating a New Citation
# Create new citation
citation = Citation()
citation.title = "My Research Paper"
citation.abstract = "This paper presents..."
citation.year = 2024
citation.doi = "10.1234/example"
# Add authors
citation.add_author("John", "Doe", email="john@example.com")
citation.add_author("Jane", "Smith", orcid="0000-0001-2345-6789")
# Add keywords
citation.add_keyword("deep learning")
citation.add_keyword("neural networks")
# Create in Citation Manager
citation_id = client.create(citation)
print(f"Created citation ID: {citation_id}")
PDF Management
# Upload PDF
client.upload_pdf(citation_id, "paper.pdf")
# Download PDF
client.download_pdf(citation_id, "downloaded_paper.pdf")
# Get PDF info
info = client.get_pdf_info(citation_id)
print(f"PDF: {info['filename']}, Size: {info['size']} bytes")
# Delete PDF
client.pdf_manager.delete(citation_id)
Searching Citations
# Search by text
results = client.search("machine learning", limit=20)
for citation in results:
print(f"{citation.year}: {citation.title}")
# List with filtering
documents = client.list(
search="deep learning",
status=100, # Published
limit=50,
offset=0
)
LLM-Powered Metadata Extraction
The library includes an example showing how to use a local LLM (via Ollama) to automatically extract and complete citation metadata from PDF files.
Prerequisites
# Install Ollama
curl https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama3
# Install additional dependencies
pip install pypdf2
Running the Example
# Set your API token
export NANOHUB_TOKEN="your-api-token"
# Run the extraction
python examples/llm_metadata_extraction.py 123 llama3
The script will:
- Load the citation from the Citation Manager
- Download the associated PDF
- Extract text from the PDF
- Use the LLM to extract metadata (title, authors, abstract, keywords, etc.)
- Show you the extracted data and ask for confirmation before updating
Example Output
Citation Manager LLM Metadata Extraction
=========================================
Citation ID: 123
LLM Model: llama3
Fetching citation 123...
Current title: Incomplete Title
Current authors: 0 author(s)
Current keywords: 0 keyword(s)
Downloading PDF...
PDF saved to: citation_123.pdf
Extracting text from PDF...
Extracted 12543 characters from PDF
Calling LLM (model: llama3)...
Extracted metadata:
{
"title": "Deep Learning for Scientific Computing: A Survey",
"abstract": "This paper presents a comprehensive survey of deep learning...",
"authors": [
{"firstname": "John", "lastname": "Doe"},
{"firstname": "Jane", "lastname": "Smith"}
],
"year": 2024,
"doi": "10.1234/dl.2024.001",
"journal": "Journal of Machine Learning Research",
"keywords": ["deep learning", "scientific computing", "neural networks"]
}
Save changes to Citation Manager? (yes/no): yes
Citation updated successfully!
API Reference
CitationManagerClient
Main client for interacting with the Citation Manager API.
Methods:
create(citation: Citation) -> int: Create a new citationget(citation_id: int) -> Citation: Retrieve a citationupdate(citation: Citation) -> bool: Update a citationdelete(citation_id: int) -> bool: Delete a citationlist(search, status, limit, offset) -> List[Dict]: List citationssearch(query: str, limit: int) -> List[Citation]: Search citationsdownload_pdf(citation_id, output_path) -> bool: Download PDFupload_pdf(citation_id, pdf_path) -> bool: Upload PDFget_pdf_info(citation_id) -> Dict: Get PDF metadata
Citation
Represents a citation/document with all metadata fields.
Core Fields:
id,title,abstractyear,doi,isbn,urlpublisher,publication_namevolume,issue,begin_page,end_page
BibTeX Fields:
address,booktitle,chapter,editioneditor,institution,schoolnote,organization,series
NanoHub Fields:
status,affiliated,fundedbysoftware_use,res_edudate_submit,date_accept,date_publish
Citation Metrics:
cnt_citations,url_citationsdate_citations
Related Data:
authors: List of author dictionarieskeywords: List of keywords
Methods:
add_author(firstname, lastname, **kwargs): Add an authoradd_keyword(keyword): Add a keywordto_dict(): Convert to dictionary for APIfrom_dict(data): Load from API response
PDFManager
Handles PDF file operations.
Methods:
download(citation_id, output_path) -> bool: Download PDFupload(citation_id, pdf_path, filename) -> bool: Upload PDFget_info(citation_id) -> Dict: Get PDF metadatadelete(citation_id) -> bool: Delete PDF
Complete Example
See examples/llm_metadata_extraction.py for a complete example showing:
- Citation retrieval
- PDF download
- Text extraction from PDF
- LLM-based metadata extraction
- Citation update
Supported Fields
The library supports all fields from the HubZero DocumentExp model:
Core Document Fields
- ID, title, abstract
- Publication ID/name
- Document genre ID/name
- Publication date
- Full text path (PDF)
- Timestamp
BibTeX Fields
All standard BibTeX fields including address, booktitle, chapter, edition, editor, eprint, howpublished, institution, key, month, note, organization, publisher, series, school, type
Paper/Journal Fields
- Volume, issue
- Begin page, end page
NanoHub-Specific Fields
- URL, year, ISBN, cite
- Affiliated, funded by
- Created, DOI, reference type
- Status (workflow)
- Dates: submit, accept, publish
- Software use, research/education flags
- Experimental data fields
- Notes
Citation Metrics
- Citation URL, search URL
- Citation count
- Last citation check date
Related Data
- Authors (with full details: name, email, ORCID, etc.)
- Keywords
Workflow Status Codes
0: UNDEFINED1: RELATED2: PREVIEW3: REVIEW4: VOTING5: PROCESS6: POSTPROCESS100: PUBLISHED-1to-6: REJECTED-9: JUNK (deleted)
Development
# Clone repository
git clone https://github.com/denphi/nanohub-citmanager.git
cd nanohub-citmanager
# Install in development mode
pip install -e .[dev]
# Run tests
pytest
# Format code
black nanohubcitmanager/
Requirements
- Python >= 3.8
- nanohubremote >= 0.2.0
- requests >= 2.25.0
Optional:
- ollama >= 0.1.0 (for LLM features)
- pypdf2 (for PDF text extraction)
License
MIT License - see LICENSE file for details.
Authors
- Daniel Mejia (denphi@denphi.com)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Links
Citation
If you use this library in your research, please cite:
@software{nanohub_citmanager,
author = {Mejia, Daniel},
title = {NanoHub Citation Manager Python Library},
year = {2025},
url = {https://github.com/denphi/nanohub-citmanager}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nanohub_citmanager-0.1.0.tar.gz.
File metadata
- Download URL: nanohub_citmanager-0.1.0.tar.gz
- Upload date:
- Size: 43.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c35a60fdc25f49ef2dc68c2432db36d78889181fe675b6f49cb3cbfaa059a5de
|
|
| MD5 |
501534c7dd28847e6b6139bee9eabd09
|
|
| BLAKE2b-256 |
8620fee411f06e9600aa5f87e0da33bab3eaee0be5ae68e8f56dadc7cf121ef4
|
File details
Details for the file nanohub_citmanager-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nanohub_citmanager-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c2fc84ce08a6fdbf1f44f4860fe3836cf408b3bd38574ab46da81537684007b
|
|
| MD5 |
35e8d188c5224142b61097e93689be44
|
|
| BLAKE2b-256 |
83f605ee04f6140aac480602a81686509d77435076a7609ad313802b3f613b15
|