A Python library for interacting with HubZero Citation Manager API

These details have not been verified by PyPI

Project links

Project description

NanoHub Citation Manager

A Python library for interacting with the HubZero Citation Manager API, built on top of nanohubremote.

Features

Full CRUD Operations: Create, read, update, and delete citations
PDF Management: Upload, download, and manage PDF files associated with citations
Complete Metadata Support: All DocumentExp fields including BibTeX, NanoHub-specific fields, and citation metrics
LLM Integration: Example showing automated metadata extraction using local LLMs (Ollama)
Type-safe: Full type hints for better IDE support
Built on nanohubremote: Leverages the robust NanoHub API client

Installation

pip install nanohub-citmanager

For LLM features:

pip install nanohub-citmanager[llm]

For development:

pip install nanohub-citmanager[dev]

Quick Start

Basic Usage

from nanohubremote import Session
from nanohubcitmanager import CitationManagerClient, Citation

# Create session
credentials = {
    "grant_type": "personal_token",
    "token": "your-api-token"
}
session = Session(credentials)

# Create client
client = CitationManagerClient(session)

# Get a citation
citation = client.get(123)
print(f"Title: {citation.title}")
print(f"Authors: {len(citation.authors)}")
print(f"Keywords: {', '.join(citation.keywords)}")

# Update citation
citation.abstract = "Updated abstract text..."
citation.add_keyword("machine learning")
client.update(citation)

Creating a New Citation

# Create new citation
citation = Citation()
citation.title = "My Research Paper"
citation.abstract = "This paper presents..."
citation.year = 2024
citation.doi = "10.1234/example"

# Add authors
citation.add_author("John", "Doe", email="john@example.com")
citation.add_author("Jane", "Smith", orcid="0000-0001-2345-6789")

# Add keywords
citation.add_keyword("deep learning")
citation.add_keyword("neural networks")

# Create in Citation Manager
citation_id = client.create(citation)
print(f"Created citation ID: {citation_id}")

PDF Management

# Upload PDF
client.upload_pdf(citation_id, "paper.pdf")

# Download PDF
client.download_pdf(citation_id, "downloaded_paper.pdf")

# Get PDF info
info = client.get_pdf_info(citation_id)
print(f"PDF: {info['filename']}, Size: {info['size']} bytes")

# Delete PDF
client.pdf_manager.delete(citation_id)

Searching Citations

# Search by text
results = client.search("machine learning", limit=20)
for citation in results:
    print(f"{citation.year}: {citation.title}")

# List with filtering
documents = client.list(
    search="deep learning",
    status=100,  # Published
    limit=50,
    offset=0
)

LLM-Powered Metadata Extraction

The library includes an example showing how to use a local LLM (via Ollama) to automatically extract and complete citation metadata from PDF files.

Prerequisites

# Install Ollama
curl https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3

# Install additional dependencies
pip install pypdf2

Running the Example

# Set your API token
export NANOHUB_TOKEN="your-api-token"

# Run the extraction
python examples/llm_metadata_extraction.py 123 llama3

The script will:

Load the citation from the Citation Manager
Download the associated PDF
Extract text from the PDF
Use the LLM to extract metadata (title, authors, abstract, keywords, etc.)
Show you the extracted data and ask for confirmation before updating

Example Output

Citation Manager LLM Metadata Extraction
=========================================
Citation ID: 123
LLM Model: llama3

Fetching citation 123...
Current title: Incomplete Title
Current authors: 0 author(s)
Current keywords: 0 keyword(s)

Downloading PDF...
PDF saved to: citation_123.pdf

Extracting text from PDF...
Extracted 12543 characters from PDF

Calling LLM (model: llama3)...

Extracted metadata:
{
  "title": "Deep Learning for Scientific Computing: A Survey",
  "abstract": "This paper presents a comprehensive survey of deep learning...",
  "authors": [
    {"firstname": "John", "lastname": "Doe"},
    {"firstname": "Jane", "lastname": "Smith"}
  ],
  "year": 2024,
  "doi": "10.1234/dl.2024.001",
  "journal": "Journal of Machine Learning Research",
  "keywords": ["deep learning", "scientific computing", "neural networks"]
}

Save changes to Citation Manager? (yes/no): yes
Citation updated successfully!

API Reference

CitationManagerClient

Main client for interacting with the Citation Manager API.

Methods:

create(citation: Citation) -> int: Create a new citation
get(citation_id: int) -> Citation: Retrieve a citation
update(citation: Citation) -> bool: Update a citation
delete(citation_id: int) -> bool: Delete a citation
list(search, status, limit, offset) -> List[Dict]: List citations
search(query: str, limit: int) -> List[Citation]: Search citations
download_pdf(citation_id, output_path) -> bool: Download PDF
upload_pdf(citation_id, pdf_path) -> bool: Upload PDF
get_pdf_info(citation_id) -> Dict: Get PDF metadata

Citation

Represents a citation/document with all metadata fields.

Core Fields:

id, title, abstract
year, doi, isbn, url
publisher, publication_name
volume, issue, begin_page, end_page

BibTeX Fields:

address, booktitle, chapter, edition
editor, institution, school
note, organization, series

NanoHub Fields:

status, affiliated, fundedby
software_use, res_edu
date_submit, date_accept, date_publish

Citation Metrics:

cnt_citations, url_citations
date_citations

Related Data:

authors: List of author dictionaries
keywords: List of keywords

Methods:

add_author(firstname, lastname, **kwargs): Add an author
add_keyword(keyword): Add a keyword
to_dict(): Convert to dictionary for API
from_dict(data): Load from API response

PDFManager

Handles PDF file operations.

Methods:

download(citation_id, output_path) -> bool: Download PDF
upload(citation_id, pdf_path, filename) -> bool: Upload PDF
get_info(citation_id) -> Dict: Get PDF metadata
delete(citation_id) -> bool: Delete PDF

Complete Example

See examples/llm_metadata_extraction.py for a complete example showing:

Citation retrieval
PDF download
Text extraction from PDF
LLM-based metadata extraction
Citation update

Supported Fields

The library supports all fields from the HubZero DocumentExp model:

Core Document Fields

ID, title, abstract
Publication ID/name
Document genre ID/name
Publication date
Full text path (PDF)
Timestamp

BibTeX Fields

All standard BibTeX fields including address, booktitle, chapter, edition, editor, eprint, howpublished, institution, key, month, note, organization, publisher, series, school, type

Paper/Journal Fields

Volume, issue
Begin page, end page

NanoHub-Specific Fields

URL, year, ISBN, cite
Affiliated, funded by
Created, DOI, reference type
Status (workflow)
Dates: submit, accept, publish
Software use, research/education flags
Experimental data fields
Notes

Citation Metrics

Citation URL, search URL
Citation count
Last citation check date

Related Data

Authors (with full details: name, email, ORCID, etc.)
Keywords

Workflow Status Codes

0: UNDEFINED
1: RELATED
2: PREVIEW
3: REVIEW
4: VOTING
5: PROCESS
6: POSTPROCESS
100: PUBLISHED
-1 to -6: REJECTED
-9: JUNK (deleted)

Development

# Clone repository
git clone https://github.com/denphi/nanohub-citmanager.git
cd nanohub-citmanager

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

# Format code
black nanohubcitmanager/

Requirements

Python >= 3.8
nanohubremote >= 0.2.0
requests >= 2.25.0

Optional:

ollama >= 0.1.0 (for LLM features)
pypdf2 (for PDF text extraction)

License

MIT License - see LICENSE file for details.

Authors

Daniel Mejia (denphi@denphi.com)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Citation

If you use this library in your research, please cite:

@software{nanohub_citmanager,
  author = {Mejia, Daniel},
  title = {NanoHub Citation Manager Python Library},
  year = {2025},
  url = {https://github.com/denphi/nanohub-citmanager}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Dec 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanohub_citmanager-0.1.0.tar.gz (43.8 kB view details)

Uploaded Dec 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nanohub_citmanager-0.1.0-py3-none-any.whl (15.6 kB view details)

Uploaded Dec 4, 2025 Python 3

File details

Details for the file nanohub_citmanager-0.1.0.tar.gz.

File metadata

Download URL: nanohub_citmanager-0.1.0.tar.gz
Upload date: Dec 4, 2025
Size: 43.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for nanohub_citmanager-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c35a60fdc25f49ef2dc68c2432db36d78889181fe675b6f49cb3cbfaa059a5de`
MD5	`501534c7dd28847e6b6139bee9eabd09`
BLAKE2b-256	`8620fee411f06e9600aa5f87e0da33bab3eaee0be5ae68e8f56dadc7cf121ef4`

See more details on using hashes here.

File details

Details for the file nanohub_citmanager-0.1.0-py3-none-any.whl.

File metadata

Download URL: nanohub_citmanager-0.1.0-py3-none-any.whl
Upload date: Dec 4, 2025
Size: 15.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for nanohub_citmanager-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6c2fc84ce08a6fdbf1f44f4860fe3836cf408b3bd38574ab46da81537684007b`
MD5	`35e8d188c5224142b61097e93689be44`
BLAKE2b-256	`83f605ee04f6140aac480602a81686509d77435076a7609ad313802b3f613b15`

See more details on using hashes here.

nanohub-citmanager 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

NanoHub Citation Manager

Features

Installation

Quick Start

Basic Usage

Creating a New Citation

PDF Management

Searching Citations

LLM-Powered Metadata Extraction

Prerequisites

Running the Example

Example Output

API Reference

CitationManagerClient

Citation

PDFManager

Complete Example

Supported Fields

Core Document Fields

BibTeX Fields

Paper/Journal Fields

NanoHub-Specific Fields

Citation Metrics

Related Data

Workflow Status Codes

Development

Requirements

License

Authors

Contributing

Links

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes