Skip to main content

Universal citation management and academic reference toolkit

Project description

OneCite Logo

OneCite

The Universal Citation & Academic Reference Toolkit

Downloads PyPI version Python Version License Project Status

Effortlessly convert messy, unstructured references into perfectly formatted, standardized citations.

OneCite is a powerful command-line tool and Python library designed to automate the tedious process of citation management. Feed it anything—DOIs, paper titles,arXiv IDs, or even a mix—and get clean, accurate bibliographic entries in return.

🚀 OneCite for Web is coming.

Dropping soon at hezhiang.com/onecite.

✨ Features🚀 Quick Start📖 Advanced Usage🤖 AI Integration⚙️ Configuration🤝 Contributing


✨ Features

OneCite is packed with features to streamline your entire academic workflow, from initial search to final formatting.

  • 🔍 Smart Recognition: Utilizes fuzzy matching against CrossRef and Google Scholar APIs to find the correct reference even from incomplete or slightly inaccurate information.
  • 📚 Universal Format Support: Accepts .txt and .bib inputs and can output to BibTeX, APA, and MLA formats, adapting to any project's requirements.
  • 🎯 High-Accuracy Refinement: A 4-stage processing pipeline cleans, queries, validates, and formats your entries to ensure the highest quality output.
  • 🤖 Intelligent Auto-Completion: Automatically discovers and fills in missing bibliographic data like journal, volume, pages, and author lists.
  • 🎛️ Interactive Mode: When multiple potential matches are found, an interactive prompt lets you choose the correct entry, giving you full control over ambiguous references.
  • ⚙️ Customizable Templates: A flexible YAML-based template system allows for complete control over the output fields and their priority.
  • 🎓 Broad Paper Type Support: Natively understands and processes journal articles, conference papers (NIPS, CVPR, ICML, etc.), and arXiv preprints with ease.
  • 📄 Seamless arXiv & URL Integration: Automatically fetches metadata for arXiv IDs and can extract identifiers directly from arxiv.org or doi.org URLs.

🚀 Quick Start

Get up and running with OneCite in under a minute.

Installation

# Recommended: Install from PyPI
pip install onecite

# Or, install from source for the latest version
git clone https://github.com/HzaCode/OneCite.git
cd OneCite
pip install -e .

Basic Usage

  1. Create an input file (references.txt):

    10.1038/nature14539
    
    Attention is all you need
    Vaswani et al.
    NIPS 2017
    
  2. Run the command:

    onecite process references.txt -o results.bib --quiet
    
  3. Get perfectly formatted output (results.bib):

    @article{LeCun2015Deep,
      doi = "10.1038/nature14539",
      title = "Deep learning",
      author = "LeCun, Yann and Bengio, Yoshua and Hinton, Geoffrey",
      journal = "Nature",
      year = 2015,
      volume = 521,
      number = 7553,
      pages = "436-444",
      publisher = "Springer Science and Business Media LLC",
      url = "https://doi.org/10.1038/nature14539",
    }
    
    @inproceedings{Vaswani2017Attention,
      arxiv = "1706.03762",
      title = "Attention Is All You Need",
      author = "Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia",
      booktitle = "Advances in Neural Information Processing Systems",
      year = 2017,
      url = "https://arxiv.org/abs/1706.03762",
    }
    

📖 Advanced Usage

🎨 Multiple Output Formats (APA, MLA)
# Generate APA formatted citations
onecite process refs.txt --output-format apa
# → LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
# → Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems.

# Generate MLA formatted citations
onecite process refs.txt --output-format mla
# → LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep Learning." Nature 521.7553 (2015): 436-444.
# → Vaswani, Ashish, et al. "Attention Is All You Need." Advances in Neural Information Processing Systems. 2017.
🤖 Interactive Disambiguation

For ambiguous entries, use the --interactive flag to ensure accuracy.

Command:

onecite process ambiguous.txt --interactive

Example Interaction:

1. Deep learning
   Authors: LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
   Journal: Nature
   Year: 2015
   Match Score: 92.5
   DOI: 10.1038/nature14539

2. Deep belief networks
   Authors: Hinton, Geoffrey E.
   Journal: Scholarpedia
   Year: 2009
   Match Score: 78.3
   DOI: 10.4249/scholarpedia.5947

Please select (1-2, 0=skip): 1
✅ Selected: Deep learning
🐍 Use as a Python Library

Integrate OneCite's processing power directly into your Python scripts.

from onecite import process_references

# Define a callback for non-interactive selection (e.g., always choose the best match)
def auto_select_callback(candidates):
    return 0

result = process_references(
    input_content="Deep learning review\nLeCun, Bengio, Hinton\nNature 2015",
    input_type="txt",
    output_format="bibtex",
    interactive_callback=auto_select_callback
)

print(result['output_content'])
📑 Supported Input Types

OneCite is designed to be flexible and understands various common academic identifiers.

  • DOI: 10.1038/nature14539
  • Conference Papers: Attention is all you need, Vaswani et al., NIPS 2017
  • arXiv ID: 1706.03762
  • URLs: https://arxiv.org/abs/1706.03762

🤖 AI Assistant Integration (MCP)

OneCite provides complete Model Context Protocol (MCP) support, enabling AI assistants to directly use all of OneCite's functionality for literature search, processing, and formatting.

✨ Available Functions

  • cite - Generate single academic citations
    • Supports DOI, paper titles, arXiv IDs, and other input types
    • Supports APA, MLA, BibTeX, and other output formats
  • batch_cite - Batch citation generation
    • Process multiple literature sources at once
    • Automatically handle different input types
  • search - Academic literature search
    • Search for relevant literature based on keywords
    • Return structured literature information

🚀 Quick Start

  1. Install OneCite (if not already installed):

    pip install onecite
    
  2. Test MCP server:

    onecite-mcp
    
  3. Configure AI assistant: Add to settings.json in MCP-supported editors:

    {
      "mcpServers": {
        "onecite": {
          "command": "onecite-mcp",
          "args": [],
          "env": {}
        }
      }
    }
    
  4. Restart your editor, and the AI assistant will have access to OneCite's complete functionality!

📊 Test Status

Server Startup - MCP server starts and responds normally
Citation Function - DOI parsing and formatting work correctly
Batch Processing - Multi-source batch processing works normally
Search Function - Literature search functionality works correctly
Command Line Tool - onecite-mcp command is available

💡 Usage Examples

After configuration, you can directly tell your AI assistant:

  • "Generate an APA format citation for this DOI: 10.1038/nature14539"
  • "Batch process these references and generate BibTeX format"
  • "Search for the latest papers on machine learning"

The AI assistant will automatically call OneCite's corresponding functions and return results.

⚙️ Configuration

📋 Command Line Options
Option Description Default
--input-type Input format (txt, bib) txt
--output-format Output format (bibtex, apa, mla) bibtex
--template Specify a custom template YAML to use journal_article_full
--interactive Enable interactive mode for disambiguation False
--quiet Suppress verbose logging False
--output, -o Path to the output file stdout
🎨 Custom Templates

Define custom output formats using a simple YAML template.

Example my_template.yaml:

name: my_template
entry_type: "@article"
fields:
  - name: author
    required: true
  - name: title  
    required: true
  - name: journal
    required: true
  - name: year
    required: true
  - name: doi
    required: false
    source_priority: [crossref_api]

Usage:` ``bash onecite process refs.txt --template my_template.yaml```

🔄 Core Processing Pipeline

OneCite ensures high accuracy and quality through a sophisticated four-stage processing pipeline. The diagram below shows the complete workflow from raw input to final formatted output.

💡 MCP Integration: Through Model Context Protocol, AI assistants can directly invoke this complete processing pipeline without requiring users to manually operate the command line.

graph TD
    %% Input Layer - Multiple Entry Points
    A1["CLI Input<br/>onecite process"] --> A["Input Content"]
    A2["Python API<br/>process_references()"] --> A
    A3["MCP Server<br/>AI Assistant Integration"] --> A
    A4["Batch Processing<br/>Multiple Sources"] --> A
    
    A --> B["Stage 1: Parsing Module<br/>ParserModule"]
    
    B --> B1{"Input Type?"}
    B1 -->|TXT| B2["Parse Text<br/>- Split entries by double newlines<br/>- Extract DOIs and URLs<br/>- Generate query strings"]
    B1 -->|BIB| B3["Parse BibTeX<br/>- Parse existing entries<br/>- Extract metadata"]
    B2 --> C["Raw Entry List<br/>List[RawEntry]<br/>- id, raw_text, doi, url, query_string"]
    B3 --> C
    
    C --> D["Stage 2: Identification Module<br/>IdentifierModule"]
    D --> D0["Parallel Processing<br/>Each entry processed independently"]
    D0 --> D1{"DOI exists?"}
    
    D1 -->|Yes| D2["Validate DOI format<br/>Regex matching"]
    D2 --> D3["Verify DOI via CrossRef API"]
    D3 --> D4{"DOI exists and valid?"}
    
    D4 -->|Yes| D5["Get metadata from CrossRef<br/>Status: identified"]
    D4 -->|No| D6["DOI format valid but not found<br/>Continue fuzzy search"]
    
    D1 -->|No| D7["Check arXiv ID in URL"]
    D7 --> D8{"Found arXiv ID?"}
    D8 -->|Yes| D9["Extract arXiv ID<br/>Continue processing"]
    D8 -->|No| D10["Check well-known papers<br/>Built-in paper database"]
    
    D6 --> D11["Intelligent Search Strategy<br/>Auto-fallback mechanism"]
    D9 --> D11
    D10 --> D11
    
    D11 --> D11A["Primary: CrossRef Search<br/>Fast and accurate"]
    D11 --> D11B["Fallback: Google Scholar<br/>When CrossRef fails"]
    D11A --> D12["Score candidate results<br/>Fuzzy matching algorithm"]
    D11B --> D12
    
    D12 --> D13{"Match confidence?"}
    D13 -->|">80%"| D14["Auto-select best match<br/>Status: identified"]
    D13 -->|"70-80%"| D15["Interactive selection<br/>User/AI chooses from candidates"]
    D13 -->|"<70%"| D16["Mark as identification failed<br/>Status: identification_failed"]
    
    D15 --> D17["Selection result<br/>Status: identified"]
    D5 --> E["Identified Entry List<br/>List[IdentifiedEntry]<br/>- id, raw_text, doi, arxiv_id, metadata, status"]
    D14 --> E
    D16 --> E
    D17 --> E
    
    E --> F["Stage 3: Enrichment Module<br/>EnricherModule"]
    F --> F0["Parallel Enrichment<br/>Each entry processed independently"]
    F0 --> F1{"Entry status?"}
    F1 -->|identified| F2["Enrich metadata<br/>Template-driven completion"]
    F1 -->|failed| F3["Skip enrichment<br/>Status: enrichment_failed"]
    
    F2 --> F4{"Data source type?"}
    F4 -->|DOI| F5["Get complete metadata from CrossRef<br/>Full bibliographic data"]
    F4 -->|"arXiv ID"| F6["Get metadata from arXiv API<br/>Preprint information"]
    F4 -->|"Search result"| F7["Convert search metadata format<br/>Normalize data structure"]
    
    F5 --> F8["Generate BibTeX key<br/>FirstAuthorYearTitle format"]
    F6 --> F8
    F7 --> F8
    
    F8 --> F9["Complete missing fields<br/>Template priority rules"]
    F9 --> F10["Determine entry type<br/>@article vs @inproceedings"]
    F10 --> F11["Status: completed"]
    
    F3 --> G["Completed Entry List<br/>List[CompletedEntry]<br/>- id, doi, status, bib_key, bib_data"]
    F11 --> G
    
    G --> H["Stage 4: Formatting Module<br/>FormatterModule"]
    H --> H1{"Output format?"}
    
    H1 -->|BibTeX| H2["Format as BibTeX<br/>- Generate @entry format<br/>- Include all required fields"]
    H1 -->|APA| H3["Format as APA style<br/>- Author-date format<br/>- Standard punctuation"]
    H1 -->|MLA| H4["Format as MLA style<br/>- Author-page format<br/>- Specific citation rules"]
    
    H2 --> I["Final Output<br/>List[str] formatted citations"]
    H3 --> I
    H4 --> I
    
    I --> J["Processing Report<br/>- total: int<br/>- succeeded: int<br/>- failed_entries: List[Dict]"]
    
    %% MCP Integration Details
    MCP["MCP Functions"] --> MCP1["cite(source, style, format)<br/>Single citation generation"]
    MCP --> MCP2["batch_cite(sources, style)<br/>Batch processing"]
    MCP --> MCP3["search(query, limit)<br/>Literature search"]
    MCP1 --> A3
    MCP2 --> A3
    MCP3 --> A3
    
    %% Error handling and resilience
    D3 -.->|"API timeout/error"| D11
    F5 -.->|"API error"| F3
    F6 -.->|"API error"| F3
    H2 -.->|"Format error"| H5["Add to failed entries"]
    H3 -.->|"Format error"| H5
    H4 -.->|"Format error"| H5
    H5 --> J
    
    %% Template system
    T["Template System<br/>TemplateLoader"] --> F9
    T --> T1["journal_article_full.yaml<br/>Complete journal template"]
    T --> T2["conference_paper.yaml<br/>Conference template"]
    T1 --> T3["Field Configuration<br/>- Required/optional fields<br/>- Data source priority<br/>- Validation rules"]
    T2 --> T3
    
    %% External data sources with smart strategy
    DS["External Data Sources<br/>Smart API Management"] --> D3
    DS --> F5
    DS --> F6
    DS --> DS1["CrossRef API<br/>- DOI validation & metadata<br/>- Rate limiting & caching<br/>- Error handling"]
    DS --> DS2["arXiv API<br/>- Preprint metadata<br/>- PDF information<br/>- ID extraction"]
    DS --> DS3["Google Scholar<br/>- Fuzzy search fallback<br/>- Citation data<br/>- Timeout protection"]
    
    %% Performance optimizations
    PERF["Performance Features"] --> PERF1["Parallel Processing<br/>Independent entry handling"]
    PERF --> PERF2["Smart Caching<br/>Avoid duplicate API calls"]
    PERF --> PERF3["Graceful Degradation<br/>Fallback strategies"]
    PERF1 --> D0
    PERF1 --> F0
    
    %% Style definitions
    classDef stageBox fill:#e1f5fe,stroke:#01579b,stroke-width:3px
    classDef decisionBox fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef processBox fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef outputBox fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    classDef systemBox fill:#fafafa,stroke:#424242,stroke-width:2px
    classDef mcpBox fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef inputBox fill:#fff8e1,stroke:#ff8f00,stroke-width:2px
    classDef perfBox fill:#f1f8e9,stroke:#689f38,stroke-width:2px
    
    class B,D,F,H stageBox
    class B1,D1,D4,D8,D13,F1,F4,H1 decisionBox
    class B2,B3,D2,D3,D5,D6,D7,D9,D10,D11,D11A,D11B,D12,D14,D15,D16,D17,F2,F3,F5,F6,F7,F8,F9,F10,F11,H2,H3,H4,H5,D0,F0 processBox
    class C,E,G,I,J outputBox
    class T,T1,T2,T3,DS,DS1,DS2,DS3 systemBox
    class A1,A2,A3,A4 inputBox
    class MCP,MCP1,MCP2,MCP3 mcpBox
    class PERF,PERF1,PERF2,PERF3 perfBox

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for development guidelines and instructions on how to submit pull requests.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.


OneCite - Simple, Accurate, and Powerful Citation Management ✨

⭐ Star on GitHub🚀 Try the Web App📖 Read the Docs🐛 Report an Issue💬 Start a Discussion

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onecite-0.0.7.tar.gz (56.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

onecite-0.0.7-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file onecite-0.0.7.tar.gz.

File metadata

  • Download URL: onecite-0.0.7.tar.gz
  • Upload date:
  • Size: 56.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.1

File hashes

Hashes for onecite-0.0.7.tar.gz
Algorithm Hash digest
SHA256 db56dfae5d4469bb92c53e9112e26ce935d9ec5b07770c2cb278972364c93bcd
MD5 d95e846504838d1397f4af1bf44d70cc
BLAKE2b-256 e04b26f280131b33054af4bbd3fe77a1c6464c998f9081b8da2d7750727316b0

See more details on using hashes here.

File details

Details for the file onecite-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: onecite-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.1

File hashes

Hashes for onecite-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ea566af3c7be127ae2218d2dc963ebb9b1ffa0cd975a980923feee56d5d982ba
MD5 5d2d64e920e680bf95070e4395a6405d
BLAKE2b-256 fe82714f6b67be0c80bee9046fe5fa8af7a6cc787ca7e5a7a67f29eaa0eaae3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page