A comprehensive BibTeX bibliography management toolkit for academic research

These details have not been verified by PyPI

Project links

Project description

Awesome Citations

Awesome Citations is a comprehensive Python-based BibTeX bibliography management toolkit that automates the entire citation workflow. From completing incomplete entries by fetching from official sources (IEEE, ACM, arXiv, CrossRef, Semantic Scholar), to standardizing formatting, replacing arXiv preprints with published versions, and generating formatted PDF bibliographies - all in a single command.

Perfect for managing bibliographies for academic papers, theses, and research projects with support for multiple citation styles and bilingual (English/Chinese) journals.

Features
Installation
Quick Start
Usage
- Complete Workflow (Recommended)
- Individual Tools
Configuration
Data Sources
Project Structure
Documentation
Testing
License

Features

Core Capabilities

🚀 Complete Workflow Automation - One-stop BibTeX processing pipeline in a single command
- Sort and deduplicate entries by ID
- Complete missing fields from multiple sources
- Standardize formatting across all entries
- Replace arXiv preprints with published versions
- Generate detailed change logs
- Create formatted PDF bibliographies
📚 Multi-Source BibTeX Completion - Fetch missing fields from 5 official sources:
- IEEE Xplore - Primary source for IEEE publications (with Selenium fallback)
- ACM Digital Library - For ACM publications
- arXiv API - For preprints and arXiv papers
- CrossRef - Universal fallback for any DOI
- Semantic Scholar - For published versions of arXiv papers
- Intelligent fallback chains ensure maximum success rate
🔄 Smart arXiv Preprint Replacement
- Automatically detects arXiv preprints in your bibliography
- Searches for published versions using Semantic Scholar, DBLP, and CrossRef APIs
- Replaces preprint entries with complete journal/conference publication data
- Preserves original entry IDs for reference consistency
✨ Comprehensive Field Standardization
- Title formatting: Title Case, Sentence case, with protected acronyms (IoT, WiFi, etc.)
- Author formatting: First-last or Last-first name ordering
- Journal normalization: Full names, abbreviations, or preserve both (50+ journal mappings)
- Page formatting: LaTeX double-dash (100--110) or single-dash (100-110)
📊 PDF Bibliography Generation
- Support for 4 citation styles: IEEE, ACM, APA, GB/T 7714 (Chinese standard)
- Customizable templates for each style
- Automatic LaTeX compilation with biber
- Configurable document title, font size, paper size
📝 Detailed Change Tracking
- Every modification is logged with before/after values
- Source attribution (which API provided the data)
- Markdown-formatted change reports
- Summary statistics (entries processed, fields added, errors)
🔍 Analysis Tools
- Analyze bibliography by reference types (article, inproceedings, etc.)
- Publication year distribution (sorted newest first)
- Publication venue frequency (sorted by count)
- Formatted table output
🌏 Bilingual Support
- English and Chinese journal name mappings
- Chinese journal metadata database
- GB/T 7714 citation style for Chinese standards
- Bilingual document titles support
🛡️ Robust Error Handling
- DOI validation via HEAD requests to doi.org
- Title similarity checks (60% word overlap required)
- Year consistency validation (±1 year tolerance)
- Failed DOI tracking in /data/failed_dois.json
- Manual DOI correction database support
- Rate limiting to respect API limits (configurable delays)

Installation

Prerequisites

Python 3.8+ (tested on Python 3.12)
uv - Modern Python package manager (recommended) or pip
LaTeX distribution (optional, required for PDF generation)
- macOS: brew install --cask mactex
- Ubuntu/Debian: sudo apt-get install texlive-full
- Windows: Install MiKTeX or TeX Live

Install uv (if not already installed)

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip
pip install uv

Install Python Dependencies

Method 1: Using uv (Recommended - Fast & Modern)

# Install all dependencies
uv pip install -r requirements.txt

# Or create and activate a virtual environment with uv
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

Method 2: Using pip (Traditional)

pip install -r requirements.txt

Method 3: Manual installation with uv

uv pip install bibtexparser tabulate requests beautifulsoup4 lxml pyyaml habanero scholarly

Optional dependencies for enhanced features:

# For IEEE Selenium fallback (if API fails)
uv pip install selenium webdriver-manager

# For testing and development
uv pip install pytest pytest-cov pytest-timeout pytest-mock

Verify Installation

# Test the complete workflow
uv run python scripts/workflow_complete.py examples/sample_input.bib --output output/test_result.bib

# Or if using activated virtual environment
python3 scripts/workflow_complete.py examples/sample_input.bib --output output/test_result.bib

# Check if LaTeX is installed (for PDF generation)
pdflatex --version
biber --version

Quick Start

Process your BibTeX file in one command:

# Using uv (recommended)
uv run python scripts/workflow_complete.py refs.bib

# Or with activated virtual environment
python3 scripts/workflow_complete.py refs.bib

This will create:

refs_completed.bib - Your processed bibliography
refs_completed_changes.md - Detailed change log
refs_completed.pdf - Formatted PDF (if LaTeX is installed)

With custom configuration:

uv run python scripts/workflow_complete.py refs.bib --output output/my_refs.bib --config config.json

Usage

🚀 Complete Workflow (Recommended)

The workflow_complete.py script orchestrates the entire BibTeX processing pipeline in a single command. This is the recommended way to use Awesome Citations.

Basic usage:

# Using uv (recommended - handles dependencies automatically)
uv run python scripts/workflow_complete.py input.bib

# Or with activated virtual environment
python3 scripts/workflow_complete.py input.bib

What it does (7 automated steps):

✅ Sort and deduplicate entries by ID
✅ Complete missing fields from multiple sources (IEEE/ACM/arXiv/CrossRef/Semantic Scholar)
✅ Standardize formatting (titles, authors, journals, pages)
✅ Replace arXiv preprints with published versions
✅ Write output file with all changes
✅ Generate change summary (Markdown report with statistics)
✅ Generate PDF bibliography (IEEE style by default)

Output files:

input_completed.bib - Processed BibTeX file
input_completed_changes.md - Detailed change log
input_completed.pdf - Formatted PDF (if LaTeX is installed)

Custom output path:

uv run python scripts/workflow_complete.py refs.bib --output output/completed.bib --config config.json

Process multiple files:

# Process all .bib files in a directory
for file in *.bib; do
    uv run python scripts/workflow_complete.py "$file"
done

Key features:

Progress tracking with detailed console output
Error handling with fallback chains
Rate limiting (1.0s delay between requests by default)
Preserves original entry IDs
Logs all failures for manual review

📖 For detailed documentation, see docs/WORKFLOW_GUIDE.md

Individual Tools

For specific tasks, you can use individual scripts. Each script focuses on a single aspect of bibliography management.

1. Complete BibTeX Entries

Complete incomplete BibTeX entries by automatically fetching missing information from official sources using DOI.

File: scripts/complete_bibtex.py

Usage:

uv run python scripts/complete_bibtex.py

Edit the script to set input and output file paths, or modify to accept command-line arguments.

Features:

Extracts DOI from entry fields (doi, url, note) or URLs
Validates DOI existence via HEAD request to doi.org
Identifies publisher from DOI prefix (10.1109 = IEEE, 10.1145 = ACM, etc.)
Fetches from appropriate source with intelligent fallback chain:
- IEEE Xplore → Selenium fallback → CrossRef → Google Scholar
- ACM Digital Library → CrossRef → Google Scholar
- arXiv API → CrossRef → Google Scholar
Validates fetched data (title similarity, year consistency, DOI match)
Logs failed attempts to /data/failed_dois.json
Respects rate limits (0.5s default delay)

Note: This feature uses web scraping and may be affected by website changes.

2. Format BibTeX Fields

Standardize field formatting across all entries according to your preferences.

File: scripts/format_bibtex.py

Usage:

uv run python scripts/format_bibtex.py input.bib output.bib config.json

Formatting options:

Title formatting:
- titlecase: "A Survey on Machine Learning Techniques"
- sentencecase: "A survey on machine learning techniques"
- Protects acronyms: "{IoT}-Based {WiFi} System"
Author formatting:
- first_last: "John Smith and Jane Doe"
- last_first: "Smith, John and Doe, Jane"
Journal formatting:
- abbreviation: "IEEE Trans. Pattern Anal. Mach. Intell."
- full: "IEEE Transactions on Pattern Analysis and Machine Intelligence"
- both: Keep original format
Page formatting:
- double_dash: "100--110" (LaTeX format)
- single_dash: "100-110"

3. Sort BibTeX File

Sort entries alphabetically by their citation keys (entry IDs).

File: scripts/sort_bibtex.py

Usage:

uv run python scripts/sort_bibtex.py

Edit the script to set input and output file paths.

Features:

Alphabetical sorting by entry ID
Preserves entry formatting
Removes duplicate entries with the same ID

4. Analyze BibTeX File

Generate statistical analysis and tables for your bibliography.

File: scripts/analyze_bibtex.py

Usage:

uv run python scripts/analyze_bibtex.py

Edit the script to set the input file path.

Outputs three formatted tables:

Reference Types (article, inproceedings, book, etc.)

Type            Count
-------------  -------
article             45
inproceedings       32
book                 8

Publication Years (sorted newest first)

Year    Count
------  -------
2024         15
2023         28
2022         22

Publication Venues (sorted by frequency)

Publication                                    Count
-------------------------------------------  -------
IEEE Transactions on Neural Networks            12
ACM Computing Surveys                             8
Nature Machine Intelligence                       5

5. Generate PDF Bibliography

Create a formatted PDF bibliography using LaTeX templates.

File: scripts/generate_pdf.py

Usage:

uv run python scripts/generate_pdf.py input.bib output.pdf ieee config.json

Supported citation styles:

ieee - IEEE numeric citations (default)
acm - ACM author-year citations
apa - APA psychology standard
gb7714 - Chinese GB/T 7714 standard

Requirements:

LaTeX distribution (pdflatex + biber)
Templates in /templates/ directory

Customization options (via config.json):

{
  "pdf_output": {
    "enabled": true,
    "document_title": "参考文献列表 / References",
    "sort_by": "author",          // or "year", "title"
    "font_size": "11pt",           // or "10pt", "12pt"
    "paper_size": "a4paper"        // or "letterpaper"
  }
}

6. Replace arXiv Preprints

Detect arXiv preprints and replace them with published versions.

File: utils/arxiv_detector.py

Usage:

from utils.arxiv_detector import detect_and_replace_arxiv

# Automatically called in workflow_complete.py
entries_updated = detect_and_replace_arxiv(bib_database)

Detection methods:

Checks journal field for "arXiv"
Checks eprint field for arXiv ID
Checks DOI for arXiv format (10.48550/arXiv.*)

Search strategy:

Semantic Scholar API (most reliable for arXiv papers)
DBLP API (excellent for CS papers)
CrossRef API (comprehensive coverage)

Filters:

Only replaces with journal or conference publications
Ignores other preprints
Preserves original entry ID

Configuration

Awesome Citations uses a JSON configuration file to control all aspects of the processing workflow. The default configuration is in config.json.

Configuration File Structure

{
  "citation_style": "ieee",
  "custom_biblatex_style": null,
  "title_format": "titlecase",
  "journal_format": "both",
  "author_format": "first_last",
  "page_format": "double_dash",
  "arxiv_handling": "replace_with_published",
  "data_source_priority": ["doi_official", "dblp", "crossref"],
  "merge_multiple_sources": true,
  "parallel_processing": true,
  "max_workers": 5,
  "request_delay": 1.0,
  "pdf_output": {
    "enabled": true,
    "document_title": "参考文献列表 / References",
    "sort_by": "author",
    "font_size": "11pt",
    "paper_size": "a4paper"
  },
  "logging": {
    "enabled": true,
    "output_file": "changes_log.md",
    "verbose": true
  }
}

Configuration Options Explained

General Settings

citation_style (string): PDF citation style
- Options: "ieee", "acm", "apa", "gb7714"
- Default: "ieee"
custom_biblatex_style (string or null): Custom biblatex style file path
- Use if you have a custom .bbx/.cbx file
- Default: null

Formatting Options

title_format (string): How to format entry titles
- "titlecase": "A Survey on Machine Learning" (capitalize major words)
- "sentencecase": "A survey on machine learning" (only first word)
- "preserve": Keep original formatting
- Default: "titlecase"
journal_format (string): Journal name formatting
- "abbreviation": "IEEE Trans. Pattern Anal." (abbreviated)
- "full": "IEEE Transactions on Pattern Analysis..." (full name)
- "both": Preserve original format
- Default: "both"
author_format (string): Author name ordering
- "first_last": "John Smith and Jane Doe"
- "last_first": "Smith, John and Doe, Jane"
- Default: "first_last"
page_format (string): Page number dash style
- "double_dash": "100--110" (LaTeX standard)
- "single_dash": "100-110"
- Default: "double_dash"

Data Fetching Options

arxiv_handling (string): How to handle arXiv preprints
- "replace_with_published": Auto-replace with published versions
- "keep": Preserve arXiv entries as-is
- Default: "replace_with_published"
data_source_priority (array): Order of data sources to try
- Options: "doi_official" (IEEE/ACM/arXiv), "dblp", "crossref", "google_scholar"
- Default: ["doi_official", "dblp", "crossref"]
- First successful source is used (unless merge_multiple_sources is true)
merge_multiple_sources (boolean): Merge data from multiple sources
- true: Fetch from all sources and intelligently merge fields
- false: Use only first successful source
- Default: true

Performance Options

parallel_processing (boolean): Enable concurrent processing
- true: Process multiple entries simultaneously (faster)
- false: Process sequentially (safer for debugging)
- Default: true
max_workers (integer): Number of concurrent worker threads
- Range: 1-10 (recommended: 3-5)
- Default: 5
- Only applies if parallel_processing is true
request_delay (float): Delay between API requests (seconds)
- Prevents rate limiting from APIs
- Range: 0.2-5.0 seconds
- Default: 1.0
- Recommended: 0.5 for personal use, 1.0+ for large batches

PDF Output Settings

pdf_output.enabled (boolean): Generate PDF bibliography
- Default: true
pdf_output.document_title (string): Title on PDF document
- Default: "参考文献列表 / References" (bilingual)
pdf_output.sort_by (string): Bibliography sorting order
- Options: "author", "year", "title", "id"
- Default: "author"
pdf_output.font_size (string): Document font size
- Options: "10pt", "11pt", "12pt"
- Default: "11pt"
pdf_output.paper_size (string): Paper size
- Options: "a4paper", "letterpaper", "a5paper"
- Default: "a4paper"

Logging Settings

logging.enabled (boolean): Enable change logging
- Default: true
logging.output_file (string): Change log filename
- Default: "changes_log.md"
logging.verbose (boolean): Include detailed logs
- true: Log every field change
- false: Only log summary statistics
- Default: true

Example Configurations

Minimal processing (fast, conservative):

{
  "title_format": "preserve",
  "journal_format": "both",
  "arxiv_handling": "keep",
  "merge_multiple_sources": false,
  "parallel_processing": true,
  "max_workers": 10,
  "request_delay": 0.5
}

Maximum quality (slow, comprehensive):

{
  "title_format": "titlecase",
  "journal_format": "full",
  "arxiv_handling": "replace_with_published",
  "merge_multiple_sources": true,
  "parallel_processing": false,
  "request_delay": 2.0,
  "logging": {
    "verbose": true
  }
}

Chinese academic papers:

{
  "citation_style": "gb7714",
  "title_format": "preserve",
  "journal_format": "full",
  "pdf_output": {
    "document_title": "参考文献列表",
    "paper_size": "a4paper",
    "font_size": "12pt"
  }
}

Data Sources

Awesome Citations fetches bibliographic data from multiple authoritative sources:

Primary Sources (Publisher APIs)

IEEE Xplore - For IEEE publications (DOI: 10.1109/*)
- Primary: POST API to /xpl/downloadCitations
- Fallback: Selenium browser automation
- Coverage: IEEE journals, conferences, standards
ACM Digital Library - For ACM publications (DOI: 10.1145/*)
- Web scraping from /doi/{doi}/bibtex
- Coverage: ACM journals, conferences, magazines
arXiv API - For preprints (DOI: 10.48550/arXiv.*)
- Official arXiv public API
- Coverage: All arXiv papers

Fallback Sources

CrossRef API - Universal DOI resolver
- REST API for any DOI
- Coverage: 130+ million records across all publishers
- Very reliable for basic metadata
Semantic Scholar API - Academic search engine
- Best for finding published versions of arXiv papers
- Provides publication venue information
- Coverage: 200+ million papers
DBLP API - Computer Science bibliography
- Excellent for CS conference and journal papers
- Provides canonical venue names
- Coverage: 5+ million CS publications
Google Scholar - Final fallback (title-based search)
- Used when DOI lookup fails
- Searches by title
- Rate-limited, used sparingly

Fallback Strategy

Each entry follows an intelligent fallback chain:

1. Extract DOI from entry
   ↓
2. Validate DOI exists (HEAD request to doi.org)
   ↓
3. Identify publisher from DOI prefix
   ↓
4. Try primary source (IEEE/ACM/arXiv API)
   ↓
5. If fails → Try CrossRef
   ↓
6. If fails → Try Google Scholar
   ↓
7. If fails → Log to failed_dois.json

For arXiv papers specifically:

1. Check if entry is arXiv preprint
   ↓
2. Try Semantic Scholar API (find published version)
   ↓
3. If found → Fetch complete BibTeX from new DOI
   ↓
4. If not found → Try DBLP API
   ↓
5. If not found → Try CrossRef API
   ↓
6. If no published version → Keep arXiv entry

Data Validation

All fetched data undergoes validation:

Title similarity: At least 60% word overlap with original title
Year consistency: Within ±1 year of original (if present)
DOI match: Exact match with expected DOI
Field completeness: Must have at minimum: title, author, year

Failed validations are logged and the entry is skipped.

Manual Corrections

For persistent DOI fetch failures, you can add manual corrections to /data/doi_corrections.json:

{
  "10.1109/EXAMPLE.2024.1234567": {
    "title": "Correct Title",
    "author": "Smith, John and Doe, Jane",
    "journal": "IEEE Transactions on Example",
    "year": "2024",
    "volume": "15",
    "number": "3",
    "pages": "100--110",
    "doi": "10.1109/EXAMPLE.2024.1234567"
  }
}

These corrections are checked before attempting API fetches.

Project Structure

Awesome-Citations/
├── README.md                   # This file
├── LICENSE                     # GNU GPL v3.0
├── requirements.txt            # Python dependencies
├── config.json                 # Default configuration
├── refs.bib                    # Example BibTeX file
│
├── scripts/                    # Main executable scripts
│   ├── workflow_complete.py   # 🚀 Complete workflow (recommended)
│   ├── complete_bibtex.py     # BibTeX completion from APIs
│   ├── format_bibtex.py       # Field standardization
│   ├── sort_bibtex.py         # Alphabetical sorting
│   ├── analyze_bibtex.py      # Statistical analysis
│   ├── generate_pdf.py        # PDF generation
│   ├── utilities.py           # Core utility functions
│   └── enhanced_complete.py   # Enhanced multi-source completion
│
├── utils/                      # Utility modules
│   ├── change_logger.py       # Change tracking and logging
│   ├── arxiv_detector.py      # arXiv preprint detection
│   ├── title_formatter.py     # Title formatting utilities
│   └── multi_source_merger.py # Multi-source data merging
│
├── data/                       # Data files and databases
│   ├── journal_abbr.json      # Journal abbreviation mappings (50+)
│   ├── protected_words.json   # Acronyms to protect in titles
│   ├── small_words.json       # Articles/prepositions for Title Case
│   ├── chinese_journals.json  # Chinese journal metadata
│   ├── doi_corrections.json   # Manual DOI corrections
│   └── failed_dois.json       # Failed DOI fetch log (auto-generated)
│
├── templates/                  # LaTeX templates for PDF generation
│   ├── ieee_template.tex      # IEEE citation style
│   ├── acm_template.tex       # ACM citation style
│   ├── apa_template.tex       # APA citation style
│   └── gb7714_template.tex    # Chinese GB/T 7714 standard
│
├── docs/                       # Documentation (16 files)
│   ├── WORKFLOW_GUIDE.md      # Complete workflow documentation
│   ├── USAGE_GUIDE.md         # Detailed usage instructions
│   ├── TEST_REPORT.md         # Testing results and coverage
│   ├── IEEE_FAILURE_ANALYSIS.md # IEEE API troubleshooting
│   └── ...                    # Implementation and optimization docs
│
├── examples/                   # Example files
│   ├── sample_input.bib       # Sample BibTeX entries
│   └── sample_config.json     # Example configuration
│
├── tests/                      # Comprehensive test suite (2,579 lines)
│   ├── conftest.py            # Pytest fixtures and configuration
│   ├── test_complete_bibtex.py # BibTeX completion tests (1,057 lines)
│   ├── test_format_bibtex.py  # Formatting tests (485 lines)
│   ├── test_ieee_integration.py # IEEE-specific tests (401 lines)
│   ├── test_integration.py    # End-to-end workflow tests (171 lines)
│   ├── test_utils.py          # Utility function tests (161 lines)
│   └── test_data/             # Test BibTeX files (14+ files)
│       ├── ieee_papers.bib
│       ├── acm_papers.bib
│       ├── arxiv_papers.bib
│       ├── mixed_entries.bib
│       └── ...
│
├── research/                   # Research and exploration scripts
│   └── ieee_api_research.py   # IEEE API endpoint exploration
│
├── output/                     # Generated outputs (auto-created)
│   ├── *.bib                  # Processed BibTeX files
│   ├── *_changes.md           # Change logs
│   └── *.pdf                  # Generated PDFs
│
└── .cache/                     # API response cache (auto-created)
    └── *.json                 # Cached responses (30-day expiry)

Key Directories Explained

scripts/ - All main executable scripts. Start here for any task.
utils/ - Reusable Python modules imported by scripts.
data/ - JSON databases for journal names, acronyms, corrections, and failure logs.
templates/ - LaTeX templates for generating PDFs in different citation styles.
docs/ - Comprehensive documentation covering implementation, testing, and usage.
examples/ - Sample files to test the tool and learn usage patterns.
tests/ - Full test suite with fixtures, test data, and comprehensive coverage.
output/ - Auto-created directory for all generated files.
.cache/ - Auto-created cache for API responses to reduce redundant requests.

Documentation

Comprehensive documentation is available in the /docs directory:

User Documentation

WORKFLOW_GUIDE.md - Complete workflow documentation
- Step-by-step walkthrough of the complete workflow
- Configuration options explained
- Troubleshooting common issues
USAGE_GUIDE.md - Detailed usage instructions
- Individual script usage examples
- Advanced configuration scenarios
- Best practices for different use cases

Developer Documentation

IMPLEMENTATION_SUMMARY.md - Implementation details
- Architecture overview
- Core algorithms and data flows
- API integration details
TEST_REPORT.md - Testing results and coverage
- Test suite overview
- Coverage statistics
- Known issues and limitations
IEEE_FAILURE_ANALYSIS.md - IEEE API troubleshooting
- Common IEEE API errors
- Selenium fallback strategies
- Debugging tips

Project History

COMPLETION_SUMMARY.md - BibTeX completion feature development
OPTIMIZATION_RESULTS.md - Performance optimization results
FILE_REORGANIZATION_SUMMARY.md - Project restructuring notes
WORKFLOW_IMPLEMENTATION_SUMMARY.md - Complete workflow development

All documentation files are in Markdown format and contain detailed explanations with code examples.

Testing

Awesome Citations includes a comprehensive test suite to ensure reliability and correctness.

Test Suite Overview

Total test code: 2,579 lines
Test files: 6 main test files
Test data: 14+ specialized BibTeX files
Framework: pytest with coverage, timeout, and mock support

Running Tests

Run all tests:

pytest tests/

Run with coverage report:

pytest --cov=scripts --cov=utils --cov-report=html tests/

Run specific test file:

pytest tests/test_complete_bibtex.py -v

Run tests matching a pattern:

pytest tests/ -k "test_ieee" -v

Test Categories

Unit Tests (test_utils.py)
- Core utility functions
- Deduplication logic
- Sorting algorithms
BibTeX Completion Tests (test_complete_bibtex.py)
- DOI extraction and validation
- Publisher identification
- API fetching and fallback chains
- Data validation
- Error handling
Formatting Tests (test_format_bibtex.py)
- Title formatting (Title Case, Sentence case)
- Author name formatting
- Journal normalization
- Page formatting
IEEE Integration Tests (test_ieee_integration.py)
- IEEE API integration
- Selenium fallback testing
- Real IEEE DOI fetching
- Error scenarios
End-to-End Tests (test_integration.py)
- Complete workflow execution
- Configuration handling
- Output file generation
- Change log creation

Test Data Files

Located in /tests/test_data/:

ieee_papers.bib - IEEE journal and conference papers
acm_papers.bib - ACM publications
arxiv_papers.bib - arXiv preprints
crossref_papers.bib - CrossRef-sourced entries
duplicate_entries.bib - Duplicate detection tests
malformed_entries.bib - Error handling tests
mixed_entries.bib - Mixed publisher entries
chinese_journals.bib - Chinese publications
And more...

Continuous Testing

The test suite is designed to:

Catch regressions early
Validate API integrations
Ensure data quality
Test error handling paths
Verify configuration options

For detailed test results, see docs/TEST_REPORT.md

Troubleshooting

Common Issues

1. IEEE API fails frequently

The IEEE Xplore API can be unstable
Selenium fallback is automatically used
See docs/IEEE_FAILURE_ANALYSIS.md for details
Consider increasing request_delay in config

2. LaTeX not found

PDF generation requires LaTeX (pdflatex + biber)
Install TeX Live (Linux), MacTeX (macOS), or MiKTeX (Windows)
Verify: pdflatex --version and biber --version

3. Rate limiting errors

Increase request_delay in config.json (try 2.0 or higher)
Reduce max_workers (try 3 or less)
Some APIs have strict rate limits (especially Google Scholar)

4. Title similarity validation fails

Fetched data must match original title (60% word overlap)
Check if original title is correct in your .bib file
Add manual correction to /data/doi_corrections.json

5. DOI not found

Check /data/failed_dois.json for error details
Verify DOI is correct and exists at https://doi.org/
Some DOIs may not be indexed by all APIs

6. Module import errors

Ensure all dependencies are installed: pip install -r requirements.txt
Check Python version: 3.8+ required (tested on 3.12)

Getting Help

If you encounter issues not covered here:

Check the documentation in /docs
Review test files for usage examples
Check GitHub issues (if applicable)
Review error logs in console output and /data/failed_dois.json

Contributing

Contributions are welcome! If you'd like to contribute to Awesome Citations:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run the test suite to ensure nothing breaks (pytest tests/)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Areas for Contribution

Adding support for new citation styles
Improving API reliability and fallback strategies
Adding new data sources (e.g., Springer, Elsevier APIs)
Enhancing journal abbreviation database
Writing additional tests
Improving documentation
Fixing bugs and issues

Acknowledgment

Special thanks to:

Claude Code for providing valuable assistance in the development of this project
All contributors to the open-source libraries used in this project (bibtexparser, requests, BeautifulSoup, etc.)
The maintainers of IEEE Xplore, ACM Digital Library, arXiv, CrossRef, Semantic Scholar, and DBLP for their excellent APIs

License

GNU General Public License v3.0

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Built with ❤️ for researchers, academics, and anyone managing BibTeX bibliographies

For issues, questions, or suggestions, please check the documentation or open an issue on GitHub.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jan 9, 2026

0.1.0

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awesome_citations-0.2.0.tar.gz (99.7 kB view details)

Uploaded Jan 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

awesome_citations-0.2.0-py3-none-any.whl (74.8 kB view details)

Uploaded Jan 9, 2026 Python 3

File details

Details for the file awesome_citations-0.2.0.tar.gz.

File metadata

Download URL: awesome_citations-0.2.0.tar.gz
Upload date: Jan 9, 2026
Size: 99.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for awesome_citations-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3a1b95aa306f342a26ebcc35f3902047553639d09b1f2209ac8a0eade5f88e46`
MD5	`36e6fd122d3e48463ae7839696eabe42`
BLAKE2b-256	`0fc80f3e11b31137afc9b8c429a04f00e655121aae7685500479609873767ad9`

See more details on using hashes here.

File details

Details for the file awesome_citations-0.2.0-py3-none-any.whl.

File metadata

Download URL: awesome_citations-0.2.0-py3-none-any.whl
Upload date: Jan 9, 2026
Size: 74.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for awesome_citations-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5caa080934ec1e7af317b7bd8fdc184180b3ac4e325260b48680ca5856e05739`
MD5	`52895b11f02a368179803cbaa1cb78cb`
BLAKE2b-256	`cd77e13ece19bb4ff26488031ced6fcfb39e42286af3818062e646fa39e7ee57`

See more details on using hashes here.

awesome-citations 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Awesome Citations

Table of Contents

Features

Core Capabilities

Installation

Prerequisites

Install uv (if not already installed)

Install Python Dependencies

Verify Installation

Quick Start

Usage

🚀 Complete Workflow (Recommended)

Individual Tools

1. Complete BibTeX Entries

2. Format BibTeX Fields

3. Sort BibTeX File

4. Analyze BibTeX File

5. Generate PDF Bibliography

6. Replace arXiv Preprints

Configuration

Configuration File Structure

Configuration Options Explained

General Settings

Formatting Options

Data Fetching Options

Performance Options

PDF Output Settings

Logging Settings

Example Configurations

Data Sources

Primary Sources (Publisher APIs)

Fallback Sources

Fallback Strategy

Data Validation

Manual Corrections

Project Structure

Key Directories Explained

Documentation

User Documentation

Developer Documentation

Project History

Testing

Test Suite Overview

Running Tests

Test Categories

Test Data Files

Continuous Testing

Troubleshooting

Common Issues

Getting Help

Contributing

Areas for Contribution

Acknowledgment

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes