Skip to main content

A tool for fixing BibTeX reference list with DBLP API.

Project description

reffix: Fixing BibTeX reference list with DBLP API 🔧

reffix GitHub GitHub issues PyPI PyPI downloads Github stars

➡️ Reffix is a simple tool for improving the BibTeX list of references in your paper. It can fix several common errors such as incorrect capitalization, missing URLs, or using arXiv pre-prints instead of published version.

➡️ Reffix queries the DBLP API, so it does not require any local database of papers.

➡️ Reffix uses a conservative approach to keep your bibliography valid.

➡️ The tool is developed with NLP papers in mind, but it can be used on any BibTeX list of references containing computer science papers present on DBLP.

Quickstart

👉️ You can now install reffix from PyPI:

pip install -U reffix
reffix [BIB_FILE]

See the Installation and Usage section below for more details.

Example

Before the update (Google Scholar):

  • ❎ arXiv version
  • ❎ no URL
  • ❎ capitalization lost
 {  
    'ENTRYTYPE': 'article',
    'ID': 'duvsek2020evaluating',
    'author': 'Du{\\v{s}}ek, Ond{\\v{r}}ej and Kasner, Zden{\\v{e}}k',
    'journal': 'arXiv preprint arXiv:2011.10819',
    'title': 'Evaluating semantic accuracy of data-to-text generation with '
             'natural language inference',
    'year': '2020'
}

After the update (DBLP + preserving capitalization):

  • ✔️ ACL version
  • ✔️ URL included
  • ✔️ capitalization preserved
 {   
    'ENTRYTYPE': 'inproceedings',
    'ID': 'duvsek2020evaluating',
    'author': 'Ondrej Dusek and\nZdenek Kasner',
    'bibsource': 'dblp computer science bibliography, https://dblp.org',
    'biburl': 'https://dblp.org/rec/conf/inlg/DusekK20.bib',
    'booktitle': 'Proceedings of the 13th International Conference on Natural '
                 'Language\n'
                 'Generation, {INLG} 2020, Dublin, Ireland, December 15-18, '
                 '2020',
    'editor': 'Brian Davis and\n'
              'Yvette Graham and\n'
              'John D. Kelleher and\n'
              'Yaji Sripada',
    'pages': '131--137',
    'publisher': 'Association for Computational Linguistics',
    'timestamp': 'Mon, 03 Jan 2022 00:00:00 +0100',
    'title': '{Evaluating} {Semantic} {Accuracy} of {Data-to-Text} '
             '{Generation} with {Natural} {Language} {Inference}',
    'url': 'https://aclanthology.org/2020.inlg-1.19/',
    'year': '2020'
}

Main features

  • Completing referencesreffix queries the DBLP API with the paper title and the first author's name to find a complete reference for each entry in the BibTeX file.
  • Replacing arXiv preprintsreffix can try to replace arXiv pre-prints with the version published at a conference or in a journal whenever possible.
  • Preserving titlecase – in order to preserve correct casing, reffix wraps individual uppercased words in the paper title in curly brackets.
  • Conservative approach:
    • the original .bib file is preserved
    • no references are deleted
    • papers are updated only if the title and at least one of the authors match
    • the version of the paper corresponding to the original entry should be selected first
  • Interactive mode – you can confirm every change manually.

The package uses bibtexparser for parsing the BibTex files, DBLP API for updating the references, and the titlecase package for optional extra titlecasing.

Installation

You can install reffix from PyPI:

pip install reffix

The core package does not require spaCy. If you want to use --process-conf-loc, install the optional extra:

pip install 'reffix[conf-loc]'
uv tool install 'reffix[conf-loc]'

The first time --process-conf-loc is used, reffix will download the en_core_web_sm model automatically into the active environment.

For development, you can install the package in the editable mode:

pip install -e .[dev]

Usage

Run the script with the .bib file as the first argument:

reffix [IN_BIB_FILE]

By default, the program will run in batch mode, save the outputs in the file with an extra ".fixed" suffix, and keep the arXiv versions.

The following command will run reffix in interactive mode, save the outputs to a custom file, and replace arXiv versions:

reffix [IN_BIB_FILE] -o [OUT_BIB_FILE] -i -a

If you want to control which DBLP BibTeX form is imported, use --dblp-bibtex-format condensed, --dblp-bibtex-format standard, or --dblp-bibtex-format crossref.

Flags

short long description
-o --out Output filename. If not specified, the default filename <original_name>.fixed.bib is used.
-i --interact Interactive mode. Every replacement of an entry with DBLP result has to be confirmed manually.
-a --replace-arxiv Replace arXiv versions. If a non-arXiv version (e.g. published at a conference or in a journal) is found at DBLP, it is preferred to the arXiv version.
--dblp-bibtex-format Choose which DBLP BibTeX export form to fetch for matching records: condensed, standard, or crossref.
-t --force-titlecase Force titlecase for all entries. The titlecase package is used to fix casing of titles which are not titlecased. (Note that the capitalizaton rules used by the package may be a bit different.)
-s --sort-by Multiple sort conditions compatible with bibtexparser.BibTexWriter applied in the provided order. Example: -s ENTRYTYPE year sorts the list by the entry type as its primary key and year as its secondary key. ID can be used to refer to the Bibtex key. The default None value keeps the original order of Bib entries.
--no-publisher Suppress publishers in conference papers and journals (still kept for books).
--process-conf-loc Parse conference dates and locations, remove from proceedings names, store locations under address.
--no-formatting Disable automatic BibTeX formatting.

Notes

For lowering the amount of requests to the DBLP API, you can use the bibexport tool for generating a file compact.bib containing only the references used in the paper. As an input, use the file <myarticle>.aux created during compilation.

bibexport -o compact.bib <myarticle>.aux

Although reffix uses a conservative approach, it provides no guarantees that the output references are actually correct.

If you want to make sure that reffix does not introduce any unwanted changes, please use the interactive mode (flag -i).

The tool depends on DBLP API which may change any time in the future. I will try to update the script if necessary, but it may still occasionally break. I welcome any pull requests with improvements.

Please be considerate regarding the DBLP API and do not generate high traffic for their servers :-)

Contact

For any questions or suggestions, send an e-mail to kasner@ufal.mff.cuni.cz.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reffix-1.3.1.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reffix-1.3.1-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file reffix-1.3.1.tar.gz.

File metadata

  • Download URL: reffix-1.3.1.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for reffix-1.3.1.tar.gz
Algorithm Hash digest
SHA256 b2753062462a6b6d7aa439a46d9de511f237f45b2063bbb7698749c86a7bb288
MD5 d7d39e42b72203c0e1bfba856ba0c218
BLAKE2b-256 02001b776fe968241cf6fe06560132af7b4c4b6a0f1c93b16ab6f990979522cb

See more details on using hashes here.

File details

Details for the file reffix-1.3.1-py3-none-any.whl.

File metadata

  • Download URL: reffix-1.3.1-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for reffix-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27810927da72460c1a472143ceacb36b8d0125969d7f44a0c3634c7da873c22f
MD5 ed753a8820ec8f6fc0b0e1efaee144ed
BLAKE2b-256 30d427ee004c1e447ff652736f4548257d9efe142bbda1b4e14ccca901e8256e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page