Skip to main content

Fixes and standardizes BibTeX using LLM + web search

Project description

PyPI version License PRs Welcome Model Changelog

A Python tool that fixes and standardizes your BibTeX. It not only completes entries with accurate metadata via LLM + web search capabilities, but also enforces a consistent style based on your preferences (e.g., venue naming, title casing, author format, page ranges). This removes the tedious manual work of hunting down sources and cleaning messy entries (like those copied from Google Scholar), producing a clean, uniform bib file. A consistent style improves readability and leaves a stronger impression on readers and reviewers.

Examples

Example (1) Original bib entry from Google Scholar. Additional authors are omitted and indicated by "and others", and "ai" is not capitalized.

@article{bai2022constitutional,
 author = {Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and others},
 journal = {arXiv preprint arXiv:2212.08073},
 title = {Constitutional ai: Harmlessness from ai feedback},
 year = {2022}
}

With bibfixer, missing authors are added and title is capitalized properly:

@article{bai2022constitutional,
  author = {Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and Chen, Carol and Olsson, Catherine and Olah, Christopher and Hernandez, Danny and Drain, Dawn and Ganguli, Deep and Li, Dustin and Tran-Johnson, Eli and Perez, Ethan and Kerr, Jamie and Mueller, Jared and Ladish, Jeffrey and Landau, Joshua and Ndousse, Kamal and Lukosuite, Kamile and Lovitt, Liane and Sellitto, Michael and Elhage, Nelson and Schiefer, Nicholas and Mercado, Noemi and DasSarma, Nova and Lasenby, Robert and Larson, Robin and Ringer, Sam and Johnston, Scott and Kravec, Shauna and El Showk, Sheer and Fort, Stanislav and Lanham, Tamera and Telleen-Lawton, Timothy and Conerly, Tom and Henighan, Tom and Hume, Tristan and Bowman, Samuel R. and Hatfield-Dodds, Zac and Mann, Ben and Amodei, Dario and Joseph, Nicholas and McCandlish, Sam and Brown, Tom and Kaplan, Jared},
  title = {Constitutional {AI}: {H}armlessness from {AI} Feedback},
  journal = {arXiv preprint arXiv:2212.08073},
  year = {2022}
}

Example (2) Original bib entry from Google Scholar. This shows the arXiv version but the paper was published in ICML. "llm" needs to be capitalized.

@article{khan2024debating,
 author = {Khan, Akbir and Hughes, John and Valentine, Dan and Ruis, Laura and Sachan, Kshitij and Radhakrishnan, Ansh and Grefenstette, Edward and Bowman, Samuel R and Rockt{\"a}schel, Tim and Perez, Ethan},
 journal = {arXiv preprint arXiv:2402.06782},
 title = {Debating with more persuasive llms leads to more truthful answers},
 year = {2024}
}

With bibfixer, arXiv is replaced with the conference information and appropriate title:

@inproceedings{khan2024debating,
  author = {Khan, Akbir and Hughes, John and Valentine, Dan and Ruis, Laura and Sachan, Kshitij and Radhakrishnan, Ansh and Grefenstette, Edward and Bowman, Samuel R. and Rockt{\"a}schel, Tim and Perez, Ethan},
  title = {Debating with More Persuasive {LLMs} Leads to More Truthful Answers},
  booktitle = {Proceedings of the 41st International Conference on Machine Learning},
  year = {2024},
  volume = {235},
  pages = {23662--23733}
}

Example (3) Original bib entry from Google Scholar. Last author is missing due to a system issue of the distributor Penguin Random House. Subtitle and publisher needs to be capitalized appropriately.

@book{sugiyama2022machine,
 title = {Machine learning from weak supervision: An empirical risk minimization approach},
 author = {Sugiyama, Masashi and Bao, Han and Ishida, Takashi and Lu, Nan and Sakai, Tomoya},
 year = {2022},
 publisher = {MIT Press}
}

With bibfixer, we have all authors and appropriate capitalization:

@book{sugiyama2022machine,
  author = {Sugiyama, Masashi and Bao, Han and Ishida, Takashi and Lu, Nan and Sakai, Tomoya and Niu, Gang},
  title = {Machine Learning from Weak Supervision: {A}n Empirical Risk Minimization Approach},
  publisher = {{MIT} Press},
  year = {2022},
  pages = {320}
}

Installation

  1. Install (from PyPI):
pip install bibfixer
  1. Set up your OpenAI API key:
export OPENAI_API_KEY='your-api-key-here'

Usage

Basic usage (input is required via -i/--input):

bibfixer -i sample_input.bib

With output file:

bibfixer -i sample_input.bib -o corrected.bib

With additional formatting preferences (-p):

bibfixer -i sample_input.bib -p "Use NeurIPS instead of NIPS"

Use a custom prompt file (defaults to bundled prompts/default.md):

bibfixer -i sample_input.bib --prompt-file prompts/default.md

The complete revision instructions are in prompts/default.md. You can edit this file to match your style or point to another file using --prompt-file.

Streamlit app

In addition to the dependencies in pyproject.toml, install streamlit>=1.30.0.

From the repo root, run:

streamlit run app.py

[!WARNING] This tool uses LLM + web search and may occasionally produce incomplete or inaccurate metadata or formatting. Always review the final .bib before submission. To quickly compare input and output, you can run:

diff -y --suppress-common-lines input.bib output.bib | less -R

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bibfixer-0.1.2.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bibfixer-0.1.2-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file bibfixer-0.1.2.tar.gz.

File metadata

  • Download URL: bibfixer-0.1.2.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for bibfixer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a4fafdc6f5d8390a9df7640d321aa2e929d568c70c8dc912558707e79c8dbfc0
MD5 4bb8f5c360634c37fd2d1ed546ef21cd
BLAKE2b-256 900975e7a5fec7a28015621d52e76a3c3bf909124ae4e5638d9807f45a16cbf9

See more details on using hashes here.

File details

Details for the file bibfixer-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: bibfixer-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for bibfixer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f523ed1455dee1f4fba91899291702d7ad21c69e77aa1e4f69cf6e9cb5fdbd3a
MD5 7d2834f9085a4541ba7cfb95c896bb01
BLAKE2b-256 88b174bb25481a2202c74191a24d9ffb2ec7f0a162531b7016f07a6abc775b96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page