Skip to main content

A tool for normalizing bibtex with official info.

Project description

Rebiber: A tool for normalizing bibtex with official info.

We often cite papers using their arXiv versions without noting that they are already PUBLISHED in some conferences. These unofficial bib entries might violate rules about submissions or camera-ready versions for some conferences. We introduce Rebiber, a simple tool in Python to fix them automatically. It is based on the official conference information from the DBLP or the ACL anthology (for NLP confernces)! You can check the list of supported conferences here. Apart from handling outdated arXiv citations, Rebiber also normalizes citations in a unified way (DBLP-style), supporting abbreviation and value selection.

You can use this google colab notebook as a simple web demo.

Installation

pip install rebiber -U

OR

git clone https://github.com/yuchenlin/rebiber.git
cd rebiber/
pip install -e .

If you would like to use the latest github version with more bug fixes, please use the second installation method.

Usage (v1.1)

Normalize your bibtex file with the official converence information:

rebiber -i /path/to/input.bib -o /path/to/output.bib

You can find a pair of example input and output files in rebiber/example_input.bib and rebiber/example_output.bib.

argument usage
-i or --input_bib. The path to the input bib file that you want to update
-o or --output_bib. The path to the output bib file that you want to save. If you don't specify any -o then it will be the same as the -i.
-r or --remove. A comma-seperated list of value names that you want to remove, such as -r "url,biburl,address,publisher,pages". Empty by default.
-s or --shorten. A bool argument that is False by default, used for replacing booktitle with abbreviation in -a.
-d or --deduplicate. A bool argument that is True by default, used for removing the duplicate bib entries sharing the same key.
-l or --bib_list. The path to the list of the bib json files to be loaded. Check rebiber/bib_list.txt for the default file. Usually you don't need to set this argument.
-a or --abbr_tsv. The list of conference abbreviation data. Check rebiber/abbr.tsv for the default file. Usually you don't need to set this argument.

Example Input and Output

An example input entry with the arXiv information (from Google Scholar or somewhere):

@article{lin2020birds,
	title={Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models},
	author={Lin, Bill Yuchen and Lee, Seyeon and Khanna, Rahul and Ren, Xiang},
	journal={arXiv preprint arXiv:2005.00683},
	year={2020}
}

An example normalized output entry with the official information:

@inproceedings{lin2020birds,
    title = "{B}irds have four legs?! {N}umer{S}ense: {P}robing {N}umerical {C}ommonsense {K}nowledge of {P}re-{T}rained {L}anguage {M}odels",
    author = "Lin, Bill Yuchen  and
      Lee, Seyeon  and
      Khanna, Rahul  and
      Ren, Xiang",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.557",
    doi = "10.18653/v1/2020.emnlp-main.557",
    pages = "6862--6868",
}

Supported Conferences

The bib_list.txt contains a list of converted json files of the official bib data. In this repo, we now support the full ACL anthology, i.e., all papers that are published at *CL conferences (ACL, EMNLP, NAACL, etc.) as well as workshops. Also, we support any conference proceedings that can be downloaded from DBLP, for example, ICLR2020.

The following conferences are supported and their bib/json files are in our data folder. You can turn each item on/off in bib_list.txt. Please feel free to create PR for adding new conferences following this!

Name Years
ACL Anthology (until 2021-01)
AAAI 2010 -- 2020
AISTATS 2013 -- 2020
ALENEX 2010 -- 2020
ASONAM 2010 -- 2019
BigDataConf 2013 -- 2019
BMVC 2010 -- 2020
CHI 2010 -- 2020
CIDR 2009 -- 2020
CIKM 2010 -- 2020
COLT 2000 -- 2020
CVPR 2000 -- 2020
ICASSP 2015 -- 2020
ICCV 2003 -- 2019
ICLR 2013 -- 2020
ICML 2000 -- 2020
IJCAI 2011 -- 2020
KDD 2010 -- 2020
MLSys 2019 -- 2020
NeurIPS 2000 -- 2020
RECSYS 2010 -- 2020
SDM 2010 -- 2020
SIGIR 2010 -- 2020
SIGMOD 2010 -- 2020
SODA 2010 -- 2020
STOC 2010 -- 2020
UAI 2010 -- 2020
WSDM 2008 -- 2020
WWW (The Web Conf) 2001 -- 2020

Thanks for Anton Tsitsulin's great work on collecting such a complete set bib files!

Adding a new conference

You can manually add any conferences from DBLP by downloading its bib file to our data folder, then convert the conference bib file to the json format, and finally add its path to the bib_list.txt.

Take ICLR2020 as an example:

python bib2json.py -i data/iclr2020.bib -o data/iclr2020.json
  • Step 4: Add its path to bib_list.txt.

Contact

Please email yuchen.lin@usc.edu or create Github issues here if you have any questions or suggestions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rebiber-1.1.0.tar.gz (18.6 MB view details)

Uploaded Source

Built Distribution

rebiber-1.1.0-py2.py3-none-any.whl (19.1 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file rebiber-1.1.0.tar.gz.

File metadata

  • Download URL: rebiber-1.1.0.tar.gz
  • Upload date:
  • Size: 18.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for rebiber-1.1.0.tar.gz
Algorithm Hash digest
SHA256 b27bb46edf1c429d0cff7bb3a5f27bbb52f969ee725413232ca9bad98f17ac23
MD5 3f7fa8fa29fe187d92e668b603dc4b45
BLAKE2b-256 a789167eefc2266c7726f21abb3f6b2166339c0b0d42a2ea87ef166884d862f0

See more details on using hashes here.

File details

Details for the file rebiber-1.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: rebiber-1.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 19.1 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for rebiber-1.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b184ab8f59fc3b4bd434cfd4374c880ed1ef85079992d61f10f8f52eeac11b3d
MD5 d10a830d5d55730bd116b73c9ccef2c5
BLAKE2b-256 044622e040c89d42f379e29d431995bdd54c51373021e112b806689d5346e92d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page