Skip to main content

A useful tool for looking up Bib entries using DOI, or pubmed ID (or URL), or arXiv ID (or URL).

Project description

bib_lookup

pytest PyPI DOI downloads license

A useful tool for looking up Bib entries using DOI, or pubmed ID (or URL), or arXiv ID (or URL).

It is an updated version of https://github.com/wenh06/utils/blob/master/utils_universal/utils_bib.py

NOTE that you should have internet connection to use bib_lookup.

Installation

Run

python -m pip install bib-lookup

or install the latest version in GitHub using

python -m pip install git+https://github.com/DeepPSP/bib_lookup.git

or git clone this repository and install locally via

cd bib_lookup
python -m pip install .

:point_right: Back to TOC

Dependencies

  • requests
  • feedparser
  • pandas

:point_right: Back to TOC

Basic Usage Examples

Click to expand!
>>> from bib_lookup import BibLookup
>>> bl = BibLookup(align="middle")
>>> print(bl("1707.07183"))
@article{wen2017_1707.07183v2,
   author = {Hao Wen and Chunhui Liu},
    title = {Counting Multiplicities in a Hypersurface over a Number Field},
  journal = {arXiv preprint arXiv:1707.07183v2},
     year = {2017},
    month = {7},
}
>>> print(bl("10.1109/CVPR.2016.90"))
@inproceedings{He_2016,
     author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
      title = {Deep Residual Learning for Image Recognition},
  booktitle = {2016 {IEEE} Conference on Computer Vision and Pattern Recognition ({CVPR})},
        doi = {10.1109/cvpr.2016.90},
       year = {2016},
      month = {6},
  publisher = {{IEEE}},
}
>>> print(bl("10.23919/cinc53138.2021.9662801", align="left-middle"))
@inproceedings{Wen_2021,
  author    = {Hao Wen and Jingsu Kang},
  title     = {Hybrid Arrhythmia Detection on Varying-Dimensional Electrocardiography: Combining Deep Neural Networks and Clinical Rules},
  booktitle = {2021 Computing in Cardiology ({CinC})},
  doi       = {10.23919/cinc53138.2021.9662801},
  year      = {2021},
  month     = {9},
  publisher = {{IEEE}},
}

:point_right: Back to TOC

Command-line Usage

Click to expand!

After installation, one can use bib-lookup in the command line:

bib-lookup 10.1109/CVPR.2016.90 10.23919/cinc53138.2021.9662801 --ignore-fields url doi -i path/to/input.txt -o path/to/output.bib

:point_right: Back to TOC

Output (Append) to a .bib File

Click to expand!

Each time a bib item is successfully found, it will be cached. One can call the save function to write the cached bib items to a .bib file, in the append mode.

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> bl(["10.1109/CVPR.2016.90", "10.23919/cinc53138.2021.9662801", "DOI: 10.1142/S1005386718000305"]);
>>> len(bl)
3
>>> bl[0]
'10.1109/CVPR.2016.90'
>>> bl.save([0, 2], "path/to/some/file.bib")  # save bib item corr. to "10.1109/CVPR.2016.90" and "DOI: 10.1142/S1005386718000305"
>>> len(bl)
1
>>> bl.pop(0)  # remove the bib item corr. "10.23919/cinc53138.2021.9662801", equivalent to `bl.pop("10.23919/cinc53138.2021.9662801")`
>>> len(bl)
0

:point_right: Back to TOC

arXiv to DOI

Click to expand!

From 2022.2.17, new arXiv articles are automatically assigned DOIs (old ones in progress). If one prefers DOI citation to arXiv citation then

>>> from bib_lookup import BibLookup
>>> bl = BibLookup(arxiv2doi=True)  # the default for `arxiv2doi` is False
>>> print(bl("https://arxiv.org/abs/2204.04420"))
@misc{https://doi.org/10.48550/arxiv.2204.04420,
     author = {Hao, Wen and Jingsu, Kang},
      title = {Investigating Deep Learning Benchmarks for Electrocardiography Signal Processing},
        doi = {10.48550/ARXIV.2204.04420},
   keywords = {Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  publisher = {arXiv},
       year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

while with bl = BibLookup(), one would get

@article{hao2022_2204.04420v1,
   author = {Wen Hao and Kang Jingsu},
    title = {Investigating Deep Learning Benchmarks for Electrocardiography Signal Processing},
  journal = {arXiv preprint arXiv:2204.04420v1},
     year = {2022},
    month = {4}
}

:point_right: Back to TOC

Bib Items Checking

Click to expand!

One can use BibLookup to check the validity (required fields, duplicate labels, etc) of bib items in a Bib file. The following is an example with a Bib file with incorrect and duplicate bib items.

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> bl.check_bib_file("./test/invalid_items.bib")
Bib item "He_2016"
    starting from line 3 is not valid.
    Bib item of entry type "inproceedings" should have the following fields:
    ['author', 'title', 'booktitle', 'year']
Bib item "Wen_2018"
    starting from line 16 is not valid.
    Bib item of entry type "article" should have the following fields:
    ['author', 'title', 'journal', 'year']
Bib items "He_2016" starting from line 3
      and "He_2016" starting from line 45 is duplicate.
[3, 16, 45]

or from command line

bib-lookup -c ./test/invalid_items.bib
bib-lookup --ignore-fields url doi -i ./test/sample_input.txt -o ./tmp/a.bib -c true

:point_right: Back to TOC

Simplify a .bib File

Click to expand!

Sometimes one wants a clean bib without bib items that are not cited, then one can use the static method simplify_bib_file to generate a new .bib File that contains only the cited bib items from an old .bib File.

>>> from bib_lookup import BibLookup
>>> new_bib_file_path = BibLookup.simplify_bib_file("path/to/tex/source/file", "path/to/old/bib/file")
>>> # or use the following if one has multiple source files
>>> new_bib_file_path = BibLookup.simplify_bib_file(list_of_tex_source_files_or_folders, "path/to/old/bib/file")

:point_right: Back to TOC

TODO

Click to expand!
  1. (:heavy_check_mark:) add CLI support;
  2. use eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi for PubMed, as in [3];
  3. try using google scholar api described in [4] (unfortunately [4] is charged);
  4. use Flask to write a simple browser-based UI;
  5. (:heavy_check_mark:) check if the bib item is already existed in the output file, and skip saving it if so;
  6. since arXiv articles are now automatically assigned DOIs (ref. this blog), consider converting arXiv identifiers to DOI indentifiers, and requesting from DOI. Currently, the request results are different, at least the entry type is change from article to misc.

:point_right: Back to TOC

WARNING

Click to expand!

Many journals have specific requirements for the Bib entries, for example, the title and/or journal (and/or booktitle), etc. should be capitalized, which could not be done automatically since

  • some abbreviations in title should have characters all in the upper case, for example

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

  • some should have characters all in in the lower case,

mixup: Beyond Empirical Risk Minimization

  • and some others should have mixed cases,

KeMRE: Knowledge-enhanced Medical Relation Extraction for Chinese Medicine Instructions

This should be corrected by the user himself if necessary (which although is rare), and remember to enclose such fields with double curly braces.

For example, the lookup result for the AlexNet paper is

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> print(bl("https://doi.org/10.1145/3065386"))
@article{Krizhevsky_2017,
     author = {Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton},
      title = {{ImageNet} classification with deep convolutional neural networks},
    journal = {Communications of the {ACM}},
        doi = {10.1145/3065386},
       year = {2017},
      month = {5},
  publisher = {Association for Computing Machinery ({ACM})},
     volume = {60},
     number = {6},
      pages = {84--90}
}

This result (the title) should be adjusted to

@article{Krizhevsky_2017,
     author = {Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton},
      title = {{ImageNet Classification with Deep Convolutional Neural Networks}},
    journal = {Communications of the {ACM}},
        doi = {10.1145/3065386},
       year = {2017},
      month = {5},
  publisher = {Association for Computing Machinery ({ACM})},
     volume = {60},
     number = {6},
      pages = {84--90}
}

A more severe example that need manual correction is as follows

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> print(bl("10.1093/acprof:oso/9780195058239.001.0001"))
@book{Malmivuo_1995,
     author = {Jaakko Malmivuo and Robert Plonsey},
      title = {{BioelectromagnetismPrinciples} and Applications of Bioelectric and Biomagnetic Fields},
        doi = {10.1093/acprof:oso/9780195058239.001.0001},
       year = {1995},
      month = {10},
  publisher = {Oxford University Press}
}

Adjust it to

@book{Malmivuo_1995,
     author = {Jaakko Malmivuo and Robert Plonsey},
      title = {{Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields}},
        doi = {10.1093/acprof:oso/9780195058239.001.0001},
       year = {1995},
      month = {10},
  publisher = {Oxford University Press}
}

This shows that the data in the DOI database is NOT always correct.

:point_right: Back to TOC

Biblatex Cheetsheet

This file downloaded from [6] gives full knowledge about bib entries.

:point_right: Back to TOC

Citation

@misc{https://doi.org/10.5281/zenodo.6435017,
     author = {WEN, Hao},
      title = {bib\_lookup: A Useful Tool for Uooking Up Bib Entries},
        doi = {10.5281/ZENODO.6435017},
        url = {https://zenodo.org/record/6435017},
  publisher = {Zenodo},
       year = {2022},
  copyright = {MIT License}
}

The above citation can be get via

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> print(bl("DOI: 10.5281/zenodo.6435017"))

:point_right: Back to TOC

References

  1. https://github.com/davidagraf/doi2bib2
  2. https://arxiv.org/help/api
  3. https://github.com/mfcovington/pubmed-lookup/
  4. https://serpapi.com/google-scholar-cite-api
  5. https://www.bibtex.com/
  6. http://tug.ctan.org/info/biblatex-cheatsheet/biblatex-cheatsheet.pdf

:point_right: Back to TOC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bib_lookup-0.0.16.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bib_lookup-0.0.16-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file bib_lookup-0.0.16.tar.gz.

File metadata

  • Download URL: bib_lookup-0.0.16.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.3 importlib-metadata/4.5.0 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.9

File hashes

Hashes for bib_lookup-0.0.16.tar.gz
Algorithm Hash digest
SHA256 b11ae253b94b1725f82f49c0191911c679f1a25413218cb3111707e58bb29203
MD5 015af86222e36349fa19d97ca9786510
BLAKE2b-256 9e9afba677f5becc7cfc5dcec705e3c1e182fcd8bc1304b1fda1048c45e60458

See more details on using hashes here.

File details

Details for the file bib_lookup-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: bib_lookup-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.3 importlib-metadata/4.5.0 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.9

File hashes

Hashes for bib_lookup-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 faef4620f7bf6cce0a5c58a0c3ed3a3fc128ec0e43f676969c48d2dcfff9ded5
MD5 d0e13cb7f0006833c1ef73756d7b29dc
BLAKE2b-256 b9f3591246f88c89525d68e1051e2a92e8b4d60cc0a54fa61df16c9ac895adb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page