Skip to main content

A useful tool for looking up Bib entries using DOI, or pubmed ID (or URL), or arXiv ID (or URL).

Project description

bib_lookup

pytest codecov PyPI DOI downloads license GitHub Release Date - Published_At GitHub commits since latest release (by SemVer including pre-releases) Streamlit App

A useful tool for looking up Bib entries using DOI, PubMed ID (URL), or arXiv ID (URL).

:rocket: NEW :rocket: Streamlit support! See here for an app deployed on Streamlit Community Cloud.

It is an updated version of https://github.com/wenh06/utils/blob/master/utils_universal/utils_bib.py

NOTE that you should have internet connection to use bib_lookup.

Installation

Run

python -m pip install bib-lookup

or install the latest version in GitHub using

python -m pip install git+https://github.com/DeepPSP/bib_lookup.git

or git clone this repository and install locally via

cd bib_lookup
python -m pip install .

:point_right: Back to TOC

Dependencies

  • requests
  • feedparser
  • pandas

:point_right: Back to TOC

Basic Usage Examples

Click to expand!
>>> from bib_lookup import BibLookup
>>> bl = BibLookup(align="middle")
>>> print(bl("1707.07183"))
@article{wen2017_1707.07183v2,
   author = {Hao Wen and Chunhui Liu},
    title = {Counting Multiplicities in a Hypersurface over a Number Field},
  journal = {arXiv preprint arXiv:1707.07183v2},
     year = {2017},
    month = {7}
}
>>> print(bl("10.1109/CVPR.2016.90"))
@inproceedings{He_2016,
     author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
      title = {Deep Residual Learning for Image Recognition},
  booktitle = {2016 {IEEE} Conference on Computer Vision and Pattern Recognition ({CVPR})},
        doi = {10.1109/cvpr.2016.90},
       year = {2016},
      month = {6},
  publisher = {{IEEE}}
}
>>> print(bl("10.23919/cinc53138.2021.9662801", align="left-middle"))
@inproceedings{Wen_2021,
  author    = {Hao Wen and Jingsu Kang},
  title     = {Hybrid Arrhythmia Detection on Varying-Dimensional Electrocardiography: Combining Deep Neural Networks and Clinical Rules},
  booktitle = {2021 Computing in Cardiology ({CinC})},
  doi       = {10.23919/cinc53138.2021.9662801},
  publisher = {{IEEE}},
  year      = {2021},
  month     = {9},
  pages     = {14}
}

:point_right: Back to TOC

Command-line Usage

Click to expand!

After installation, one can use bib-lookup in the command line:

bib-lookup 10.1109/CVPR.2016.90 10.23919/cinc53138.2021.9662801 --ignore-fields url doi -i path/to/input.txt -o path/to/output.bib

View current version:

bib-lookup --version

View current configuration:

bib-lookup --config show

Remove current configuration:

bib-lookup --config reset

Set specific configuration:

bib-lookup --config "timeout=2.0;print_result=true;ignore_fields=['url','pdf']"

or from a json file or yaml file:

bib-lookup --config /path/to/config.json
bib-lookup --config /path/to/config.yaml

Note that unrecognized fields will be ignored and warning messages will be printed. The following table lists all the available configuration options:

Option Type Default Description
align str middle Alignment of the bib item.
email str None Email address to be used in the request.
ignore_fields list ['url', 'pdf'] Fields to be ignored in the output.
ignore_errors bool False Whether to ignore errors.
timeout float 6.0 Timeout in seconds for each request.
arxiv2doi bool True Whether to convert arXiv ID to DOI.
format str bibtex Output format.
style str apa Citation style. Valid only when format is text.
verbose int 0 Verbosity level.
print_result bool False Whether to print the result.
ordering list ['title', 'author', 'journal', 'booktitle'] Ordering of the fields.

:point_right: Back to TOC

Output (Append) to a .bib File

Click to expand!

Each time a bib item is successfully found, it will be cached. One can call the save function to write the cached bib items to a .bib file, in the append mode.

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> bl(["10.1109/CVPR.2016.90", "10.23919/cinc53138.2021.9662801", "DOI: 10.1142/S1005386718000305"]);
>>> len(bl)
3
>>> bl[0]
'10.1109/CVPR.2016.90'
>>> bl.save([0, 2], "path/to/some/file.bib")  # save bib item corr. to "10.1109/CVPR.2016.90" and "DOI: 10.1142/S1005386718000305"
>>> len(bl)
1
>>> bl.pop(0)  # remove the bib item corr. "10.23919/cinc53138.2021.9662801", equivalent to `bl.pop("10.23919/cinc53138.2021.9662801")`
>>> len(bl)
0

:point_right: Back to TOC

arXiv to DOI

Click to expand!

From 2022.2.17, new arXiv articles are automatically assigned DOIs (old ones in progress). If one prefers DOI citation to arXiv citation then

>>> from bib_lookup import BibLookup
>>> bl = BibLookup(arxiv2doi=True)  # the default for `arxiv2doi` is False
>>> print(bl("https://arxiv.org/abs/2204.04420"))
@misc{https://doi.org/10.48550/arxiv.2204.04420,
     author = {Hao, Wen and Jingsu, Kang},
      title = {Investigating Deep Learning Benchmarks for Electrocardiography Signal Processing},
        doi = {10.48550/ARXIV.2204.04420},
   keywords = {Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  publisher = {arXiv},
       year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

while with bl = BibLookup(), one would get

@article{hao2022_2204.04420v1,
   author = {Wen Hao and Kang Jingsu},
    title = {Investigating Deep Learning Benchmarks for Electrocardiography Signal Processing},
  journal = {arXiv preprint arXiv:2204.04420v1},
     year = {2022},
    month = {4}
}

:point_right: Back to TOC

Bib Items Checking

Click to expand!

One can use BibLookup to check the validity (required fields, duplicate labels, etc) of bib items in a Bib file. The following is an example with a Bib file with incorrect and duplicate bib items.

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> bl.check_bib_file("./test/invalid_items.bib")
Bib item "He_2016"
    starting from line 3 is not valid.
    Bib item of entry type "inproceedings" should have the following fields:
    ['author', 'title', 'booktitle', 'year']
Bib item "Wen_2018"
    starting from line 16 is not valid.
    Bib item of entry type "article" should have the following fields:
    ['author', 'title', 'journal', 'year']
Bib items "He_2016" starting from line 3
      and "He_2016" starting from line 45 is duplicate.
[3, 16, 45]

or from command line

bib-lookup -c ./test/invalid_items.bib
bib-lookup --ignore-fields url doi -i ./test/sample_input.txt -o ./tmp/a.bib -c true

:point_right: Back to TOC

Simplify a .bib File

Click to expand!

Sometimes one wants a clean bib without bib items that are not cited, then one can use the static method simplify_bib_file to generate a new .bib File that contains only the cited bib items from an old .bib File.

>>> from bib_lookup import BibLookup
>>> new_bib_file_path = BibLookup.simplify_bib_file("path/to/tex/source/file", "path/to/old/bib/file")
>>> # or use the following if one has multiple source files
>>> new_bib_file_path = BibLookup.simplify_bib_file(list_of_tex_source_files_or_folders, "path/to/old/bib/file")

:point_right: Back to TOC

CitationMixin class

Click to expand!

One can inherit the CitationMixin class to have the method get_citation for any class, in which case one only needs to provide a self.doi. For example:

from bib_lookup import CitationMixin

class SomeClass(CitationMixin):

    doi = "10.23919/cinc53138.2021.9662801"  # can also be a list

TODO

Click to expand!
  1. :heavy_check_mark: add CLI support;
  2. :x: use eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi for PubMed, as in [3];
  3. :x: try using google scholar api described in [4] (unfortunately [4] is charged);
  4. :heavy_check_mark: use Flask to write a simple browser-based UI;
  5. :heavy_check_mark: check if the bib item is already existed in the output file, and skip saving it if so;
  6. :heavy_check_mark: since arXiv articles are now automatically assigned DOIs (ref. this blog), consider converting arXiv identifiers to DOI indentifiers, and requesting from DOI. Currently, the request results are different, at least the entry type is change from article to misc;
  7. make __call__ method asynchronised using asyncio and aiohttp or httpx.

:point_right: Back to TOC

WARNING

Click to expand!

Many journals have specific requirements for the Bib entries, for example, the title and/or journal (and/or booktitle), etc. should be capitalized, which could not be done automatically since

  • some abbreviations in title should have characters all in the upper case, for example

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

  • some should have characters all in in the lower case,

mixup: Beyond Empirical Risk Minimization

  • and some others should have mixed cases,

KeMRE: Knowledge-enhanced Medical Relation Extraction for Chinese Medicine Instructions

This should be corrected by the user himself if necessary (which although is rare), and remember to enclose such fields with double curly braces.

For example, the lookup result for the AlexNet paper is

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> print(bl("https://doi.org/10.1145/3065386"))
@article{Krizhevsky_2017,
     author = {Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton},
      title = {{ImageNet} classification with deep convolutional neural networks},
    journal = {Communications of the {ACM}},
        doi = {10.1145/3065386},
       year = {2017},
      month = {5},
  publisher = {Association for Computing Machinery ({ACM})},
     volume = {60},
     number = {6},
      pages = {84--90}
}

This result (the title) should be adjusted to

@article{Krizhevsky_2017,
     author = {Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton},
      title = {{ImageNet Classification with Deep Convolutional Neural Networks}},
    journal = {Communications of the {ACM}},
        doi = {10.1145/3065386},
       year = {2017},
      month = {5},
  publisher = {Association for Computing Machinery ({ACM})},
     volume = {60},
     number = {6},
      pages = {84--90}
}

A more severe example that need manual correction is as follows

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> print(bl("10.1093/acprof:oso/9780195058239.001.0001"))
@book{Malmivuo_1995,
     author = {Jaakko Malmivuo and Robert Plonsey},
      title = {{BioelectromagnetismPrinciples} and Applications of Bioelectric and Biomagnetic Fields},
        doi = {10.1093/acprof:oso/9780195058239.001.0001},
       year = {1995},
      month = {10},
  publisher = {Oxford University Press}
}

Adjust it to

@book{Malmivuo_1995,
     author = {Jaakko Malmivuo and Robert Plonsey},
      title = {{Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields}},
        doi = {10.1093/acprof:oso/9780195058239.001.0001},
       year = {1995},
      month = {10},
  publisher = {Oxford University Press}
}

This shows that the data in the DOI database is NOT always correct.

:point_right: Back to TOC

Biblatex Cheetsheet

This file downloaded from [6] gives full knowledge about bib entries.

:point_right: Back to TOC

Citation

@misc{https://doi.org/10.5281/zenodo.6435017,
     author = {WEN, Hao},
      title = {bib\_lookup: A Useful Tool for Uooking Up Bib Entries},
        doi = {10.5281/ZENODO.6435017},
        url = {https://zenodo.org/record/6435017},
  publisher = {Zenodo},
       year = {2022},
  copyright = {MIT License}
}

The above citation can be get via

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> print(bl("DOI: 10.5281/zenodo.6435017"))

:point_right: Back to TOC

References

  1. https://github.com/davidagraf/doi2bib2
  2. https://arxiv.org/help/api
  3. https://github.com/mfcovington/pubmed-lookup/
  4. https://serpapi.com/google-scholar-cite-api
  5. https://www.bibtex.com/
  6. http://tug.ctan.org/info/biblatex-cheatsheet/biblatex-cheatsheet.pdf

:point_right: Back to TOC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bib_lookup-0.1.4.tar.gz (48.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bib_lookup-0.1.4-py3-none-any.whl (55.4 kB view details)

Uploaded Python 3

File details

Details for the file bib_lookup-0.1.4.tar.gz.

File metadata

  • Download URL: bib_lookup-0.1.4.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bib_lookup-0.1.4.tar.gz
Algorithm Hash digest
SHA256 599359403837eaf23e1d8c49f4a539ada345e57b861014964d5d15ca11622b95
MD5 15be1d17cc0858660585daec61d8f5b1
BLAKE2b-256 07ecdce78c1fc2e060ec51e7ba590ff2664a1bfef4f5a5276bc30f909f5ff7d9

See more details on using hashes here.

File details

Details for the file bib_lookup-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: bib_lookup-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 55.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bib_lookup-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ceac9e0b55ca731c50304cc3410d391aba3b87e5ff117ce5196685598068884e
MD5 4768a32ff6c8668ff5e55c99917de35a
BLAKE2b-256 1d66bea9d593c36406ae9d9f8d1e197b64d37f4e3012edba82194319a5a402c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page