Skip to main content

A useful tool for looking up Bib entries using DOI, or pubmed ID (or URL), or arXiv ID (or URL).

Project description

bib_lookup

pytest PyPI

A useful tool for looking up Bib entries using DOI, or pubmed ID (or URL), or arXiv ID (or URL).

It is an updated version of https://github.com/wenh06/utils/blob/master/utils_universal/utils_bib.py

NOTE that you should have internet connection to use bib_lookup.

Installation

Run

python -m pip install bib-lookup

or install the latest version in GitHub using

python -m pip install git+https://github.com/DeepPSP/bib_lookup.git

or git clone this repository and install locally via

cd bib_lookup
python -m pip install .

Requirements

  • requests
  • feedparser
  • pandas

Usage Examples

>>> from bib_lookup import BibLookup
>>> bl = BibLookup(align="middle")
>>> res = bl("1707.07183")
@article{wen2017_1707.07183v2,
   author = {Hao Wen and Chunhui Liu},
    title = {Counting Multiplicities in a Hypersurface over a Number Field},
  journal = {arXiv preprint arXiv:1707.07183v2},
     year = {2017},
    month = {7},
}
>>> bl("10.1109/CVPR.2016.90")
@inproceedings{He_2016,
     author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
      title = {Deep Residual Learning for Image Recognition},
  booktitle = {2016 {IEEE} Conference on Computer Vision and Pattern Recognition ({CVPR})},
        doi = {10.1109/cvpr.2016.90},
       year = {2016},
      month = {6},
  publisher = {{IEEE}},
}
>>> bl("10.23919/cinc53138.2021.9662801", align="left-middle")
@inproceedings{Wen_2021,
  author    = {Hao Wen and Jingsu Kang},
  title     = {Hybrid Arrhythmia Detection on Varying-Dimensional Electrocardiography: Combining Deep Neural Networks and Clinical Rules},
  booktitle = {2021 Computing in Cardiology ({CinC})},
  doi       = {10.23919/cinc53138.2021.9662801},
  year      = {2021},
  month     = {9},
  publisher = {{IEEE}},
}

Command-line Usage

After installation, one can use bib-lookup in the command line:

bib-lookup 10.1109/CVPR.2016.90 10.23919/cinc53138.2021.9662801 --ignore-fields url doi -i path/to/input.txt -o path/to/output.bib

Output (Append) to a .bib File

Each time a bib item is successfully found, it will be cached. One can call the save function to write the cached bib items to a .bib file, in the append mode.

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> bl(["10.1109/CVPR.2016.90", "10.23919/cinc53138.2021.9662801", "DOI: 10.1142/S1005386718000305"]);
>>> len(bl)
3
>>> bl[0]
'10.1109/CVPR.2016.90'
>>> bl.save([0, 2], "path/to/some/file.bib")  # save bib item corr. to "10.1109/CVPR.2016.90" and "DOI: 10.1142/S1005386718000305"
>>> len(bl)
1
>>> bl.pop(0)  # remove the bib item corr. "10.23919/cinc53138.2021.9662801", equivalent to `bl.pop("10.23919/cinc53138.2021.9662801")`
>>> len(bl)
0

Bib Items Checking

One can use BibLookup to check the validity (required fields, duplicate labels) of bib items in a Bib file

>>> from bib_lookup import BibLookup
>>> bl = BibLookup()
>>> bl.check_bib_file("./test/invalid_items.bib")
Bib item "He_2016"
    starting from line 3 is not valid.
    Bib item of entry type "inproceedings" should have the following fields:
    ['author', 'title', 'booktitle', 'year']
Bib item "Wen_2018"
    starting from line 16 is not valid.
    Bib item of entry type "article" should have the following fields:
    ['author', 'title', 'journal', 'year']
Bib items "He_2016" starting from line 3
      and "He_2016" starting from line 45 is duplicate.
[3, 16, 45]

or from command line

bib-lookup -c ./test/invalid_items.bib
bib-lookup --ignore-fields url doi -i ./test/sample_input.txt -o ./tmp/a.bib -c true

TODO

  1. (:heavy_check_mark:) add CLI support;
  2. use eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi for PubMed, as in [3];
  3. try using google scholar api described in [4] (unfortunately [4] is charged);
  4. use Flask to write a simple browser-based UI;
  5. (:heavy_check_mark:) check if the bib item is already existed in the output file, and skip saving it if so;

WARNING

Many journals have specific requirements for the Bib entries, for example, the title and/or journal (and/or booktitle), etc. should be capitalized, which could not be done automatically since

  • some abbreviations in title should have characters all in the upper case, for example

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

  • some should have characters all in in the lower case,

mixup: Beyond Empirical Risk Minimization

  • and some others should have mixed cases,

KeMRE: Knowledge-enhanced Medical Relation Extraction for Chinese Medicine Instructions

This should be corrected by the user himself if necessary (which although is rare), and remember to enclose such fields with double curly braces.

Biblatex Cheetsheet

This file downloaded from [6] gives full knowledge about bib entries.

References

  1. https://github.com/davidagraf/doi2bib2
  2. https://arxiv.org/help/api
  3. https://github.com/mfcovington/pubmed-lookup/
  4. https://serpapi.com/google-scholar-cite-api
  5. https://www.bibtex.com/
  6. http://tug.ctan.org/info/biblatex-cheatsheet/biblatex-cheatsheet.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bib_lookup-0.0.5.tar.gz (19.6 kB view hashes)

Uploaded Source

Built Distribution

bib_lookup-0.0.5-py3-none-any.whl (20.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page