Module to complete bibtex files by polling online databases

These details have not been verified by PyPI

Project links

Homepage

Project description

Bibtex Autocomplete

bibtexautocomplete or btac is a python package to autocomplete BibTeX bibliographies. It is inspired and expanding on the solution provided by thando in this TeX stack exchange post.

It attempts to complete a BibTeX file by querying the following domains:

Big thanks to all of them for allowing open, easy and well-documented access to their databases.

Contents:

Demo
Quick overview
Installation
- Dependencies
Usage
Command line arguments

Demo

Quick overview

How does it find matches?

btac queries the websites using the entry DOI if known otherwise the title. So entries that don't have one of those two fields will not be completed.

Titles should be the full title, they are compared excluding case and punctuation, but titles with missing words will not match.
If one or more authors are present, entries with no common authors will not match. Authors are compared using lower case last names only. Be sure to use one of the correct BibTeX formats for the author field:
```
author = {First Last and Last, First and First von Last}
```
(see https://www.bibtex.com/f/author-field/ for full details)

Disclaimers

There is no guarantee that the script will find matches for your entries, or that the websites will have any data to add to your entries, (or even that the website data is correct, but that's not for me to say...)
The script is designed to minimize the chance of false positives - that is adding data from another similar-ish entry to your entry. If you find any such false positive please report them using the issue tracker.

How are entries completed?

Once responses from all websites have been found, the script will add fields from website with the following priority :

crossref > arxiv > semantic scholar > dblp > researchr > unpaywall.

So if both crossref's and dblp's response contain a publisher, the one from crossref will be used. This order can be changed using the -q --only-query option (see query filtering).

The script will not overwrite any user given non-empty fields, unless the -f/--force-overwrite flag is given. If you want to check what fields are added, you can use -v/--verbose to have them printed to stdout (with source information), or -p/--prefix to have the new fields be prefixed with BTAC in the output file.

The script checks that the DOIs or URLs found correspond (or redirect to) a valid webpage before adding them to an entry.

Installation

Can be installed with pip :

pip install bibtexautocomplete

You should now be able to run the script using either command:

btac --version
python3 -m bibtexautocomplete --version

Dependencies

This package has two dependencies (automatically installed by pip) :

bibtexparser
alive_progress (>= 3.0.0) for the fancy progress bar

Usage

The command line tool can be used as follows:

btac [--flags] <input_files>

Examples :

btac my/db.bib : reads from ./my/db.bib, writes to ./my/db.btac.bib
btac -i db.bib : reads from db.bib and overwrites it (inplace flag)
btac db1.bib db2.bib db3.bib -o out1.bib -o out2.bib reads db1.bib, db2.bib and db3.bib, and write their outputs to out1.bib, out2.bib and db3.btac.bib respectively.
btac folder : reads from all files ending with .bib in folder. Excludes .btac.bib files unless they are the only .bib files present. Writes to folder/file.btac.bib unless inplace flag is set.
btac with no inputs is same as btac .
btac -v ... verbose mode, pretty prints all new fields when done

Note: the parser doesn't preserve format information, so this script will reformat your files. Some formatting options (see below) are provided.

Slow responses: I found that crossref responds significantly slower than the other websites. It often takes longer than the 20s timeout.

You can increase timeout with btac ... -t 60 (60s) or btac ... -t -1 (no timeout)
You can disable crossref queries with btac ... -Q crossref

Command line arguments

-o --output <file.bib>

Write output to given file. Can be used multiple times when also giving multiple inputs. Maps inputs to outputs in order. If there are extra inputs, uses default name (old_name.btac.bib). Ignored in inplace (-i) mode.

Query filtering

-q --only-query <site> or -Q --dont-query <site>

Restrict which websites to query from. <site> must be one of: crossref, arxiv, s2, dblp, researchr, unpaywall. These arguments can be used multiple times, for example to only query crossref and dblp use -q crossref -q dblp or -Q researchr -Q unpaywall -Q arxiv -Q s2

Additionally, you can use -q to change the completion priority. So -q unpaywall -q researchr -q dblp -q s2 -q arxiv -q crossref reverses the default order.
-e --only-entry <id> or -E --exclude-entry <id>

Restrict which entries should be autocomplete. <id> is the entry ID used in your BibTeX file (e.g. @inproceedings{<id> ... }). These arguments can also be used multiple times to select only/exclude multiple entries
-c --only-complete <field> or -C --dont-complete <field>

Restrict which fields you wish to autocomplete. Field is a BibTeX field (e.g. author, doi,...). So if you only wish to add missing DOIs use -c doi.
-m --mark and -M --ignore-mark

This is useful to avoid repeated queries if you want to run btac many times on the same (large) file.

By default, btac ignores any entry with a BTACqueried field. --ignore-mark overrides this behavior.

When --mark is set, btac adds a BTACqueried = {yyyy-mm-dd} field to each entry it queries.

Output formatting

Unfortunately bibtexparser doesn't preserve format information, so this script will reformat your BibTeX file. Here are a few options you can use to control the output format:

--fa --align-values pad field names to align all values

@article{Example,
  author = {Someone},
  doi    = {10.xxxx/yyyyy},
}

--fc --comma-first use comma first syntax

@article{Example
  , author = {Someone}
  , doi = {10.xxxx/yyyyy}
  ,
}

--fl --no-trailing-comma don't add the last trailing comma
--fi --indent <space> space used for indentation, default is a tab. Can be specified as a number (number of spaces) or a string with spaces and _, t, and n characters to mark space, tabs and newlines.

Optional flags

-i --inplace Modify input files inplace, ignores any specified output files
-p --prefix Write new fields with a prefix. The script will add BTACtitle = ... instead of title = ... in the bib file. This can be combined with -f to safely show info for already present fields.

Note that this can overwrite existing fields starting with BTACxxxx, even without the -f option.
-f --force-overwrite Overwrite already present fields. The default is to overwrite a field if it is empty or absent
-t --timeout <float> set timeout on request in seconds, default: 20.0 s, increase this if you are getting a lot of timeouts. Set it to -1 for no timeout.
-S --ignore-ssl bypass SSL verification. Use this if you encounter the error:
```
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)
```
Another (better) fix for this is to run pip install --upgrade certifi to update python's certificates.

-d --dump-data <file.json> writes matching entries to the given JSON files.

This allows to see duplicate fields from different sources that are otherwise overwritten when merged into a single entry.

The JSON file will have the following structure:

[
  {
    "entry": "<entry_id>",
    "new-fields": 8,
    "crossref": {
      "query-url": "https://api.crossref.org/...",
      "query-response-time": 0.556,
      "query-response-status": 200,
      "author" : "Lastname, Firstnames and Lastname, Firstnames ...",
      "title" : "super interesting article!",
      "..." : "..."
    },
    "arxiv": null, // null when no match found
    "dblp": ...,
    "researchr": ...,
    "unpaywall": ...
  },
  ...
]

-O --no-output don't write any output files (except the one specified by --dump-data)
-v --verbose verbose mode shows more info. It details entries as they are being processed and shows a summary of new fields and their source at the end. Using it more than once prints debug info (up to four times).
-s --silent hide info and progress bar. Keep showing warnings and errors. Use twice to also hide warnings, thrice to also hide errors and four times to also hide critical errors, effectively killing all output.
-n --no-color don't use ANSI codes to color and stylize output
--version show version number
-h --help show help

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.4.0

Oct 27, 2024

1.3.3

Aug 7, 2024

1.3.2

Apr 12, 2024

1.3.1

Feb 20, 2024

1.3.0

Feb 5, 2024

1.2.2

Nov 26, 2023

1.2.1

May 12, 2023

This version

1.2.0

Apr 14, 2023

1.1.8

Feb 27, 2023

1.1.7

Feb 27, 2023

1.1.6

Jan 6, 2023

1.1.5

Sep 20, 2022

1.1.4

Sep 16, 2022

1.1.3

Aug 13, 2022

1.1.2

Jun 11, 2022

1.1.1

May 27, 2022

1.1.0

May 21, 2022

1.0.5

Apr 13, 2022

1.0.4 yanked

Mar 31, 2022

Reason this release was yanked:

Doesn't work with python < 3.9

1.0.3 yanked

Mar 30, 2022

Reason this release was yanked:

Doesn't work with python < 3.9

1.0.2 yanked

Mar 21, 2022

Reason this release was yanked:

Doesn't work with python < 3.9

0.0.0

Feb 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bibtexautocomplete-1.2.0.tar.gz (45.7 kB view details)

Uploaded Apr 14, 2023 Source

Built Distribution

bibtexautocomplete-1.2.0-py3-none-any.whl (54.6 kB view details)

Uploaded Apr 14, 2023 Python 3

File details

Details for the file bibtexautocomplete-1.2.0.tar.gz.

File metadata

Download URL: bibtexautocomplete-1.2.0.tar.gz
Upload date: Apr 14, 2023
Size: 45.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.7

File hashes

Hashes for bibtexautocomplete-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`13244bc9d83d9bcc694c1b9753dd1e2461033b8dae040b18c7b943652f9fc01d`
MD5	`2ae8ece0b7fed3daddfaed96984df79d`
BLAKE2b-256	`c86d81fdfe5aef92ed516998ba7d19029e75071f5dd02f73f7cd945a1d569e0a`

See more details on using hashes here.

File details

Details for the file bibtexautocomplete-1.2.0-py3-none-any.whl.

File metadata

Download URL: bibtexautocomplete-1.2.0-py3-none-any.whl
Upload date: Apr 14, 2023
Size: 54.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.7

File hashes

Hashes for bibtexautocomplete-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d7b2c36668dbc2dde469350145f8d01d1bcb6a54074ff6af3fb72668530b3552`
MD5	`9cfebcc92ab83f778bafba52db15fc11`
BLAKE2b-256	`1eaf443fd749aaa1dda73affe0d917cd47a0104a816a97061f19bbd083d2481f`