Automatically match Bibliographies against bnf.gallica.fr
Project description
pygallica-autobib
Automatically match Bibliographies against bnf.gallica.fr!
Overview
pygallica-autobib
will match your bibliographies against the French National
Library and download articles as pdfs if possible, optionally post-processing
them. Whilst it obviously cannot download articles which Gallica does not hold,
it strives to achieve a 100% match rate. If you find an article it does not
match, please report a bug.
Features:
- Input in RIS or Bibtex format
- Output report generated with a jinja template (templates for org-mode, html and plaintext supplied)
Online Demo
There is an online demo with a very basic interface. It is not currently production ready. Note that Gallica use some very aggressive rate limiting, so if you hit it with the same requests too often it will simply go down.
Installation
You need python >= 3.9 (but if anyone needs to use this with an older python, open an issue and I will refactor the few incompatible statements). Then install as usual:
pipx install gallica-autobib # prefered, if you have pipx, or
python -m pip install gallica-autobib
Standalone Usage
gallica-autobib my-bibliography.bib pdfs # match my-bibliography and put files in ./pdfs
gallica-autobib --help
As a library
from pathlib import Path
from gallica_autobib.models import Article
from gallica_autobib.query import Query, GallicaResource
target = Article(
journaltitle="La Vie spirituelle",
author="M.-D. Chenu",
pages=list(range(547, 552)),
volume=7,
year=1923,
title="Ascèse et péché originel",
)
query = Query(target)
candidate = Query.run().candidate # get candidate journal
gallica_resource = GallicaResource(candidate, source)
ark = gallica_resource.ark # match candidate article
gallica_resource.download_pdf(Path("article.pdf"))
or if you just want to do what the cli does:
from pathlib import Path
from gallica_resource.pipeline import BibtexParser
parser = BibtexParser(Path("outdir"))
with Path("articles.bib").open() as f:
parser.read(f)
parser.run()
for result in parser.results:
print(result)
for more advanced usage see the documentation and the test suite.
Developing
# ensure you have Poetry installed
pip install --user poetry
# install all dependencies (including dev)
poetry install
When your feature is ready, open a PR.
Testing
We use pytest and mypy. You may want to focus on getting unit tests passing first:
poetry run pytest --cov=gallica_autobib --cov-report html --cov-branch tests
If you have started a shell with poetry shell
you can drop the poetry run
.
When unittests are passing you can run the whole suite with:
poetry run scripts/test.sh
Note that tests will only pass fully if /tmp/
exists and is writeable, and if
poppler
and imagemagick
are installed. This is due to the use of
pdf-diff-visually
and the rather hackish template tests.
You may wish to check your formatting first with
poetry run scripts/format.sh
Alternatively, just open a premature PR to run the tests in CI. Note that this is rather slow.
Plausibly Askable Questions
Why don't you also check xyz.com?
Because I don't know about it. Open an issue and I'll look into it.
Why don't you use library xyz for image processing?
Probably because I don't know about it. This is a quick tool written to help me research. Submit a PR and I'll happily update it.
Why is the code so verbose?
It is rather object-oriented. It might be rather over engineered. It was written in a hurry and the design evolved as it went along. On the other hand, it should be easier to extend.
Why not just extend pygallica?
I mean to submit the SRU stuff as an extension to pygallica
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gallica-autobib-0.1.4.tar.gz
.
File metadata
- Download URL: gallica-autobib-0.1.4.tar.gz
- Upload date:
- Size: 30.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.6 CPython/3.9.5 Linux/5.12.1-arch1-1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc9149971d541c826c82daf07e34eb95922e68d5a1974bfd911139e777173fa7 |
|
MD5 | 237411c8e87ac14a6ebe52358d1b4bd8 |
|
BLAKE2b-256 | 1f5f7cfb10971b71204ef1be3588e71f8eba28b7ec4f8fe8b118623e72f506a2 |
File details
Details for the file gallica_autobib-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: gallica_autobib-0.1.4-py3-none-any.whl
- Upload date:
- Size: 35.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.6 CPython/3.9.5 Linux/5.12.1-arch1-1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 376360bb17de341ab6689fa06978abf0f1ca03bd21c283cb8b2f068b8dfd7788 |
|
MD5 | 9b5c698177587f96ec1fca92b3081cba |
|
BLAKE2b-256 | e4673588c8f7b38fbf19b74cbacb7cf96e3419d8755e90e72b77c19dad82b32b |