Skip to main content

Automatically match Bibliographies against bnf.gallica.fr

Project description

pygallica-autobib

Automatically match Bibliographies against bnf.gallica.fr!

Test Dependabot Enabled Coverage Package version Python Versions

Overview

pygallica-autobib will match your bibliographies against the French National Library and download articles as pdfs if possible, optionally post-processing them. Whilst it obviously cannot download articles which Gallica does not hold, it strives to achieve a 100% match rate. If you find an article it does not match, please report a bug.

Features:

  • Input in RIS or Bibtex format
  • Output report generated with a jinja template (templates for org-mode, html and plaintext supplied)

Online Demo

There is an online demo with a very basic interface. It is not currently production ready. Note that Gallica use some very aggressive rate limiting, so if you hit it with the same requests too often it will simply go down.

Installation

You need python >= 3.9 (but if anyone needs to use this with an older python, open an issue and I will refactor the few incompatible statements). Then install as usual:

pipx install gallica-autobib # prefered, if you have pipx, or
python -m pip install gallica-autobib

Standalone Usage

gallica-autobib my-bibliography.bib pdfs # match my-bibliography and put files in ./pdfs
gallica-autobib --help

As a library

from pathlib import Path
from gallica_autobib.models import Article
from gallica_autobib.query import Query, GallicaResource

target = Article(
    journaltitle="La Vie spirituelle",
    author="M.-D. Chenu",
    pages=list(range(547, 552)),
    volume=7,
    year=1923,
    title="Ascèse et péché originel",
)
query = Query(target)
candidate = Query.run().candidate # get candidate journal
gallica_resource = GallicaResource(candidate, source)
ark = gallica_resource.ark # match candidate article
gallica_resource.download_pdf(Path("article.pdf"))

or if you just want to do what the cli does:

from pathlib import Path
from gallica_resource.pipeline import BibtexParser

parser = BibtexParser(Path("outdir"))

with Path("articles.bib").open() as f:
    parser.read(f)

parser.run()
for result in parser.results:
    print(result)

for more advanced usage see the documentation and the test suite.

Developing

# ensure you have Poetry installed
pip install --user poetry

# install all dependencies (including dev)
poetry install

When your feature is ready, open a PR.

Testing

We use pytest and mypy. You may want to focus on getting unit tests passing first:

poetry run pytest --cov=gallica_autobib --cov-report html --cov-branch tests

If you have started a shell with poetry shell you can drop the poetry run.

When unittests are passing you can run the whole suite with:

poetry run scripts/test.sh

Note that tests will only pass fully if /tmp/ exists and is writeable, and if poppler and imagemagick are installed. This is due to the use of pdf-diff-visually and the rather hackish template tests.

You may wish to check your formatting first with

poetry run scripts/format.sh

Alternatively, just open a premature PR to run the tests in CI. Note that this is rather slow.

Plausibly Askable Questions

Why don't you also check xyz.com?

Because I don't know about it. Open an issue and I'll look into it.

Why don't you use library xyz for image processing?

Probably because I don't know about it. This is a quick tool written to help me research. Submit a PR and I'll happily update it.

Why is the code so verbose?

It is rather object-oriented. It might be rather over engineered. It was written in a hurry and the design evolved as it went along. On the other hand, it should be easier to extend.

Why not just extend pygallica?

I mean to submit the SRU stuff as an extension to pygallica

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gallica-autobib-0.1.4.tar.gz (30.7 kB view details)

Uploaded Source

Built Distribution

gallica_autobib-0.1.4-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file gallica-autobib-0.1.4.tar.gz.

File metadata

  • Download URL: gallica-autobib-0.1.4.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.5 Linux/5.12.1-arch1-1

File hashes

Hashes for gallica-autobib-0.1.4.tar.gz
Algorithm Hash digest
SHA256 bc9149971d541c826c82daf07e34eb95922e68d5a1974bfd911139e777173fa7
MD5 237411c8e87ac14a6ebe52358d1b4bd8
BLAKE2b-256 1f5f7cfb10971b71204ef1be3588e71f8eba28b7ec4f8fe8b118623e72f506a2

See more details on using hashes here.

File details

Details for the file gallica_autobib-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: gallica_autobib-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 35.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.5 Linux/5.12.1-arch1-1

File hashes

Hashes for gallica_autobib-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 376360bb17de341ab6689fa06978abf0f1ca03bd21c283cb8b2f068b8dfd7788
MD5 9b5c698177587f96ec1fca92b3081cba
BLAKE2b-256 e4673588c8f7b38fbf19b74cbacb7cf96e3419d8755e90e72b77c19dad82b32b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page