Skip to main content

Automatically match Bibliographies against bnf.gallica.fr

Project description

pygallica-autobib

Automatically match Bibliographies against bnf.gallica.fr!

Test Publish Dependabot Enabled Coverage Package version Python Versions

Overview

pygallica-autobib will match your bibliographies against the French National Library and download articles as pdfs if possible, optionally post-processing them. Whilst it obviously cannot download articles which Gallica does not hold, it strives to achieve a 100% match rate. If you find an article it does not match, please report a bug.

Features:

  • Input in RIS or Bibtex format
  • Output report generated with a jinja template (templates for org-mode, html and plaintext supplied)

Online Demo

There is an online demo with a very basic interface, allowing

Installation

You need python >= 3.9 (but if anyone needs to use this with an older python, open an issue and I will refactor the few incompatible statements). Then install as usual:

pipx install gallica-autobib # prefered, if you have pipx, or
python -m pip install gallica-autobib

Standalone Usage

gallica-autobib my-bibliography.bib pdfs # match my-bibliography and put files in ./pdfs
gallica-autobib --help

As a library

from pathlib import Path
from gallica_autobib.models import Article
from gallica_autobib.query import Query, GallicaResource

target = Article(
    journaltitle="La Vie spirituelle",
    author="M.-D. Chenu",
    pages=list(range(547, 552)),
    volume=7,
    year=1923,
    title="Ascèse et péché originel",
)
query = Query(target)
candidate = Query.run().candidate # get candidate journal
gallica_resource = GallicaResource(candidate, source)
ark = gallica_resource.ark # match candidate article
gallica_resource.download_pdf(Path("article.pdf"))

or if you just want to do what the cli does:

from pathlib import Path
from gallica_resource.pipeline import BibtexParser

parser = BibtexParser(Path("outdir"))

with Path("articles.bib").open() as f:
    parser.read(f)

parser.run()
for result in parser.results:
    print(result)

for more advanced usage see the documentation and the test suite.

Developing

# ensure you have Poetry installed
pip install --user poetry

# install all dependencies (including dev)
poetry install

When your feature is ready, open a PR.

Testing

We use pytest and mypy. You may want to focus on getting unit tests passing first:

poetry run pytest --cov=gallica_autobib --cov-report html --cov-branch tests

If you have started a shell with poetry shell you can drop the poetry run.

When unittests are passing you can run the whole suite with:

poetry run scripts/test.sh

Note that tests will only pass fully if /tmp/ exists and is writeable, and if poppler and imagemagick are installed. This is due to the use of pdf-diff-visually and the rather hackish template tests.

You may wish to check your formatting first with

poetry run scripts/format.sh

Alternatively, just open a premature PR to run the tests in CI. Note that this is rather slow.

Plausibly Askable Questions

Why don't you also check xyz.com?

Because I don't know about it. Open an issue and I'll look into it.

Why don't you use library xyz for image processing?

Probably because I don't know about it. This is a quick tool written to help me research. Submit a PR and I'll happily update it.

Why is the code so verbose?

It is rather object-oriented. It might be rather over engineered. It was written in a hurry and the design evolved as it went along. On the other hand, it should be easier to extend.

Why not just extend pygallica?

I mean to submit the SRU stuff as an extension to pygallica

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gallica-autobib-0.1.4a0.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

gallica_autobib-0.1.4a0-py3-none-any.whl (35.1 kB view details)

Uploaded Python 3

File details

Details for the file gallica-autobib-0.1.4a0.tar.gz.

File metadata

  • Download URL: gallica-autobib-0.1.4a0.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.4 Linux/5.12.1-arch1-1

File hashes

Hashes for gallica-autobib-0.1.4a0.tar.gz
Algorithm Hash digest
SHA256 9d6c8b327bbf3da4a7e10fbac9c90ba4ac0b73542b48ad50158ca0394624b3a6
MD5 8bbb54400fed727d7844bdfe41247676
BLAKE2b-256 c32211372c89331726ea2bd1b816f60a0fc48c67b2d2a1ef1ddf5b80cb12712b

See more details on using hashes here.

File details

Details for the file gallica_autobib-0.1.4a0-py3-none-any.whl.

File metadata

  • Download URL: gallica_autobib-0.1.4a0-py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.4 Linux/5.12.1-arch1-1

File hashes

Hashes for gallica_autobib-0.1.4a0-py3-none-any.whl
Algorithm Hash digest
SHA256 420ab6b33fa38f6df2066938ac89de99893cfba4176ec31f92227a16d0c7618b
MD5 302d4ab27ff4d379618544a88376ed4b
BLAKE2b-256 72ba5ce2faea0e65a825379e22bdcb25f6f8c951ecbc1bce88876e5986987b9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page