Skip to main content

Automatically match Bibliographies against bnf.gallica.fr

Project description

pygallica-autobib

Automatically match Bibliographies against bnf.gallica.fr!

Test Publish Dependabot Enabled Coverage Package version Python Versions

Overview

pygallica-autobib will match your bibliographies against the French National Library and download articles as pdfs if possible, optionally post-processing them. Whilst it obviously cannot download articles which Gallica does not hold, it strives to achieve a 100% match rate. If you find an article it does not match, please report a bug.

Features:

  • Input in RIS or Bibtex format
  • Output report generated with a jinja template (templates for org-mode, html and plaintext supplied)

Online Demo

There is an online demo with a very basic interface, allowing

Installation

You need python >= 3.9 (but if anyone needs to use this with an older python, open an issue and I will refactor the few incompatible statements). Then install as usual:

pipx install gallica-autobib # prefered, if you have pipx, or
python -m pip install gallica-autobib

Standalone Usage

gallica-autobib my-bibliography.bib pdfs # match my-bibliography and put files in ./pdfs
gallica-autobib --help

As a library

from pathlib import Path
from gallica_autobib.models import Article
from gallica_autobib.query import Query, GallicaResource

target = Article(
    journaltitle="La Vie spirituelle",
    author="M.-D. Chenu",
    pages=list(range(547, 552)),
    volume=7,
    year=1923,
    title="Ascèse et péché originel",
)
query = Query(target)
candidate = Query.run().candidate # get candidate journal
gallica_resource = GallicaResource(candidate, source)
ark = gallica_resource.ark # match candidate article
gallica_resource.download_pdf(Path("article.pdf"))

or if you just want to do what the cli does:

from pathlib import Path
from gallica_resource.pipeline import BibtexParser

parser = BibtexParser(Path("outdir"))

with Path("articles.bib").open() as f:
    parser.read(f)

parser.run()
for result in parser.results:
    print(result)

for more advanced usage see the documentation and the test suite.

Developing

# ensure you have Poetry installed
pip install --user poetry

# install all dependencies (including dev)
poetry install

When your feature is ready, open a PR.

Testing

We use pytest and mypy. You may want to focus on getting unit tests passing first:

poetry run pytest --cov=gallica_autobib --cov-report html --cov-branch tests

If you have started a shell with poetry shell you can drop the poetry run.

When unittests are passing you can run the whole suite with:

poetry run scripts/test.sh

Note that tests will only pass fully if /tmp/ exists and is writeable, and if poppler and imagemagick are installed. This is due to the use of pdf-diff-visually and the rather hackish template tests.

You may wish to check your formatting first with

poetry run scripts/format.sh

Alternatively, just open a premature PR to run the tests in CI. Note that this is rather slow.

Plausibly Askable Questions

Why don't you also check xyz.com?

Because I don't know about it. Open an issue and I'll look into it.

Why don't you use library xyz for image processing?

Probably because I don't know about it. This is a quick tool written to help me research. Submit a PR and I'll happily update it.

Why is the code so verbose?

It is rather object-oriented. It might be rather over engineered. It was written in a hurry and the design evolved as it went along. On the other hand, it should be easier to extend.

Why not just extend pygallica?

I mean to submit the SRU stuff as an extension to pygallica

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gallica-autobib-0.1.3.tar.gz (30.7 kB view details)

Uploaded Source

Built Distribution

gallica_autobib-0.1.3-py3-none-any.whl (35.1 kB view details)

Uploaded Python 3

File details

Details for the file gallica-autobib-0.1.3.tar.gz.

File metadata

  • Download URL: gallica-autobib-0.1.3.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.4 Linux/5.12.1-arch1-1

File hashes

Hashes for gallica-autobib-0.1.3.tar.gz
Algorithm Hash digest
SHA256 9eb5fce9a8793c8885bff2ada845548b34217f85cf74fe15e00918b5c499e6ca
MD5 c377d3a4610f782c41d21d35636c939a
BLAKE2b-256 4dd0d1fdf965f6570e20c68a08f5f2e281d9bab25b0bb5dfa30fe3a7354ab5d4

See more details on using hashes here.

File details

Details for the file gallica_autobib-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: gallica_autobib-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.4 Linux/5.12.1-arch1-1

File hashes

Hashes for gallica_autobib-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6890153e2e6a2b732e599cce6081ad5be882830073598f32e55269795764dc7c
MD5 272dd10219ff21e8a01ef00963f812b7
BLAKE2b-256 27b534608104c0fa77ffd01ac2b441f0fe7d2d051ee21427816759cacbb2bfc7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page