Skip to main content

Automatically match Bibliographies against bnf.gallica.fr

Project description

pygallica-autobib

Automatically match Bibliographies against bnf.gallica.fr!

Test Publish Dependabot Enabled Coverage Package version Python Versions

Overview

pygallica-autobib will match your bibliographies against the French National Library and download articles as pdfs if possible, optionally post-processing them. Whilst it obviously cannot download articles which Gallica does not hold, it strives to achieve a 100% match rate. If you find an article it does not match, please report a bug.

Features:

  • Input in RIS or Bibtex format
  • Output report generated with a jinja template (templates for org-mode, html and plaintext supplied)

Online Demo

There is an online demo with a very basic interface, allowing

Installation

You need python >= 3.9 (but if anyone needs to use this with an older python, open an issue and I will refactor the few incompatible statements). Then install as usual:

pipx install gallica-autobib # prefered, if you have pipx, or
python -m pip install gallica-autobib

Standalone Usage

gallica-autobib my-bibliography.bib pdfs # match my-bibliography and put files in ./pdfs
gallica-autobib --help

As a library

from pathlib import Path
from gallica_autobib.models import Article
from gallica_autobib.query import Query, GallicaResource

target = Article(
    journaltitle="La Vie spirituelle",
    author="M.-D. Chenu",
    pages=list(range(547, 552)),
    volume=7,
    year=1923,
    title="Ascèse et péché originel",
)
query = Query(target)
candidate = Query.run().candidate # get candidate journal
gallica_resource = GallicaResource(candidate, source)
ark = gallica_resource.ark # match candidate article
gallica_resource.download_pdf(Path("article.pdf"))

or if you just want to do what the cli does:

from pathlib import Path
from gallica_resource.pipeline import BibtexParser

parser = BibtexParser(Path("outdir"))

with Path("articles.bib").open() as f:
    parser.read(f)

parser.run()
for result in parser.results:
    print(result)

for more advanced usage see the documentation and the test suite.

Developing

# ensure you have Poetry installed
pip install --user poetry

# install all dependencies (including dev)
poetry install

When your feature is ready, open a PR.

Testing

We use pytest and mypy. You may want to focus on getting unit tests passing first:

poetry run pytest --cov=gallica_autobib --cov-report html --cov-branch tests

If you have started a shell with poetry shell you can drop the poetry run.

When unittests are passing you can run the whole suite with:

poetry run scripts/test.sh

Note that tests will only pass fully if /tmp/ exists and is writeable, and if poppler and imagemagick are installed. This is due to the use of pdf-diff-visually and the rather hackish template tests.

You may wish to check your formatting first with

poetry run scripts/format.sh

Alternatively, just open a premature PR to run the tests in CI. Note that this is rather slow.

Plausibly Askable Questions

Why don't you also check xyz.com?

Because I don't know about it. Open an issue and I'll look into it.

Why don't you use library xyz for image processing?

Probably because I don't know about it. This is a quick tool written to help me research. Submit a PR and I'll happily update it.

Why is the code so verbose?

It is rather object-oriented. It might be rather over engineered. It was written in a hurry and the design evolved as it went along. On the other hand, it should be easier to extend.

Why not just extend pygallica?

I mean to submit the SRU stuff as an extension to pygallica

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gallica-autobib-0.1.3.tar.gz (30.7 kB view hashes)

Uploaded Source

Built Distribution

gallica_autobib-0.1.3-py3-none-any.whl (35.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page