A tool for pulling word occurrence ('n-gram') data from the Gallica periodical archive.
Project description
gallicaGetter
This tool wraps a few endpoints from the Gallica API to allow multi-threaded data retrieval with support for generators. I'll be adding much more documentation soon -- just wanted to get this out there! Pull requests welcome.
Current endpoints are:
- 'sru' -- word occurrences
- 'content' -- occurrence context and page numbers
- 'papers' -- paper metadata
- 'issues' -- years published for a given paper
The tool's functionality has evolved around my application's needs, but it should be easy to extend.
Examples
I want to retrieve all issues that mention "Brazza" from 1890 to 1900.
import gallicaGetter
sruWrapper = gallicaGetter.connect('sru')
records = sruWrapper.get(
terms="Brazza",
startDate="1890",
endDate="1900",
grouping="all"
)
for record in records:
print(record.getRow())
I want to retrieve all occurrences of "Brazza" within 10 words of "Congo" in the paper "Le Temps" from 1890 to 1900.
import gallicaGetter
sruWrapper = gallicaGetter.connect('sru')
records = sruWrapper.get(
terms="Brazza",
startDate="1890",
endDate="1900",
linkTerm="Congo",
linkDistance=10,
grouping="all",
codes="cb34431794k"
)
for record in records:
print(record.getRow())
Retrieve the number of occurrences of "Victor Hugo", by year, across the Gallica archive from 1800 to 1900, running 30 requests in parallel.
import gallicaGetter
sruWrapper = gallicaGetter.connect('sru', numWorkers=30)
records = sruWrapper.get(
terms="Victor Hugo",
startDate="1800",
endDate="1900",
grouping="year"
)
for record in records:
print(record.getRow())
Retrieve all issues mentioning "Paris" in the papers "Le Temps" and "Le Figaro" from 1890 to 1900, using a generator.
import gallicaGetter
sruWrapper = gallicaGetter.connect('sru')
recordGenerator = sruWrapper.get(
terms="Paris",
startDate="1890",
endDate="1900",
grouping="all",
codes=["cb34431794k", "cb3443179k"],
generate=True
)
for i in range(10):
print(next(recordGenerator).getRow())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gallicagetter-0.0.2.tar.gz
.
File metadata
- Download URL: gallicagetter-0.0.2.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e5a0793d269eea3e8d7447661a8fa92833ac4b3a7e83827d52110add0e2512f |
|
MD5 | 77fe3a11752b5d4d13ac16d734e4b3e4 |
|
BLAKE2b-256 | 3859db15abf5bcb51375d68c9b632433d99749d33c07d6bb501c6bbc5eb4013f |
File details
Details for the file gallicagetter-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: gallicagetter-0.0.2-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b70e07b5cfbb216de95a1c77e4f2a0d9261e3bb8c8180e636fab70573e8cfeff |
|
MD5 | 503e336d4ea9ad4b8bc5852d34a29371 |
|
BLAKE2b-256 | 0d0f52bc89c0a1c330d2ac41bf40ddaa61ecfcefcf7a4e63b59db42c8992c175 |