search MARC files for regex matches
Project description
marcgrep 
A CLI for searching MARC files like MARCgrep.pl but in Python and a bit different syntax.
marcli is also a similar project that's faster but a little less flexible.
Installation
Python 3.10 or later.
pipx install marcgrep # install globally with pipx
pip install marcgrep # or use pip/pip3
Usage
# general command format - pass one or more files or pipe stdin
marcgrep OPTIONS FILE1.mrc FILE2.mrc
cat FILE.mrc | marcgrep OPTIONS
# full usage information
Usage: marcgrep [OPTIONS] [FILES]...
Find MARC records matching patterns in a file.
Options:
-h, --help Show this message and exit.
-c, --count Count matching records
-i, --include TEXT Include matching records (repeatable)
-e, --exclude TEXT Exclude matching records (repeatable)
-f, --fields TEXT Comma-separated list of fields to print
-l, --limit INTEGER Limit number of records to process
--color Colorize mnemonic MARC output
--invert Invert color scheme (for light terminal backgrounds)
--version Show the version and exit.
The --include and --exclude flags can be used multiple times to specify multiple criteria. They accept a pattern which is a sort of comma-separated filter expression for matching MARC fields. Examples:
# records with a 780 field
marcgrep -i 780 FILE.mrc
# records with Ulysses in the 245 field
marcgrep -i '245,Ulysses' FILE.mrc
# titles _without_ "Collected Poems" in the 245 ‡a subfield
marcgrep -e '245,a,Collected Poems' FILE.mrc
# titles with second indicator = 4 that do not start with "The "
marcgrep -i '245,,4,,^(?!The )' FILE.mrc
The meaning of the filter expression's components depends upon their number:
- 1: field,
910-> 910 is in record - 2: field and value (regular expression),
100,Lorde-> 100 contains string "Lorde" - 3: field, subfield, and value,
506,a,Open Access-> 506‡a contains string "Open Access" - 4: field, subfield, first indicator, and value,
856,0,u,@lcsh\.gov-> 856‡u with 1st indicator 0 contains string "@lcsh.gov" - 5: field, subfield, first & second indicators, and value,
245,0,4,a,The Communist Manifesto
The intention of this syntax is to facilitate searching subfields and field values more easily than MARCgrep.pl since we care about them more often than indicators. To ignore a component but use one of lesser priority, leave the component empty. For instance, 856,s, refers to records with an 856 field with an s subfield but the trailing comma means we don't care about the subfield's value. The pattern 245,,4,, refers to records with a 245 field with a second indicator of 4 regardless its subfields or value.
To use a literal comma in a value pattern, include all the other components. For instance, to search for "Morrison, Toni" anywhere in a 100 field, use 100,,,,Morrison, Toni.
Multiple criteria are combined with logical AND. Multiple --include flags is narrower than one, as is an --include and an --exclude.
Color & Formatting
The --color flag lets you pick colors for various parts of a MARC record using environment variables. You can pick from the available termcolor colors. The defaults are:
| Component | Color | Var |
|---|---|---|
| Tag | cyan | MARC_TAG_COLOR |
| Indicator | light_yellow | MARC_INDICATOR_COLOR |
| Subfield code | green | MARC_SUBFIELD_COLOR |
| Data | white | MARC_DATA_COLOR |
There is an inverted color scheme available with the --invert flag for use with light (e.g. white) terminal backgrounds.
You can also configure the subfield delimiter character and the symbol for an empty indicator. Those defaults are:
| Symbol | Var |
|---|---|
| ‡ | MARC_SUBFIELD_DELIMITER |
| _ | MARC_EMPTY_INDICATOR |
Development
uv is used for development.
uv sync # install dependencies
uv run pytest # run tests
uv build # build package, used in CI
Any tag triggers a release to Test PyPI. Any tag beginning with the letter v requires manual approval to be released to PyPI and GitHub. There are protection rules on the pypi and testpypi environments to this effect, too.
License
MIT © Eric Phetteplace 2024.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file marcgrep-2.0.0.tar.gz.
File metadata
- Download URL: marcgrep-2.0.0.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c78d5ddbac0d67a3d91168b16ecc2cac4a13e244807c34a39f2a4dfe1b8d13e
|
|
| MD5 |
0b5d049636c5a6a0dfa056224af5ad86
|
|
| BLAKE2b-256 |
53e11bf30af52da9395d20702f415a4ea448f97fdfb54dc91d4046454e834a71
|
Provenance
The following attestation bundles were made for marcgrep-2.0.0.tar.gz:
Publisher:
publish.yml on phette23/marcgreppy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marcgrep-2.0.0.tar.gz -
Subject digest:
3c78d5ddbac0d67a3d91168b16ecc2cac4a13e244807c34a39f2a4dfe1b8d13e - Sigstore transparency entry: 1340765585
- Sigstore integration time:
-
Permalink:
phette23/marcgreppy@d37ef07be806017d0eca162af8bfdc1e57925aca -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/phette23
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d37ef07be806017d0eca162af8bfdc1e57925aca -
Trigger Event:
push
-
Statement type:
File details
Details for the file marcgrep-2.0.0-py3-none-any.whl.
File metadata
- Download URL: marcgrep-2.0.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4625df75eee9216ffaaaab7c1727b36cce4d78f2e3d00acfea3ba396802be12f
|
|
| MD5 |
ea5db60ce133d70342c1357ec4df56c2
|
|
| BLAKE2b-256 |
24240caf132f2cea850a2dcde40a46b17a1b0ac58fd1b2b97085c0d1e8b97497
|
Provenance
The following attestation bundles were made for marcgrep-2.0.0-py3-none-any.whl:
Publisher:
publish.yml on phette23/marcgreppy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
marcgrep-2.0.0-py3-none-any.whl -
Subject digest:
4625df75eee9216ffaaaab7c1727b36cce4d78f2e3d00acfea3ba396802be12f - Sigstore transparency entry: 1340765586
- Sigstore integration time:
-
Permalink:
phette23/marcgreppy@d37ef07be806017d0eca162af8bfdc1e57925aca -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/phette23
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d37ef07be806017d0eca162af8bfdc1e57925aca -
Trigger Event:
push
-
Statement type: