Skip to main content

search MARC files for regex matches

Project description

marcgrep PyPI

A CLI for searching MARC files like MARCgrep.pl but in Python and a bit different syntax.

marcli is also a similar project that's faster but a little less flexible.

Installation

Python 3.9 or later.

pipx install marcgrep # install globally with pipx
pip install marcgrep # or use pip/pip3

Usage

# general command format
$ marcgrep OPTIONS FILE.mrc
$ cat FILE.mrc | marcgrep OPTIONS
# full usage information
$ marcgrep -h
Usage: marcgrep [OPTIONS] [FILE]

  Find MARC records matching patterns in a file.

Options:
  -h, --help           Show this message and exit.
  -c, --count          Count matching records
  -i, --include TEXT   Include matching records (repeatable)
  -e, --exclude TEXT   Exclude matching records (repeatable)
  -f, --fields TEXT    Comma-separated list of fields to print
  -l, --limit INTEGER  Limit number of records to process
  --color              Colorize mnemonic MARC output
  --version            Show the version and exit.

The --include and --exclude flags can be used multiple times to specify multiple criteria. They accept a pattern which is a sort of comma-separated filter expression for matching MARC fields. Examples:

# records with a 780 field
$ marcgrep -i 780 FILE.mrc
# records with Ulysses in the 245 field
$ marcgrep -i '245,Ulysses' FILE.mrc
# titles _without_ "Collected Poems" in the 245 $a subfield
$ marcgrep -e '245,a,Collected Poems' FILE.mrc
# titles with second indicator = 4 that do not start with "The "
$ marcgrep -i '245,,4,,^(?!The )' FILE.mrc

The meaning of the pattern's components depends upon their number:

  • 1: field, 910 -> 910 is in record
  • 2: field and value (regular expression), 100,Lorde -> 100 contains string "Lorde"
  • 3: field, subfield, and value, 506,a,Open Access -> 506$a contains string "Open Access"
  • 4: field, subfield, first indicator, and value, 856,0,u,@lcsh\.gov -> 856$u with 1st indicator 0 contains string "@lcsh.gov"
  • 5: field, subfield, first & second indicators, and value, 245,0,4,a,The Communist Manifesto

The intention of this syntax is to facilitate searching subfields and field values more easily than MARCgrep.pl since we care about them more often than indicators. To ignore a component but use one of lesser priority, leave the component empty. For instance, 856,s, refers to records with an 856 field with a $s subfield but the trailing comma means we don't care about the subfield's value. The pattern 245,,4,, refers to records with a 245 field with a second indicator of 4 regardless its subfields or value.

Multiple criteria are combined with logical AND. Multiple --include flags is narrower than one, as is an --include and an --exclude.

Color & Formatting

The --color flag lets you pick colors for various parts of a MARC record using environment variables. You can pick from the available termcolor colors. The defaults are:

Component Color Var
Tag cyan MARC_TAG_COLOR
Indicator light_yellow MARC_INDICATOR_COLOR
Subfield code green MARC_SUBFIELD_COLOR
Data white MARC_DATA_COLOR

You can also configure the subfield delimiter character and the symbol for an empty indicator. Those defaults are:

Symbol Var
MARC_SUBFIELD_DELIMITER
_ MARC_EMPTY_INDICATOR

Development

Poetry is used for development.

  • -c count
  • -v version
  • -l limit (number of records to process)
  • -i include criteria (multiple)
  • -e exclude criteria (multiple)
  • -f fields to print
  • colorize output?
  • work with MARC leader
  • regex for all components? e.g. 24.,text in any 240-249 field
  • relatedly, specify not to treat value as a regex?
poetry install # install dependencies
poetry run pytest # run tests

Any tag triggers a release to Test PyPI. Any tag beginning with the letter v requires manual approval to be released to PyPI and GitHub. There are protection rules on the pypi and testpypi environments to this effect, too.

License

MIT © Eric Phetteplace 2024.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marcgrep-1.0.4.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

marcgrep-1.0.4-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file marcgrep-1.0.4.tar.gz.

File metadata

  • Download URL: marcgrep-1.0.4.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for marcgrep-1.0.4.tar.gz
Algorithm Hash digest
SHA256 49cb8b9f9283aed53e2e82b90f2fb201a5717d4554fc4fb5420e24d81e82094b
MD5 0c7955fc4c01e633b5cb5a1423745ac1
BLAKE2b-256 fa6755a7c07fab7e375995ae926122db551b402c2a555da2d4977b001c0b9bc5

See more details on using hashes here.

File details

Details for the file marcgrep-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: marcgrep-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for marcgrep-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 25d7965dca000a7c76932867f34ee112ba008c187fd342f3894fa2af3cec80fc
MD5 967abdecc9a5e04fcf376e3418d9a25f
BLAKE2b-256 879ace119cfc358f79c4cdca5b333772def875f164f2214bd3bf81f5634c7e3f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page