Skip to main content

Search pymarc.Record using a string expression

Project description

pymarcspec

Build Status Coverage Status

Summary

An implementation of MarcSpec on top of pymarc for searching MARC records.

Usage

The idea is to easily use strings to search over MARC without writing complicated code to handle data.

import sys
from pymarcspec import MarcSearchParser
from pymarc import MARCReader

parser = MarcSearchParser()
spec = parser.parse('650$a$0')
with open(sys.argv[1], 'rb') as f:
    for record in MARCReader(f):
        subjects = spec.search(record, field_delimiter=':', subfield_delimiter=',')
        print(subjects)

There is also a MarcSearch object that memoizes each search expression, so that you can conveniently run a number of different searches without creating several parsed specs. For example:

import csv
import sys
from pymarcspec import MarcSearch
from pymarc import MARCReader

writer = csv.writer(sys.stdout, dialect='unix', quoting=csv.QUOTE_MINIMAL)
writer.writerow(['id', 'title', 'subjects'])

marcsearch = MarcSearch()
with open(sys.argv[1], 'rb') as f:
    for record in MARCReader(f):
        control_id = marcsearch.search('100', record)
        title = marcsearch.search('245[0]$a-c', record)
        subjects = marcsearch.search('650$a', record, field_delimiter=', ')
        writer.writerow([control_id, title, subjects])        

Development

Building the Parser

To build the parser, run:

python -m tatsu -o marcparser/parser.py marcparser/marcparser.ebnf

Note that this builds a class MarcSpecParser, which implements the full specification from MarcSpec, the MarcSearchParser is a subclass that builds an instance of MarcSpec; building this structure has some restrictions for what I needed when I wrote it.

Testing for freshness

The test in test/test_ebnf.py compiles the parser from the EBNF into a temporary path, which makes sure that coffee driven programmers like me remember to compile the parser and check in the changes.

Performance

It is not obvious this is needed. It may be fine for instance to use XPath expressions. Suppose we are going to do a lot of these conversions - if XPath is fast enough, the work of converting from a pymarc.Record to MARCXML will be amoritized by many searches. Jupyter Notebooks have a %timeit magic that allows us to check this:

Let us check the performance of the simplest such XPath expression:

In [34]: %timeit ''.join(doc.xpath('./controlfield[@tag="001"]/text()'))                                                                                  
19.4 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

And compare it to parsing a spec and searching:

In [37]: from pymarcspec import MarcSearchParser                                                    

In [38]: parser = MarcSearchParser()                                                                

In [39]: spec = parser.parse('001')                                                                 

In [40]: spec.search(record)                                                                        
Out[40]: '1589530'

In [41]: %timeit spec.search(record)                                                                
7.89 µs ± 253 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So, from a performance perspective this is clearly a win, and the expression is much closer to library IT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymarcspec-0.0.2.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

pymarcspec-0.0.2-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file pymarcspec-0.0.2.tar.gz.

File metadata

  • Download URL: pymarcspec-0.0.2.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.6.12

File hashes

Hashes for pymarcspec-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5256d7143725e730be3de6fa3b64bcf48f3245745d034c95cbbed4b65edb8ee6
MD5 2c02deb9aa8b078255ac1d8867b22d0a
BLAKE2b-256 da899116e1f9d0a620b4c84c989e51ac52603cd474e36daf91088a0d54d48df4

See more details on using hashes here.

File details

Details for the file pymarcspec-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pymarcspec-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.6.12

File hashes

Hashes for pymarcspec-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9cedbd5e3e92956d2ef99e9f47b889960382ad50fae487c5c7726bb2116137d1
MD5 a5af6d6ed1cd1b2c304979b04d21909e
BLAKE2b-256 51f9fa6a0e7908222c87fb434dd997852435bbb8c73fd05bea0c6e8173e74b41

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page