Datasette plugin that adds a custom SQL function for executing matches using the Rust regular expression engine
Project description
datasette-rure
Datasette plugin that adds a custom SQL function for executing matches using the Rust regular expression engine
Install this plugin in the same environment as Datasette to enable the regexp() SQL function.
$ pip install datasette-rure
The plugin is built on top of the rure-python library by David Blewett.
regexp() to test regular expressions
You can test if a value matches a regular expression like this:
select regexp('hi.*there', 'hi there')
-- returns 1
select regexp('not.*there', 'hi there')
-- returns 0
You can also use SQLite's custom syntax to run matches:
select 'hi there' REGEXP 'hi.*there'
-- returns 1
This means you can select rows based on regular expression matches - for example, to select every article where the title begins with an E or an F:
select * from articles where title REGEXP '^[EF]'
Try this out: REGEXP interactive demo
regexp_match() to extract groups
You can extract captured subsets of a pattern using regexp_match().
select regexp_match('.*( and .*)', title) as n from articles where n is not null
-- Returns the ' and X' component of any matching titles, e.g.
-- and Recognition
-- and Transitions Their Place
-- etc
This will return the first parenthesis match when called with two arguments. You can call it with three arguments to indicate which match you would like to extract:
select regexp_match('.*(and)(.*)', title, 2) as n from articles where n is not null
The function will return null for invalid inputs e.g. a pattern without capture groups.
Try this out: regexp_match() interactive demo
regexp_matches() to extract multiple matches at once
The regexp_matches() function can be used to extract multiple patterns from a single string. The result is returned as a JSON array, which can then be further processed using SQLite's JSON functions.
The first argument is a regular expression with named capture groups. The second argument is the string to be matched.
select regexp_matches(
'hello (?P<name>\w+) the (?P<species>\w+)',
'hello bob the dog, hello maggie the cat, hello tarquin the otter'
)
This will return a list of JSON objects, each one representing the named captures from the original regular expression:
[
{"name": "bob", "species": "dog"},
{"name": "maggie", "species": "cat"},
{"name": "tarquin", "species": "otter"}
]
Try this out: regexp_matches() interactive demo
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datasette_rure-0.3-py3-none-any.whl.
File metadata
- Download URL: datasette_rure-0.3-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19405f0891a3efdbfa8ac6964606ce9d5fd6c38f6fe2f50509381b95a4924a97
|
|
| MD5 |
052971b4c1f769be61995469d83cce6c
|
|
| BLAKE2b-256 |
95a9191f77fb0e11767e25f797cb87c62b3c84b79ce12f278f45bb014e88dc18
|