Package modeling structured strings with regex.
Project description
regexmodel
Regexmodel is a python package that uses a graph model to fit and synthesize structured strings. Structured strings are strings such as license plates, credit card numbers ip-addresses, and phone numbers. Regexmodel can infer a regex-like structure from a series of positive examples and create new samples (such as phone numbers etc.).
Features:
- Draw new synthetic values
- Only on the numpy and polar libraries (faker for benchmarks).
- Fast (on average < 1 second for about 500 positive examples).
- Can provide statistics on how good the regexmodel has fit your values using log likelihood.
- Can be serialized and can be modified by hand.
Installation
You can install regexmodel using pip:
pip install git+https://github.com/sodascience/regexmodel.git
If you want to run the benchmarks, you should also install the faker package:
pip install faker
Using regexmodel
Fitting the regexmodel is as simple as:
from regexmodel import RegexModel
model = RegexModel.fit(your_values_to_fit, count_thres=10)
The count_thres
parameter changes how detailed and time consuming the fit is. A higher threshold means
a shorter time to fit, but also a worse fit.
Then synthesizing a new value is done with:
model.draw()
Serialization
The regex model can be serialized so that it can be stored in for example a JSON file:
import json
with open(some_file, "w") as handle:
json.dump(model.serialize(), handle)
And deserialized:
with open(some_file, "r") as handle:
model = RegexModel(json.load(handle))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for regexmodel-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db6703973ed2841eb6f66dc70c41cbaa7c47ef20fa0bba46439ef03ebba4c53f |
|
MD5 | 59e8460c25c7839165c9402093de9fc0 |
|
BLAKE2b-256 | 1ddbbfb5a51190fb7798c5b689e835bc3dce5dbf77851fd52db41cc149295499 |