Generate unique strings from regular expressions.
Project description
IntXeger
IntXeger (pronounced "integer") is a Python library for generating strings from regular expressions. It was inspired by the xeger library but provides additional features such as:
- Faster than both xeger and exrex.
- Array-like indexing for mapping integers to strings which match the regex.
- Sampling-without-replacement for generating a set of unique strings which match the regex.
These features make IntXeger
perfect for applications such as generating unique
identifiers, producing matching strings sequentially, and more!
Installation
You can install the latest stable release of IntXeger by running:
pip install intxeger
Quick Start
Let's start with a simple example where our regex specifies a two-character string that only contains lowercase letters.
import intxeger
x = intxeger.build("[a-z]{2}")
You can check the number of strings that can be generated with this string using
the length
attribute and generate the i
th string which matches using the get(i)
method.
assert x.length == 26**2 # there are 676 unique strings which match this regex
assert x.get(15) == 'ap' # the 15th unique string is 'ap'
Furthermore, you can generate N
unique strings which match this regex using the
sample(N)
method. Note that N
must be less than or equal to the length.
print(x.sample(N=10))
# ['xt', 'rd', 'jm', 'pj', 'jy', 'sp', 'cm', 'ag', 'cb', 'yt']
Here's a more complicated regex which specifies a timestamp.
x = intxeger.build(r"(1[0-2]|0[1-9])(:[0-5]\d){2} (A|P)M")
print(x.sample(N=2))
# ['11:57:12 AM', '01:16:01 AM']
To learn more about the functionality provided by IntXeger
, check out our
documentation!
Benchmark
This table, generated by benchmark.py
, shows the amount of time in
milliseconds required to generate N
examples of each regular expression
using xeger
and intxeger
.
regex | N | xeger | exrex | intxeger |
---|---|---|---|---|
[a-zA-Z]+ | 100 | 7.36 | 3.17 | 1.09 |
[0-9]{3}-[0-9]{3}-[0-9]{4} | 100 | 11.59 | 6.25 | 0.8 |
[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4} | 1000 | 208.62 | 91.3 | 18.28 |
/json/([0-9]{4})/([a-z]{4}) | 1000 | 133.36 | 107.01 | 12.18 |
Have a regular expression that isn't represented here? Check out our Contributing Guide and submit a pull request!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file intxeger-0.1.1.tar.gz
.
File metadata
- Download URL: intxeger-0.1.1.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1093c931367318b9fb2069462afdaef266e99ede40f50eeb3125d69246d9d74b |
|
MD5 | d8704ec7e41fc0bb6595839b29445e35 |
|
BLAKE2b-256 | 0ae9b4690c734e9727dea819e25b346e185eb77b3c919f2240e905b3bf4b4fff |
File details
Details for the file intxeger-0.1.1-py2.py3-none-any.whl
.
File metadata
- Download URL: intxeger-0.1.1-py2.py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a49dfa2476a41145cc3d4351f8095d5e62506c95261656f7673c75a02ffdbc5 |
|
MD5 | 15ac39505bd15bd1af38c59b7813105a |
|
BLAKE2b-256 | 90649f26f91dd0c25315e6fd0deefee7c492ba2ef29988b3424ea2f24570887e |