Skip to main content

Generate unique strings from regular expressions.

Project description

IntXeger

Build Status Documentation Code Coverage PyPI MIT

IntXeger (pronounced "integer") is a Python library for generating strings from regular expressions. It was inspired by the xeger library but provides additional features such as:

  • Faster than both xeger and exrex.
  • Array-like indexing for mapping integers to strings which match the regex.
  • Sampling-without-replacement for generating a set of unique strings which match the regex.

These features make IntXeger perfect for applications such as generating unique identifiers, producing matching strings sequentially, and more!

Installation

You can install the latest stable release of IntXeger by running:

pip install intxeger

Quick Start

Let's start with a simple example where our regex specifies a two-character string that only contains lowercase letters.

import intxeger
x = intxeger.build("[a-z]{2}")

You can check the number of strings that can be generated with this string using the length attribute and generate the ith string which matches using the get(i) method.

assert x.length == 26**2 # there are 676 unique strings which match this regex
assert x.get(15) == 'ap' # the 15th unique string is 'ap'

Furthermore, you can generate N unique strings which match this regex using the sample(N) method. Note that N must be less than or equal to the length.

print(x.sample(N=10))
# ['xt', 'rd', 'jm', 'pj', 'jy', 'sp', 'cm', 'ag', 'cb', 'yt']

Here's a more complicated regex which specifies a timestamp.

x = intxeger.build(r"(1[0-2]|0[1-9])(:[0-5]\d){2} (A|P)M")
print(x.sample(N=2))
# ['11:57:12 AM', '01:16:01 AM']

To learn more about the functionality provided by IntXeger, check out our documentation!

Benchmark

This table, generated by benchmark.py, shows the amount of time in milliseconds required to generate N examples of each regular expression using xeger and intxeger.

regex N xeger exrex intxeger
[a-zA-Z]+ 100 7.36 3.17 1.09
[0-9]{3}-[0-9]{3}-[0-9]{4} 100 11.59 6.25 0.8
[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4} 1000 208.62 91.3 18.28
/json/([0-9]{4})/([a-z]{4}) 1000 133.36 107.01 12.18

Have a regular expression that isn't represented here? Check out our Contributing Guide and submit a pull request!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intxeger-0.1.1.tar.gz (14.2 kB view hashes)

Uploaded Source

Built Distribution

intxeger-0.1.1-py2.py3-none-any.whl (8.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page