Skip to main content

Validation and secure evaluation of untrusted python expressions

Project description

Evalidate

Evalidate is simple python module for safe and very fast eval()'uating user-supplied (possible malicious) python expressions.

Evalidate comes with a very convenient and fast CLI tool, jg, for filtering (grep‑ing) items in JSON lists:

jg 'attributes["storage"] == "512GB" and price<300 and rating>4'  products.json

Purpose

Originally it's developed for filtering complex data structures e.g.

Find cheap smartphones available for sale:

category=="smartphones" and price<300 and stock>0

But also, it can be used for other expressions, e.g. arithmetical, like

a+b-100

Evalidate is fastest among all (known to me) secure eval python modules.

Install

pip3 install evalidate

Security

Built-in python features such as compile() or eval() are quite powerful to run any kind of user-supplied code, but could be insecure if used code is malicious like os.system("rm -rf /"). Evalidate works on whitelist principle, allowing code only if it consist only of safe operations (based on authors views about what is safe and what is not, your mileage may vary - but you can supply your list of safe operations)

TL;DR. Just give me safe eval!

from evalidate import Expr, EvalException

src = 'a + 40 > b'
# src = "__import__('os').system('clear')"

try:
    print(Expr(src).eval({'a':10, 'b':42}))
except EvalException as e:
    print(e)

Gives output: True

In case of dangerous code (uncomment second src line to test):

output will be: ERR: Operation type Call is not allowed

Exceptions

Evalidate throws exceptions CompilationException, ValidationException, ExecutionException. All of them inherit from base exception class EvalException.

Configure validation

Evalidate is very flexible, depending on security model, same code can either pass validation or raise exception.

EvalModel is security model class for eval - lists of allowed AST nodes, function calls, attributes and dict of imported functions. There is built-in model base_eval_model with basic operations allowed (which are safe from authors point of view).

You can create custom empty model (and extend it later):

my_model = evalidate.EvalModel()

(nothing is allowed by default, even 1+2 will not be considered safe)

or you may start from base_eval_mode and extend it:

from evalidate import Expr, base_eval_model

my_model = base_eval_model.clone()
my_model.nodes.append('Mult')

Expr('2*2', model=my_model).eval()

To enable int() function, need to allow 'Call' node and add this function to list of allowed function:

my_model.nodes.append('Call')
my_model.allowed_functions.append('int')

Expr('int(36.6)', model=my_model).eval()

Or, to call attributes:

m = base_eval_model.clone()
m.nodes.extend(['Call', 'Attribute'])
m.attributes.append('startswith')

src = '"abcdef".startswith("abc")'
r = evalidate.Expr(src, model=m).eval()

But even with this settings, exploiting it with expression like __builtins__["eval"](1) will fail (good!).

Exporting my functions to eval code

def one():
  return 1

m = base_eval_model.clone()
m.nodes.append('Call')
m.imported_functions["one"] = one
Expr('one()', model=m).eval()

CLI tools

genfakeproducts

genfakeproducts generates a JSON list of N fake products for testing. For example:

# generates 1 million fake JSON records for testing (takes about 1 minute on my computer)
genfakeproducts -n 1000000 -o products.json

By default, N is 10000 and the output file is products.json (takes 1 second).

Example record:

  {
    "id": "d5af9f93-c164-4872-a535-4fb56cf62e6e",
    "sku": "SKU-D5AF9F93",
    "title": "Google ET-3843 Pro",
    "brand": "Google",
    "category": "phones",
    "description": "Foot trial city avoid he wish real college investment tend include draw window either over somebody history detail risk support glass. Google phones designed for modern users.",
    "price": 340.94,
    "currency": "USD",
    "stock": 24,
    "rating": 4.23,
    "reviews_count": 222,
    "attributes": {
      "storage": "512GB",
      "color": "red"
    },
    "tags": [
      "limited",
      "new"
    ]
  }

To use genfakeproducts, you need to install evalidate with "generate" extra: pip3 install evalidate[generate]

jg (grep)

jg is a very fast JSON grep CLI tool. It accepts a Python expression and a filename, or reads from stdin:

jg 'category=="phones"' products.json

jg implements only a small subset of jq's functionality, but it is twice as fast and much simpler to use.

Options

  • -b - benchmark. Will not print output JSON. Measures filtering time.
  • -v - verbose mode.
  • -l - reads input as JSONL (one JSON object per line) instead of a JSON array
  • -k - keypath to list e.g. "shop::products::onstock" if the input data is a nested dictionary
  • -f - custom output format for non-JSON output e.g. '{sku} {price} ({stock}) {title!r}'

Advanced topics

Improve speed by using native eval() with validated code

Evalidate is very fast, but it's still takes CPU cycles... If you want to achieve maximal possible speed, you can use python native eval with this kind of code:

from evalidate import Expr

d = dict(a=1, b=2)
expr = Expr('a+b')
eval(expr.code, None, d) # <-- native python eval, will run at eval() speed

This is as secure as expr.eval(), because expr.code is already validated to be secure.

Difference is very little: execution of expr.code can throw any exception, while expr.eval() can throw only ExecutionException. Also, if you want to export your functions to eval, you should do this manually.

Custom mapping classes

You can use custom mapping classes instead of dict. As with native python eval, custom mapping class may be used only as locals context.

Example:

from collections import UserDict
from evalidate import Expr

class LazyDict(UserDict):
    def __missing__(self, key):
        return 42

ctx = LazyDict(a=100)
expr = Expr("a+b")

res = expr.eval(ctx_locals=ctx)
print(res) # 142

Limitations

evalidate uses ast.parse() to get AST node to validate it.

Warning

It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.

In my test, works well with 200 nested int(): int(int(.... int(1)...)) but not with 201. Source code is 1000+ characters. But even if evalidate will get such code, it will just raise CompilationException.

evalidate.security.test_security()

Evalidate is very flexible and it's possible to shoot yourself in foot if you will try hard. test_security() checks your configuration (nodes, funcs, attrs) against given list of possible attack code or against built-in list of attacks. test_security() returns True if everything is OK (all attacks raised ValidationException) or False if something passed.

This code will never print (I hope).

from evalidate.security import test_security

test_security() or print("default rules are vulnerable!")

But this will fail because nodes/funcs leads to successful validation for attack (suppose you do not want anyone to call int())

from evalidate.security import test_security

attacks = ['int(1)']

test_security(attacks, addnodes=['Call'], funcs=['int'], verbose=True)

It will print:

Testing attack code:
int(1)
Problem! Attack passed validation without exception!
Code:
int(1)

Example

Filtering by user-supplied condition

This is code of examples/products.py. Expression is validated and compiled once and executed (as byte-code, very fast) many times, so filtering is both fast and secure.

#!/usr/bin/env python3

import requests
from evalidate import Expr, ValidationException, CompilationException, ExecutionException
import json
import sys

data = requests.get('https://dummyjson.com/products?limit=100').json()

try:
    src = sys.argv[1]
except IndexError:
    src = 'True'

try:
    expr = Expr(src)
except (ValidationException, CompilationException) as e:
    print(e)
    sys.exit(1)

c=0
for p in data['products']:
    # print(p)
    try:
        r = expr.eval(p)
        if r:
            print(json.dumps(p, indent=2))
            c+=1
    except ExecutionException as e:
        print("Runtime exception:", e)
print("# {} products matches".format(c))
# print all 100 products
./products.py

# Only cheap products, 8 matches
./products.py 'price<20'

# smartphones (5)
./products.py 'category=="smartphones"'

# good smartphones
./products.py 'category=="smartphones" and rating>4.5'

# cheap smartphones
./products.py 'category=="smartphones" and price<300'

Similar projects and benchmark

asteval

While asteval can compute much more complex code (define functions, use python math libraries) it has drawbacks:

  • asteval is much slower (evalidate can be used at speed of eval() python bytecode)
  • user can provide source code which runs very long time and consumes many resources

simpleeval Very similar project, using AST approach too and optimized to re-evaluate pre-parsed expressions. But parsed expressions are stored as more high-level ast.Expr type and this approach is few times slower, while evalidate uses python native code type and evaluation itself goes at speed of python eval()

evalidate is good to run same expression against different data.

Benchmarking

We use benchmark/benchmark.py in this repository. We prepare list of 1 million of products (actually, we take just 100 products sample, but repeat it 10 000 times to get 1 million), and then filter it, finding only specific products on "untrusted user-supplied expression" (price < 20 in this case)

Products: 1000000 items
evalidate_raw_eval(): 0.266s
evalidate_eval(): 0.326s
test_simpleeval(): 1.824s
test_asteval(): 26.106s

As you see, evalidate is few times faster then simpleeval and both are much faster then asteval.

Maybe my test is not perfectly optimized (I'm not expert with simpleeval/asteval), if you can suggest better filtering sample code (which produces faster result), I will include it. (Benchmark code must assume expression as unknown in advance and untrusted)

Read about eval() risks

Note: realpython article shows example with nice short method of validation source (using code.co_names), but it's vulnerable, it passes "bomb" from Ned Batchelder article (bomb has empty co_names tuple) and crash interpreter. Evalidate can block this code and similar bombs (unless you will intentionally configure evalidate to pass specific bomb code. Yes, with evalidate it is hard to shoot yourself in the foot, but it is possible if you will try hard).

More info

Want more info? Check source code of module, it's very short and simple, easy to modify

Contact

Write me: yaroslaff at gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalidate-2.1.4.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalidate-2.1.4-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file evalidate-2.1.4.tar.gz.

File metadata

  • Download URL: evalidate-2.1.4.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.1 cpython/3.13.5 HTTPX/0.28.1

File hashes

Hashes for evalidate-2.1.4.tar.gz
Algorithm Hash digest
SHA256 86b574117c839e7479b04efccd07eb2ef55669bfcbb2bbb59ea1bdf6d1c7d0ab
MD5 4d1ac14704fa350ee999c3176d596a5c
BLAKE2b-256 b5505a02fdfa72f380cb7c64f5acc91a282a3ba97fb8a4320ad328086cc9bcfb

See more details on using hashes here.

File details

Details for the file evalidate-2.1.4-py3-none-any.whl.

File metadata

  • Download URL: evalidate-2.1.4-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.1 cpython/3.13.5 HTTPX/0.28.1

File hashes

Hashes for evalidate-2.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e295098e7f7eeaf1c77c82f315180eef152280bd94f464eda2e093425f97fffb
MD5 cae6bf3ea1d69f7500bc27536c28aa4b
BLAKE2b-256 7aa3651a90aa0b4cc1dc3c7905de3a8f2173ecc6dfb997318f042957521d4e8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page