Skip to main content

Parse and compile Excel formulas and workbooks in python code.

Project description

What is formulas?

formulas implements an interpreter for Excel formulas, which parses and compile Excel formulas expressions.

Moreover, it compiles Excel workbooks to python and executes without using the Excel COM server. Hence, Excel is not needed.

Installation

To install it use (with root privileges):

$ pip install formulas

Or download the last git version and use (with root privileges):

$ python setup.py install

Install extras

Some additional functionality is enabled installing the following extras:

  • excel: enables to compile Excel workbooks to python and execute using: ExcelModel.

  • plot: enables to plot the formula ast and the Excel model.

To install formulas and all extras, do:

$ pip install formulas[all]

Development version

To help with the testing and the development of formulas, you can install the development version:

$ pip install https://github.com/vinci1it2000/formulas/archive/dev.zip

CLI Quickstart

The formulas command-line interface works with spreadsheet models and accepts .xlsx, .ods, and .json inputs.

A typical workflow starts by calculating a workbook. You can override input values directly from the command line and request specific cells to be rendered in the output.

$ formulas calc test/test_files/excel.xlsx \
    --overwrite "'[excel.xlsx]'!INPUT_A=3" \
    --overwrite "'[excel.xlsx]DATA'!B3=1" \
    --render "'[excel.xlsx]DATA'!C2=result" \
    --output-format json

Spreadsheet models can also be converted into a portable JSON representation. This is useful when the model needs to be versioned, inspected, or executed without the original workbook.

$ formulas build test/test_files/excel.xlsx \
    --output-file model.json

For validation purposes, a workbook can be tested directly from the CLI. The following command runs the tests and prints a short summary.

$ formulas test test/test_files/excel.xlsx --summary

Finally, a model can be exposed as a lightweight HTTP API, allowing other applications to execute it remotely. The serve command requires the optional web dependencies (pip install formulas[web]).

$ formulas serve test/test_files/excel.xlsx \
    --host 127.0.0.1 \
    --port 5000

Each command provides additional options and examples through the built-in help system:

$ formulas COMMAND --help

Basic Examples

The following sections will show how to:

  • parse a Excel formulas;

  • load, compile, and execute a Excel workbook;

  • extract a sub-model from a Excel workbook;

  • add a custom function.

Parsing formula

An example how to parse and execute an Excel formula is the following:

>>> import formulas
>>> func = formulas.Parser().ast('=(1 + 1) + B3 / A2')[1].compile()

To visualize formula model and get the input order you can do the following:

>>> list(func.inputs)
['A2', 'B3']
>>> func.plot(view=False)  # Set view=True to plot in the default browser.
SiteMap({=((1 + 1) + (B3 / A2)): SiteMap({})})

[graph]

Finally to execute the formula and plot the workflow:

>>> func(1, 5)
Array(7.0, dtype=object)
>>> func.plot(workflow=True, view=False)  # Set view=True to plot in the default browser.
SiteMap({=((1 + 1) + (B3 / A2)): SiteMap({})})

[graph]

Excel workbook

An example how to load, calculate, and write an Excel workbook is the following:

>>> import formulas
>>> fpath, dir_output = 'excel.xlsx', 'output'
>>> xl_model = formulas.ExcelModel().loads(fpath).finish()
>>> xl_model.calculate()
Solution(...)
>>> xl_model.write(dirpath=dir_output)
{'EXCEL.XLSX': {Book: <openpyxl.workbook.workbook.Workbook ...>}}
Tip: If you have or could have circular references, add

circular=True to finish method.

To plot the dependency graph that depict relationships between Excel cells:

>>> dsp = xl_model.dsp
>>> dsp.plot(view=False)  # Set view=True to plot in the default browser.
SiteMap({ExcelModel: SiteMap(...)})

[graph]

To overwrite the default inputs that are defined by the excel file or to impose some value to a specific cell:

>>> xl_model.calculate(
...     inputs={
...         "'[excel.xlsx]'!INPUT_A": 3,  # To overwrite the default value.
...         "'[excel.xlsx]DATA'!B3": 1  # To impose a value to B3 cell.
...     },
...     outputs=[
...        "'[excel.xlsx]DATA'!C2", "'[excel.xlsx]DATA'!C4"
...     ] # To define the outputs that you want to calculate.
... )
 Solution({"'[excel.xlsx]'!INPUT_A": <Ranges>('[excel.xlsx]DATA'!A2)=[[3]],
           "'[excel.xlsx]DATA'!B3": <Ranges>('[excel.xlsx]DATA'!B3)=[[1]],
           "'[excel.xlsx]DATA'!A2": <Ranges>('[excel.xlsx]DATA'!A2)=[[3]],
           "'[excel.xlsx]DATA'!A3": <Ranges>('[excel.xlsx]DATA'!A3)=[[6]],
           "'[excel.xlsx]DATA'!A4": <Ranges>('[excel.xlsx]DATA'!A4)=[[5]],
           "'[excel.xlsx]DATA'!D2": <Ranges>('[excel.xlsx]DATA'!D2)=[[1]],
           "'[excel.xlsx]'!INPUT_B": <Ranges>('[excel.xlsx]DATA'!A3)=[[6]],
           "'[excel.xlsx]'!INPUT_C": <Ranges>('[excel.xlsx]DATA'!A4)=[[5]],
           "'[excel.xlsx]DATA'!A3:A4": <Ranges>('[excel.xlsx]DATA'!A3:A4)=[[6] [5]],
           "'[excel.xlsx]DATA'!B2": <Ranges>('[excel.xlsx]DATA'!B2)=[[9.0]],
           "'[excel.xlsx]DATA'!D3": <Ranges>('[excel.xlsx]DATA'!D3)=[[2.0]],
           "'[excel.xlsx]DATA'!C2": <Ranges>('[excel.xlsx]DATA'!C2)=[[10.0]],
           "'[excel.xlsx]DATA'!D4": <Ranges>('[excel.xlsx]DATA'!D4)=[[3.0]],
           "'[excel.xlsx]DATA'!C4": <Ranges>('[excel.xlsx]DATA'!C4)=[[4.0]]})

To build a single function out of an excel model with fixed inputs and outputs, you can use the compile method of the ExcelModel that returns a DispatchPipe. This is a function where the inputs and outputs are defined by the data node ids (i.e., cell references).

>>> func = xl_model.compile(
...     inputs=[
...         "'[excel.xlsx]'!INPUT_A",  # First argument of the function.
...         "'[excel.xlsx]DATA'!B3"   # Second argument of the function.
...     ], # To define function inputs.
...     outputs=[
...         "'[excel.xlsx]DATA'!C2", "'[excel.xlsx]DATA'!C4"
...     ] # To define function outputs.
... )
>>> func
<schedula.utils.dsp.DispatchPipe object at ...>
>>> [v.value[0, 0] for v in func(3, 1)]  # To retrieve the data.
[10.0, 4.0]
>>> func.plot(view=False)  # Set view=True to plot in the default browser.
SiteMap({ExcelModel: SiteMap(...)})

[graph]

Alternatively, to load a partial excel model from the output cells, you can use the from_ranges method of the ExcelModel:

>>> xl = formulas.ExcelModel().from_ranges(
...     "'[%s]DATA'!C2:D2" % fpath,  # Output range.
...     "'[%s]DATA'!B4" % fpath,  # Output cell.
... )
>>> dsp = xl.dsp
>>> sorted(dsp.data_nodes)
["'[excel.xlsx]'!INPUT_A",
 "'[excel.xlsx]'!INPUT_B",
 "'[excel.xlsx]'!INPUT_C",
 "'[excel.xlsx]DATA'!A2",
 "'[excel.xlsx]DATA'!A3",
 "'[excel.xlsx]DATA'!A3:A4",
 "'[excel.xlsx]DATA'!A4",
 "'[excel.xlsx]DATA'!B2",
 "'[excel.xlsx]DATA'!B3",
 "'[excel.xlsx]DATA'!B4",
 "'[excel.xlsx]DATA'!C2",
 "'[excel.xlsx]DATA'!D2"]

[graph]

JSON export/import

The ExcelModel can be exported/imported to/from a readable JSON format. The reason of this functionality is to have format that can be easily maintained (e.g. using version control programs like git). Follows an example on how to export/import to/from JSON an ExcelModel:

>>> import json
>>> xl_dict = xl_model.to_dict()  # To JSON-able dict.
>>> xl_dict  # Exported format.
{
 "'[excel.xlsx]DATA'!A1": "inputs",
 "'[excel.xlsx]DATA'!B1": "Intermediate",
 "'[excel.xlsx]DATA'!C1": "outputs",
 "'[excel.xlsx]DATA'!D1": "defaults",
 "'[excel.xlsx]DATA'!A2": 2,
 "'[excel.xlsx]DATA'!D2": 1,
 "'[excel.xlsx]DATA'!A3": 6,
 "'[excel.xlsx]DATA'!A4": 5,
 "'[excel.xlsx]DATA'!B2": "=('[excel.xlsx]DATA'!A2 + '[excel.xlsx]DATA'!A3)",
 "'[excel.xlsx]DATA'!C2": "=(('[excel.xlsx]DATA'!B2 / '[excel.xlsx]DATA'!B3) + '[excel.xlsx]DATA'!D2)",
 "'[excel.xlsx]DATA'!B3": "=('[excel.xlsx]DATA'!B2 - '[excel.xlsx]DATA'!A3)",
 "'[excel.xlsx]DATA'!C3": "=(('[excel.xlsx]DATA'!C2 * '[excel.xlsx]DATA'!A2) + '[excel.xlsx]DATA'!D3)",
 "'[excel.xlsx]DATA'!D3": "=(1 + '[excel.xlsx]DATA'!D2)",
 "'[excel.xlsx]DATA'!B4": "=MAX('[excel.xlsx]DATA'!A3:A4, '[excel.xlsx]DATA'!B2)",
 "'[excel.xlsx]DATA'!C4": "=(('[excel.xlsx]DATA'!B3 ^ '[excel.xlsx]DATA'!C2) + '[excel.xlsx]DATA'!D4)",
 "'[excel.xlsx]DATA'!D4": "=(1 + '[excel.xlsx]DATA'!D3)"
}
>>> xl_json = json.dumps(xl_dict, indent=True)  # To JSON.
>>> xl_model = formulas.ExcelModel().from_dict(json.loads(xl_json))  # From JSON.

Custom functions

An example how to add a custom function to the formula parser is the following:

>>> import formulas
>>> FUNCTIONS = formulas.get_functions()
>>> FUNCTIONS['MYFUNC'] = lambda x, y: 1 + y + x
>>> func = formulas.Parser().ast('=MYFUNC(1, 2)')[1].compile()
>>> func()
4

Advanced Examples

Formulas can also be embedded as a calculation engine inside lightweight applications and automated workflows, without requiring Excel or another spreadsheet GUI.

Minimal Flask integration

This example loads a workbook once, exposes it through a Flask application, and calls the JSON API through a test client.

from formulas.app import create_app

app = create_app(files=('test/test_files/excel.xlsx',), circular=False)
client = app.test_client()
response = client.post('/api/calculate', json={
    'inputs': {
        "'[excel.xlsx]'!INPUT_A": 3,
        "'[excel.xlsx]DATA'!B3": 1,
    },
    'renders': ["'[excel.xlsx]DATA'!C2=result"],
})

assert response.status_code == 200
assert response.get_json()['outputs'] == {'result': 10.0}

Batch automation

This example creates a temporary batch file and runs formulas calc over two scenarios.

import json
import subprocess
import sys
import tempfile
from pathlib import Path

with tempfile.TemporaryDirectory() as tmp:
    batch = Path(tmp) / 'batch.json'
    batch.write_text(json.dumps([
        {
            'name': 'base',
            'overwrite': {
                "'[excel.xlsx]'!INPUT_A": 3,
                "'[excel.xlsx]DATA'!B3": 1,
            },
            'renders': ["'[excel.xlsx]DATA'!C2=result"],
        },
        {
            'name': 'stress',
            'overwrite': {
                "'[excel.xlsx]'!INPUT_A": 4,
                "'[excel.xlsx]DATA'!B3": 1,
            },
            'renders': ["'[excel.xlsx]DATA'!C2=result"],
        },
    ], indent=2))

    result = subprocess.run([
        sys.executable, '-m', 'formulas.cli', 'calc',
        'test/test_files/excel.xlsx',
        '--batch', str(batch),
        '--processes', '2',
        '--output-format', 'json',
        '--output-dir', tmp,
    ], capture_output=True, text=True, check=False)

    assert result.returncode == 0, result.stderr
    summary = json.loads(result.stdout)
    assert [item['name'] for item in summary] == ['base', 'stress']

ETL transformer

This example treats a workbook as a transformation step over structured input records.

import formulas

model = formulas.ExcelModel().loads('test/test_files/excel.xlsx').finish()
func = model.compile(
    inputs=["'[excel.xlsx]'!INPUT_A", "'[excel.xlsx]DATA'!B3"],
    outputs=["'[excel.xlsx]DATA'!C2"],
)
records = [
    {'id': 'row-1', 'input_a': 3, 'b3': 1},
    {'id': 'row-2', 'input_a': 4, 'b3': 1},
]
results = []

for record in records:
    result, = func(record['input_a'], record['b3'])
    results.append({
        'id': record['id'],
        'result': result.value[0, 0],
    })

assert results == [
    {'id': 'row-1', 'result': 10.0},
    {'id': 'row-2', 'result': 11.0},
]

Excel Function Coverage

The current Excel function coverage is tracked in the test workbook test/test_files/test.xlsx, sheet COVERAGE. The table below summarizes the current implementation status by category.

Category

Implemented

Total

Coverage

AUTOMATION

0

3

0.0%

COMPATIBILITY

40

40

100.0%

CUBE

0

7

0.0%

DATABASE

0

12

0.0%

DATE & TIME

25

25

100.0%

ENGINEERING

54

54

100.0%

FINANCIAL

55

55

100.0%

INFORMATION

16

22

72.7%

LOGICAL

19

19

100.0%

LOOKUP

33

40

82.5%

MATH & TRIG

71

80

88.8%

STATISTICAL

111

111

100.0%

TEXT

44

50

88.0%

WEB

0

3

0.0%

OPERATORS

15

15

100.0%

TOTAL

483

536

90.1%

Overall coverage is currently 483 out of 536 functions (90.1%).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

formulas-1.3.4.tar.gz (149.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

formulas-1.3.4-py2.py3-none-any.whl (126.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file formulas-1.3.4.tar.gz.

File metadata

  • Download URL: formulas-1.3.4.tar.gz
  • Upload date:
  • Size: 149.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for formulas-1.3.4.tar.gz
Algorithm Hash digest
SHA256 018ee4c43056d2ea2497243e83bb61d92c0f1039ca8eb82a8f0e969562e3e8e9
MD5 4722cf4382389b0d9ef3a615c10a2b46
BLAKE2b-256 a10387e2931f7e134cfffb6c6199003ce1a59feeea5160abd774dfad1c19e1ef

See more details on using hashes here.

File details

Details for the file formulas-1.3.4-py2.py3-none-any.whl.

File metadata

  • Download URL: formulas-1.3.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 126.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for formulas-1.3.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ee2c9795217885158d9ce8d68a78ea0fb0a96e0814d491aa83759ef96e5645a0
MD5 09ea5ab68a66527aa88468f18c56e57e
BLAKE2b-256 96a04ddef4c1e4ccfa320f8780070fd6364b63a094b0ca6afd5d6fcf8c440669

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page