Skip to main content

Rainbow Query Language

Project description

RBQL (RainBow Query Language) Description

RBQL is a technology which provides SQL-like language that supports SELECT and UPDATE queries with Python expressions.

Official Site

Installation:

$ pip install rbql

Usage example:

$ rbql-py --query "select a1, a2 order by a1" < input.tsv

Main Features

  • Use Python expressions inside SELECT, UPDATE, WHERE and ORDER BY statements
  • Result set of any query immediately becomes a first-class table on it's own.
  • Output entries appear in the same order as in input unless ORDER BY is provided.
  • Input csv/tsv spreadsheet may contain varying number of entries (but select query must be written in a way that prevents output of missing values)
  • Works out of the box, no external dependencies.

Supported SQL Keywords (Keywords are case insensitive)

  • SELECT [ TOP N ] [ DISTINCT [ COUNT ] ]
  • UPDATE [ SET ]
  • WHERE
  • ORDER BY ... [ DESC | ASC ]
  • [ [ STRICT ] LEFT | INNER ] JOIN
  • GROUP BY
  • LIMIT N

All keywords have the same meaning as in SQL queries. You can check them online

RBQL-specific keywords, rules and limitations

  • JOIN statements must have the following form: <JOIN_KEYWORD> (/path/to/table.tsv | table_name ) ON ai == bj
  • UPDATE SET is synonym to UPDATE, because in RBQL there is no need to specify the source table.
  • UPDATE has the same meaning as in SQL, but it also can be considered as a special type of SELECT query.
  • TOP and LIMIT have identical meaning. Use whichever you like more.
  • DISTINCT COUNT is like DISTINCT, but adds a new column to the "distinct" result set: number of occurrences of the entry, similar to uniq -c unix command.
  • STRICT LEFT JOIN is like LEFT JOIN, but generates an error if any key in left table "A" doesn't have exactly one matching key in the right table "B".

Special variables

Variable Name Variable Type Variable Description
a1, a2,..., a{N} string Value of i-th column
b1, b2,..., b{N} string Value of i-th column in join table B
NR integer Line number (1-based)
NF integer Number of fields in line

Aggregate functions and queries

RBQL supports the following aggregate functions, which can also be used with GROUP BY keyword:
COUNT(), MIN(), MAX(), SUM(), AVG(), VARIANCE(), MEDIAN()

Limitations

  • Aggregate function are CASE SENSITIVE and must be CAPITALIZED.
  • It is illegal to use aggregate functions inside Python expressions. Although you can use expressions inside aggregate functions. E.g. MAX(float(a1) / 1000) - legal; MAX(a1) / 1000 - illegal.

Examples of RBQL queries

With Python expressions

  • select top 100 a1, int(a2) * 10, len(a4) where a1 == "Buy" order by int(a2)
  • select * order by random.random() - random sort, this is an equivalent of bash command sort -R
  • select top 20 len(a1) / 10, a2 where a2 in ["car", "plane", "boat"] - use Python's "in" to emulate SQL's "in"
  • select len(a1) / 10, a2 where a2 in ["car", "plane", "boat"] limit 20
  • update set a3 = 'US' where a3.find('of America') != -1
  • select * where NR <= 10 - this is an equivalent of bash command "head -n 10", NR is 1-based')
  • select a1, a4 - this is an equivalent of bash command "cut -f 1,4"
  • select * order by int(a2) desc - this is an equivalent of bash command "sort -k2,2 -r -n"
  • select NR, * - enumerate lines, NR is 1-based
  • select * where re.match(".*ab.*", a1) is not None - select entries where first column has "ab" pattern
  • select a1, b1, b2 inner join ./countries.txt on a2 == b1 order by a1, a3 - an example of join query
  • select distinct count len(a1) where a2 != 'US'
  • select MAX(a1), MIN(a1) where a2 != 'US' group by a2, a3

FAQ

How does RBQL work?

RBQL parses SQL-like user query, creates a new python worker module, then imports and executes it.

Explanation of simplified Python version of RBQL algorithm by example.

  1. User enters the following query, which is stored as a string Q:
    SELECT a3, int(a4) + 100, len(a2) WHERE a1 != 'SELL'
  1. RBQL replaces all a{i} substrings in the query string Q with a[{i - 1}] substrings. The result is the following string:
    Q = "SELECT a[2], int(a[3]) + 100, len(a[1]) WHERE a[0] != 'SELL'"
  1. RBQL searches for "SELECT" and "WHERE" keywords in the query string Q, throws the keywords away, and puts everything after these keywords into two variables S - select part and W - where part, so we will get:
    S = "a[2], int(a[3]) + 100, len(a[1])"
    W = "a[0] != 'SELL'"
  1. RBQL has static template script which looks like this:
    for line in sys.stdin:
        a = line.rstrip('\n').split('\t')
        if %%%W_Expression%%%:
            out_fields = [%%%S_Expression%%%]
            print '\t'.join([str(v) for v in out_fields])
  1. RBQL replaces %%%W_Expression%%% with W and %%%S_Expression%%% with S so we get the following script:
    for line in sys.stdin:
        a = line.rstrip('\n').split('\t')
        if a[0] != 'SELL':
            out_fields = [a[2], int(a[3]) + 100, len(a[1])]
            print '\t'.join([str(v) for v in out_fields])
  1. RBQL runs the patched script against user's data file:
    ./tmp_script.py < data.tsv > result.tsv

Result set of the original query (SELECT a3, int(a4) + 100, len(a2) WHERE a1 != 'SELL') is in the "result.tsv" file. It is clear that this simplified version can only work with tab-separated files.

Is this technology reliable?

It should be: RBQL scripts have only 1000 - 2000 lines combined (depending on how you count them) and there are no external dependencies. There is no complex logic, even query parsing functions are very simple. If something goes wrong RBQL will show an error instead of producing incorrect output, also there are currently 5 different warning types.

References

  • rbql-js CLI App for Node.js - npm
  • rbql-py CLI App in python
  • Rainbow CSV extension with integrated RBQL in Visual Studio Code
  • Rainbow CSV extension with integrated RBQL in Vim
  • Rainbow CSV extension with integrated RBQL in Sublime Text 3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rbql-0.3.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rbql-0.3.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file rbql-0.3.0.tar.gz.

File metadata

  • Download URL: rbql-0.3.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.2

File hashes

Hashes for rbql-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e9b2ff7966d4819e2cb82cfd66d1dd4fe4b5b10ac469cffd3e8e7892d9c95603
MD5 47deccbcc3c625890916d7dbeea2026f
BLAKE2b-256 444cdc22ee2cfbed0b6f6c9ced96b745d7b9d9d617916978c2e402c4b3138df1

See more details on using hashes here.

File details

Details for the file rbql-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: rbql-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.2

File hashes

Hashes for rbql-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ae3ba25e563f67e8c344c1b972810882c9bff742bac40b0004055ba1047ef3d
MD5 06919d99fcf770d858c0632c2af07cce
BLAKE2b-256 5bd01adc568de501749e28dada5e71b503873d1311801709319ab3dfae84ba07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page