RWTH Aachen Computer Science i5/dbis assets for Lecture Datenbanken und Informationssysteme
Project description
DBIS Relational Algebra
This library provides a Python implementation of the relational algebra.
Features
- Create relational algebra expressions in Python.
- Load data from SQLite or directly in memory.
- Evaluate expressions an get results as relations.
- Convert expressions to LaTeX math mode.
- Render relations as Markdown tables.
Installation
Install via pip:
pip install dbis-relational-algebra
Loading Data
From SQLite (recommended)
To load data, an SQLite connection can be used. This connection must be passed to the relational algebra expression for the evaluation.
import sqlite3
from relational_algebra import *
connection = sqlite3.connect("example.db")
relation = Relation("R") # uses table R from the database
Manually
It is also possible to load a relation with data by hand
relation = Relation("R")
relation.add_attributes(["a", "b", "c"])
relation.add_rows([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
])
Manual data loading is useful for quick examples but suffers from missings SQLite's optimization which results in higher memory usage and slower performance.
Building and Evaluating Expressions
Operators
- Cross Product / Cartesian Product (
*) - Difference (
-) - Division (
/) - Intersection (
&) - Left/Right Semijoin
- Full/Left/Right Outer Join
- Natural Join
- Projection
- Rename
- Selection
- Theta Join
- Union (
|)
The set operators Union, Intersection, and Difference require the relations to be union-compatible.
Formulas
For the Theta Join and Selection, a formula is used to specify the join or selection condition. These formulas can be created using the following operators:
- And
- Or
- Not
- Equals
- GreaterEquals
- GreaterThan
- LessEquals
- LessThan
In the comparators, two values have to be specified. At least one of these values must be a Python str, which references a column of the relation.
Examples
Example using SQLite to load data
import sqlite3
from relational_algebra import *
connection = sqlite3.connect("example.db")
expr = Projection(
Selection(
Relation("R") * Relation("S"),
Equals("R.b", "S.d")
),
["R.a", "S.c"]
)
result = expr.evaluate(sql_con=connection)
print(result.attributes) # column names
print(result.rows) # data rows
"""
Alternative for displaying if using jupyter notebooks:
display(Markdown(result.tabulate()))
"""
Example that manually loads data
from relational_algebra import *
relation = Relation("R")
relation.add_attributes(["a", "b"])
relation.add_rows([
[1, 2],
[4, 5],
])
relation_2 = Relation("S")
relation_2.add_attributes(["c", "d"])
relation_2.add_rows([
[6, 3],
[4, 2],
])
expr = Projection(
Selection(
Relation("R") * Relation("S"),
Equals("R.b", "S.d")
),
["R.a", "S.c"]
)
result = expr.evaluate(sql_con=connection)
print(result.attributes) # column names
print(result.rows) # data rows
"""
Alternative for displaying if using jupyter notebooks:
display(Markdown(result.tabulate()))
"""
Best Practice
- Before joining two relations or the cross product of two relations, you should always give column names that appear in both relations a new distinct name.
- After joining two relations, the cross product of two relations, or some set operation on two relations, you should always give the resulting relation a new distinct name.
- When referencing a column in a comparator, it is recommended that this column should be referred to using a detailed description, i.e. refer to column
aof relationRas"R.a"instead of"a".
Developer Notes
- Internally, the data is stored in a pandas DataFrame. This accelerates the relational algebra operators greatly.
- In relational algebra, a column
afrom a relationRcan be referred to asaandR.a. Internally, the column name is always stored using the full name, i.e.R.a. This is done to avoid ambiguities when a columnais present in multiple relations. - When joining two relations (or also cross product), the relational algebra provides no guidelines on how the resulting relation should be named. Thus, if
ais a column of relationR, joining relationsRandSresults in a relation, whereR.aandS.amight refer to this columna(depending on ifaalso references a column inS). Thus, generally speaking, joining two relationsRandSwill internally result in a relation namedRS, and the columnR.awill now be namedRS.a(if there is no columnS.a).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbis_relational_algebra-1.1.9.tar.gz.
File metadata
- Download URL: dbis_relational_algebra-1.1.9.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f928dddce15ff76098b700e1d1d3d355754a11acc16f70d84496f3eb229dbbaf
|
|
| MD5 |
c562de4d610fadbe5fa534b377589972
|
|
| BLAKE2b-256 |
5e14c5caea134653cfe446bccf9e5765c95ee5fbcd7335d744855db3e34535b5
|
File details
Details for the file dbis_relational_algebra-1.1.9-py3-none-any.whl.
File metadata
- Download URL: dbis_relational_algebra-1.1.9-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e22dbac291e340a8a83eab9dab1d4b68e822cf2a8fdf2990218568da4cb1a128
|
|
| MD5 |
e580b8e4eb1f400e8ddcd75482b9f5d8
|
|
| BLAKE2b-256 |
593ce75857ace8056394844942a8d200d6cbba1f2bf05af413871c5f6e734d39
|