Skip to main content

RWTH Aachen Computer Science i5/dbis assets for Lecture Datenbanken und Informationssysteme

Project description

DBIS Relational Algebra

pypi PyPI Status

This library provides a Python implementation of the relational algebra.

Features

  • Create relational algebra expressions in Python.
  • Load data from SQLite or directly in memory.
  • Evaluate expressions an get results as relations.
  • Convert expressions to LaTeX math mode.
  • Render relations as Markdown tables.

Installation

Install via pip:

pip install dbis-relational-algebra

Loading Data

From SQLite (recommended)

To load data, an SQLite connection can be used. This connection must be passed to the relational algebra expression for the evaluation.

import sqlite3
from relational_algebra import *

connection = sqlite3.connect("example.db")
relation = Relation("R") # uses table R from the database

Manually

It is also possible to load a relation with data by hand

relation = Relation("R")
relation.add_attributes(["a", "b", "c"])
relation.add_rows([
	[1, 2, 3],
	[4, 5, 6],
	[7, 8, 9],
])

Manual data loading is useful for quick examples but suffers from missings SQLite's optimization which results in higher memory usage and slower performance.

Building and Evaluating Expressions

Operators

  • Cross Product / Cartesian Product (*)
  • Difference (-)
  • Division (/)
  • Intersection (&)
  • Left/Right Semijoin
  • Full/Left/Right Outer Join
  • Natural Join
  • Projection
  • Rename
  • Selection
  • Theta Join
  • Union (|)

The set operators Union, Intersection, and Difference require the relations to be union-compatible.

Formulas

For the Theta Join and Selection, a formula is used to specify the join or selection condition. These formulas can be created using the following operators:

  • And
  • Or
  • Not
  • Equals
  • GreaterEquals
  • GreaterThan
  • LessEquals
  • LessThan

In the comparators, two values have to be specified. At least one of these values must be a Python str, which references a column of the relation.

Examples

Example using SQLite to load data

import sqlite3
from relational_algebra import *

connection = sqlite3.connect("example.db")

expr = Projection(
	Selection(
		Relation("R") * Relation("S"),
		Equals("R.b", "S.d")
	),
	["R.a", "S.c"]
)

result = expr.evaluate(sql_con=connection)

print(result.attributes)  # column names
print(result.rows)        # data rows

"""
Alternative for displaying if using jupyter notebooks:
display(Markdown(result.tabulate()))
"""

Example that manually loads data

from relational_algebra import *

relation = Relation("R")
relation.add_attributes(["a", "b"])
relation.add_rows([
	[1, 2],
	[4, 5],
])
relation_2 = Relation("S")
relation_2.add_attributes(["c", "d"])
relation_2.add_rows([
	[6, 3],
	[4, 2],
])

expr = Projection(
	Selection(
		Relation("R") * Relation("S"),
		Equals("R.b", "S.d")
	),
	["R.a", "S.c"]
)

result = expr.evaluate(sql_con=connection)

print(result.attributes)  # column names
print(result.rows)        # data rows

"""
Alternative for displaying if using jupyter notebooks:
display(Markdown(result.tabulate()))
"""

Best Practice

  • Before joining two relations or the cross product of two relations, you should always give column names that appear in both relations a new distinct name.
  • After joining two relations, the cross product of two relations, or some set operation on two relations, you should always give the resulting relation a new distinct name.
  • When referencing a column in a comparator, it is recommended that this column should be referred to using a detailed description, i.e. refer to column a of relation R as "R.a" instead of "a".

Developer Notes

  • Internally, the data is stored in a pandas DataFrame. This accelerates the relational algebra operators greatly.
  • In relational algebra, a column a from a relation R can be referred to as a and R.a. Internally, the column name is always stored using the full name, i.e. R.a. This is done to avoid ambiguities when a column a is present in multiple relations.
  • When joining two relations (or also cross product), the relational algebra provides no guidelines on how the resulting relation should be named. Thus, if a is a column of relation R, joining relations R and S results in a relation, where R.a and S.a might refer to this column a (depending on if a also references a column in S). Thus, generally speaking, joining two relations R and S will internally result in a relation named RS, and the column R.a will now be named RS.a (if there is no column S.a).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbis_relational_algebra-1.1.9.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbis_relational_algebra-1.1.9-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file dbis_relational_algebra-1.1.9.tar.gz.

File metadata

  • Download URL: dbis_relational_algebra-1.1.9.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for dbis_relational_algebra-1.1.9.tar.gz
Algorithm Hash digest
SHA256 f928dddce15ff76098b700e1d1d3d355754a11acc16f70d84496f3eb229dbbaf
MD5 c562de4d610fadbe5fa534b377589972
BLAKE2b-256 5e14c5caea134653cfe446bccf9e5765c95ee5fbcd7335d744855db3e34535b5

See more details on using hashes here.

File details

Details for the file dbis_relational_algebra-1.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for dbis_relational_algebra-1.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 e22dbac291e340a8a83eab9dab1d4b68e822cf2a8fdf2990218568da4cb1a128
MD5 e580b8e4eb1f400e8ddcd75482b9f5d8
BLAKE2b-256 593ce75857ace8056394844942a8d200d6cbba1f2bf05af413871c5f6e734d39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page