Skip to main content

RWTH Aachen Computer Science i5/dbis assets for Lecture Datenbanken und Informationssysteme

Project description

DBIS Relational Algebra

pypi PyPI Status

This library provides a Python implementation of the relational algebra.

Features

  • Create expressions of the relational algebra in python.
  • Load data from SQLite tables.
  • Evaluate expressions on the data.
  • Convert these expressions to text in LaTeX math mode.
  • Convert a relation / the result of an expression to a Markdown table.

Installation

Install via pip:

pip install dbis-relational-algebra

Usage

Overview of supported operators

  • Cross Product / Cartesian Product (*)
  • Difference (-)
  • Division (/)
  • Intersection (&)
  • Left Semijoin
  • Natural Join
  • Projection
  • Rename
  • Right Semijoin
  • Selection
  • Theta Join
  • Union (|)

The set operators Union, Intersection, and Difference require the relations to be union-compatible.

Formulas

For the Theta Join and the Selection, a formula is used to specify the join or selection condition. These formulas can be created using the following operators:

  • And
  • Or
  • Not
  • Equals
  • GreaterEquals
  • GreaterThan
  • LessEquals
  • LessThan

In the comparators, two values have to be specified. At least one of these values must be a python str, which references a column of the relation.

Loading data & Evaluating an expression

To load data, an SQLite connection can be used (recommended). This connection must be passed to the relational algebra expression for the evaluation.

It is also possible to load a relation with data by hand (not recommended):

relation = Relation(name="R")
relation.add_attributes(["a", "b", "c"])
relation.add_rows([
	[1, 2, 3],
	[4, 5, 6],
	[7, 8, 9],
])

An expression can be created by using the operators and formulas listed above. The expression can then be evaluated on the data:

# Cross Product RxS, see above
expression = Relation("R") * Relation("S")
result = expression.evaluate(sql_con=connection)
# Theta Join R.a = S.b, see above
expression = ThetaJoin("R", "S", Not(Equals("R.a", "S.b")))
result = expression.evaluate(sql_con=connection)

The rows and column names of a relation (result) are accessible using the following attributes:

result.attributes # list of column names (str)
result.rows # set of rows (tuple)

Best practices:

  • After joining two relations or the cross product of two relations, you should always give column names that appear in both relations a new distinct name.
  • After joining two relations, the cross product of two relations, or some set operation on two relations, you should always give the resulting relation a new distinct name.
  • When referencing a column in a comparator, it is recommended that this column should be referred to using a detailed description, i.e. refer to column a of relation R as "R.a" instead of "a".

Developer Notes

A few design choices were made:

  • Internally, the data is stored in a pandas DataFrame. This accelerates the relational algebra operators greatly.
  • In relational algebra, a column a from a relation R can be referred to as a and R.a. Internally, the column name is always stored using the full name, i.e. R.a. This is done to avoid ambiguities when a column a is present in multiple relations.
  • When joining two relations (or also cross product), the relational algebra provides no guidelines on how the resulting relation should be named. Thus, if a is a column of relation R, joining relations R and S results in a relation, where R.a and S.a might refer to this column a (depending on if a also references a column in S). Thus, generally speaking, joining two relations R and S will internally result in a relation named R+S, and the column R.a will now be named R+S.a (if there is no column S.a).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbis-relational-algebra-1.1.6.tar.gz (18.8 kB view hashes)

Uploaded Source

Built Distribution

dbis_relational_algebra-1.1.6-py3-none-any.whl (28.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page