Skip to main content

TenSQL

Project description

TenSQL

Relational Database Management Systems (RDBMS) have been the most prominent form of database in the world for several decades. While relational databases are often applied within high-frequency/low-volume transactional applications such as website backends, the poor performance of relational databases on low-frequency/high-volume queries often precludes their application to big data analysis fields like graph analytics. This work explores the construction of an RDBMS solution that uses the GraphBLAS API to execute Structured Query Language (SQL) in an effort to improve performance on high-volume queries. Tables are redefined to be collections of sparse scalars, vectors, matrices, and more generally sparse tensors. The explicit values (nonzeros) in these sparse tensors define the rows and NULL values within the tables. A prototype database called TenSQL was constructed and evaluated against several SQL implementations including PostgreSQL. Preliminary results comparing the performance on queries common in graph analysis applications offer performance improvements as high as 1,400x over PostgreSQL for moderately sized datasets when returning results in a columnar format.

Authors

TenSQL was created by Sandia National Laboratories, with assistance provided by the University of Utah.

Installation

TenSQL has only been tested with Python 3.9. Python 3.10 is too new for the version of numpy supported by pygraphblas.

To install from PyPI:

pip install tensql

To install from source:

git clone 'https://github.com/sandialabs/TenSQL.git'
cd TenSQL
pip install .

Testing

To run the tests, you must first clone the sourcecode from github, and then build the extensions and install testing dependencies.

git clone 'https://github.com/sandialabs/TenSQL.git'
cd TenSQL
python setup.py build_ext --inplace
pip install -e ".[test]"

The tests can then be run either with the run_tests.py script which outputs code coverage information:

python3 run_tests.py

Or via python's built-in unittest module

python3 -m unittest -v tensql.test

Specific tests can be run via the unittest module:

python3 -m unittest -v tensql.test.test_queries.xAy.TestQuery_xAy

Note: Certain tests for memory leaks can take about a minute to execute.

Running Benchamrks

To run the benchmarks, you must first clone the sourcecode from github, and then build the extensions and install testing dependencies.

git clone 'https://github.com/sandialabs/TenSQL.git'
cd TenSQL
python setup.py build_ext --inplace
pip install -e ".[test,benchmark]"

You must also install PostgreSQL 15 to run the postgres tests.

Once installed, you can run the benchmarks via slurm with:

bash download_benchmark_data.sh
bash benchmark_twohop.sh
bash benchmark_ingest_and_named_edges.sh

Alternatively, you can run single benchmarks (without slurm) like this:

bash download_benchmark_data.sh
bash single_twohop.sh "`pwd`/tmp" "`pwd`/results" all
bash single_ingest_and_named_edges.sh "`pwd`/tmp" "`pwd`/results" all

Note: You will likely need to tune the settings in postgresql.conf if your system has less memory than our benchmarking system.

Note: Running some benchmarks requires a very large amount of memory (hundreds of gigabytes).

Citing TenSQL

TenSQL was described in the paper "An SQL Database Built on GraphBLAS", which was accepeted by the IEEE High Performance Extreme Computing Virtual Conference in September 2023. It has not yet been published in IEEE Xplore.

Roose, J. P., Vaidya, M., Sadayappan, P., & Rajamanickam, S. (2023). TenSQL: An SQL Database Built on GraphBLAS. 
IEEE High Performance Extreme Computing Virtual Conference, forthcoming.

BSD 3-Clause License

Copyright 2023 National Technology & Engineering Solutions of Sandia, LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S. Government retains certain rights in this software.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tensql-1.0.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl (427.3 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64 manylinux: glibc 2.5+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page