PyStellarDB

Python interface to StellarDB

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PyStellarDB

PyStellarDB is a Python API for executing Transwarp Exetended OpenCypher(TEoC) and Hive query. It could also generate a RDD object which could be used in PySpark. It is base on PyHive(https://github.com/dropbox/PyHive) and PySpark(https://github.com/apache/spark/)

PySpark RDD

We hack a way to generate RDD object using the same method in sc.parallelize(data). It could cause memory panic if the query returns a large amount of data.

Users could use a workaround if you do need huge data: 1. If you are querying a graph, refer to StellarDB manual of Chapter 4.4.5 to save the query data into a temporary table. 2. If you are querying a SQL table, save your query result into a temporary table. 3. Find the HDFS path of the temporary table generated in Step 1 or Step 2. 4. Use API like sc.newAPIHadoopFile() to generate RDD.

Usage

PLAIN Mode (No security is configured)

from pystellardb import stellar_hive

conn = stellar_hive.StellarConnection(host="localhost", port=10000, graph_name='pokemon')
cur = conn.cursor()
cur.execute('config query.lang cypher')
cur.execute('use graph pokemon')
cur.execute('match p = (a)-[f]->(b) return a,f,b limit 1')

print cur.fetchall()

LDAP Mode

from pystellardb import stellar_hive

conn = stellar_hive.StellarConnection(host="localhost", port=10000, username='hive', password='123456', auth='LDAP', graph_name='pokemon')
cur = conn.cursor()
cur.execute('config query.lang cypher')
cur.execute('use graph pokemon')
cur.execute('match p = (a)-[f]->(b) return a,f,b limit 1')

print cur.fetchall()

Kerberos Mode

# Make sure you have the correct realms infomation about the KDC server in /etc/krb5.conf
# Make sure you have the correct keytab file in your environment
# Run kinit command:
# In Linux: kinit -kt FILE_PATH_OF_KEYTABL PRINCIPAL_NAME
# In Mac: kinit -t FILE_PATH_OF_KEYTABL -f PRINCIPAL_NAME

from pystellardb import stellar_hive

conn = stellar_hive.StellarConnection(host="localhost", port=10000, kerberos_service_name='hive', auth='KERBEROS', graph_name='pokemon')
cur = conn.cursor()
cur.execute('config query.lang cypher')
cur.execute('use graph pokemon')
cur.execute('match p = (a)-[f]->(b) return a,f,b limit 1')

print cur.fetchall()

Execute Hive Query

from pystellardb import stellar_hive

# If `graph_name` parameter is None, it will execute a Hive query and return data just as PyHive does
conn = stellar_hive.StellarConnection(host="localhost", port=10000, database='default')
cur = conn.cursor()
cur.execute('SELECT * FROM default.abc limit 10')

Execute Graph Query and change to a PySpark RDD object

from pyspark import SparkContext
from pystellardb import stellar_hive

sc = SparkContext("local", "Demo App")

conn = stellar_hive.StellarConnection(host="localhost", port=10000, graph_name='pokemon')
cur = conn.cursor()
cur.execute('config query.lang cypher')
cur.execute('use graph pokemon')
cur.execute('match p = (a)-[f]->(b) return a,f,b limit 10')

rdd = cur.toRDD(sc)

def f(x): print(x)

rdd.map(lambda x: (x[0].toJSON(), x[1].toJSON(), x[2].toJSON())).foreach(f)

# Every line of this query is in format of Tuple(VertexObject, EdgeObject, VertexObject)
# Vertex and Edge object has a function of toJSON() which can print the object in JSON format

Execute Hive Query and change to a PySpark RDD object

from pyspark import SparkContext
from pystellardb import stellar_hive

sc = SparkContext("local", "Demo App")

conn = stellar_hive.StellarConnection(host="localhost", port=10000)
cur = conn.cursor()
cur.execute('select * from default_db.default_table limit 10')

rdd = cur.toRDD(sc)

def f(x): print(x)

rdd.foreach(f)

# Every line of this query is in format of Tuple(Column, Column, Column)

Dependencies

Required:

Python 2.7+ / Python 3

System SASL

Different systems require different packages to be installed to enable SASL support in Impyla. Some examples of how to install the packages on different distributions follow.

Ubuntu:

apt-get install libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit
apt-get install python-dev gcc              #Update python and gcc if needed

RHEL/CentOS:

yum install cyrus-sasl-md5 cyrus-sasl-plain cyrus-sasl-gssapi cyrus-sasl-devel
yum install gcc-c++ python-devel.x86_64     #Update python and gcc if needed

Requirements

Install using

pip install 'pystellardb[hive]' for the Hive interface.

PyHive works with

For Hive: HiveServer2 daemon

Testing

On his way

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.13.0

Oct 12, 2023

0.12.1

Oct 12, 2023

0.12.0

Sep 7, 2023

0.11.0

Mar 22, 2022

0.10.0

May 27, 2021

0.0.9

Dec 23, 2020

This version

0.0.8

Dec 19, 2020

0.0.7

Jul 20, 2020

0.0.6

Jul 13, 2020

0.0.5

Jul 13, 2020

0.0.4

Jul 13, 2020

0.0.3

Jun 15, 2020

0.0.2

Jun 15, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyStellarDB-0.0.8.tar.gz (26.4 kB view hashes)

Uploaded Dec 19, 2020 Source

Built Distribution

PyStellarDB-0.0.8-py2.py3-none-any.whl (11.0 kB view hashes)

Uploaded Dec 19, 2020 Python 2 Python 3

Hashes for PyStellarDB-0.0.8.tar.gz

Hashes for PyStellarDB-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`c412c8ebeafd02e99d5e8aa7887e25f9c04fef64fae01c1da4f9c5f1a183ad39`
MD5	`98672b9b1700e25ce584ca6a3540f668`
BLAKE2b-256	`92b279b754fbe8e8db9186f8319eb2c22ffedd8877d56c15e56c63b13c165b20`

Hashes for PyStellarDB-0.0.8-py2.py3-none-any.whl

Hashes for PyStellarDB-0.0.8-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`8ad1892af01c4f8cc793fa8df85568981cb5fe69248c86debb05ef0246d42cfc`
MD5	`32e2c50778953e9dfaa8bc66d73b42b8`
BLAKE2b-256	`abc4a2ef1aa0513de679f344e5a52d3ad5d326dcb57585792150c4a16d734dd6`