PyStellarDB·PyPI

Python interface to StellarDB

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Topic
- Database :: Front-Ends

Project description

PyStellarDB

PyStellarDB is a Python API for executing Transwarp Exetended OpenCypher(TEoC) and Hive query. It could also generate a RDD object which could be used in PySpark. It is base on PyHive(https://github.com/dropbox/PyHive) and PySpark(https://github.com/apache/spark/)

PySpark RDD

We hack a way to generate RDD object using the same method in sc.parallelize(data). It could cause memory panic if the query returns a large amount of data.

Users could use a workaround if you do need huge data:

If you are querying a graph, refer to StellarDB manual of Chapter 4.4.5 to save the query data into a temporary table.
If you are querying a SQL table, save your query result into a temporary table.
Find the HDFS path of the temporary table generated in Step 1 or Step 2.
Use API like sc.newAPIHadoopFile() to generate RDD.

Usage

PLAIN Mode (No security is configured)

"""version < 1.0 """
from pystellardb import stellar_hive

conn = stellar_hive.StellarConnection(host="localhost", port=10000, graph_name='pokemon')
cur = conn.cursor()
cur.execute('config query.lang cypher')
cur.execute('use graph pokemon')
cur.execute('match p = (a)-[f]->(b) return a,f,b limit 1')

print cur.fetchall()

"""version >= 1.0 """
from pystellardb import Connection, Graph
conn = stellar_hive.StellarConnection(host="localhost", port=10000)
graph = Graph('pokemon', conn)
query_result = graph.execute('match p = (a)-[f]->(b) return a,f,b limit 1')
for row in query_result:
    print(row[0].toJSON(), row[1].toJSON(), row[2].toJSON())

LDAP Mode

"""version < 1.0 """
from pystellardb import stellar_hive

conn = stellar_hive.StellarConnection(host="localhost", port=10000, username='hive', password='123456', auth='LDAP', graph_name='pokemon')
cur = conn.cursor()
cur.execute('config query.lang cypher')
cur.execute('use graph pokemon')
cur.execute('match p = (a)-[f]->(b) return a,f,b limit 1')

print cur.fetchall()

"""version >= 1.0 """
from pystellardb import Connection, Graph
conn = stellar_hive.StellarConnection(host="localhost", port=10000, username='hive', password='123456', auth='LDAP')
graph = Graph('pokemon', conn)
query_result = graph.execute('match p = (a)-[f]->(b) return a,f,b limit 1')
for row in query_result:
    print(row[0].toJSON(), row[1].toJSON(), row[2].toJSON())

Kerberos Mode

# Make sure you have the correct realms infomation about the KDC server in /etc/krb5.conf
# Make sure you have the correct keytab file in your environment
# Run kinit command:
# In Linux: kinit -kt FILE_PATH_OF_KEYTABL PRINCIPAL_NAME
# In Mac: kinit -t FILE_PATH_OF_KEYTABL -f PRINCIPAL_NAME

# Run with Kerberos path environment variables
# ENV KRB5_CONFIG=/etc/krb5.conf
# ENV KRB5_CLIENT_KTNAME=/etc/krb5.keytab
# ENV KRB5_KTNAME=/etc/krb5.keytab

"""version < 1.0 """
from pystellardb import stellar_hive

conn = stellar_hive.StellarConnection(host="localhost", port=10000, kerberos_service_name='hive', auth='KERBEROS', graph_name='pokemon')
cur = conn.cursor()
cur.execute('config query.lang cypher')
cur.execute('use graph pokemon')
cur.execute('match p = (a)-[f]->(b) return a,f,b limit 1')

print cur.fetchall()

"""version >= 1.0 """
from pystellardb import Connection, Graph
conn = stellar_hive.StellarConnection(host="localhost", port=10000, kerberos_service_name='hive', auth='KERBEROS')
graph = Graph('pokemon', conn)
query_result = graph.execute('match p = (a)-[f]->(b) return a,f,b limit 1')
for row in query_result:
    print(row[0].toJSON(), row[1].toJSON(), row[2].toJSON())

Execute Hive Query

from pystellardb import stellar_hive

# If `graph_name` parameter is None, it will execute a Hive query and return data just as PyHive does
conn = stellar_hive.StellarConnection(host="localhost", port=10000, database='default')
cur = conn.cursor()
cur.execute('SELECT * FROM default.abc limit 10')

Execute Graph Query and change to a PySpark RDD object

"""version < 1.0 """
from pyspark import SparkContext
from pystellardb import stellar_hive

sc = SparkContext("local", "Demo App")

conn = stellar_hive.StellarConnection(host="localhost", port=10000, graph_name='pokemon')
cur = conn.cursor()
cur.execute('config query.lang cypher')
cur.execute('use graph pokemon')
cur.execute('match p = (a)-[f]->(b) return a,f,b limit 10')

rdd = cur.toRDD(sc)

def f(x): print(x)

rdd.map(lambda x: (x[0].toJSON(), x[1].toJSON(), x[2].toJSON())).foreach(f)

"""version >= 1.0 """
from pyspark import SparkContext
from pystellardb import Connection, Graph

sc = SparkContext("local", "Demo App")


conn = stellar_hive.StellarConnection(host="localhost", port=10000)
graph = Graph('pokemon', conn)
query_result = graph.execute('match p = (a)-[f]->(b) return a,f,b limit 1')
rdd = query_result.toRDD(sc)

def f(x): print(x)

rdd.map(lambda x: (x[0].toJSON(), x[1].toJSON(), x[2].toJSON())).foreach(f)

# Every line of this query is in format of Tuple(VertexObject, EdgeObject, VertexObject)
# Vertex and Edge object has a function of toJSON() which can print the object in JSON format

Execute Hive Query and change to a PySpark RDD object

from pyspark import SparkContext
from pystellardb import stellar_hive

sc = SparkContext("local", "Demo App")

conn = stellar_hive.StellarConnection(host="localhost", port=10000)
cur = conn.cursor()
cur.execute('select * from default_db.default_table limit 10')

rdd = cur.toRDD(sc)

def f(x): print(x)

rdd.foreach(f)

# Every line of this query is in format of Tuple(Column, Column, Column)

Dependencies

Required:

Python 3.6+

System SASL(Depricated since 1.0):

Ubuntu:

apt-get install libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit
apt-get install python3-dev gcc              #Update python and gcc if needed

RHEL/CentOS:

yum install cyrus-sasl-md5 cyrus-sasl-plain cyrus-sasl-gssapi cyrus-sasl-devel
yum install gcc-c++ python3-devel.x86_64     #Update python and gcc if needed

# if pip3 install fails with a message like 'Can't connect to HTTPS URL because the SSL module is not available'
# you may need to update ssl & reinstall python

# 1. Download a higher version of openssl, e.g: https://www.openssl.org/source/openssl-1.1.1k.tar.gz
# 2. Install openssl: ./config && make && make install
# 3. Link openssl: echo /usr/local/lib64/ > /etc/ld.so.conf.d/openssl-1.1.1.conf
# 4. Update dynamic lib: ldconfig -v
# 5. Uninstall Python & Download a new Python source package
# 6. vim Modules/Setup, search '_socket socketmodule.c', uncomment
#    _socket socketmodule.c
#    SSL=/usr/local/ssl
#    _ssl _ssl.c \
#            -DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
#            -L$(SSL)/lib -lssl -lcrypto
#
# 7. Install Python: ./configure && make && make install

Windows:

# There are 3 ways of installing sasl for python on windows
# 1. (recommended) Download a .whl version of sasl from https://www.lfd.uci.edu/~gohlke/pythonlibs/#sasl
# 2. (recommended) If using anaconda, use conda install sasl.
# 3. Install Microsoft Visual C++ 9.0/14.0 buildtools for python2.7/3.x, then pip install sasl.

Notices

Pystellardb >= 0.9 contains beeline installation to /usr/local/bin/beeline.

Requirements

Install using

pip install 'pystellardb[hive]' for the Hive interface.

PyHive works with

For Hive: HiveServer2 daemon

Windows Kerberos Configuration

Windows Kerberos configuration can be a little bit tricky and may need a few instructions. First, you’ll need to install & configure Kerberos for Windows. Get it from http://web.mit.edu/kerberos/dist/

After installation, configure the environment variables. Make sure the position of your Kerberos variable is ahead of JDK variable, avoid using kinit command located in JDK path.

Find /etc/krb5.conf on your KDC, copy it into krb5.ini on Windows with some modifications. e.g.(krb5.conf on KDC):

[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log

[libdefaults]
default_realm = DEFAULT
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
allow_weak_crypto = true
udp_preference_limit = 32700
default_ccache_name = FILE:/tmp/krb5cc_%{uid}

[realms]
DEFAULT = {
kdc = host1:1088
kdc = host2:1088
}

Modify it, delete [logging] and default_ccache_name in [libdefaults]:

[libdefaults]
default_realm = DEFAULT
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
allow_weak_crypto = true
udp_preference_limit = 32700

[realms]
DEFAULT = {
kdc = host1:1088
kdc = host2:1088
}

Above is your krb5.ini for Kerberos on Windows. Put it at 3 places:

C:ProgramDataMITKerberos5krb5.ini

C:Program FilesMITKerberoskrb5.ini

C:Windowskrb5.ini

Finally, configure hosts file at: C:/Windows/System32/drivers/etc/hosts Add ip mappings of host1, host2 in the previous example. e.g.

10.6.6.96     host1
10.6.6.97     host2

Now, you can try running kinit in your command line!

Testing

On his way

Project details

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Topic
- Database :: Front-Ends

Release history Release notifications | RSS feed

This version

1.0.1

May 9, 2025

1.0

May 6, 2025

0.13.5

Dec 17, 2024

0.13.4

Oct 30, 2024

0.13.3

Sep 11, 2024

0.13.2

Sep 5, 2024

0.13.1

Aug 20, 2024

0.13.0

Oct 12, 2023

0.12.1

Oct 12, 2023

0.12.0

Sep 7, 2023

0.11.0

Mar 22, 2022

0.10.0

May 27, 2021

0.0.9

Dec 23, 2020

0.0.8

Dec 19, 2020

0.0.7

Jul 20, 2020

0.0.6

Jul 13, 2020

0.0.5

Jul 13, 2020

0.0.4

Jul 13, 2020

0.0.3

Jun 15, 2020

0.0.2

Jun 15, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pystellardb-1.0.1.tar.gz (31.3 kB view details)

Uploaded May 9, 2025 Source

Built Distribution

PyStellarDB-1.0.1-py2.py3-none-any.whl (19.7 kB view details)

Uploaded May 9, 2025 Python 2Python 3

File details

Details for the file pystellardb-1.0.1.tar.gz.

File metadata

Download URL: pystellardb-1.0.1.tar.gz
Upload date: May 9, 2025
Size: 31.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pystellardb-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e16297c8fc20ca7b4f2782cd856629179211eff731df61ed586a1f200c844eb5`
MD5	`4204fe5571f97e3f3673308874e3d008`
BLAKE2b-256	`36a140e689024a69020b04457fdce5a502bea79a4bd674f8627527cb0fe841fb`

See more details on using hashes here.

File details

Details for the file PyStellarDB-1.0.1-py2.py3-none-any.whl.

File metadata

Download URL: PyStellarDB-1.0.1-py2.py3-none-any.whl
Upload date: May 9, 2025
Size: 19.7 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for PyStellarDB-1.0.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`85a53e1af06d887da39068f241f6ae4792221eaf4c7f70af0dee18b1f4134a2a`
MD5	`1386df8032c4e5f109c671e9b856802b`
BLAKE2b-256	`26eb15f432d862e9cd78707d01eeb6c0b16b37bfd2c22b06b1c7ab4496ad3880`

See more details on using hashes here.

PyStellarDB 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyStellarDB

PySpark RDD

Usage

PLAIN Mode (No security is configured)

LDAP Mode

Kerberos Mode

Execute Hive Query

Execute Graph Query and change to a PySpark RDD object

Execute Hive Query and change to a PySpark RDD object

Dependencies

Required:

System SASL(Depricated since 1.0):

Notices

Requirements

Windows Kerberos Configuration

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes