Skip to main content
Join the official Python Developers Survey 2018 and win valuable prizes: Start the survey!

Usefull functions for working with Database in PySpark (PostgreSQL, ClickHouse)

Project description

# pyspark_db_utils

It helps you with your DB deals in Spark

## Documentation

http://pyspark-db-utils.readthedocs.io/en/latest/

## Example of using

You need jdbc drivers for using this lib!
Just get drivers from
https://jdbc.postgresql.org/download.html
https://github.com/yandex/clickhouse-jdbc
and put it in jars/ directory in your project

### Example settings:
```
settings = {
"PG_PROPERTIES": {
"user": "user",
"password": "pass",
"driver": "org.postgresql.Driver"
},
"PG_DRIVER_PATH": "jars/postgresql-42.1.4.jar",
"PG_URL": "jdbc:postgresql://db.olabs.com/dbname",
}
```

### Example of code

see example.py

### Example of run
```
vsmelov@vsmelov:~/PycharmProjects/pyspark_db_utils$ mkdir jars
vsmelov@vsmelov:~/PycharmProjects/pyspark_db_utils$ cp /var/bigdata/spark-2.2.0-bin-hadoop2.7/jars/postgresql-42.1.4.jar ./jars/
vsmelov@vsmelov:~/PycharmProjects/pyspark_db_utils$ python3 pyspark_db_utils/example.py
host: ***SECRET***
db: ***SECRET***
user: ***SECRET***
password: ***SECRET***

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/03/05 11:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/05 11:43:29 WARN Utils: Your hostname, vsmelov resolves to a loopback address: 127.0.1.1; using 192.168.43.26 instead (on interface wlp2s0)
18/03/05 11:43:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
TRY: create df
OK: create df
+---+-----------+
| id| mono_id|
+---+-----------+
| 1| 0|
| 2| 1|
| 3| 2|
| 4| 3|
| 5| 8589934592|
| 6| 8589934593|
| 7| 8589934594|
| 8| 8589934595|
| 9| 8589934596|
| 10|17179869184|
| 11|17179869185|
| 12|17179869186|
| 13|17179869187|
| 14|17179869188|
| 15|25769803776|
| 16|25769803777|
| 17|25769803778|
| 18|25769803779|
| 19|25769803780|
+---+-----------+


TRY: write_to_pg
OK: write_to_pg

TRY: read_from_pg
OK: read_from_pg
+---+-----------+
| id| mono_id|
+---+-----------+
| 10|17179869184|
| 11|17179869185|
| 12|17179869186|
| 13|17179869187|
| 14|17179869188|
| 1| 0|
| 2| 1|
| 3| 2|
| 4| 3|
| 5| 8589934592|
| 6| 8589934593|
| 7| 8589934594|
| 8| 8589934595|
| 9| 8589934596|
| 15|25769803776|
| 16|25769803777|
| 17|25769803778|
| 18|25769803779|
| 19|25769803780|
| 1| 0|
+---+-----------+
only showing top 20 rows

```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pyspark_db_utils-0.0.7.tar.gz (6.3 MB) Copy SHA256 hash SHA256 Source None May 24, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page