pyspark_db_utils

Usefull functions for working with Database in PySpark (PostgreSQL, ClickHouse)

These details have not been verified by PyPI

Project links

Homepage

Project description

# pyspark_db_utils

It helps you with your DB deals in Spark

## Documentation

http://pyspark-db-utils.readthedocs.io/en/latest/

## Example of using

You need jdbc drivers for using this lib!
Just get drivers from
https://jdbc.postgresql.org/download.html
https://github.com/yandex/clickhouse-jdbc
and put it in jars/ directory in your project

### Example settings:
```
settings = {
"PG_PROPERTIES": {
"user": "user",
"password": "pass",
"driver": "org.postgresql.Driver"
},
"PG_DRIVER_PATH": "jars/postgresql-42.1.4.jar",
"PG_URL": "jdbc:postgresql://db.olabs.com/dbname",
}
```

### Example of code

see example.py

### Example of run
```
vsmelov@vsmelov:~/PycharmProjects/pyspark_db_utils$ mkdir jars
vsmelov@vsmelov:~/PycharmProjects/pyspark_db_utils$ cp /var/bigdata/spark-2.2.0-bin-hadoop2.7/jars/postgresql-42.1.4.jar ./jars/
vsmelov@vsmelov:~/PycharmProjects/pyspark_db_utils$ python3 pyspark_db_utils/example.py
host: ***SECRET***
db: ***SECRET***
user: ***SECRET***
password: ***SECRET***

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/03/05 11:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/05 11:43:29 WARN Utils: Your hostname, vsmelov resolves to a loopback address: 127.0.1.1; using 192.168.43.26 instead (on interface wlp2s0)
18/03/05 11:43:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
TRY: create df
OK: create df
+---+-----------+
| id| mono_id|
+---+-----------+
| 1| 0|
| 2| 1|
| 3| 2|
| 4| 3|
| 5| 8589934592|
| 6| 8589934593|
| 7| 8589934594|
| 8| 8589934595|
| 9| 8589934596|
| 10|17179869184|
| 11|17179869185|
| 12|17179869186|
| 13|17179869187|
| 14|17179869188|
| 15|25769803776|
| 16|25769803777|
| 17|25769803778|
| 18|25769803779|
| 19|25769803780|
+---+-----------+

TRY: write_to_pg
OK: write_to_pg

TRY: read_from_pg
OK: read_from_pg
+---+-----------+
| id| mono_id|
+---+-----------+
| 10|17179869184|
| 11|17179869185|
| 12|17179869186|
| 13|17179869187|
| 14|17179869188|
| 1| 0|
| 2| 1|
| 3| 2|
| 4| 3|
| 5| 8589934592|
| 6| 8589934593|
| 7| 8589934594|
| 8| 8589934595|
| 9| 8589934596|
| 15|25769803776|
| 16|25769803777|
| 17|25769803778|
| 18|25769803779|
| 19|25769803780|
| 1| 0|
+---+-----------+
only showing top 20 rows

```

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.7

May 24, 2018

0.0.6

May 24, 2018

0.0.5

Mar 5, 2018

0.0.4

Mar 5, 2018

0.0.3

Mar 5, 2018

0.0.1

Oct 20, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_db_utils-0.0.7.tar.gz (6.3 MB view hashes)

Uploaded May 24, 2018 Source

Hashes for pyspark_db_utils-0.0.7.tar.gz

Hashes for pyspark_db_utils-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`1dd91ac364b2278c41fb5d9f40cec7e1b3b702faf1d97dea775025e814331840`
MD5	`d7f4ec1ad307820f2c31b0a4ae30cc57`
BLAKE2b-256	`f336a9973011d289be4c78b60dbb033c57f109bb5d7fc0f15af2370048183660`