pyspark_db_utils

Usefull functions for working with Database in PySpark (PostgreSQL, ClickHouse)

These details have not been verified by PyPI

Project links

Homepage

Project description

# pyspark_db_utils

It helps you with your DB deals in Spark

## Documentation

http://pyspark-db-utils.readthedocs.io/en/latest/

## Example of using

You need jdbc drivers for using this lib!
Just get drivers from
https://jdbc.postgresql.org/download.html
https://github.com/yandex/clickhouse-jdbc
and put it in jars/ directory in your project

### Example settings:
```
settings = {
"PG_PROPERTIES": {
"user": "user",
"password": "pass",
"driver": "org.postgresql.Driver"
},
"PG_DRIVER_PATH": "jars/postgresql-42.1.4.jar",
"PG_URL": "jdbc:postgresql://db.olabs.com/dbname",
}
```

### Example of code

see example.py

### Example of run
```
vsmelov@vsmelov:~/PycharmProjects/pyspark_db_utils$ mkdir jars
vsmelov@vsmelov:~/PycharmProjects/pyspark_db_utils$ cp /var/bigdata/spark-2.2.0-bin-hadoop2.7/jars/postgresql-42.1.4.jar ./jars/
vsmelov@vsmelov:~/PycharmProjects/pyspark_db_utils$ python3 pyspark_db_utils/example.py
host: ***SECRET***
db: ***SECRET***
user: ***SECRET***
password: ***SECRET***

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/03/05 11:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/05 11:43:29 WARN Utils: Your hostname, vsmelov resolves to a loopback address: 127.0.1.1; using 192.168.43.26 instead (on interface wlp2s0)
18/03/05 11:43:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
TRY: create df
OK: create df
+---+-----------+
| id| mono_id|
+---+-----------+
| 1| 0|
| 2| 1|
| 3| 2|
| 4| 3|
| 5| 8589934592|
| 6| 8589934593|
| 7| 8589934594|
| 8| 8589934595|
| 9| 8589934596|
| 10|17179869184|
| 11|17179869185|
| 12|17179869186|
| 13|17179869187|
| 14|17179869188|
| 15|25769803776|
| 16|25769803777|
| 17|25769803778|
| 18|25769803779|
| 19|25769803780|
+---+-----------+

TRY: write_to_pg
OK: write_to_pg

TRY: read_from_pg
OK: read_from_pg
+---+-----------+
| id| mono_id|
+---+-----------+
| 10|17179869184|
| 11|17179869185|
| 12|17179869186|
| 13|17179869187|
| 14|17179869188|
| 1| 0|
| 2| 1|
| 3| 2|
| 4| 3|
| 5| 8589934592|
| 6| 8589934593|
| 7| 8589934594|
| 8| 8589934595|
| 9| 8589934596|
| 15|25769803776|
| 16|25769803777|
| 17|25769803778|
| 18|25769803779|
| 19|25769803780|
| 1| 0|
+---+-----------+
only showing top 20 rows

```

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.7

May 24, 2018

This version

0.0.6

May 24, 2018

0.0.5

Mar 5, 2018

0.0.4

Mar 5, 2018

0.0.3

Mar 5, 2018

0.0.1

Oct 20, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_db_utils-0.0.6.tar.gz (6.3 MB view details)

Uploaded May 24, 2018 Source

File details

Details for the file pyspark_db_utils-0.0.6.tar.gz.

File metadata

Download URL: pyspark_db_utils-0.0.6.tar.gz
Upload date: May 24, 2018
Size: 6.3 MB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for pyspark_db_utils-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`78c28b34eb10458e8803fc782b37d720915a977316bfa6a04c88789f065c8e21`
MD5	`06a0bdb3545f7d8464122aba1b6fedd4`
BLAKE2b-256	`22663969c09c5658eeb690cb0065d157ce5d8d0d87390803b6eaf8fa2dbf0840`

See more details on using hashes here.

pyspark_db_utils 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes