Databricks read and write with sql connection
Project description
Databricks read and write data from and to databricks tables via insert statement direct write, pandas or spark dataframes to insert statement conversion write
Requirements
Python 3.7 or above is required.
Prerequisite:
- Java
- Python
- Pyspark
- Pandas
- Numpy
Although the installation of this package installs pyspark, pandas, and numpy, the spark environment isnt set up automatically. The machine should be able to create a spark session and create spark and pandas dataframes.
To confirm if pyspark is running as expected, run the following python script:
from pyspark.sql import SparkSession
import pandas as pd
spark = SparkSession.builder.appName("Databricks Bridge Test").enableHiveSupport().getOrCreate()
dict_data = [
{"name": "Tom", "age": 20, "dob": "2000-10-31"},
{"name": "Dick", "age": 21, "dob": "1999-10-30"},
{"name": "Harry", "age": 22, "dob": "1998-10-29"}
]
spark_df = spark.createDataFrame(dict_data)
spark_df.show()
pd_df = pd.DataFrame(dict_data)
print(pd_df)
Should return:
+---+----------+-----+
|age| dob| name|
+---+----------+-----+
| 20|2000-10-31| Tom|
| 21|1999-10-30| Dick|
| 22|1998-10-29|Harry|
+---+----------+-----+
name age dob
0 Tom 20 2000-10-31
1 Dick 21 1999-10-30
2 Harry 22 1998-10-29
If this runs without errors and the dataframe prints are returned on the console, then pyspark and pandas are set up properly.
If not, then please install openjdk
Usage
- Initialization
-
from databricks_bridge import Bridge bridge = Bridge(hostname="<host_id>.cloud.databricks.com", token="<token>")
-
- Run queries without data returns
-
bridge.execute_query("create database if not exists bridge_test_db;") bridge.execute_query(""" create table if not exists bridge_test_db.students ( name string, age int, dob date, last_active timestamp, reg_date date );""")
-
- Write into tables with sql insert statement
-
bridge.execute_query(""" insert into bridge_test_db.students (age, name, dob, last_active, reg_date) values (18, 'Rachel', '1999-11-01', '2023-11-01 20:36:31.365375', '2023-11-01'), (19, 'Harriet', '1999-11-02', '2023-11-01 20:36:31.365375', '2022-11-01'); """)
-
- Write pandas or spark dataframes into databricks tables
-
new_data = [ {"name": "Tom", "age": 20, "dob": "1999-10-31", "last_active": datetime.now(), "reg_date": datetime.today().date()}, {"name": "Dick", "age": 21, "dob": "1999-10-30", "last_active": datetime.now(), "reg_date": datetime.today().date()}, {"name": "Harry", "age": 22, "dob": "1999-10-29", "last_active": datetime.now(), "reg_date": datetime.today().date()} ] new_pd_df = pd.DataFrame(new_data) bridge.write_df_to_table(df=new_pd_df, target_table="bridge_test_db.students") new_spark_df = bridge.spark.createDataFrame(new_data) bridge.write_df_to_table(df=new_spark_df, target_table="bridge_test_db.students")
-
- Run queries with dataframes returns
-
pd_df, spark_schema = bridge.execute_query("select * from bridge_test_db.students")
-
- Convert returned default pandas dataframe to spark dataframe with exact schema match
-
spark_df = bridge.to_spark_df(pd_df, spark_schema)
-
- Convert returned default pandas dataframe to spark dataframe without exact schema match
-
spark_df = bridge.to_spark_df(pd_df)
-
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file databricks-bridge-0.0.4.tar.gz
.
File metadata
- Download URL: databricks-bridge-0.0.4.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eb26e6a867b3577808e8616f9db90b8737868162ff00c484b33c819c1fd8e66 |
|
MD5 | 5cbe7d3eb9e40883a5f08ae37d3f5245 |
|
BLAKE2b-256 | 3663bf274210b8cb915c9a2bea5d1bea5f992fba5094993073751f4a68ed7018 |
File details
Details for the file databricks_bridge-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: databricks_bridge-0.0.4-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bacfe48742b66dc8bd42c99b984a51aa8f114dbee23eedf65e9eda180a9d2c95 |
|
MD5 | c360f98deb1488188e36001b558c8a9d |
|
BLAKE2b-256 | 62fde3daaec3118e7d0c1e28aed9cd728754bb629ae1402acd966e9a6f14bcfa |