Skip to main content

Databricks read and write with sql connection

Project description

Databricks read and write data from and to databricks tables via insert statement direct write, pandas or spark dataframes to insert statement conversion write

Requirements

Python 3.7 or above is required.

Prerequisite:

  • Java
  • Python
  • Pyspark
  • Pandas
  • Numpy

Although the installation of this package installs pyspark, pandas, and numpy, the spark environment isnt set up automatically. The machine should be able to create a spark session and create spark and pandas dataframes.

To confirm if pyspark is running as expected, run the following python script:

from pyspark.sql import SparkSession
import pandas as pd

spark = SparkSession.builder.appName("Databricks Bridge Test").enableHiveSupport().getOrCreate()
dict_data = [
    {"name": "Tom", "age": 20, "dob": "2000-10-31"},
    {"name": "Dick", "age": 21, "dob": "1999-10-30"},
    {"name": "Harry", "age": 22, "dob": "1998-10-29"}
]
spark_df = spark.createDataFrame(dict_data)
spark_df.show()

pd_df = pd.DataFrame(dict_data)
print(pd_df)

Should return:

+---+----------+-----+
|age|       dob| name|
+---+----------+-----+
| 20|2000-10-31|  Tom|
| 21|1999-10-30| Dick|
| 22|1998-10-29|Harry|
+---+----------+-----+
    name  age         dob
0    Tom   20  2000-10-31
1   Dick   21  1999-10-30
2  Harry   22  1998-10-29

If this runs without errors and the dataframe prints are returned on the console, then pyspark and pandas are set up properly.

If not, then please install openjdk

Usage

  • Initialization
    • from databricks_bridge import Bridge
      bridge = Bridge(hostname="<host_id>.cloud.databricks.com", token="<token>")
      
  • Run queries without data returns
    • bridge.execute_query("create database if not exists bridge_test_db;")
      bridge.execute_query("""
          create table if not exists bridge_test_db.students (
              name string,
              age int,
              dob date,
              last_active timestamp,
              reg_date date
          );""")
      
  • Write into tables with sql insert statement
    • bridge.execute_query("""
          insert into bridge_test_db.students (age, name, dob, last_active, reg_date)
          values
              (18, 'Rachel', '1999-11-01', '2023-11-01 20:36:31.365375', '2023-11-01'),
              (19, 'Harriet', '1999-11-02', '2023-11-01 20:36:31.365375', '2022-11-01');
      """)
      
  • Write pandas or spark dataframes into databricks tables
    • new_data = [
          {"name": "Tom", "age": 20, "dob": "1999-10-31", "last_active": datetime.now(), "reg_date": datetime.today().date()},
          {"name": "Dick", "age": 21, "dob": "1999-10-30", "last_active": datetime.now(), "reg_date": datetime.today().date()},
          {"name": "Harry", "age": 22, "dob": "1999-10-29", "last_active": datetime.now(), "reg_date": datetime.today().date()}
      ]
      new_pd_df = pd.DataFrame(new_data)
      bridge.write_df_to_table(df=new_pd_df, target_table="bridge_test_db.students")
      
      new_spark_df = bridge.spark.createDataFrame(new_data)
      bridge.write_df_to_table(df=new_spark_df, target_table="bridge_test_db.students")
      
  • Run queries with dataframes returns
    • pd_df, spark_schema = bridge.execute_query("select * from bridge_test_db.students")
      
  • Convert returned default pandas dataframe to spark dataframe with exact schema match
    • spark_df = bridge.to_spark_df(pd_df, spark_schema)
      
  • Convert returned default pandas dataframe to spark dataframe without exact schema match
    • spark_df = bridge.to_spark_df(pd_df)
      

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-bridge-0.0.4.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

databricks_bridge-0.0.4-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file databricks-bridge-0.0.4.tar.gz.

File metadata

  • Download URL: databricks-bridge-0.0.4.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for databricks-bridge-0.0.4.tar.gz
Algorithm Hash digest
SHA256 4eb26e6a867b3577808e8616f9db90b8737868162ff00c484b33c819c1fd8e66
MD5 5cbe7d3eb9e40883a5f08ae37d3f5245
BLAKE2b-256 3663bf274210b8cb915c9a2bea5d1bea5f992fba5094993073751f4a68ed7018

See more details on using hashes here.

File details

Details for the file databricks_bridge-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_bridge-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bacfe48742b66dc8bd42c99b984a51aa8f114dbee23eedf65e9eda180a9d2c95
MD5 c360f98deb1488188e36001b558c8a9d
BLAKE2b-256 62fde3daaec3118e7d0c1e28aed9cd728754bb629ae1402acd966e9a6f14bcfa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page