Skip to main content

Databricks read and write with sql connection

Project description

Databricks read and write data from and to databricks tables via insert statement direct write, pandas or spark dataframes to insert statement conversion write

Requirements

Python 3.7 or above is required.

Prerequisite:

  • Java
  • Python
  • Pyspark
  • Pandas
  • Numpy

Although the installation of this package installs pyspark, pandas, and numpy, the spark environment isnt set up automatically. The machine should be able to create a spark session and create spark and pandas dataframes.

To confirm if pyspark is running as expected, run the following python script:

from pyspark.sql import SparkSession
import pandas as pd

spark = SparkSession.builder.appName("Databricks Bridge Test").enableHiveSupport().getOrCreate()
dict_data = [
    {"name": "Tom", "age": 20, "dob": "2000-10-31"},
    {"name": "Dick", "age": 21, "dob": "1999-10-30"},
    {"name": "Harry", "age": 22, "dob": "1998-10-29"}
]
spark_df = spark.createDataFrame(new_data)
spark_df.show()

pd_df = pd.DataFrame(dict_data)
print(pd_df)

Should return:

+---+----------+-----+
|age|       dob| name|
+---+----------+-----+
| 20|2000-10-31|  Tom|
| 21|1999-10-30| Dick|
| 22|1998-10-29|Harry|
+---+----------+-----+
    name  age         dob
0    Tom   20  2000-10-31
1   Dick   21  1999-10-30
2  Harry   22  1998-10-29

If this runs without errors and the dataframe prints are returned on the console, then pyspark and pandas are set up properly.

If not, then please install openjdk

Usage

  • Initialization
    • from databricks_bridge import Bridge
      bridge = Bridge(hostname="<host_id>.cloud.databricks.com", token="<token>")
      
  • Run queries without data returns
    • bridge.execute_query("create database if not exists bridge_test_db;")
      bridge.execute_query("""
          create table if not exists bridge_test_db.students (
              name string,
              age int,
              dob date,
              last_active timestamp,
              reg_date date
          );""")
      
  • Write into tables with sql insert statement
    • bridge.execute_query("""
          insert into bridge_test_db.students (age, name, dob, last_active, reg_date)
          values
              (18, 'Rachel', '1999-11-01', '2023-11-01 20:36:31.365375', '2023-11-01'),
              (19, 'Harriet', '1999-11-02', '2023-11-01 20:36:31.365375', '2022-11-01');
      """)
      
  • Write pandas or spark dataframes into databricks tables
    • new_data = [
          {"name": "Tom", "age": 20, "dob": "1999-10-31", "last_active": datetime.now(), "reg_date": datetime.today().date()},
          {"name": "Dick", "age": 21, "dob": "1999-10-30", "last_active": datetime.now(), "reg_date": datetime.today().date()},
          {"name": "Harry", "age": 22, "dob": "1999-10-29", "last_active": datetime.now(), "reg_date": datetime.today().date()}
      ]
      new_pd_df = pd.DataFrame(new_data)
      bridge.write_df_to_table(df=new_pd_df, target_table="bridge_test_db.students")
      
      new_spark_df = bridge.spark.createDataFrame(new_data)
      bridge.write_df_to_table(df=new_spark_df, target_table="bridge_test_db.students")
      
  • Run queries with dataframes returns
    • pd_df, spark_schema = bridge.execute_query("select * from bridge_test_db.students")
      
  • Convert returned default pandas dataframe to spark dataframe with exact schema match
    • spark_df = bridge.to_spark_df(pd_df, spark_schema)
      
  • Convert returned default pandas dataframe to spark dataframe without exact schema match
    • spark_df = bridge.to_spark_df(pd_df)
      

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-bridge-0.0.3.tar.gz (5.6 kB view hashes)

Uploaded Source

Built Distribution

databricks_bridge-0.0.3-py3-none-any.whl (5.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page