Skip to main content

Effortlessly connect to SQL Server to import data into Polars DataFrames and export data back to SQL Server.

Project description

polars_mssql

polars_mssql is a Python package designed to simplify working with Microsoft SQL Server databases using the high-performance polars DataFrame library. It provides an intuitive and efficient interface for running SQL queries, reading tables, and writing data to SQL Server.

Features

  • Seamless SQL Server Integration: Easily connect to SQL Server with options for Windows Authentication or SQL Authentication.
  • Query Execution: Execute SQL queries and retrieve results as polars.DataFrame objects. Use read_query for simple query execution or polars.read_database for advanced functionality like batch processing and schema customization.
  • Parameterization Support: Securely execute parameterized queries to prevent accidental SQL injection.
  • Table Operations: Read and write tables with flexibility and performance.
  • Context Management: Supports Python's context manager for automatic connection handling.

Installation

Install the package using pip:

pip install polars_mssql

Ensure the following dependencies are installed:

  • polars for high-performance DataFrame operations.
  • sqlalchemy for database connectivity.
  • An appropriate ODBC driver for SQL Server (e.g., ODBC Driver 17 or 18).

Usage

Here is an example of how to use polars_mssql to connect to SQL Server and perform various operations:

1. Connecting to SQL Server

from polars_mssql import Connection

# Initialize a connection
conn = Connection(
  server="my_server",
    database="my_database",
    # If not specified, driver defaults to "SQL Server"
    # driver = 'ODBC Driver 17 for SQL Server'
)

Driver Defaults By default, the driver parameter is set to "SQL Server", which often comes preinstalled on Windows. If you don't have "SQL Server" installed or prefer a more recent driver, specify any compatible driver you have installed (e.g., "ODBC Driver 17 for SQL Server") for the database you are trying to connect to.

2. Read Data from SQL Server

Execute a SQL Query and Get Results as a DataFrame

query = "SELECT * FROM my_table WHERE col1 = 'a'"
df = conn.read_query(query)

For advanced functionality (e.g., batch processing or schema customization), use the polars.read_database function with the engine:

import polars as pl

df = pl.read_database(
    query="SELECT * FROM users",
    connection=conn.engine,
    iter_batches=True,
    batch_size=1000
)

for batch in df:
    print(batch)

Read an Entire Table

df = conn.read_table("my_table")

3. Save DataFrame to SQL Server

Write a Polars DataFrame to a Table

import polars as pl

# Example DataFrame
data = pl.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
conn.write_table(data, name="my_table", if_exists="replace")

4. Execute Queries

The execute_query method allows you to run any SQL query on your database. It supports parameterized queries to prevent accidental SQL injection and can be used for both retrieval and modification operations such as INSERT, DELETE, and DROP.

Example: Run a Simple Query

query = "DELETE FROM users WHERE id = 1"
conn.execute_query(query)

Example: Insert Data Securely

query = "INSERT INTO users (id, name, email) VALUES (:id, :name, :email)"
params = {"id": 1, "name": "John Doe", "email": "john.doe@example.com"}
conn.execute_query(query, params)

Example: Drop a Table

query = "DROP TABLE users"
conn.execute_query(query)

Example: Prevent SQL Injection

query = "SELECT * FROM users WHERE name = :name"
params = {"name": "John'; DROP TABLE users; --"}
conn.execute_query(query, params)

This safely executes the query without executing malicious SQL commands.

5. Using Context Management

with Connection(server="my_server", database="my_database") as conn:
    df = conn.read_query("SELECT * FROM my_table")
    print(df)

6. Closing the Connection

conn.close()

API Reference

Connection Class

Constructor

Connection(server: Optional[str] = None, database: Optional[str] = None, driver: Optional[str] = None, username: Optional[str] = None, password: Optional[str] = None)

Attributes

  • server (str): The name or address of the SQL Server instance.

    • database (str): The name of the connected database.
  • driver (str): The ODBC driver being used for the connection (e.g., "ODBC Driver 17 for SQL Server").

  • connection_string (str): The full SQLAlchemy connection string used to create the engine. This can be useful for debugging or passing to other tools.

  • engine (sqlalchemy.engine.base.Engine): The SQLAlchemy engine used for database interactions. Advanced users can use this attribute for custom SQLAlchemy operations or to pass it to functions like polars.read_database.

Methods

  • read_query(query: str) -> pl.DataFrame: Execute a query and return results as a Polars DataFrame.

    • Parameters:

      • query (str): The SQL query to execute.
    • Returns: pl.DataFrame: The result of the query as a Polars DataFrame.

    • Example:

      query = "SELECT * FROM my_table WHERE col1 = 'a'"
      df = conn.read_query(query)
      print(df)
      
  • read_table(name: str) -> pl.DataFrame: Read all rows from a table.

    • Parameters:

      • name (str): The name of the table to read from.
    • Returns: pl.DataFrame: All rows from the specified table as a Polars DataFrame.

    • Example:

      df = conn.read_table('my_table')
      print(df)
      
  • write_table(df: pl.DataFrame, name: str, if_exists: str = "fail") -> None: Save a Polars DataFrame to a specified table in SQL Server.

    • Parameters:
      • df (pl.DataFrame): The Polars DataFrame to be written.
      • name (str): The name of the target table in the database.
      • if_exists (str): What to do if the target table already exists. Options:
        • 'fail' (default): Raise an error.
        • 'append': Append the data to the existing table.
        • 'replace': Drop the existing table, recreate it, and insert the data.
    • Raises:
      • ValueError: If if_exists is not one of 'fail', 'append', or 'replace'.
      • RuntimeError: If the write operation fails.
    • Examples:
      import polars as pl
      
      # Create a Polars DataFrame
      df = pl.DataFrame({
          "id": [1, 2, 3],
          "name": ["Alice", "Bob", "Charlie"]
      })
      
      # Write the DataFrame to the database
      conn.write_table(df, name="users", if_exists="replace")
      
  • execute_query(query: str, params: Optional[Dict[str, Any]] = None) -> None: Execute any SQL query. Supports parameterized queries to prevent SQL injection.

    • Parameters:

      • query (str): The SQL query to execute. Can include placeholders for parameterized queries (e.g., :param_name).
      • params (dict, optional): A dictionary of parameters to bind to the query.
    • Examples:

      query = "DELETE FROM users WHERE id = 1"
      conn.execute_query(query)
      
      query = "INSERT INTO users (id, name) VALUES (:id, :name)"
      params = {"id": 1, "name": "Jane"}
      conn.execute_query(query, params)
      
  • close() -> None: Dispose of the SQLAlchemy engine and close the connection.

    • Example:
      conn.close()
      

Requirements

  • Python 3.7 or higher
  • polars
  • sqlalchemy
  • ODBC Driver for SQL Server (17 or 18 recommended)

Installing the ODBC Driver

Windows

Download and install the ODBC Driver from Microsoft's website.

macOS

Install via Homebrew:

brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
brew install --no-sandbox msodbcsql18

Linux

Install using the following commands:

curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
sudo apt-get update
sudo apt-get install -y mssql-tools unixodbc-dev

Contributing

Contributions are welcome! If you encounter issues or have feature requests, please open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

This package integrates the efficiency of polars with the versatility of SQL Server, inspired by real-world data engineering needs. As a data engineer, I often need to pull data from SQL Server into polars and export data from polars back to SQL Server. I created this package to streamline these workflows and make the process more efficient.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_mssql-0.3.1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_mssql-0.3.1-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file polars_mssql-0.3.1.tar.gz.

File metadata

  • Download URL: polars_mssql-0.3.1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.7

File hashes

Hashes for polars_mssql-0.3.1.tar.gz
Algorithm Hash digest
SHA256 49021b5c659ec3ce1127089bf8513773dea7bdbae9c2049d35edb300221e9395
MD5 6e944d9852434b3ad2c06b937ca32069
BLAKE2b-256 3721cbaa1e51483fc25d67c4345f1a155090119d474be575ac7c12980ad73636

See more details on using hashes here.

File details

Details for the file polars_mssql-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: polars_mssql-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.7

File hashes

Hashes for polars_mssql-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 02fdfb194d13bd9d8bfbe2c3f149fe455ad42a897534822e62fd286661f50178
MD5 3493ca1f41935413f2fa5238f8ba1eb7
BLAKE2b-256 8a67eefe71190110766d1a2e50d43f8e52ff9dc43cc5b7efe27e868d15737e86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page