No project description provided
Project description
dagster-mssql-bcp
ODBC is slow 🐢 bcp is fast! 🐰
This is a custom dagster IO manager for loading data into SQL Server using the bcp
utility.
What you need to run it
Pypi
pip install dagster-mssql-bcp
BCP Utility
The bcp
utility must be installed on the machine that is running the dagster pipeline.
See Microsoft's documentation for more information.
Ideally you should place this on your PATH
, but you can specify in the IO configuration where it is located.
ODBC Drivers
You need the ODBC drivers installed on the machine that is running the dagster pipeline.
See Microsoft's documentation for more information.
Permissions
The user running the dagster pipeline must have the necessary permissions to load data into the SQL Server database.
CREATE SCHEMA
CREATE/ALTER TABLES
Basic Usage
Polars
Polars processes as a LazyFrame
. Either a DataFrame
or LazyFrame
can be provided as an output of your asset before its cast automatically to lazy
from dagster import asset, Definitions
from dagster_mssql_bcp import PolarsBCPIOManager
import polars as pl
io_manager = PolarsBCPIOManager(
host="my_mssql_server",
database="my_database",
user="username",
password="password",
query_props={
"TrustServerCertificate": "yes",
},
bcp_arguments={"-u": ""},
bcp_path="/opt/mssql-tools18/bin/bcp",
)
@asset(
metadata={
"asset_schema": [
{"name": "id", "type": "INT"},
],
"schema": "my_schema",
}
)
def my_polars_asset(context):
return pl.DataFrame({"id": [1, 2, 3]})
@asset(
metadata={
"asset_schema": [
{"name": "id", "type": "INT"},
],
"schema": "my_schema",
}
)
def my_polars_asset_lazy(context):
return pl.LazyFrame({"id": [1, 2, 3]})
defs = Definitions(
assets=[my_polars_asset, my_polars_asset_lazy],
io_managers={
"io_manager": io_manager,
},
)
Pandas
from dagster import asset, Definitions
from dagster_mssql_bcp import PandasBCPIOManager
import pandas as pd
io_manager = PandasBCPIOManager(
host="my_mssql_server",
database="my_database",
user="username",
password="password",
query_props={
"TrustServerCertificate": "yes",
},
bcp_arguments={"-u": ""},
bcp_path="/opt/mssql-tools18/bin/bcp",
)
@asset(
metadata={
"asset_schema": [
{"name": "id", "type": "INT"},
],
"schema": "my_schema",
}
)
def my_pandas_asset(context):
return pd.DataFrame({"id": [1, 2, 3]})
defs = Definitions(
assets=[my_pandas_asset],
io_managers={
"io_manager": io_manager,
},
)
The asset schema
defines your table structure and your asset returns your data to load.
Docs
For more details see assets doc, io manager doc, and for how its implemented, the dev doc.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dagster_mssql_bcp-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0756940efdc6aae6a562131a7c2ace634fe599c38e8f2278ee2bfe6db7620d61 |
|
MD5 | cb759d9f064dc43372d98e8be0204803 |
|
BLAKE2b-256 | e29a5dc8a8528bbb432921b7bf8110c04c723b0b0ca46302161fcedc9f2c9cb7 |