Skip to main content

Dagster extra utilities for data processing

Project description

Dxtr Dagster Library

A Python library that provides utilities and components for data engineering workflows using Dagster. The library focuses on data processing capabilities including downloading data from Sharepoint, loading to PostgreSQL, and performing data transformations.

Project Structure

The library is organized into the following components:

dxtr/
├── dxtr/     # Main library package
│   ├── dagster/          # Dagster-specific components and resources
│   └── utils/            # Utility functions
├── pyproject.toml        # Project configuration and dependencies
└── README.md            # This file

Features

  • Sharepoint data file downloading
  • SQLAlchemy data loading
  • Data transformation capabilities
  • Integration with Dagster for workflow orchestration

Dependencies

The library requires Python 3.11.8 or higher and includes key dependencies such as:

  • polars
  • google-cloud-storage
  • requests
  • msal
  • pandas
  • sqlalchemy
  • psycopg2-binary
  • and more (see pyproject.toml for complete list)

Development

Installation

For development purposes, install the package in editable mode:

pip install -e ".[dev] --config-settings editable_mode=compat"

Please refer to the Wiki to usage of ./dxtrx.sh to setup the environment and start the Dagster code server a more convenient way of working with this code.

The library requires several environment variables to be set:

  • Sharepoint credentials
  • Database credentials
  • Other configuration variables

Please refer to the Wiki for detailed setup instructions using ./dxtrx.sh to configure the environment and start the Dagster code server.

Contributing Guidelines

When contributing to this library:

  1. Follow the existing code structure and naming conventions
  2. Add new components in the appropriate directories
  3. Update documentation as needed
  4. Test changes locally
  5. Submit PRs with evidence of testing and team review

Running tests

To run the tests, use the following command:

pytest

Or you can also run them in watching mode:

ptw

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dxtrx-0.0.2.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dxtrx-0.0.2-py3-none-any.whl (44.5 kB view details)

Uploaded Python 3

File details

Details for the file dxtrx-0.0.2.tar.gz.

File metadata

  • Download URL: dxtrx-0.0.2.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.2.tar.gz
Algorithm Hash digest
SHA256 d695bb53ccedadaa31e24bd007b06f13dd94f60e26f2f1f7f773f95de5ceb826
MD5 fc6ae562d0fb9bf3a2d5beb0b1853db6
BLAKE2b-256 5681415bd5605161addb331bcedd7eafcb9112ad45c0eb64d5169fef9d4925a1

See more details on using hashes here.

File details

Details for the file dxtrx-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: dxtrx-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ac2653717c76601cee4b287d4659e89cbc21487cdde2451ca62f10f66cefe9f0
MD5 0abc23b239bbf2e863a8309b94d799b1
BLAKE2b-256 099e238f4ab846fc05ab317cfc9737e884889b309d651a6af27606fc03c370d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page