Skip to main content

Dagster extra utilities for data processing

Project description

Dxtr Dagster Library

A Python library that provides utilities and components for data engineering workflows using Dagster. The library focuses on data processing capabilities including downloading data from Sharepoint, loading to PostgreSQL, and performing data transformations.

Project Structure

The library is organized into the following components:

dxtr/
├── dxtr/     # Main library package
│   ├── dagster/          # Dagster-specific components and resources
│   └── utils/            # Utility functions
├── pyproject.toml        # Project configuration and dependencies
└── README.md            # This file

Features

  • Sharepoint data file downloading
  • SQLAlchemy data loading
  • Data transformation capabilities
  • Integration with Dagster for workflow orchestration

Dependencies

The library requires Python 3.11.8 or higher and includes key dependencies such as:

  • polars
  • google-cloud-storage
  • requests
  • msal
  • pandas
  • sqlalchemy
  • psycopg2-binary
  • and more (see pyproject.toml for complete list)

Development

Installation

For development purposes, install the package in editable mode:

pip install -e ".[dev] --config-settings editable_mode=compat"

Please refer to the Wiki to usage of ./dxtrx.sh to setup the environment and start the Dagster code server a more convenient way of working with this code.

The library requires several environment variables to be set:

  • Sharepoint credentials
  • Database credentials
  • Other configuration variables

Please refer to the Wiki for detailed setup instructions using ./dxtrx.sh to configure the environment and start the Dagster code server.

Contributing Guidelines

When contributing to this library:

  1. Follow the existing code structure and naming conventions
  2. Add new components in the appropriate directories
  3. Update documentation as needed
  4. Test changes locally
  5. Submit PRs with evidence of testing and team review

Running tests

To run the tests, use the following command:

pytest

Or you can also run them in watching mode:

ptw

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dxtrx-0.0.6.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dxtrx-0.0.6-py3-none-any.whl (47.6 kB view details)

Uploaded Python 3

File details

Details for the file dxtrx-0.0.6.tar.gz.

File metadata

  • Download URL: dxtrx-0.0.6.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.6.tar.gz
Algorithm Hash digest
SHA256 f220ef2de2e3c3921bc646c65cfc9a56ce75002336aa9938d24153deae35b491
MD5 f442aa566b7399e73123a415233c4d00
BLAKE2b-256 1c0af996d2683dc8f207f783d754ba846911b29ffad4bc48b354a81d6cce62fb

See more details on using hashes here.

File details

Details for the file dxtrx-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: dxtrx-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 47.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 187b077e2b93e6bcc42d58208e5158d1f9c3210b606b72294c98ad2e50f71323
MD5 8530dc3043fd4da38c297a9f6ec32d21
BLAKE2b-256 0225e53d75487c85ce21d22f6ef3ea96e3e6826b45d3629e2e84c2827f029937

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page