Skip to main content

Dagster extra utilities for data processing

Project description

Dxtr Dagster Library

A Python library that provides utilities and components for data engineering workflows using Dagster. The library focuses on data processing capabilities including downloading data from Sharepoint, loading to PostgreSQL, and performing data transformations.

Project Structure

The library is organized into the following components:

dxtr/
├── dxtr/     # Main library package
│   ├── dagster/          # Dagster-specific components and resources
│   └── utils/            # Utility functions
├── pyproject.toml        # Project configuration and dependencies
└── README.md            # This file

Features

  • Sharepoint data file downloading
  • SQLAlchemy data loading
  • Data transformation capabilities
  • Integration with Dagster for workflow orchestration

Dependencies

The library requires Python 3.11.8 or higher and includes key dependencies such as:

  • polars
  • google-cloud-storage
  • requests
  • msal
  • pandas
  • sqlalchemy
  • psycopg2-binary
  • and more (see pyproject.toml for complete list)

Development

Installation

For development purposes, install the package in editable mode:

pip install -e ".[dev] --config-settings editable_mode=compat"

Please refer to the Wiki to usage of ./dxtrx.sh to setup the environment and start the Dagster code server a more convenient way of working with this code.

The library requires several environment variables to be set:

  • Sharepoint credentials
  • Database credentials
  • Other configuration variables

Please refer to the Wiki for detailed setup instructions using ./dxtrx.sh to configure the environment and start the Dagster code server.

Contributing Guidelines

When contributing to this library:

  1. Follow the existing code structure and naming conventions
  2. Add new components in the appropriate directories
  3. Update documentation as needed
  4. Test changes locally
  5. Submit PRs with evidence of testing and team review

Running tests

To run the tests, use the following command:

pytest

Or you can also run them in watching mode:

ptw

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dxtrx-0.0.3.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dxtrx-0.0.3-py3-none-any.whl (44.5 kB view details)

Uploaded Python 3

File details

Details for the file dxtrx-0.0.3.tar.gz.

File metadata

  • Download URL: dxtrx-0.0.3.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.3.tar.gz
Algorithm Hash digest
SHA256 0b4b5a05c5cec6012e2436a2067c0ebe0be3f9a149648c86dedd53786f7aedfe
MD5 16ccea64bbe7dbba18fa83bdd9786116
BLAKE2b-256 9becfdb0a5568fde7232c1166263787b69cf0db3076e1e9ae53abe6eff1986b3

See more details on using hashes here.

File details

Details for the file dxtrx-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: dxtrx-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f8f0c41a5e7252234d24eac2c421c5c446182a785b476924856775b7f7f89024
MD5 08ed9c1c7b48c2e082e6ed44a7ffff3d
BLAKE2b-256 c71db378ed85daf9f7aacd67e45cd711a765586486ba379247df554170ebd735

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page