Skip to main content

Dagster extra utilities for data processing

Project description

Dxtr Dagster Library

A Python library that provides utilities and components for data engineering workflows using Dagster. The library focuses on data processing capabilities including downloading data from Sharepoint, loading to PostgreSQL, and performing data transformations.

Project Structure

The library is organized into the following components:

dxtr/
├── dxtr/     # Main library package
│   ├── dagster/          # Dagster-specific components and resources
│   └── utils/            # Utility functions
├── pyproject.toml        # Project configuration and dependencies
└── README.md            # This file

Features

  • Sharepoint data file downloading
  • SQLAlchemy data loading
  • Data transformation capabilities
  • Integration with Dagster for workflow orchestration

Dependencies

The library requires Python 3.11.8 or higher and includes key dependencies such as:

  • polars
  • google-cloud-storage
  • requests
  • msal
  • pandas
  • sqlalchemy
  • psycopg2-binary
  • and more (see pyproject.toml for complete list)

Development

Installation

For development purposes, install the package in editable mode:

pip install -e ".[dev] --config-settings editable_mode=compat"

Please refer to the Wiki to usage of ./dxtrx.sh to setup the environment and start the Dagster code server a more convenient way of working with this code.

The library requires several environment variables to be set:

  • Sharepoint credentials
  • Database credentials
  • Other configuration variables

Please refer to the Wiki for detailed setup instructions using ./dxtrx.sh to configure the environment and start the Dagster code server.

Contributing Guidelines

When contributing to this library:

  1. Follow the existing code structure and naming conventions
  2. Add new components in the appropriate directories
  3. Update documentation as needed
  4. Test changes locally
  5. Submit PRs with evidence of testing and team review

Running tests

To run the tests, use the following command:

pytest

Or you can also run them in watching mode:

ptw

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dxtrx-0.0.7.tar.gz (47.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dxtrx-0.0.7-py3-none-any.whl (58.8 kB view details)

Uploaded Python 3

File details

Details for the file dxtrx-0.0.7.tar.gz.

File metadata

  • Download URL: dxtrx-0.0.7.tar.gz
  • Upload date:
  • Size: 47.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.7.tar.gz
Algorithm Hash digest
SHA256 562e5037e114e76a2da559601691a5a1014484968e2776c9ca8778ad4a99cfdd
MD5 f18c79b2b28e699dcfa2335d815904b8
BLAKE2b-256 f38e825d78fc68ac7671bf15ba279e05354a2d6894308eaeeadf6cc5eb59ce1c

See more details on using hashes here.

File details

Details for the file dxtrx-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: dxtrx-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 58.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d1a0a70c673f0a87c12dc16dfbcafb1dd0d2d01d42a04ed2581b21ff8556a403
MD5 7ba37c341900563bf4a3d73a3443a0cd
BLAKE2b-256 b50463ac437f0c0f0b34369d23dbb8d4299745c3b027349c626ebfac223d0c3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page