Skip to main content

Dagster extra utilities for data processing

Project description

Dxtr Dagster Library

A Python library that provides utilities and components for data engineering workflows using Dagster. The library focuses on data processing capabilities including downloading data from Sharepoint, loading to PostgreSQL, and performing data transformations.

Project Structure

The library is organized into the following components:

dxtr/
├── dxtr/     # Main library package
│   ├── dagster/          # Dagster-specific components and resources
│   └── utils/            # Utility functions
├── pyproject.toml        # Project configuration and dependencies
└── README.md            # This file

Features

  • Sharepoint data file downloading
  • SQLAlchemy data loading
  • Data transformation capabilities
  • Integration with Dagster for workflow orchestration

Dependencies

The library requires Python 3.11.8 or higher and includes key dependencies such as:

  • polars
  • google-cloud-storage
  • requests
  • msal
  • pandas
  • sqlalchemy
  • psycopg2-binary
  • and more (see pyproject.toml for complete list)

Development

Installation

For development purposes, install the package in editable mode:

pip install -e ".[dev] --config-settings editable_mode=compat"

Please refer to the Wiki to usage of ./dxtrx.sh to setup the environment and start the Dagster code server a more convenient way of working with this code.

The library requires several environment variables to be set:

  • Sharepoint credentials
  • Database credentials
  • Other configuration variables

Please refer to the Wiki for detailed setup instructions using ./dxtrx.sh to configure the environment and start the Dagster code server.

Contributing Guidelines

When contributing to this library:

  1. Follow the existing code structure and naming conventions
  2. Add new components in the appropriate directories
  3. Update documentation as needed
  4. Test changes locally
  5. Submit PRs with evidence of testing and team review

Running tests

To run the tests, use the following command:

pytest

Or you can also run them in watching mode:

ptw

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dxtrx-0.0.4.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dxtrx-0.0.4-py3-none-any.whl (44.5 kB view details)

Uploaded Python 3

File details

Details for the file dxtrx-0.0.4.tar.gz.

File metadata

  • Download URL: dxtrx-0.0.4.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.4.tar.gz
Algorithm Hash digest
SHA256 749ff785e27f8ac5fafbb3280f414efcd75f607e033a931f09fbffae7907435c
MD5 64ae73d8ab636e47c1c7f2f1126db9bc
BLAKE2b-256 dc6a19427f5de1532db77616482e225f3f08d4b2519b8b913509f2cba9704d8f

See more details on using hashes here.

File details

Details for the file dxtrx-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: dxtrx-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for dxtrx-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 cb9c121f4649a8285af001ff809c484b7f618191ca387c85479e40ac5146a686
MD5 1bf342596bf24a412932155c234a34af
BLAKE2b-256 bb048c6f51806653c41a0945f8d5898463d86cd0da87d30e394ffeb0c3087492

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page