Skip to main content

Simple, tiny and ridiculus ETL made with Python

Project description

DasLaden is a simple, tiny and ridiculus ETL made with Python

Dasladen is a general purpose Python package to make an automate ETL (Extracting, Transforming and Loading data) through the configuration of one or more .json files that represents tasks. It is based on petl. It can do some tasks like:

  • load a .csv file to database table
  • run a database query into a .csv file
  • run a database query into a database table
  • convert a .csv file into another .csv file
  • convert a .xls file into a .csv file
  • load a .xml file into a database table
  • convert a .xls file into a .csv file

This tasks can be configured to do some basic transformations offer by petl and you can write your own transformations in a Python module or class to be called by Dasladen during loading process.

There is others types of tasks to do things like:

  • Compact files into .zip file
  • Extract files from .zip file
  • Upload a file
  • Download a file
  • Execute a Python script
  • Execute a SQL command

The tasks are configured in a .json file that supports a sequence of tasks that will be executed in configured order. Details of how to configure tasks will be in Wiki pages.

The basic steps to use DasLaden is:

  • Install dasladen package via pip install dasladen in your environment or in virtualenv.
  • Install database driver package if you want to execute database tasks. Dasladen is prepared to run with the following drivers: MySQL via PyMySQL, MS SQL Server via pyodbc and Oracle via cx_Oracle. Please see the limitations on the driver package that you choose.
  • Create a folder for you project.
  • Prepare a folder structure in project folder with following names:
    • input Is the default folder to put input files, like .csv, .xml, .xls and .sql files
    • output Is the default folder that tasks write target files
    • module Is the folder for python scripts if you can't put then in project folder
    • capture Is the default folder to drop task files (.json or .zip)
    • log Is the folder that Dasladen write task logs
    • tasks Is the folder that you can put tasks files. It is only a suggestion.
  • Create a .json file with your tasks in tasks folder.
  • Start DasLaden from project folder calling python -m dasladen.
  • If you want to see log in console window, pass a --verbose as argument on call.
  • Copy the .json tasks file from tasks to the capture folder.

The watcher will open the tasks file and process it. To see result you can open log folder and search for watcher_DD_TT.log where DD_TT is the date and time that log was generated. In log folder you can see individual tasks logs too.

It is important that you copy the task file instead move it, because on finish it will be deleted.

If you drop a file other than .zip in capture folder, that file will be move to input folder.

You can zip the .json file with all other dependent files (.csv, .xls, etc.) and copy that zip into capture folder too. Watcher will unzip then at a temporary folder, copy input files (other than .json files) to input folder and execute the .json file.

In the .json file you can configure a scheduler to run the tasks. With it you can delay a execution or configure its recurrence.

Data drivers via PyPi packages:

  • MySQL via PyMySQL package. v >= 0.7.5
  • MS SQL Server via pyodbc package. v >= 3.0.10
  • Oracle via cx_Oracle package. v >= 5.2.1
  • PostgreSQL via psycopg2 package. v >= 2.8.3

The current version works with Python 2 and 3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dasladen-0.2.1.tar.gz (15.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dasladen-0.2.1-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

dasladen-0.2.1-py2-none-any.whl (17.4 kB view details)

Uploaded Python 2

File details

Details for the file dasladen-0.2.1.tar.gz.

File metadata

  • Download URL: dasladen-0.2.1.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.1

File hashes

Hashes for dasladen-0.2.1.tar.gz
Algorithm Hash digest
SHA256 6d28143f9a3c18ecd9f1c8bd8fe97299785faad07cae9766e0b2e5114b1b0cc8
MD5 c62e8197e88b4fceef5054f9bb596e2c
BLAKE2b-256 1d02756df7a3d068b6cd7d33aeae1321d24881d9ddb30e67143169f4cdfa04a9

See more details on using hashes here.

File details

Details for the file dasladen-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: dasladen-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.1

File hashes

Hashes for dasladen-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 725d2b5002a38fe134707de114054ee7a2f92782d382e107a4f4b23c51239444
MD5 877762118396c5cad3050c7a2c612a5a
BLAKE2b-256 a6b2d2ed537c00760ad3b55d820676b22c93c9647535afa9bd7d3ce4f401b8b0

See more details on using hashes here.

File details

Details for the file dasladen-0.2.1-py2-none-any.whl.

File metadata

  • Download URL: dasladen-0.2.1-py2-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.1

File hashes

Hashes for dasladen-0.2.1-py2-none-any.whl
Algorithm Hash digest
SHA256 91cc0382b1d202037808af3ee9ddaf6372c28bdccfdd9297e67909458d913303
MD5 7e04f45ee6f6376f7bfd14d4b064c982
BLAKE2b-256 ceed829e5c767c6ec60702c8170f5d009563a0ff2d2ea3f388bfa59ca410072b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page