Skip to main content

ETL processes with easy config

Project description

Overview

mETL is an ETL device which has been especially designed to load elective data necessary for CEu. Obviously, the programme can be used in a more general way, it can be used to load practically any kind of data. The programme was designed with Python, taking into maximum consideration the optimal memory usage after having assessed the Brewery device’s capabilities.

Capabilities

The actual version supports the most widespread file formats with data migration and data migration packages. These include:

Source- types:

  • CSV, TSV, XLS, Google SpreadSheet, Fixed width file
  • PostgreSQL, MySQL, Oracle, SQLite, Microsoft SQL Server
  • JSON, XML, YAML

Target- types:

  • CSV, TSV, XLS - with file continuation as well
  • Fixed width file
  • PostgreSQL, MySQL, Oracle, SQLite, Microsoft SQL Server - with the purpose of modification as well
  • JSON, XML, YAML

During the develpoment of the programme we tried to provide the whole course of processing with the most widespread transformation steps, programme structures and mutation steps. In light of this, the programme by default possesses the following transformations:

  • Add: Adds an arbitrary number to a value.
  • Clean: Removes the different types of punctuation marks. (dots, commas, etc.)
  • ConvertType: Modifies the type of the field to another type.
  • Homogenize: Converts the accentuated letters to unaccentuated ones. (NFKD format)
  • LowerCase: Converts to lower case.
  • Map: Changes the value of a field to anothe value.
  • RemoveWordsBySource: Using another source, it removes certain words.
  • ReplaceByRegexp: Makes a change (replaces) by a regular expression.
  • ReplaceWordsBySource: Replaces words using another source.
  • Set: Sets a certain value.
  • Split: Separates words by spaces and leaves a given interval.
  • Stem: Brings words to a stem. (root)
  • Strip: Removes the unnecessary spaces and/or other characters from the beginning and ending of the value.
  • Sub: Subtracts a given number from a given value.
  • Title: Capitalizes the first letter of every word.
  • UpperCase: Converts to upper case.

Four groups are differentiated in case of manipulations:

  1. Modifier

    Modifiers are those objects that are given a whole line (record) and revert with a whole line. However, during their processes they make changes to values with the usage of the related values of different fields.

    • JoinByKey: Merge and join two different record.
    • Order: Orders lines according to the given conditions.
    • Set: Sets a value with the use of fix value scheme, function or another source.
    • SetWithMap: Sets a value in case of a complicated type with a given map.
    • TransformField: During manipulation, regular field transformation can be achieved with this command .
  2. Filter

    Their function is primarily filtering. It is used when we would like to evaluate or get rid of incomlete or faulty records as a result of an earlier tranformation.

    • DropByCondition: The fate of the record depends on a condition.
    • DropBySource: The fate is decided by whether or not the record is in another file.
    • DropField: Does not decrease the number of records but field can be deleted with it.
    • KeepByCondition: The fate of the record depends on a condition.
  3. Expand

    It is used for enlargement if we would like to add more values to the present given source.

    • Append: Pasting a new source file identical to the used one after the actual one being used.
    • AppendBySource: A new file source may be pasted after the original one.
    • Field: Collects coloumns as parameters and puts them into another coloumn with the coloumns’ values.
    • BaseExpander: Class used for enlargement, primarily used when we would like to multiply a record.
    • ListExpander: Splits list-type elements and puts them into separate lines.
    • Melt: Fixes given coloumns and shows the rest of the coloumns as key-value pairs.
  4. Aggregator

    Aggregators are used to connect and arrange data.

    • Avg: Used to determine the mean average.
    • Count: Used to calculate figures.
    • Sum: Used to determine sums.

Project details


Release history Release notifications

History Node

1.0.6

History Node

1.0.5

History Node

1.0.4

History Node

1.0.3

History Node

1.0.3dev

History Node

1.0.2dev

History Node

1.0.1dev

History Node

1.0.0dev

History Node

0.1.8.7dev

History Node

0.1.8.6dev

History Node

0.1.8.5dev

History Node

0.1.8.4dev

History Node

0.1.8.1dev

History Node

0.1.8dev

History Node

0.1.7.1dev

This version
History Node

0.1.7.0dev

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
mETL-0.1.7.0dev-py2.7.egg (377.7 kB) Copy SHA256 hash SHA256 Egg 2.7 Oct 12, 2013
mETL-0.1.7.0dev.tar.gz (53.9 kB) Copy SHA256 hash SHA256 Source None Oct 12, 2013

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page