Skip to main content

ETL processes with easy config

Project description

Overview

mETL is an ETL device which has been especially designed to load elective data necessary for CEu. Obviously, the programme can be used in a more general way, it can be used to load practically any kind of data. The programme was designed with Python, taking into maximum consideration the optimal memory usage after having assessed the Brewery device’s capabilities.

Capabilities

The actual version supports the most widespread file formats with data migration and data migration packages. These include:

Source- types:

  • CSV, TSV, XLS, Google SpreadSheet, Fixed width file

  • PostgreSQL, MySQL, Oracle, SQLite, Microsoft SQL Server

  • JSON, XML, YAML

Target- types:

  • CSV, TSV, XLS - with file continuation as well

  • Fixed width file

  • PostgreSQL, MySQL, Oracle, SQLite, Microsoft SQL Server - with the purpose of modification as well

  • JSON, XML, YAML

During the develpoment of the programme we tried to provide the whole course of processing with the most widespread transformation steps, programme structures and mutation steps. In light of this, the programme by default possesses the following transformations:

  • Add: Adds an arbitrary number to a value.

  • Clean: Removes the different types of punctuation marks. (dots, commas, etc.)

  • ConvertType: Modifies the type of the field to another type.

  • Homogenize: Converts the accentuated letters to unaccentuated ones. (NFKD format)

  • LowerCase: Converts to lower case.

  • Map: Changes the value of a field to anothe value.

  • RemoveWordsBySource: Using another source, it removes certain words.

  • ReplaceByRegexp: Makes a change (replaces) by a regular expression.

  • ReplaceWordsBySource: Replaces words using another source.

  • Set: Sets a certain value.

  • Split: Separates words by spaces and leaves a given interval.

  • Stem: Brings words to a stem. (root)

  • Strip: Removes the unnecessary spaces and/or other characters from the beginning and ending of the value.

  • Sub: Subtracts a given number from a given value.

  • Title: Capitalizes the first letter of every word.

  • UpperCase: Converts to upper case.

Four groups are differentiated in case of manipulations:

  1. Modifier

    Modifiers are those objects that are given a whole line (record) and revert with a whole line. However, during their processes they make changes to values with the usage of the related values of different fields.

    • JoinByKey: Merge and join two different record.

    • Order: Orders lines according to the given conditions.

    • Set: Sets a value with the use of fix value scheme, function or another source.

    • SetWithMap: Sets a value in case of a complicated type with a given map.

    • TransformField: During manipulation, regular field transformation can be achieved with this command .

  2. Filter

    Their function is primarily filtering. It is used when we would like to evaluate or get rid of incomlete or faulty records as a result of an earlier tranformation.

    • DropByCondition: The fate of the record depends on a condition.

    • DropBySource: The fate is decided by whether or not the record is in another file.

    • DropField: Does not decrease the number of records but field can be deleted with it.

    • KeepByCondition: The fate of the record depends on a condition.

  3. Expand

    It is used for enlargement if we would like to add more values to the present given source.

    • Append: Pasting a new source file identical to the used one after the actual one being used.

    • AppendBySource: A new file source may be pasted after the original one.

    • Field: Collects coloumns as parameters and puts them into another coloumn with the coloumns’ values.

    • BaseExpander: Class used for enlargement, primarily used when we would like to multiply a record.

    • ListExpander: Splits list-type elements and puts them into separate lines.

    • Melt: Fixes given coloumns and shows the rest of the coloumns as key-value pairs.

  4. Aggregator

    Aggregators are used to connect and arrange data.

    • Avg: Used to determine the mean average.

    • Count: Used to calculate figures.

    • Sum: Used to determine sums.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mETL-0.1.7.0dev.tar.gz (53.9 kB view hashes)

Uploaded Source

Built Distribution

mETL-0.1.7.0dev-py2.7.egg (377.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page