Skip to main content

Transform records using an Avro schema and custom map functions.

Project description

RecordMapper

Read, transform and write records using an Avro schema and custom map functions.

Installing the project

To install the project, run the following command from the root directory:

$pip install .

It is highly recommended to use a virtual environment when installing the project dependencies in order to avoid version conflicts.

Updating PyPI version

poetry publish --build

RecordMapper elements

Appliers

Appliers are the elements that materialise the records transformations. They apply sequentially specific transformations to each record and/or its schema.

There are four appliers defined by now:

  • The selector applier, which will modify the base schema if there exist nested schemas to consider.
  • The rename applier, which will develop a renaming process using the aliases included in the schema.
  • The transform applier which will apply the record transformations given the transforming functions.
  • The clean applier, which will filter the output fields to keep only the ones given in the output schema.

An Applier is a class that implements the apply method, which receives a single record and its schema and returns their transformed version after the transforming process. They are located in the appliers directory.

Readers

To apply the records transformations, the Record Mapper must be able to read the file that contains the data in order to extract them. The Record Mapper supports reading files from different formats, including csv, xml and avro.

The reading process is done by the Reader objects. A Reader class implements methods to read different kind of files. The Reader class is extended by sub-classes, each of them specialized in reading an specific format. For example, to read an XML file we can use the XMLReader class. To extract data from a CSV file, we will be using the CSVReader class.

A Reader sub-class implements the read_records_from_input method, which will return the content of the file record by record. Each specific Reader sub-class is located in the directory of its own format, sharing space with the correspondent Writer sub-class.

Writers

After applying the records transformations, the Record Mapper must be able to write the resultant transformed records in a file. It supports writing files for different formats, including csv and avro.

Important! The Record Mapper can return different files as output, one for each format, but at least it is mandatory to write the avro file. Thus, avro file is always returned.

The writing process is done by the Writer objects. A Writer class implements methods to write to different formats. The Writer class is extended by sub-classes, each of them specialized in writing an specific format. For example, to write a CSV file we can use the CSVWriter class while we will be using the AvroWriter to write an Avro file.

A Writer sub-class implements the write_records_to_output method, which will write a given iterable of records in an output file. Each specific Writer sub-class is located in the directory of its own format, sharing space with the correspondent Reader sub-class. The method accepts other output options as parameters. These output options include:

  • Flattening nested schemas when writing csv files.
  • Merging schemas when writing avro files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RecordMapper-0.7.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

RecordMapper-0.7-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file RecordMapper-0.7.tar.gz.

File metadata

  • Download URL: RecordMapper-0.7.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.0 Darwin/20.3.0

File hashes

Hashes for RecordMapper-0.7.tar.gz
Algorithm Hash digest
SHA256 cee183fe7fcc3f65f033b321a71a2b183c376135fffe0ce57233e39c42f78b4b
MD5 a41d75c1962a39eb6f13907ee2ff3d56
BLAKE2b-256 e9172b6b7c6e3e4e1210e1cc5960e51002fb5d1a8d21583f91bcf8ba45fe2994

See more details on using hashes here.

File details

Details for the file RecordMapper-0.7-py3-none-any.whl.

File metadata

  • Download URL: RecordMapper-0.7-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.0 Darwin/20.3.0

File hashes

Hashes for RecordMapper-0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ab5602908d475b763f623e24e79a1cf8d1392b75bf65bdc2b1bc4c7fae4be47a
MD5 f77ebf007a75064a0eaae0f7c4b4722d
BLAKE2b-256 bfa870f13a27076214dc0660d7f405caab62f99238fc681d90e547fbf330f2d3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page