Utility generating avro files from postgres.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Plugins
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

pg2avro

Postgres to Avro generator.

Features

Generate Avro schema from column definition.
Generate data format consumable for Avro serialization.

Usage

Generating schema

Method: pg2avro.get_avro_schema

get_avro_schema(
    "mytable", 
    "public", 
    [
        # Dictionary mode
        {
            "name": "column_name_1",
            "type": "int2",
            "nullable": False,
        },
        # SqlAlchemy mode
        SqlAlchemyColumn(ARRAY(TEXT), name="column_name_2"),
        ...
    ]
)

Schema generator needs the following information:

table name
namespace (schema in SQL, dataset in Big Query etc.)
columns - iterable of columns, each element with:
- name
- type - _ prefix is used to indicate array types
- nullable (optional, True assumed if not provided)
column mapping - optional ColumnMapping object with column mappings (see below for more info).

Column data can be passed in multiple formats.

Supported column formats

Dictionary with required keys and data
SqlAlchemy Column object
Any object with compatible attributes and required data
Dictionary or object with required data, but without compatible attributes/keys, supplied with ColumnMapping.

Note: this mode supports generating schema from raw postgres data - udt_name can be used to generate the schema.

columns = [
    CustomColumn(name="column_name", udt_name="int2", is_nullable=False),
]

get_avro_schema(
    table_name,
    namespace,
    columns,
    ColumnMapping(name="name", type="udt_name", nullable="is_nullable"),
)

Generating rows data

Method: pg2avro.get_avro_row_dict

This method requires rows data and schema to generate the rows with.

Supported row formats

Dictionary with keys corresponding to schema field names
Object with keys corresponding to schema field names (works the same as dictionary with corresponding fields)
Tuple with data in the same order as fields specified in schema

columns = [
    {"name": "name", "type": "varchar", "nullable": False},
    {"name": "number", "type": "float4", "float4", "nullable": False},
]
schema = get_avro_schema(table_name, namespace, columns)
rows = [
    {"name": "John", "number": 1.0},
    RowObject(name="Jack", number=2.0),
    ("Jim", 3.0),
]
data = [get_avro_row_dict(row, schema) for row in rows]

Overriding mappings

Some cases might require overriding standard mapping. An example of such scenario is moving pg data into google bigquery where numeric types are handled differently and cannot accept arbitrary scale, so we may want to override that into float.

To do so, simply pass your mapping overrides as a column name keyed dict to the get_avro_schema function like so:

columns = [
    {"name": "some_special_field", "type": "int"},
    {"name": "numeric_with_high_scale", "type": "numeric(20, 15)"},
]
overrides = {
    "some_special_field": {"pg_type": "string", "python_type": str},
    "numeric_with_high_scale": {"pg_type": "float8", "python_type": float},
}

schema = get_avro_schema(table_name, namespace, columns, mapping_overrides=overrides)

pg_type - the type you want the column to look like for pg2avro instead of what was retrieved from pg/sqlalchemy etc.
python_type - built in python type to use for typecasting. Use str, float, int, tuple, list, set and dict here.

And your some_special_field will be mapped into a string instead of int accordingly.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Plugins
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

0.2.5

May 17, 2022

0.2.4

Apr 23, 2020

0.2.3

Apr 16, 2020

0.2.2

Jan 9, 2020

0.2.1

Dec 20, 2019

0.2

Dec 16, 2019

0.1.2

Nov 29, 2019

0.1.1

Nov 29, 2019

0.1

Jul 30, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg2avro-0.2.5.tar.gz (7.9 kB view details)

Uploaded May 17, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pg2avro-0.2.5-py3-none-any.whl (7.8 kB view details)

Uploaded May 17, 2022 Python 3

File details

Details for the file pg2avro-0.2.5.tar.gz.

File metadata

Download URL: pg2avro-0.2.5.tar.gz
Upload date: May 17, 2022
Size: 7.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.9

File hashes

Hashes for pg2avro-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`6a8ced85487bdd5afadebfeb3434e040f78200b01132e3633ccf7c3c2e998505`
MD5	`48ccc1e25b88058a5ae52c4abb1baa0e`
BLAKE2b-256	`201310e6b897120f247ebb51d667bec066be6e4fcbe9fcdf0c80bff60eb8904f`

See more details on using hashes here.

File details

Details for the file pg2avro-0.2.5-py3-none-any.whl.

File metadata

Download URL: pg2avro-0.2.5-py3-none-any.whl
Upload date: May 17, 2022
Size: 7.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.9

File hashes

Hashes for pg2avro-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`440b48e984e5b0c8bc389377407297bf204ae4656d9b66f8d14279f7ddb647ba`
MD5	`eb6770762d0525f998a0307de8d5afb5`
BLAKE2b-256	`cc6a826025948f2743ff6624d7ef6bf059354f8e515b7d18f3216a85bfa14152`

See more details on using hashes here.

pg2avro 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pg2avro

Features

Usage

Generating schema

Supported column formats

Generating rows data

Supported row formats

Overriding mappings

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes