Skip to main content

Utility generating avro files from postgres.

Project description

pg2avro

Postgres to Avro generator.

Features

  • Generate Avro schema from column definition.
  • Generate data format consumable for Avro serialization.

Usage

Generating schema

Method: pg2avro.get_avro_schema

get_avro_schema(
    "mytable", 
    "public", 
    [
        # Dictionary mode
        {
            "name": "column_name_1",
            "type": "int2",
            "nullable": False,
        },
        # SqlAlchemy mode
        SqlAlchemyColumn(ARRAY(TEXT), name="column_name_2"),
        ...
    ]
)

Schema generator needs the following information:

  • table name
  • namespace (schema in SQL, dataset in Big Query etc.)
  • columns - iterable of columns, each element with:
    • name
    • type - _ prefix is used to indicate array types
    • nullable (optional, True assumed if not provided)
  • column mapping - optional ColumnMapping object with column mappings (see below for more info).

Column data can be passed in multiple formats.

Supported column formats

  • Dictionary with required keys and data
  • SqlAlchemy Column object
  • Any object with compatible attributes and required data
  • Dictionary or object with required data, but without compatible attributes/keys, supplied with ColumnMapping.

Note: this mode supports generating schema from raw postgres data - udt_name can be used to generate the schema.

columns = [
    CustomColumn(name="column_name", udt_name="int2", is_nullable=False),
]

get_avro_schema(
    table_name,
    namespace,
    columns,
    ColumnMapping(name="name", type="udt_name", nullable="is_nullable"),
)

Generating rows data

Method: pg2avro.get_avro_row_dict

This method requires rows data and schema to generate the rows with.

Supported row formats

  • Dictionary with keys corresponding to schema field names
  • Object with keys corresponding to schema field names (works the same as dictionary with corresponding fields)
  • Tuple with data in the same order as fields specified in schema
columns = [
    {"name": "name", "type": "varchar", "nullable": False},
    {"name": "number", "type": "float4", "float4", "nullable": False},
]
schema = get_avro_schema(table_name, namespace, columns)
rows = [
    {"name": "John", "number": 1.0},
    RowObject(name="Jack", number=2.0),
    ("Jim", 3.0),
]
data = [get_avro_row_dict(row, schema) for row in rows]

Overriding mappings

Some cases might require overriding standard mapping. An example of such scenario is moving pg data into google bigquery where numeric types are handled differently and cannot accept arbitrary scale, so we may want to override that into float.

To do so, simply pass your mapping overrides as a column name keyed dict to the get_avro_schema function like so:

columns = [
    {"name": "some_special_field", "type": "int"},
    {"name": "numeric_with_high_scale", "type": "numeric(20, 15)"},
]
overrides = {
    "some_special_field": {"pg_type": "string", "python_type": str},
    "numeric_with_high_scale": {"pg_type": "float8", "python_type": float},
}

schema = get_avro_schema(table_name, namespace, columns, mapping_overrides=overrides)
  • pg_type - the type you want the column to look like for pg2avro instead of what was retrieved from pg/sqlalchemy etc.
  • python_type - built in python type to use for typecasting. Use str, float, int, tuple, list, set and dict here.

And your some_special_field will be mapped into a string instead of int accordingly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg2avro-0.2.5.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pg2avro-0.2.5-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file pg2avro-0.2.5.tar.gz.

File metadata

  • Download URL: pg2avro-0.2.5.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.9

File hashes

Hashes for pg2avro-0.2.5.tar.gz
Algorithm Hash digest
SHA256 6a8ced85487bdd5afadebfeb3434e040f78200b01132e3633ccf7c3c2e998505
MD5 48ccc1e25b88058a5ae52c4abb1baa0e
BLAKE2b-256 201310e6b897120f247ebb51d667bec066be6e4fcbe9fcdf0c80bff60eb8904f

See more details on using hashes here.

File details

Details for the file pg2avro-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: pg2avro-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.9

File hashes

Hashes for pg2avro-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 440b48e984e5b0c8bc389377407297bf204ae4656d9b66f8d14279f7ddb647ba
MD5 eb6770762d0525f998a0307de8d5afb5
BLAKE2b-256 cc6a826025948f2743ff6624d7ef6bf059354f8e515b7d18f3216a85bfa14152

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page