Utility generating avro files from postgres.
Project description
pg2avro
Postgres to Avro generator.
Features
- Generate Avro schema from column definition.
- Generate data format consumable for Avro serialization.
Usage
Generating schema
Method: pg2avro.get_avro_schema
get_avro_schema(
"mytable",
"public",
[
# Dictionary mode
{
"name": "column_name_1",
"type": "int2",
"nullable": False,
},
# SqlAlchemy mode
SqlAlchemyColumn(ARRAY(TEXT), name="column_name_2"),
...
]
)
Schema generator needs the following information:
- table name
- namespace (
schema
in SQL,dataset
in Big Query etc.) - columns - iterable of columns, each element with:
- name
- type -
_
prefix is used to indicate array types - nullable (optional,
True
assumed if not provided)
- column mapping - optional
ColumnMapping
object with column mappings (see below for more info).
Column data can be passed in multiple formats.
Supported column formats
- Dictionary with required keys and data
- SqlAlchemy Column object
- Any object with compatible attributes and required data
- Dictionary or object with required data, but without compatible attributes/keys, supplied with ColumnMapping.
Note: this mode supports generating schema from raw postgres data - udt_name
can be used to generate the schema.
columns = [
CustomColumn(name="column_name", udt_name="int2", is_nullable=False),
]
get_avro_schema(
table_name,
namespace,
columns,
ColumnMapping(name="name", type="udt_name", nullable="is_nullable"),
)
Generating rows data
Method: pg2avro.get_avro_row_dict
This method requires rows data and schema to generate the rows with.
Supported row formats
- Dictionary with keys corresponding to schema field names
- Object with keys corresponding to schema field names (works the same as dictionary with corresponding fields)
- Tuple with data in the same order as fields specified in schema
columns = [
{"name": "name", "type": "varchar", "nullable": False},
{"name": "number", "type": "float4", "float4", "nullable": False},
]
schema = get_avro_schema(table_name, namespace, columns)
rows = [
{"name": "John", "number": 1.0},
RowObject(name="Jack", number=2.0),
("Jim", 3.0),
]
data = [get_avro_row_dict(row, schema) for row in rows]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pg2avro-0.1.2.tar.gz
(5.5 kB
view hashes)