Utility generating avro files from postgres.
Project description
pg2avro
Postgres to Avro generator.
Features
- Generate Avro schema from column definition.
- Generate data format consumable for Avro serialization.
Usage
Generating schema
Method: pg2avro.get_avro_schema
get_avro_schema(
"mytable",
"public",
[
# Dictionary mode
{
"name": "column_name_1",
"type": "int2",
"nullable": False,
},
# SqlAlchemy mode
SqlAlchemyColumn(ARRAY(TEXT), name="column_name_2"),
...
]
)
Schema generator needs the following information:
- table name
- namespace (
schemain SQL,datasetin Big Query etc.) - columns - iterable of columns, each element with:
- name
- type -
_prefix is used to indicate array types - nullable (optional,
Trueassumed if not provided)
- column mapping - optional
ColumnMappingobject with column mappings (see below for more info).
Column data can be passed in multiple formats.
Supported column formats
- Dictionary with required keys and data
- SqlAlchemy Column object
- Any object with compatible attributes and required data
- Dictionary or object with required data, but without compatible attributes/keys, supplied with ColumnMapping.
Note: this mode supports generating schema from raw postgres data - udt_name can be used to generate the schema.
columns = [
CustomColumn(name="column_name", udt_name="int2", is_nullable=False),
]
get_avro_schema(
table_name,
namespace,
columns,
ColumnMapping(name="name", type="udt_name", nullable="is_nullable"),
)
Generating rows data
Method: pg2avro.get_avro_row_dict
This method requires rows data and schema to generate the rows with.
Supported row formats
- Dictionary with keys corresponding to schema field names
- Object with keys corresponding to schema field names (works the same as dictionary with corresponding fields)
- Tuple with data in the same order as fields specified in schema
columns = [
{"name": "name", "type": "varchar", "nullable": False},
{"name": "number", "type": "float4", "float4", "nullable": False},
]
schema = get_avro_schema(table_name, namespace, columns)
rows = [
{"name": "John", "number": 1.0},
RowObject(name="Jack", number=2.0),
("Jim", 3.0),
]
data = [get_avro_row_dict(row, schema) for row in rows]
Overriding mappings
Some cases might require overriding standard mapping. An example of such scenario is moving pg data into google bigquery where numeric types are handled differently and cannot accept arbitrary scale, so we may want to override that into float.
To do so, simply pass your mapping overrides as a column name keyed dict to the get_avro_schema function like so:
columns = [
{"name": "some_special_field", "type": "int"},
{"name": "numeric_with_high_scale", "type": "numeric(20, 15)"},
]
overrides = {
"some_special_field": {"pg_type": "string", "python_type": str},
"numeric_with_high_scale": {"pg_type": "float8", "python_type": float},
}
schema = get_avro_schema(table_name, namespace, columns, mapping_overrides=overrides)
pg_type- the type you want the column to look like for pg2avro instead of what was retrieved from pg/sqlalchemy etc.python_type- built in python type to use for typecasting. Usestr,float,int,tuple,list,setanddicthere.
And your some_special_field will be mapped into a string instead of int accordingly.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pg2avro-0.2.5.tar.gz.
File metadata
- Download URL: pg2avro-0.2.5.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a8ced85487bdd5afadebfeb3434e040f78200b01132e3633ccf7c3c2e998505
|
|
| MD5 |
48ccc1e25b88058a5ae52c4abb1baa0e
|
|
| BLAKE2b-256 |
201310e6b897120f247ebb51d667bec066be6e4fcbe9fcdf0c80bff60eb8904f
|
File details
Details for the file pg2avro-0.2.5-py3-none-any.whl.
File metadata
- Download URL: pg2avro-0.2.5-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
440b48e984e5b0c8bc389377407297bf204ae4656d9b66f8d14279f7ddb647ba
|
|
| MD5 |
eb6770762d0525f998a0307de8d5afb5
|
|
| BLAKE2b-256 |
cc6a826025948f2743ff6624d7ef6bf059354f8e515b7d18f3216a85bfa14152
|