A DRY multi-database normalizer.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Project description

Syngenta Digital DTA (Database Adapter)

A DRY multi-database normalizer.

Features

Use the same package with multiple database engines
Able to validate your data against predefined schema in code (for no-schemas solutions)
Creates easy pub-sub architecture based on model changes
Local development support

Philosophy

The dta philosophy is to use one pattern with multiple databases

The dta encourages pub-sub architecture by allowing for the automatic publishing of data over SNS.

Installation

This is a python module available through the pypi registry.

Before installing, download and install python. python 3 or higher is required.

Installation is done using the pip install command:

$ pip install syngenta_digital_dta

$ pipenv install syngenta_digital_dta

Common Usage: DynamoDB

import os
import syngenta_digital_dta

adapter = syngenta_digital_dta.adapter(
    engine='dynamodb',
    table=os.getenv('DYNAMODB_TABLE'),
    endpoint='http://localhost:4000',
    model_schema='v1-table-model',
    model_schema_file='application/openapi.yml',
    model_identifier='test_id',
    model_version_key='modified'
)

Initialize Options

Option Name	Required	Type	Description
`engine`	true	string	name of supported db engine (dynamodb)
`table`	true	string	name of dynamodb table
`endpoint`	false	string	url of the dynamodb table (useful for local development)
`model_schema`	true	string	key of openapi schema this is being set against
`model_schema_file`	true	string	path where your schema file can found (accepts JSON as well)
`model_identifier`	true	string	unique identifier key on the model
`model_version_key`	true	string	key that can be used as a version key (modified timestamps often suffice)
`author_identifier`	false	string	unique identifier of the author who made the change (optional)
`sns_arn`	false	string	sns topic arn you want to broadcast the changes to
`sns_attributes`	false	dict	custom attributes in dict format; values should only be strings or numbers
`sns_default_attributes`	false	boolean	determines if default sns attributes are included in sns message (model_identifier, model_version_key, model_schema, author_identifier)

Examples

DynamoDB Create

result = adapter.create(
	operation='insert', # or overwrite (optional); defaults to insert
	data=some_dict_to_insert_into_the_table,
)

result = adapter.insert(data=some_dict_to_insert_into_the_table) # alias

DynamoDB Read

result = adapter.read(
    operation='get', # or query or scan (optional); defaults to get
	query={
	   'Key': {
	        'example_id': '3'
	   }
    }
)

result = adapter.get(
	query={
	   'Key': {
	        'example_id': '3'
	   }
    }
)

results = adapter.read(
    operation='query',
    query={
        'IndexName': 'test_query_id',
        'Limit': 1,
        'KeyConditionExpression': 'test_query_id = :test_query_id',
        'ExpressionAttributeValues': {
            ':test_query_id': 'def345'
        }
    }
)

DynamoDB Update

result = adapter.update(
	data=some_dict_to_update_the_model,
	operation='get',
	query={
	   'Key': {
	        'example_id': '3'
	   }
    }
)

DynamoDB Delete

result = adapter.delete(
	query={
	   'Key': {
	        'example_id': '3'
	   }
    }
)

Common Usage: Postgres & Redshift

import os
import syngenta_digital_dta

adapter = syngenta_digital_dta.adapter(
    engine='postgres', # or redshift
    table='users',
    endpoint='localhost',
    database='dta',
    port=5439, # 5432 for redshift
    user=os.getenv('POSTGRES_USER'),
    password=os.getenv('POSTGRES_PASSWORD'),
    model_schema='test-postgres-user-model',
    model_schema_file='tests/openapi.yml',
    model_identifier='user_id',
    model_version_key='modified',
    relationships={
        'addresses': 'user_id'
    }
)

Initialize Options

Option Name	Required	Type	Description
`engine`	true	string	name of supported db engine (dynamodb)
`table`	true	string	name of postgres table to work as primary query point
`endpoint`	true	string	url of the postgres cluster
`database`	true	string	name of the database to connect to
`port`	true	int	port of database (defaults to 5439)
`user`	true	string	username for database access
`password`	true	string	password for database access
`model_schema`	true	string	key of openapi schema this is being set against
`model_schema_file`	true	string	path where your schema file can found (accepts JSON as well)
`model_identifier`	true	string	unique identifier key on the model
`model_version_key`	true	string	key that can be used as a version key (modified timestamps often suffice)
`autocommit`	false	boolean	will commit transactions automatically without direct call
`relationships`	false	dict	key is the table with the relationship and value is the foreign key on that table (assumes your primary key name is equal to that table's foreign key)
`author_identifier`	false	string	unique identifier of the author who made the change (optional)
`sns_arn`	false	string	sns topic arn you want to broadcast the changes to
`sns_attributes`	false	dict	custom attributes in dict format; values should only be strings or numbers
`sns_default_attributes`	false	boolean	determines if default sns attributes are included in sns message (model_identifier, model_version_key, model_schema, author_identifier)[default: true]

Examples

Postgres/Reshift Connect

# will always pool and share connections
self.user_adapter.connect()

Postgres/Reshift Create

data = {
    'user_id': str(uuid.uuid4()),
    'email': 'somen.user@some-email.com',
    'first': 'Some',
    'last': 'User'
}
result = self.user_adapter.create(data=data, commit=True)
result = self.user_adapter.insert(data=data, commit=True) # alias

Postgres/Reshift Update

data = {
    'user_id': 'some-update-guid',
    'email': 'somen.user@some-email.com',
    'first': 'Some',
    'last': 'User'
}
result = self.user_adapter.update(data=data, commit=True)

Postgres/Reshift Upsert

data = {
    'user_id': 'some-update-guid',
    'email': 'somen.user@some-email.com',
    'first': 'Some',
    'last': 'User'
}
result = self.user_adapter.upsert(data=data, commit=True)

Postgres/Reshift Delete

self.user_adapter.delete('some-delete-guid', commit=True)

Postgres/Reshift Read

# will only return 1 row or None
result = self.user_adapter.read('some-read-guid')
result = self.user_adapter.get('some-read-guid') # alias

# all fields optional (defaults to SELECT * FROM {table} ORDER BY {model_identifier} ASC LIMIT 1000)
results = self.user_adapter.read_all(
    where={
        'first': 'first',
        'last': 'last',
    },
    limit=2,
    offset=1,
    orderby_column='first',
    orderby_sort='DESC'
)

# limited to get 1 relationship at a time
results = self.user_adapter.get_relationship('addresses', where={'user_id': 'some-user-relationship-guid'})

# only will allow read-only operations
# query and params are required; params can be empty dict
results = self.user_adapter.query(
    query='SELECT * FROM users WHERE user_id = %(identifier_value)s',
    params={
        'identifier_value':'some-query-relationship-guid'
    }
)

Common Usage: Elasticsearch

import os
import syngenta_digital_dta

# localhost connection
adapter = syngenta_digital_dta.adapter(
    engine='elasticsearch',
    index='users',
    endpoint='localhost',
    model_schema='test-elasticsearch-user-model',
    model_schema_file='tests/openapi.yml',
    model_identifier='user_id',
    model_version_key='modified'
)

# lambda connection (assumes lambda role has access)
adapter = syngenta_digital_dta.adapter(
    engine='elasticsearch',
    index='users',
    endpoint='localhost',
    model_schema='test-elasticsearch-user-model',
    model_schema_file='tests/openapi.yml',
    model_identifier='user_id',
    model_version_key='modified',
    authentication='lambda'
)

# traditional user password connection
adapter = syngenta_digital_dta.adapter(
    engine='elasticsearch',
    index='users',
    endpoint='localhost',
    model_schema='test-elasticsearch-user-model',
    model_schema_file='tests/openapi.yml',
    model_identifier='user_id',
    model_version_key='modified',
    authentication='user-password',
    user='root',
    password='root'
)

Initialize Options

Option Name	Required	Type	Description
`engine`	true	string	name of supported db engine (dynamodb)
`index`	true	string	name of postgres table to work as primary query point
`endpoint`	true	string	url of the postgres cluster
`model_schema`	true	string	key of openapi schema this is being set against
`model_schema_file`	true	string	path where your schema file can found (accepts JSON as well)
`model_identifier`	true	string	unique identifier key on the model
`model_version_key`	true	string	key that can be used as a version key (modified timestamps often suffice)
`port`	false	int	port of database (defaults to 9200 if localhost or 443 if not)
`author_identifier`	false	string	unique identifier of the author who made the change (optional)
`authentication`	false	string	either 'lamnbda' or 'user-password'
`user`	false	string	only needed if authentication is user-password
`password`	false	string	only needed if authentication is user-password
`sns_arn`	false	string	sns topic arn you want to broadcast the changes to
`sns_attributes`	false	dict	custom attributes in dict format; values should only be strings or numbers
`sns_default_attributes`	false	boolean	determines if default sns attributes are included in sns message (model_identifier, model_version_key, model_schema, author_identifier) [default: true]

Elasticsearch Connection

# elasticsearch is auto-connected to a shared connection; use this to test that connection
self.adapter.connection.ping()

Elasticsearch Set-up

# will convert openapi schema, defined in init, to a mapping
self.adapter.create_template(
    name='users',
    index_patterns='users-*',
    special={'phone': 'keyword'} # (optional) can send mapping of special types otherwise will default based on schema type
)

# will use sensible defaults or you can pass in a custom settings kwargs['settings']
self.adapter.create_index(settings=some_optional_settings)

OpenAPI Default Conversion

OpenAPI Type	Elasticsearch Mapping
array	none (not needed to be included)
array of objects	nested
boolean	boolean
integer	integer
number	long
object (with properties)	object
object (without properties)	flattened
string (no format)	text
string (format date)	date (with iso date format acceptance)
string (format date-time)	date (with iso date format acceptance)
string (format email)	text (with url email analyzer)
string (format ip)	ip
string (format hostname)	text (with url email analyzer)
string (format iri)	text (with url email analyzer)
string (format url)	text (with url email analyzer)

Elasticsearch Create

data = {
    'user_id': uuid.uuid4().hex,
    'email': 'somen.user@some-email.com',
    'first': 'Some',
    'last': 'User',
    'phone': 1112224444
}
self.adapter.create(data=data, refresh=True) # (optional) refresh defaults to True

Elasticsearch Update

updated_data = {
    'user_id': user_id,
    'email': 'peter.cruse@some-email.com',
    'first': 'Peter'
}
self.adapter.update(data=updated_data, refresh=True) # (optional) refresh defaults to True

Elasticsearch Upsert

data = {
    'user_id': upsert_id,
    'email': 'somen.user-upsert@some-email.com',
    'first': 'Some',
    'last': 'User',
    'phone': 1112224444
}
self.adapter.upsert(data=data, refresh=True) # (optional) refresh defaults to True

Elasticsearch Delete

self.adapter.delete(delete_id, refresh=True) # (optional) refresh defaults to True

Elasticsearch Read

response = self.adapter.get(get_id)

# returns single dictionary mapped to openapi model defined in init (or empty dict)
dict_response = self.adapter.get(get_id, normalize=True) # (optional) normalize defaults to False

# returns list of dictionaries mapped to openapi model defined in init (or empty array)
list_response = self.adapter.query(
    normalize=True, # (optional) normalize defaults to False
    query={
        'match': {
            'first': 'Normal'
        }
    }
)

Common Usage: S3

adapter = syngenta_digital_dta.adapter(
    engine='s3',
    endpoint=self.endpoint,
    bucket=self.bucket
)

Initialize Options

Option Name	Required	Type	Description
`engine`	true	string	name of supported db engine (s3)
`bucket`	true	string	name of bucket you are interfacing with
`endpoint`	true	string	url of the s3 endpoint (useful for local development)
`sns_arn`	false	string	sns topic arn you want to broadcast the changes to
`sns_attributes`	false	dict	custom attributes in dict format; values should only be strings or numbers

NOTE: If you use the SNS functionality, all SNS messages are sent presigned urls for S3, not the actual data itself given the SNS message size limitations. Below is an an example payload:

{
    "presigned_url": "https://some-s3-url"
}

S3 Create (Single)

# automatically converts dicts to json with flag
adapter.create(
    s3_path='test/test-create.json',
    data={'test': True},
    json=True
)

S3 Create (Multipart)

file = open('./tests/mock/example.json')
chunks = []
for piece in iter(file.read(6000000), ''):
    chunks.append(piece)
adapter.multipart_upload(chunks=chunks, s3_path='test/test-create.json')

S3 Create (Stream)

url = 'https://github.com/syngenta-digital/package-python-dta/archive/refs/heads/master.zip'
response = requests.get(url, stream=True)
self.adapter.upload_stream(data=response.content, s3_path='test/code-clone.zip')

S3 Create (Pre-Signed UPLOAD URLs)

presigned_upload_url = adapter.create_presigned_post_url(s3_path='test/test-create.json', expiration=3600)

S3 Read (In Memory)

# automatically converts json to dict with flag
result = adapter.read(
    s3_path='test/test-create.json',
    json=True
)

S3 Read (Download to Disk)

# automatically creates directory and child directories
file_path = adapter.download(s3_path='test/test-create.json', download_path='/tmp/unit-test-download/test.json')

S3 Read (Pre-Signed URLs)

presigned_url = adapter.create_presigned_read_url(s3_path='test/test-create.json', expiration=3600)

S3 Delete

adapter.delete(s3_path='test/test-create.json')

Contributing

If you would like to contribute please make sure to follow the established patterns and unit test your code:

Local Unit Testing

In one tab, run pipenv run local
In a second tab, run RUN_MODE=unittest python -m unittest discover

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

0.1.14

Jun 4, 2024

0.1.13

May 28, 2024

0.1.12

Aug 22, 2023

0.1.11

Aug 14, 2023

0.1.10

Jun 19, 2023

0.1.9

Jun 15, 2023

0.1.8

Jun 14, 2023

0.1.7

Mar 17, 2023

0.1.6

Mar 6, 2023

0.1.5

Jan 31, 2023

0.1.4

Jan 30, 2023

0.1.3

Jan 20, 2023

0.1.2

Jan 20, 2023

0.1.1

Jan 20, 2023

0.1.0

Jan 20, 2023

0.0.61

Nov 8, 2022

0.0.61b0 pre-release

Jan 20, 2023

0.0.60

Sep 6, 2022

0.0.59

Sep 1, 2022

0.0.58

Aug 31, 2022

0.0.57

Aug 29, 2022

0.0.56

Jul 5, 2022

0.0.55

Jun 1, 2022

0.0.54

Apr 18, 2022

0.0.53

Apr 1, 2022

0.0.52

Apr 1, 2022

0.0.51

Feb 3, 2022

0.0.50

Feb 3, 2022

0.0.49

Feb 2, 2022

0.0.48

Jan 27, 2022

0.0.47

Jan 27, 2022

0.0.46

Jan 27, 2022

This version

0.0.45

Jan 10, 2022

0.0.44

Dec 22, 2021

0.0.43

Oct 29, 2021

0.0.42

Oct 27, 2021

0.0.41

Oct 25, 2021

0.0.40

Oct 20, 2021

0.0.39

Oct 11, 2021

0.0.38

Oct 8, 2021

0.0.37

Sep 28, 2021

0.0.36

Sep 23, 2021

0.0.35

Sep 15, 2021

0.0.34

Sep 15, 2021

0.0.33

Sep 7, 2021

0.0.32

Sep 3, 2021

0.0.31

Aug 24, 2021

0.0.30

Aug 5, 2021

0.0.29

Aug 2, 2021

0.0.28

Jul 28, 2021

0.0.27

Jul 28, 2021

0.0.26

Jun 4, 2021

0.0.25

May 29, 2021

0.0.24

May 25, 2021

0.0.23

May 25, 2021

0.0.22

May 17, 2021

0.0.21

May 16, 2021

0.0.20

May 15, 2021

0.0.19

May 13, 2021

0.0.18

May 13, 2021

0.0.17

Apr 6, 2021

0.0.16

Apr 2, 2021

0.0.9

Mar 29, 2021

0.0.5

Mar 19, 2021

0.0.4

Mar 16, 2021

0.0.3

Dec 8, 2020

0.0.2

Dec 2, 2020

0.0.1

Dec 2, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syngenta_digital_dta-0.0.45.tar.gz (34.0 kB view hashes)

Uploaded Jan 10, 2022 Source

Built Distribution

syngenta_digital_dta-0.0.45-py3-none-any.whl (44.5 kB view hashes)

Uploaded Jan 10, 2022 Python 3

Hashes for syngenta_digital_dta-0.0.45.tar.gz

Hashes for syngenta_digital_dta-0.0.45.tar.gz
Algorithm	Hash digest
SHA256	`582a3964f27965474e8ee9fa993f11ffe513e3073da824b82d227b6130162f85`
MD5	`eccb2e8289f15fc9bdee74afa100ab3f`
BLAKE2b-256	`4f8cdb0cbf1a93bf2017424af108e60ea8e3561c4cf1c90cd5fe43721058fce6`

Hashes for syngenta_digital_dta-0.0.45-py3-none-any.whl

Hashes for syngenta_digital_dta-0.0.45-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9e0e1e866b9c5dc12ee0df8db3a9bc107988f6abd023c7b798578a799370decd`
MD5	`c13064d5f3f19aa0d8a076470535e465`
BLAKE2b-256	`bc25cc6f2cf479d8b5f7b07ea9e0719e5b49eea2f7067b825837fd00c47b9d85`

syngenta-digital-dta 0.0.45

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Syngenta Digital DTA (Database Adapter)

Features

Philosophy

Installation

Common Usage: DynamoDB

DynamoDB Create

DynamoDB Read

DynamoDB Update

DynamoDB Delete

Common Usage: Postgres & Redshift

Postgres/Reshift Connect

Postgres/Reshift Create

Postgres/Reshift Update

Postgres/Reshift Upsert

Postgres/Reshift Delete

Postgres/Reshift Read

Common Usage: Elasticsearch

Elasticsearch Connection

Elasticsearch Set-up

Elasticsearch Create

Elasticsearch Update

Elasticsearch Upsert

Elasticsearch Delete

Elasticsearch Read

Common Usage: S3

S3 Create (Single)

S3 Create (Multipart)

S3 Create (Stream)

S3 Create (Pre-Signed UPLOAD URLs)

S3 Read (In Memory)

S3 Read (Download to Disk)

S3 Read (Pre-Signed URLs)

S3 Delete

Contributing

Local Unit Testing

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution