Examples how to use the Informatica EDC samples. Loosely based on Informatica's EDC-REST-API-Samples

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Informatica EDC REST-API samples using python

Contains examples for connecting to and generating attribute level lineage in EDC via python.

Requirements

python 3.6+
Python editors (ide/environments)
- VS Code - good support for python - free an runs on all platforms https://code.visualstudio.com/
- pycharm - https://www.jetbrains.com/pycharm/
- anaconda - for JupterLab/Notebooks https://www.anaconda.com/ (includes vscode)
- Eclipse - ide for java/python (python using/installing pydev)
  - Download: http://www.eclipse.org/downloads/eclipse-packages/
other useful tools
- rest api clients - for testing syntax/api calls + the good ones generate code for many languages
  - postman - https://www.getpostman.com/
  - insomnia - https://insomnia.rest/

Getting Started

verify that python is installed - v3.6+
Create a new VSCode/pycharm/Eclipse Project and import/use the files in the python folder (not the java folder)
Ensure EDC is running while executing the samples - try/except code will catch & immediately exit The property suppress_edc_calls can be set in the config.json to bypass any calls to EDC
Use a virtual environment with python
- python3 -m venv venv
- and then
  - source venv/bin/activate (for linux/macos)
  - venv/Scripts/activate.ps1 (for windows powershell) Note: you may need to execute Set-ExecutionPolicy unrestricted for powershell (run powershell as administrator to do this)
  - venv/Scripts/activate.bat (windows cmd)
- after activating
  - execute the following to get the latest version for test purposes pip3 install --extra-index-url https://test.pypi.org/simple/ informatica-edc-rest-api-samples
  - execute the following to get a tested version (see coverage overview in htmlcoverage): pip3 install informatica-edc-rest-api-samples
- Run the code (remember to have an activated venv): python3 run_edc_lineage.p

REST API Authentication

the EDC rest api supports Basic Authentication only - see https://yourcatalogserver:port/access for details
we use the python requests module for all http(s) rest calls (very easy to use)
when making a rest api call - you can pass either the id/password - or a http header with an encoded password
for all examples here, we initially used the id/pwd method - but have switched to use http headers
if you are using LDAP authentication - the user must have the security domain and a '' character prefixed to the user id
- e.g. COMPANY_LDAP\user_a
use the encodeUser.py script - to create the basic auth encoding for your user, and store in a variable named INFA_EDC_AUTH
- you can set the variable for each session, so it is not stored anywhere
- if using docker - you can add this variable to an .env file to pass to docker at runtime
- if using VS Code - you can add and "env" setting for individual environment variables used in the debugger (launch.json)
  - e.g. "env" : {"INFA_EDC_AUTH" : "Basic dXNlcjE6YUNvbXBsIWNAdGVkUGEkM3cwcmQ="}, (works but prefer .env file)
- Note: any files inside of .vscode (e.g. launch.json) will be excluded from the git repo (each user has their own local version)
In this fork setupConnection.py and the functionality to use an .env has been removed

HTTPS/TLS/SSL Connections and certificates

assuming your catalog service is https enabled (it should be, if not so your passwords are send in clear text & set verify=False)
you will either need to download/copy the certificate (.pem format, not .jks) locally
- or set flags to disable certificate authentication (not recommended, but possible)
if your ssl certificate is self signed (also not recommended), an additional warning will need to be suppressed
- more information about SSL authentication can be found https://3.python-requests.org/user/advanced/#ssl-cert-verification

Sample Programs in the Project

The following samples may no longer work in this fork of the original project. If you want to use them, please use a clone of the original project, not this fork.

encodeUser.py: simple program to prompt for a userid/pwd and optionally a security domain and create a base64 encoded string that can be used for authentication in the http header. e.g. "Basic dXNlcjE6YUNvbXBsIWNAdGVkUGEkM3cwcmQ="
- use this script before you call use the other scripts, to get the right format for authenticating & not storing passwords in the .py files
  - an alternate is to prompt for a password within your script & encode the id:password
- use encodeUser27.py for legacy python
EDCQuery_template.py: a template/skeleton that shows how to connect to the catalog and execute a search using python. the result-set processing includes handling the paging model. It also uses the get_fact_value method in edcutils.py to extract the item name from the facts array
Utility/Heloer Scripts
- edcutils.py: utility/helper methods for common tasks - like get an attribute value get_fact_value(item, attrName)
- edcSessionHelper.py: EDCSession class helps you configure a requests.session object and also provides command-line args for connecting to the catalog (-c/-edcurl EDC URL, -a/--auth auth credentials (see encodeUser.py), -u username (will prompt for pwd - recommend using -a, -s/--sslcert SSLCERT).
  - this class also supports using the following environment vars:
    - INFA_EDC_URL - e.g. http://yourcatalogserver:9085 or https://yourcatalogserver:9085
    - INFA_EDC_AUTH - e.g. "Basic dXNlcl9hOnJlYWxseXNlY3VyZXBhc3N3b3Jk" - see encodeUser.py
    - INFA_EDC_SSL_PEM - certificate to use to connect (or set to None - to disable ssl verfication)
  - for an example of usage - see listAndCountCustomAttributes.py
listAndCountCustomAttributes.py: find all custom attributes (normal and classification) and count the # of times the attribute is used. writes results to csv file (output folder can be configured)
- supports command-line parameters and environment vars for accessing the catalog.
- uses edcSessionHelper.py to get a session reference to any rest queries
listCustomAttributes.py: simple script to print all custom attributes (name, id, type, sortable, facetable)
- this script will list both regular custom attributes /2/catalog/models/attributes and reference 'classification' attributes /2/catalog/models/referenceAttributes
similarityReport.py: v10.2.1+ utility to find & export all columns/fields with similar links
- note: this script will attempt to query all dataelements, even if similarity profiling was not run. for a better implementation, use similarityByResource.py
similarityByResource.py: utility to find and export column similarity for all resources that similarity profiling was configured.
- supports command-line parameters and environment vars for accessing the catalog.
- uses edcSessionHelper.py to get a session reference to any rest queries
dbSchemaReplicationLineage.py: provides the ability to link tables/columns in a database schema that are replicated to other schemas/databases & no scanner exists to automatcially document these relationships. (e.g. sqoop, scripts/code, goldengate ...)
- see dbSchemaReplicationLineage.md for more
externalDBLinker.py: script to generate custom lineage for any tables/columns created within an ExternalDatabase/ExternalSchema (often happens with Oracle (dblink) and SQLServer databases (references to databases in views)
- see externalDBLinker.md for more
domainSummary.py - queries the catalog to find all instances where data domains are used & counts the # of All, Accepted, Inferred, Rejected for all resources and per resource
- output is an excel workbook (domain_summary.xlxs) with a worksheet for counts across all resources, and a worksheet per resource with individual counts per resource. optional output to .csv files (per resource) is also possible
- supports command-line parameters and environment vars for accessing the catalog.
- uses edcSessionHelper.py to get a session reference to any rest queries
xdocAnalyzer.py - use this script to download xdocs for a resource and analyze the contents (counts # of objects by type, # of attributes) and will analyze all links + connection assignments. can be useful for troubleshooting (especially for resources that do not yet support reference objects)
- supports command-line parameters and environment vars for accessing the catalog.
- uses edcSessionHelper.py to get a session reference to any rest queries
setParentFilterValues.py - use this script to update in bulk relational objects with a custom attribute containing the value of the schema the object belongs to. This will faceting by schema name in search results, as well as creating custom tab pointing to specific database schema within a resource, see setParentFilterValues.md for more info

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.3.84

Jan 13, 2021

0.3.83

Jan 13, 2021

0.3.81

Jan 3, 2021

0.3.80

Dec 30, 2020

0.3.72

Dec 28, 2020

0.3.71

Dec 27, 2020

0.3.70

Dec 27, 2020

0.3.69

Dec 27, 2020

0.3.68

Dec 27, 2020

0.3.67

Dec 20, 2020

0.3.40

Dec 19, 2020

0.3.18

Dec 14, 2020

0.3.13

Dec 14, 2020

0.3.12

Dec 10, 2020

0.3.10

Dec 9, 2020

0.3.9

Dec 8, 2020

0.3.8

Dec 7, 2020

0.3.7

Dec 7, 2020

0.3.6

Dec 7, 2020

0.3.4 yanked

Dec 6, 2020

Reason this release was yanked:

config changes needed

0.2.11

Nov 30, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

informatica-edc-rest-api-samples-0.3.84.tar.gz (49.9 kB view details)

Uploaded Jan 13, 2021 Source

Built Distribution

informatica_edc_rest_api_samples-0.3.84-py3-none-any.whl (58.6 kB view details)

Uploaded Jan 13, 2021 Python 3

File details

Details for the file informatica-edc-rest-api-samples-0.3.84.tar.gz.

File metadata

Download URL: informatica-edc-rest-api-samples-0.3.84.tar.gz
Upload date: Jan 13, 2021
Size: 49.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.9.0

File hashes

Hashes for informatica-edc-rest-api-samples-0.3.84.tar.gz
Algorithm	Hash digest
SHA256	`8cce374f569c934d2593100902514f558edd02771a815560de3f114da2a60214`
MD5	`c3a0ea7c5277a48e60266475bc9f8123`
BLAKE2b-256	`a998cfe41b66f80eee37db36ab36255063b54528f36afefa24033c9ca668c946`

See more details on using hashes here.

File details

Details for the file informatica_edc_rest_api_samples-0.3.84-py3-none-any.whl.

File metadata

Download URL: informatica_edc_rest_api_samples-0.3.84-py3-none-any.whl
Upload date: Jan 13, 2021
Size: 58.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.9.0

File hashes

Hashes for informatica_edc_rest_api_samples-0.3.84-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7f9c41aaa5eb2184f0823e349fd8aa42f460682ab8444d6ceb9c68daf4a47b13`
MD5	`e8054dacfb84f248dc50b80290db8231`
BLAKE2b-256	`eb82f265a6290e77982a9bb67deb588115963ca2af7fb0e64e46370c89d1eda0`

See more details on using hashes here.

informatica-edc-rest-api-samples 0.3.84

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Informatica EDC REST-API samples using python

Requirements

Getting Started

REST API Authentication

HTTPS/TLS/SSL Connections and certificates

Sample Programs in the Project

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes