Examples how to use the Informatica EDC samples. Loosely based on Informatica's EDC-REST-API-Samples
Project description
EDC rest api samples/utilities using python
contains examples for connecting to and querying EDC via python
Requirements
- python 3.6+ - legacy python will not be actively tested
- why? support for legacy python stops 1/1/2020, fstrings, unicode, async, many other 3rd party libraries like pandas, numpy, django are also not supporting legacy python
- Note: some scipts will work with legacy python (v2.7) but these are not maintained or heavily tested
- any script with a suffix of 27 should work on legacy python systems
- if you can't easily install python 3.x (e.g 3.7) - you could use docker to run the code in a python3 container, or send a request for us to create a compiled binary executable (e.g. using pyinstaller)
- Python editors (ide/environments)
- VS Code - good support for python - free an runs on all platforms https://code.visualstudio.com/
- pycharm - https://www.jetbrains.com/pycharm/
- anaconda - for JupterLab/Notebooks https://www.anaconda.com/ (includes vscode)
- Eclipse - ide for java/python (python using/installing pydev)
- other useful tools
- rest api clients - for testing syntax/api calls + the good ones generate code for many languages
- postman - https://www.getpostman.com/
- insomnia - https://insomnia.rest/
- rest api clients - for testing syntax/api calls + the good ones generate code for many languages
Getting Started
- verify that python is installed - v3.6+
- Create a new VSCode/pycharm/Eclipse Project and import/use the files in the python folder (not the java folder)
- Ensure EDC is running while executing the samples - try/except code will catch & immediately exit
- best practice is to use a virtual environment with python
- e.g. python -m venv .edcvenv
- and then
- source .edcvenv/bin/activate (for linux/macox)
- .edcvenv/Scripts/activate.ps1 (for windows powershell)
Note: you may need to execute
Set-ExecutionPolicy unrestricted
for powershell (run powershell as administrator to do this) - .edcvenv/Scripts/activate.bat (windows cmd)
- after activating (or using your base python3.6+), execute
pip install -r requirements.txt
(will install any packages referecned in requirements .txt file - including requests, openpyxl and python-dotenv)
REST API Authentication
- the EDC rest api supports Basic Authentication only - see https://yourcatalogserver:port/access for details
- we use the python requests module for all http(s) rest calls (very easy to use)
- when making a rest api call - you can pass either the id/password - or a http header with an encoded password
- for all examples here, we initially used the id/pwd method - but have switched to use http headers
- if you are using LDAP authentication - the user must have the security domain and a '' character prefixed to the user id
- e.g.
COMPANY_LDAP\user_a
- e.g.
- use the encodeUser.py script - to create the basic auth encoding for your user, and store in a variable named
INFA_EDC_AUTH
- you can set the variable for each session, so it is not stored anywhere
- if using docker - you can add this variable to an .env file to pass to docker at runtime
- if using VS Code - you can add and "env" setting for individual environment variables used in the debugger (launch.json)
- e.g "envFile": "${workspaceFolder}/.env", // and add any settings to .env (preferred - also works with docker)
- e.g. "env" : {"INFA_EDC_AUTH" : "Basic dXNlcjE6YUNvbXBsIWNAdGVkUGEkM3cwcmQ="}, (works but prefer .env file)
- Note: any files inside of .vscode (e.g. launch.json) will be excluded from the git repo (each user has their own local version)
- you can also use setupConnection.py to create a .env file that stores the catalog url and the encoded user credentials
- TODO: create a seperate document & recording disucssing authorization techniques (http header, .netrc, auth=)
HTTPS/TLS/SSL Connections and certificates
- assuming your catalog service is https enabled (it should be, if not so your passwords are send in clear text & set verify=False)
- you will either need to download/copy the certificate (.pem format, not .jks) locally
- or set flags to disable certificate authentication (not recommended, but possible)
- if your ssl certificate is self signed (also not recommended), an additional warning will need to be suppressed
- more information about SSL authentication can be found https://3.python-requests.org/user/advanced/#ssl-cert-verification
Sample Programs in the Project
encodeUser.py
: simple program to prompt for a userid/pwd and optionally a security domain and create a base64 encoded string that can be used for authentication in the http header. e.g."Basic dXNlcjE6YUNvbXBsIWNAdGVkUGEkM3cwcmQ="
- use this script before you call use the other scripts, to get the right format for authenticating & not storing passwords in the .py files
- an alternate is to prompt for a password within your script & encode the id:password
- use
encodeUser27.py
for legacy python
- use this script before you call use the other scripts, to get the right format for authenticating & not storing passwords in the .py files
EDCQuery_template.py
: a template/skeleton that shows how to connect to the catalog and execute a search using python. the result-set processing includes handling the paging model. It also uses theget_fact_value
method inedcutils.py
to extract the item name from the facts array- Utility/Heloer Scripts
edcutils.py
: utility/helper methods for common tasks - like get an attribute valueget_fact_value(item, attrName)
edcSessionHelper.py
: EDCSession class helps you configure a requests.session object and also provides command-line args for connecting to the catalog (-c/-edcurl EDC URL, -a/--auth auth credentials (see encodeUser.py), -u username (will prompt for pwd - recommend using -a, -s/--sslcert SSLCERT).- this class also supports using the following environment vars:
- INFA_EDC_URL - e.g. http://yourcatalogserver:9085 or https://yourcatalogserver:9085
- INFA_EDC_AUTH - e.g. "Basic dXNlcl9hOnJlYWxseXNlY3VyZXBhc3N3b3Jk" - see
encodeUser.py
- INFA_EDC_SSL_PEM - certificate to use to connect (or set to None - to disable ssl verfication)
- for an example of usage - see
listAndCountCustomAttributes.py
- this class also supports using the following environment vars:
listAndCountCustomAttributes.py
: find all custom attributes (normal and classification) and count the # of times the attribute is used. writes results to csv file (output folder can be configured)- supports command-line parameters and environment vars for accessing the catalog.
- uses edcSessionHelper.py to get a session reference to any rest queries
listCustomAttributes.py
: simple script to print all custom attributes (name, id, type, sortable, facetable)- this script will list both regular custom attributes
/2/catalog/models/attributes
and reference 'classification' attributes/2/catalog/models/referenceAttributes
- this script will list both regular custom attributes
similarityReport.py
: v10.2.1+ utility to find & export all columns/fields with similar links- note: this script will attempt to query all dataelements, even if similarity profiling was not run. for a better implementation, use
similarityByResource.py
- note: this script will attempt to query all dataelements, even if similarity profiling was not run. for a better implementation, use
similarityByResource.py
: utility to find and export column similarity for all resources that similarity profiling was configured.- supports command-line parameters and environment vars for accessing the catalog.
- uses edcSessionHelper.py to get a session reference to any rest queries
dbSchemaReplicationLineage.py
: provides the ability to link tables/columns in a database schema that are replicated to other schemas/databases & no scanner exists to automatcially document these relationships. (e.g. sqoop, scripts/code, goldengate ...)- see dbSchemaReplicationLineage.md for more
externalDBLinker.py
: script to generate custom lineage for any tables/columns created within an ExternalDatabase/ExternalSchema (often happens with Oracle (dblink) and SQLServer databases (references to databases in views)- see externalDBLinker.md for more
domainSummary.py
- queries the catalog to find all instances where data domains are used & counts the # of All, Accepted, Inferred, Rejected for all resources and per resource- output is an excel workbook (domain_summary.xlxs) with a worksheet for counts across all resources, and a worksheet per resource with individual counts per resource. optional output to .csv files (per resource) is also possible
- supports command-line parameters and environment vars for accessing the catalog.
- uses edcSessionHelper.py to get a session reference to any rest queries
xdocAnalyzer.py
- use this script to download xdocs for a resource and analyze the contents (counts # of objects by type, # of attributes) and will analyze all links + connection assignments. can be useful for troubleshooting (especially for resources that do not yet support reference objects)- supports command-line parameters and environment vars for accessing the catalog.
- uses edcSessionHelper.py to get a session reference to any rest queries
setParentFilterValues.py
- use this script to update in bulk relational objects with a custom attribute containing the value of the schema the object belongs to. This will faceting by schema name in search results, as well as creating custom tab pointing to specific database schema within a resource, see setParentFilterValues.md for more info
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for informatica-edc-rest-api-samples-0.2.11.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | be942a3118541458a23c9a9a76e43d30c69ecbd50e469a4a6f6fd8033968b5f0 |
|
MD5 | 8747d2853a17ce0d3aa4ede467e66a2b |
|
BLAKE2b-256 | c631db3a32bee0e7de8e82e998e479b899ec72eb1a141b980c140858317af81e |
Close
Hashes for informatica_edc_rest_api_samples-0.2.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb3f224c12e929b7579d40cffc6bb8bb47f96bc08d735e9788b25c19a504a388 |
|
MD5 | 73606d1a7fd479a9e113242e03cd1c94 |
|
BLAKE2b-256 | 20377453e05936c430217afeae89117081b41415ac3cd6433c9b17142454ac54 |