Skip to main content

Examples how to use the Informatica EDC samples. Loosely based on Informatica's EDC-REST-API-Samples

Project description

EDC rest api samples/utilities using python

contains examples for connecting to and querying EDC via python

Requirements

  • python 3.6+ - legacy python will not be actively tested
    • why? support for legacy python stops 1/1/2020, fstrings, unicode, async, many other 3rd party libraries like pandas, numpy, django are also not supporting legacy python
  • Note: some scipts will work with legacy python (v2.7) but these are not maintained or heavily tested
    • any script with a suffix of 27 should work on legacy python systems
    • if you can't easily install python 3.x (e.g 3.7) - you could use docker to run the code in a python3 container, or send a request for us to create a compiled binary executable (e.g. using pyinstaller)
  • Python editors (ide/environments)
  • other useful tools

Getting Started

  • verify that python is installed - v3.6+
  • Create a new VSCode/pycharm/Eclipse Project and import/use the files in the python folder (not the java folder)
  • Ensure EDC is running while executing the samples - try/except code will catch & immediately exit
  • best practice is to use a virtual environment with python
    • e.g. python -m venv .edcvenv
    • and then
      • source .edcvenv/bin/activate (for linux/macox)
      • .edcvenv/Scripts/activate.ps1 (for windows powershell) Note: you may need to execute Set-ExecutionPolicy unrestricted for powershell (run powershell as administrator to do this)
      • .edcvenv/Scripts/activate.bat (windows cmd)
    • after activating (or using your base python3.6+), execute pip install -r requirements.txt (will install any packages referecned in requirements .txt file - including requests, openpyxl and python-dotenv)

REST API Authentication

  • the EDC rest api supports Basic Authentication only - see https://yourcatalogserver:port/access for details
  • we use the python requests module for all http(s) rest calls (very easy to use)
  • when making a rest api call - you can pass either the id/password - or a http header with an encoded password
  • for all examples here, we initially used the id/pwd method - but have switched to use http headers
  • if you are using LDAP authentication - the user must have the security domain and a '' character prefixed to the user id
    • e.g. COMPANY_LDAP\user_a
  • use the encodeUser.py script - to create the basic auth encoding for your user, and store in a variable named INFA_EDC_AUTH
    • you can set the variable for each session, so it is not stored anywhere
    • if using docker - you can add this variable to an .env file to pass to docker at runtime
    • if using VS Code - you can add and "env" setting for individual environment variables used in the debugger (launch.json)
      • e.g "envFile": "${workspaceFolder}/.env", // and add any settings to .env (preferred - also works with docker)
      • e.g. "env" : {"INFA_EDC_AUTH" : "Basic dXNlcjE6YUNvbXBsIWNAdGVkUGEkM3cwcmQ="}, (works but prefer .env file)
    • Note: any files inside of .vscode (e.g. launch.json) will be excluded from the git repo (each user has their own local version)
  • you can also use setupConnection.py to create a .env file that stores the catalog url and the encoded user credentials
  • TODO: create a seperate document & recording disucssing authorization techniques (http header, .netrc, auth=)

HTTPS/TLS/SSL Connections and certificates

  • assuming your catalog service is https enabled (it should be, if not so your passwords are send in clear text & set verify=False)
  • you will either need to download/copy the certificate (.pem format, not .jks) locally
    • or set flags to disable certificate authentication (not recommended, but possible)
  • if your ssl certificate is self signed (also not recommended), an additional warning will need to be suppressed

Sample Programs in the Project

  • encodeUser.py: simple program to prompt for a userid/pwd and optionally a security domain and create a base64 encoded string that can be used for authentication in the http header. e.g. "Basic dXNlcjE6YUNvbXBsIWNAdGVkUGEkM3cwcmQ="
    • use this script before you call use the other scripts, to get the right format for authenticating & not storing passwords in the .py files
      • an alternate is to prompt for a password within your script & encode the id:password
    • use encodeUser27.py for legacy python
  • EDCQuery_template.py: a template/skeleton that shows how to connect to the catalog and execute a search using python. the result-set processing includes handling the paging model. It also uses the get_fact_value method in edcutils.py to extract the item name from the facts array
  • Utility/Heloer Scripts
    • edcutils.py: utility/helper methods for common tasks - like get an attribute value get_fact_value(item, attrName)
    • edcSessionHelper.py: EDCSession class helps you configure a requests.session object and also provides command-line args for connecting to the catalog (-c/-edcurl EDC URL, -a/--auth auth credentials (see encodeUser.py), -u username (will prompt for pwd - recommend using -a, -s/--sslcert SSLCERT).
      • this class also supports using the following environment vars:
      • for an example of usage - see listAndCountCustomAttributes.py
  • listAndCountCustomAttributes.py: find all custom attributes (normal and classification) and count the # of times the attribute is used. writes results to csv file (output folder can be configured)
    • supports command-line parameters and environment vars for accessing the catalog.
    • uses edcSessionHelper.py to get a session reference to any rest queries
  • listCustomAttributes.py: simple script to print all custom attributes (name, id, type, sortable, facetable)
    • this script will list both regular custom attributes /2/catalog/models/attributes and reference 'classification' attributes /2/catalog/models/referenceAttributes
  • similarityReport.py: v10.2.1+ utility to find & export all columns/fields with similar links
    • note: this script will attempt to query all dataelements, even if similarity profiling was not run. for a better implementation, use similarityByResource.py
  • similarityByResource.py: utility to find and export column similarity for all resources that similarity profiling was configured.
    • supports command-line parameters and environment vars for accessing the catalog.
    • uses edcSessionHelper.py to get a session reference to any rest queries
  • dbSchemaReplicationLineage.py: provides the ability to link tables/columns in a database schema that are replicated to other schemas/databases & no scanner exists to automatcially document these relationships. (e.g. sqoop, scripts/code, goldengate ...)
  • externalDBLinker.py: script to generate custom lineage for any tables/columns created within an ExternalDatabase/ExternalSchema (often happens with Oracle (dblink) and SQLServer databases (references to databases in views)
  • domainSummary.py - queries the catalog to find all instances where data domains are used & counts the # of All, Accepted, Inferred, Rejected for all resources and per resource
    • output is an excel workbook (domain_summary.xlxs) with a worksheet for counts across all resources, and a worksheet per resource with individual counts per resource. optional output to .csv files (per resource) is also possible
    • supports command-line parameters and environment vars for accessing the catalog.
    • uses edcSessionHelper.py to get a session reference to any rest queries
  • xdocAnalyzer.py - use this script to download xdocs for a resource and analyze the contents (counts # of objects by type, # of attributes) and will analyze all links + connection assignments. can be useful for troubleshooting (especially for resources that do not yet support reference objects)
    • supports command-line parameters and environment vars for accessing the catalog.
    • uses edcSessionHelper.py to get a session reference to any rest queries
  • setParentFilterValues.py - use this script to update in bulk relational objects with a custom attribute containing the value of the schema the object belongs to. This will faceting by schema name in search results, as well as creating custom tab pointing to specific database schema within a resource, see setParentFilterValues.md for more info

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

informatica-edc-rest-api-samples-0.2.11.tar.gz (32.0 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page