Skip to main content

Python library which is extensively used for all AI projects

Project description

DXC

DXC Industrialized AI Starter

DXC Indusrialized AI Starter makes it easy for you to deploy your AI algorithms (Industrialize). If you are a data scientist, working on an algorithm that you would like to deploy across the enterprise, DXC's Industrialized AI starter makes it easier for you to:

  • Access, clean, and explore raw data
  • Build data pipelines
  • Run AI experiments
  • Publish microservices

Installation

In order to install and use the DXC AI Starter library, please use the below code snippet:

1. pip install DXC-Industrialized-AI-Starter
2. from dxc import ai

Getting Started

Access, Clean, and Explore Raw Data

Use the library to access, clean, and explore your raw data.

#Access raw data
df = ai.read_data_frame_from_remote_json(json_url)
df = ai.read_data_frame_from_remote_csv(csv_url)
df = ai.read_data_frame_from_local_json()
df = ai.read_data_frame_from_local_csv()
df = ai.read_data_frame_from_local_excel_file()

#Clean data: Imputes missing data, removes empty rows and columns, anonymizes text.
raw_data = ai.clean_dataframe(df)

#Explore raw data: 
ai.visualize_missing_data(raw_data) #visualizes relationships between all features in data.
ai.explore_features(raw_data) #creates a visual display of missing data.
ai.plot_distributions(raw_data) #creates a distribution graph for each column.

Click here for details about Acess,clean,explore raw data.

Build Data Pipelines

Pipelines are a standard way to process your data towards modeling and interpreting. By default, the DXC AI Starter library uses the free tier of MongoDB Atlas to store raw data and execute pipelines. In order to get started, you need to first have an MongoDB account which you can signup for free and create a database "connection_string" and specify those details in the data_layer below. The following code connects to MongoDB and stores raw data for processing.

#Insert data into MongoDB:
data_layer = {
    "connection_string": "<your connection_string>",
    "collection_name": "<your collection_name>",
    "database_name": "<your database_name>"
}
wrt_raw_data = ai.write_raw_data(data_layer, raw_data, date_fields = [])

Once raw data is stored, you can run pipelines to transform the data. This code instructs the data store on how to refine the output of raw data into something that can be used to train a machine-learning model. Please refer to the syntax of MongDB pipelines for the details of how to write a pipeline. Below is an example of creating and executing a pipeline.

pipeline = [
        {
            '$group':{
                '_id': {
                    "funding_source":"$funding_source",
                    "request_type":"$request_type",
                    "department_name":"$department_name",
                    "replacement_body_style":"$replacement_body_style",
                    "equipment_class":"$equipment_class",
                    "replacement_make":"$replacement_make",
                    "replacement_model":"$replacement_model",
                    "procurement_plan":"$procurement_plan"
                    },
                "avg_est_unit_cost":{"$avg":"$est_unit_cost"},
                "avg_est_unit_cost_error":{"$avg":{ "$subtract": [ "$est_unit_cost", "$actual_unit_cost" ] }}
            }
        }
]

df = ai.access_data_from_pipeline(wrt_raw_data, pipeline) #refined data will be stored in pandas dataframe.

Click here for details about building data pipeline.

Run AI Experiments

Use the DXC AI Starter to build and test algorithms. This code executes an experiment by running run_experiment() on an experiment design.

experiment_design = {
    #model options include ['regression()', 'classification()']
    "model": ai.regression(),
    "labels": df.avg_est_unit_cost_error,
    "data": df,
    #Tell the model which column is 'output'
    #Also note columns that aren't purely numerical
    #Examples include ['nlp', 'date', 'categorical', 'ignore']
    "meta_data": {
      "avg_est_unit_cost_error": "output",
      "_id.funding_source": "categorical",
      "_id.department_name": "categorical",
      "_id.replacement_body_style": "categorical",
      "_id.replacement_make": "categorical",
      "_id.replacement_model": "categorical",
      "_id.procurement_plan": "categorical"
  }
}

trained_model = ai.run_experiment(experiment_design)

Click here for details about run AI experiments.

Publish Microservice

The DXC AI Starter library makes it easy to publish your models as working microservices. By default, the DXC AI Starter library uses free tier of Algorithmia to publish models as microservices. You must create an Algorithmia account to use. Below is the example for publishing a microservice.

#trained_model is the output of run_experiment() function
microservice_design = {
    "microservice_name": "<Name of your microservice>",
    "microservice_description": "<Brief description about your microservice>",
    "execution_environment_username": "<Algorithmia username>",
    "api_key": "<your api_key>",
    "api_namespace": "<your api namespace>",   
    "model_path":"<your model_path>"
}

#publish the micro service and display the url of the api
api_url = ai.publish_microservice(microservice_design, trained_model)
print("api url: " + api_url)

Click here for details about publishing microservice.

Docs

For detailed and complete documentation, please click here

Example of colab notebook

Here is an detailed and in-depth example of DXC Indusrialized AI Starter library usage.

Below is the screen for collab notebook.

aistaterscreen

Example of colab notebook for each model

Here are example notebooks for individual models. These sample notebooks help to understand on how to use each function, what parameters are expected for each function and what will be the output of each function in a model.

Contributing Guide

To know more about the contribution and guidelines please click here

Reporting Issues

If you find any issues, feel free to report them here with clear description of your issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DXC-AI-MBN-0.0.25.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

DXC_AI_MBN-0.0.25-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file DXC-AI-MBN-0.0.25.tar.gz.

File metadata

  • Download URL: DXC-AI-MBN-0.0.25.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for DXC-AI-MBN-0.0.25.tar.gz
Algorithm Hash digest
SHA256 a5b4b6f2043c86b3fbf7971328fc0ec47643e02cd05f7dcfa63fc30ee1281d43
MD5 24626399ac89b45f6be4caf2d741c251
BLAKE2b-256 c2494a1b284b2d630015c3764e22dba131750dab4b17fb866d0c03a0e2819ccf

See more details on using hashes here.

File details

Details for the file DXC_AI_MBN-0.0.25-py3-none-any.whl.

File metadata

  • Download URL: DXC_AI_MBN-0.0.25-py3-none-any.whl
  • Upload date:
  • Size: 37.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for DXC_AI_MBN-0.0.25-py3-none-any.whl
Algorithm Hash digest
SHA256 9da3a5861883a989ff6e24e39747f5d642789d30439d7b08deab41a6b040c9a8
MD5 0ac1b7945e26939c8566d401a273c5dc
BLAKE2b-256 8addae8b96c70b43631929ca92beba62213b2488145b5b9392d8c814e41367ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page