Skip to main content

A data-centric transformation of monoliths into microservices

Project description

Codenet Minerva Cargo

License made-with-python

Cargo is part of the Minerva project working on refactoring monoliths to microservices. It leverages Data Gravity Insights from the Konveyor.io project and provides recommendations for partitioning code taking into account code relationships, data relationships, and database transaction scope.

CARGO: AI-Guided Dependency Analysis for Migrating Monolithic Applications to Microservices Architecture

Paper: ArXiV Preprint

Abstract

CARGO (short for Context-sensitive lAbel pRopaGatiOn) is a novel un-/semi-supervised partition refinement technique that uses a comprehensive system dependence graph built using context and flow-sensitive static analysis of a monolithic application to refine and thereby enrich the partitioning quality of the current state-of-the-art algorithms.

Figure 1. Overview of CARGO
image

Kick-the-tires Instructions (~15 minutes)

The instructions will reproduce the key results in Figure 6 (RQ1), Figure 7 (RQ2), and Table 1 (RQ3).

Pre-requisites

  • A Linux/Mac system with Docker.
  • Python >= 3.8, and Pip. Tested with Python 3.9.

Step 0: Clone this repository

  1. We'll clone this repository and save it's location for the next steps
git clone https://github.com/IBM/codenet-minerva-cargo && cd codenet-minerva-cargo

export REPO_ROOT=$PWD

Step 1: Set up Data Gravity Insights CLI

We will use Data Gravity Insights (aka. DGI) to first build a system dependency graph and persist the graph in a Neo4j.

1.1 Install DGI

DGI is available as PyPi package, you can also install dgi as follows

pip install -U git+https://github.com/rahlk/tackle-data-gravity-insights 

This will install the dgi command locally under your home folder in a hidden folder called: ~/.local/bin. If not already, you must add this folder to your PATH with:

export PATH=$HOME/.local/bin:$PATH

1.2 Creating a Neo4j Docker container

Make sure that your Docker daemon is running, either by starting up the service (on linux) or by opening the desktop application (on mac).

We will need an instance of Neo4j to store the graphs that dgi creates. We will start one up in a docker container and set an environment variable to let dgi know where to find it.

docker run -d --name neo4j \
    -p 7474:7474 \
    -p 7687:7687 \
    -e NEO4J_AUTH="neo4j/konveyor" \
    -e NEO4J_apoc_export_file_enabled=true \
    -e NEO4J_apoc_import_file_enabled=true \
    -e NEO4J_apoc_import_file_use__neo4j__config=true \
    -e NEO4JLABS_PLUGINS=\["apoc"\] \
    neo4j:4.4.17

export NEO4J_BOLT_URL="neo4j://neo4j:konveyor@localhost:7687"

Installation complete

We can now use the dgi command to load information about an application into a graph database. We start with dgi --help. This should produce:

$ dgi --help

 Usage: dgi [OPTIONS] COMMAND [ARGS]...

 Tackle Data Gravity Insights

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --neo4j-bolt  -n  TEXT  Neo4j Bolt URL                                                                          │
│ --quiet       -q        Be more quiet                                                                           │
│ --validate    -v        Validate but don't populate graph                                                       │
│ --clear       -c        Clear graph before loading                                                              │
│ --help                  Show this message and exit.                                                             │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ c2g            Code2Graph add various program dependencies (i.e., call return, heap, and data) into the graph   │
│ partition      Partition is a command runs the CARGO algorithm to (re-)partition a monolith into microservices  │
│ s2g            Schema2Graph parses SQL schema (*.DDL file) into the graph                                       │
│ tx2g           Transaction2Graph add edges denoting CRUD operations to the graph.                               │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Step 2: Setting up a sample application

For rest of this walkthrough, we'll work with DayTrader8.

Step 3: Build a Program Dependency Graph with DOOP

3.1 Prepare the application

Obtain the sample application WAR file. We'll save this in extras/demo/doop-in:

wget https://github.com/OpenLiberty/sample.daytrader8/releases/download/v1.2/io.openliberty.sample.daytrader8.war --directory-prefix=$REPO_ROOT/extras/demo/doop-in

3.2 Getting facts with DOOP

We first need to run DOOP. For ease of use, DOOP has been pre-compiled and hosted as a docker image at quay.io/rkrsn/doop-main. We'll use that for this demo.

docker run -it --rm \
  -v $REPO_ROOT/extras/demo/doop-in:/root/doop-data/input \
  -v $REPO_ROOT/extras/demo/doop-out:/root/doop-data/output/ \
  quay.io/rkrsn/doop-main:latest rundoop

Notes:

1. If you encounter any error above, please rerun the docker run ... command

2. Running DOOP for the first time may take up to 15 minutes.

3.3 Run DGI code2graph

In this step, we'll run DGI code2graph to populate a Neo4j graph database with various static code interaction features pertaining to object/dataflow dependencies.

dgi -c c2g -a class -i $REPO_ROOT/extras/demo/doop-out

This will take 4-5 minutes. After successful completion, we should see something like this :

 dgi -c c2g -a class -i $REPO_ROOT/extras/demo/doop-out
[15:57:56] INFO     code2graph generator started.
           INFO     Verbose mode: ON
           INFO     Building Graph.
           INFO     Class level abstraction.
           WARNING  The option clear is turned ON. Deleting pre-existing nodes.
           INFO     Populating heap carried dependencies edges
   100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  Completed/Total: 1192/1192  Elapsed: 0:00:02  Remaining: 0:00:00
[15:57:58] INFO     Populating dataflow edges
   100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  Completed/Total: 991/991  Elapsed: 0:00:01  Remaining: 0:00:00
[15:58:00] INFO     Populating call-return dependencies edges
   100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  Completed/Total: 2404/2404  Elapsed: 0:00:04  Remaining: 0:00:00
[15:58:04] INFO     Populating entrypoints
           INFO     code2graph build complete

3.4 Extracting Database Transactions with Tackle-DiVA

Note that this step is only for applications with database transactions. We will run Tackle-DiVA to extract transactions from our application. DiVA is available as a docker image, so we just need to run DiVA by pointing to the source code directory of the application and the desired output directory.

  1. Let's first get the source code for DayTrader8:
wget -c https://github.com/OpenLiberty/sample.daytrader8/archive/refs/tags/v1.2.tar.gz  -O - | tar -xvz -C $REPO_ROOT/extras/demo
docker run --rm \
  -v $REPO_ROOT/extras/demo/sample.daytrader8-1.2:/app \
  -v $REPO_ROOT/extras/demo/txns:/diva-distribution/output \
  quay.io/konveyor/tackle-diva

This should generate a file transaction.json containing all discovered transactions. Finally, we run DGI to load these transaction edges into the program dependency graph.

dgi -c tx2g -a class -i $REPO_ROOT/extras/demo/txns/transaction.json

After successful completion, we should see something like this :

 dgi -c tx2g -a class -i $REPO_ROOT/extras/demo/txns/transaction.json

[16:05:36] INFO     Verbose mode: ON
           WARNING  The CLI argument clear is turned ON. Deleting pre-existing nodes.
           INFO     ClassTransactionLoader: Populating transactions
   100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  Completed/Total: 175/175  Elapsed: 0:00:01  Remaining: 0:00:00
[16:05:38] INFO     Transactions populated

Step 4: Running CARGO

Once we have created the Neo4j graphs by following the above steps, we can run CARGO as follows:

dgi partition --partitions=5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minerva-cargo-1.1.0.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

minerva_cargo-1.1.0-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file minerva-cargo-1.1.0.tar.gz.

File metadata

  • Download URL: minerva-cargo-1.1.0.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.0

File hashes

Hashes for minerva-cargo-1.1.0.tar.gz
Algorithm Hash digest
SHA256 1ed0c2e75ad57cd9c0a6e2bc866a38d1b70788582d72bb129a871caec0241224
MD5 b0a54769b8e663683386067a22c00c35
BLAKE2b-256 5606bc5905aff93ec14166d06193da91797314e33b04227c92ea041da86fa010

See more details on using hashes here.

File details

Details for the file minerva_cargo-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for minerva_cargo-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8fc67b4fe4d5897362d1f3d1f0e57a2800494059c354f303864b59dda0b12d0b
MD5 6ceb775c6913cdd2b3b91528580bda29
BLAKE2b-256 b5f96c15a7140004a06bfb947bccc92012c11a05816885e644002c80693652fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page