Skip to main content

GraphSense is a framework that can be used to easily train and use code suggestion models with minimal data preprocessing and resource consumption

Project description

GraphSense

GraphSense is a framework that can be used to easily train and use code suggestion models with minimal data preprocessing and resource consumption. No transformers are used and underlying algorithm used was Node2Vec. FAISS used as the vector index and RocksDB used to store code line to index and index to code line mappings.

GraphSense is highly optimized for performance and efficiency.

Requirements

  • Python 3.8 or greater

installation:

pip install graphsense

Training example:

from graphsense import GraphTrain

g = GraphTrain()
# train the model
g.line_completion(directory_path="code_files", language="Python")

Inference example:

from graphsense import GraphInfer

g = GraphInfer()

g.load_artifacts()  # load the artifacts to memory
suggestions = g.infer("def factorial(n):")
g.unload_artifacts()  # clean memory

print("top 10 suggestions: ", suggestions)

Performance Comparison with gpt2_medium finetuned model

Dataset used to train models: https://github.com/TheAlgorithms/Python

gpt2-medium model (Fine-tuned on Python Algorithms dataset)

artifacts size: 1.44 GB   
avg inference time (CPU): 8 seconds 
avg inference time (GPU): 2.2662 seconds
avg memory usage: 1800 MB 

GraphSense (trained on Python Algorithms dataset)

artifacts size: 13.9 MB
avg inference time (CPU): 0.0079 seconds 
avg memory usage: 277.8194 MB 

Performance and Scalability

Accuracy of GraphSense (vector size: 128)

Dataset Top-1 Accuracy Top-3 Accuracy Top-10 Accuracy
TheAlgorithms(Python) 0.4718 0.8012 0.8958

Scalability of GraphSense (CPU) (vector size: 128)

vocabulary = 100,000
average memory usage: 273.777 MB
average execution time: 0.0113 seconds
artifacts size: 61.3 MB

vocabulary = 200,000
average memory usage: 325.8949 MB
average execution time: 0.0155 seconds
artifacts size: 122 MB

vocabulary = 300,000
average memory usage: 377.1085 MB
average execution time: 0.0185 seconds
artifacts size: 168 MB

vocabulary = 400,000
average memory usage: 428.3011 MB
average execution time: 0.0227 seconds
artifacts size: 224 MB

vocabulary = 500,000
average memory usage: 478.8532 MB
average execution time: 0.0273 seconds
artifacts size: 280 MB

vocabulary = 600,000
average memory usage: 531.0189 MB
average execution time: 0.0301 seconds
artifacts size: 368 MB

vocabulary = 700,000
average memory usage: 581.3494 MB
average execution time: 0.0333 seconds
artifacts size: 429 MB

vocabulary = 800,000
average memory usage: 633.226 MB
average execution time: 0.038 seconds
artifacts size: 448 MB

vocabulary = 900,000
average memory usage: 685.1932 MB
average execution time: 0.0439 seconds
artifacts size: 552 MB

vocabulary = 1,000,000
average memory usage: 734.5819 MB
average execution time: 0.0444 seconds
artifacts size: 561 MB

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphsense-0.0.3.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphsense-0.0.3-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file graphsense-0.0.3.tar.gz.

File metadata

  • Download URL: graphsense-0.0.3.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.8

File hashes

Hashes for graphsense-0.0.3.tar.gz
Algorithm Hash digest
SHA256 6ae103bbfb2d7cdca75475afd6f4f757f5953f5e82d013f7be7bccbdda97a65e
MD5 51be050592766d2a5dfd40b005d64ced
BLAKE2b-256 ed5ffbe75754ee8b95f33b53d13291ca93106377a4f21e9ff062191c9ed7bb84

See more details on using hashes here.

File details

Details for the file graphsense-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: graphsense-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.8

File hashes

Hashes for graphsense-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 57a5678a6f40ade004f90b9e258e0f5e5e910f880f48af48fdd8b8ac1af6fa08
MD5 45e97482aa1098026d7d58dae8abbbc8
BLAKE2b-256 07372969379db998a3fac7877035fa19a47825ca50ac07381362c754fbeb8719

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page