GraphSense is a framework that can be used to easily train and use code suggestion models with minimal data preprocessing and resource consumption
Project description
GraphSense
GraphSense is a framework that can be used to easily train and use code suggestion models with minimal data preprocessing and resource consumption. No transformers are used and underlying algorithm used was Node2Vec. FAISS used as the vector index and RocksDB used to store code line to index and index to code line mappings.
GraphSense is highly optimized for performance and efficiency.
Requirements
- Python 3.8 or greater
installation:
pip install graphsense
Training example:
from graphsense import GraphTrain
g = GraphTrain()
# train the model
g.line_completion(directory_path="code_files", language="Python")
Inference example:
from graphsense import GraphInfer
g = GraphInfer()
g.load_artifacts() # load the artifacts to memory
suggestions = g.infer("def factorial(n):")
g.unload_artifacts() # clean memory
print("top 10 suggestions: ", suggestions)
Performance Comparison with gpt2_medium finetuned model
Dataset used to train models: https://github.com/TheAlgorithms/Python
gpt2-medium model (Fine-tuned on Python Algorithms dataset)
artifacts size: 1.44 GB
avg inference time (CPU): 8 seconds
avg inference time (GPU): 2.2662 seconds
avg memory usage: 1800 MB
GraphSense (trained on Python Algorithms dataset)
artifacts size: 13.9 MB
avg inference time (CPU): 0.0079 seconds
avg memory usage: 277.8194 MB
Performance and Scalability
Accuracy of GraphSense (vector size: 128)
| Dataset | Top-1 Accuracy | Top-3 Accuracy | Top-10 Accuracy |
|---|---|---|---|
| TheAlgorithms(Python) | 0.4718 | 0.8012 | 0.8958 |
Scalability of GraphSense (CPU) (vector size: 128)
vocabulary = 100,000
average memory usage: 273.777 MB
average execution time: 0.0113 seconds
artifacts size: 61.3 MB
vocabulary = 200,000
average memory usage: 325.8949 MB
average execution time: 0.0155 seconds
artifacts size: 122 MB
vocabulary = 300,000
average memory usage: 377.1085 MB
average execution time: 0.0185 seconds
artifacts size: 168 MB
vocabulary = 400,000
average memory usage: 428.3011 MB
average execution time: 0.0227 seconds
artifacts size: 224 MB
vocabulary = 500,000
average memory usage: 478.8532 MB
average execution time: 0.0273 seconds
artifacts size: 280 MB
vocabulary = 600,000
average memory usage: 531.0189 MB
average execution time: 0.0301 seconds
artifacts size: 368 MB
vocabulary = 700,000
average memory usage: 581.3494 MB
average execution time: 0.0333 seconds
artifacts size: 429 MB
vocabulary = 800,000
average memory usage: 633.226 MB
average execution time: 0.038 seconds
artifacts size: 448 MB
vocabulary = 900,000
average memory usage: 685.1932 MB
average execution time: 0.0439 seconds
artifacts size: 552 MB
vocabulary = 1,000,000
average memory usage: 734.5819 MB
average execution time: 0.0444 seconds
artifacts size: 561 MB
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file graphsense-0.0.3.tar.gz.
File metadata
- Download URL: graphsense-0.0.3.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ae103bbfb2d7cdca75475afd6f4f757f5953f5e82d013f7be7bccbdda97a65e
|
|
| MD5 |
51be050592766d2a5dfd40b005d64ced
|
|
| BLAKE2b-256 |
ed5ffbe75754ee8b95f33b53d13291ca93106377a4f21e9ff062191c9ed7bb84
|
File details
Details for the file graphsense-0.0.3-py3-none-any.whl.
File metadata
- Download URL: graphsense-0.0.3-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57a5678a6f40ade004f90b9e258e0f5e5e910f880f48af48fdd8b8ac1af6fa08
|
|
| MD5 |
45e97482aa1098026d7d58dae8abbbc8
|
|
| BLAKE2b-256 |
07372969379db998a3fac7877035fa19a47825ca50ac07381362c754fbeb8719
|