Programming Language Detector
Project description
Shaman - Programming Language Detector
When you input code
, Shaman detect its language
.
Languages supported:
ASP
, Bash
, C
, C#
, CSS
, HTML
, Java
, JavaScript
,
Objective-c
, PHP
, Python
, Ruby
, SQL
, Swift
, and XML
.
Shaman is implemented base on Naïve Bayes Classification and pattern matching technique. Pre-trained model is included in the library, where the size of the model is only 214KB.
The accuracy of the included model is 78% with the test set and 83% with the training set. See accuracy section for detail.
Getting Started
How to install
$ pip install shamanld
How to use
from shamanld import Shaman
code = """
#include <stdio.h>
int main() {
printf("Hello world");
}
"""
r = Shaman.default().detect(code)
print(r)
# [('c', 42.60959840702781), ('objective-c', 8.535893087527496), ('java', 7.237626324587697), ...]
Test and train with your custom dataset
Shaman supports training the model with your custom dataset easily. The only thing you have to prepare is to make your dataset with CSV format. CSV file should includes "language,code" pairs.
Test with custom dataset
$ shaman-tester path/to/test_set.csv
Training a new model with custom dataset
$ shaman-trainer path/to/training_set.csv --model-path path/to/your_model.json.gz
Testing custom model
$ shaman-trainer path/to/test_set.csv --model-path path/to/your_model.json.gz
Using custom model on the code
from shamanld import Shaman
detector = Shaman('path/to/your_model.json.gz')
detector.detect('/* some code */')
Test accuracy
Included model is trained with 120K codes and tested with 42K codes. Only the codes whose lengths are more than 100 are used in both training & testing. As the codes are collected without verification, there might be some data with wrong labels.
Language | Accuracy |
---|---|
Total | 78.40% (36428 / 46464) |
c | 70.41% (11479 / 16304) |
java | 90.24% (8094 / 8969) |
python | 92.85% (5230 / 5633) |
javascript | 63.08% (2782 / 4410) |
sql | 80.92% (2519 / 3113) |
html | 83.99% (2156 / 2567) |
c# | 84.08% (1753 / 2085) |
xml | 80.18% (635 / 792) |
bash | 83.58% (560 / 670) |
swift | 83.25% (522 / 627) |
php | 73.09% (315 / 431) |
css | 68.12% (203 / 298) |
objective-c | 32.88% (121 / 368) |
asp | 36.75% (43 / 117) |
ruby | 20.00% (16 / 80) |
JavaScript version
JavaScript inferencing implementation is available at Prev/shamanjs.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file shamanld-1.1.1.tar.gz
.
File metadata
- Download URL: shamanld-1.1.1.tar.gz
- Upload date:
- Size: 227.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/49.1.0 requests-toolbelt/0.8.0 tqdm/4.19.1 CPython/3.6.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8095100da452ccc4d520a419c1928171568b485a904f147e0511a60b019b63c4 |
|
MD5 | 9cefe936565d634bc250a56dea34a83a |
|
BLAKE2b-256 | 0618a38c0dadea26bc8a6726b820edf9f85fb968ba85fb784a9bece7fa309d79 |
File details
Details for the file shamanld-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: shamanld-1.1.1-py3-none-any.whl
- Upload date:
- Size: 227.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/49.1.0 requests-toolbelt/0.8.0 tqdm/4.19.1 CPython/3.6.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e261e0f3bd1b4c720ce442a55ad9dcaac28743ad883c10a72958320333c183bb |
|
MD5 | 4f1feb23360017b850a342038b251bcf |
|
BLAKE2b-256 | 6b83b1b6a46519ea99bd6379c72d819de15dd8d7f751ab4b268b18bcfcc3fc66 |