Skip to main content

Programming Language Detector

Project description

GitHub license Pypi Build Status

Shaman - Programming Language Detector

When you input code, Shaman detect its language.

Languages supported: ASP, Bash, C, C#, CSS, HTML, Java, JavaScript, Objective-c, PHP, Python, Ruby, SQL, Swift, and XML.

Shaman is implemented base on Naïve Bayes Classification and pattern matching technique. Pre-trained model is included in the library, where the size of the model is only 214KB.

The accuracy of the included model is 78% with the test set and 83% with the training set. See accuracy section for detail.

Getting Started

How to install

$ pip install shamanld

How to use

from shamanld import Shaman

code = """
#include <stdio.h>
int main() {
	printf("Hello world");
}
"""

r = Shaman.default().detect(code)

print(r)
# [('c', 42.60959840702781), ('objective-c', 8.535893087527496), ('java', 7.237626324587697), ...]

Test and train with your custom dataset

Shaman supports training the model with your custom dataset easily. The only thing you have to prepare is to make your dataset with CSV format. CSV file should includes "language,code" pairs.

Test with custom dataset

$ shaman-tester path/to/test_set.csv

Training a new model with custom dataset

$ shaman-trainer path/to/training_set.csv --model-path path/to/your_model.json.gz

Testing custom model

$ shaman-trainer path/to/test_set.csv --model-path path/to/your_model.json.gz

Using custom model on the code

from shamanld import Shaman

detector = Shaman('path/to/your_model.json.gz')
detector.detect('/* some code */')

Test accuracy

Included model is trained with 120K codes and tested with 42K codes. Only the codes whose lengths are more than 100 are used in both training & testing. As the codes are collected without verification, there might be some data with wrong labels.

Language Accuracy
Total 78.40% (36428 / 46464)
c 70.41% (11479 / 16304)
java 90.24% (8094 / 8969)
python 92.85% (5230 / 5633)
javascript 63.08% (2782 / 4410)
sql 80.92% (2519 / 3113)
html 83.99% (2156 / 2567)
c# 84.08% (1753 / 2085)
xml 80.18% (635 / 792)
bash 83.58% (560 / 670)
swift 83.25% (522 / 627)
php 73.09% (315 / 431)
css 68.12% (203 / 298)
objective-c 32.88% (121 / 368)
asp 36.75% (43 / 117)
ruby 20.00% (16 / 80)

JavaScript version

JavaScript inferencing implementation is available at Prev/shamanjs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shamanld-1.1.1.tar.gz (227.0 kB view details)

Uploaded Source

Built Distribution

shamanld-1.1.1-py3-none-any.whl (227.6 kB view details)

Uploaded Python 3

File details

Details for the file shamanld-1.1.1.tar.gz.

File metadata

  • Download URL: shamanld-1.1.1.tar.gz
  • Upload date:
  • Size: 227.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/49.1.0 requests-toolbelt/0.8.0 tqdm/4.19.1 CPython/3.6.1

File hashes

Hashes for shamanld-1.1.1.tar.gz
Algorithm Hash digest
SHA256 8095100da452ccc4d520a419c1928171568b485a904f147e0511a60b019b63c4
MD5 9cefe936565d634bc250a56dea34a83a
BLAKE2b-256 0618a38c0dadea26bc8a6726b820edf9f85fb968ba85fb784a9bece7fa309d79

See more details on using hashes here.

File details

Details for the file shamanld-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: shamanld-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 227.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/49.1.0 requests-toolbelt/0.8.0 tqdm/4.19.1 CPython/3.6.1

File hashes

Hashes for shamanld-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e261e0f3bd1b4c720ce442a55ad9dcaac28743ad883c10a72958320333c183bb
MD5 4f1feb23360017b850a342038b251bcf
BLAKE2b-256 6b83b1b6a46519ea99bd6379c72d819de15dd8d7f751ab4b268b18bcfcc3fc66

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page