Skip to main content

Programming Language Detector

Project description

GitHub license Pypi Build Status

Shaman - Programming Language Detector

When you input code, Shaman detect its language.

Languages supported: ASP, Bash, C, C#, CSS, HTML, Java, JavaScript, JSP, Objective-c, PHP, Python, Ruby, SQL, Swift, and XML.

Implemented base on Naïve Bayes Classification and pre-defined pattern matching. Pre-trained model is included in the library, where the size of the model is only 167KB.

The accuracy of the included model is about 75% with the test set and 80% with the training set. The model is trained with 100K codes and tested with 40K codes.

Getting Started

How to install

$ pip install shamanld

How to use

from shamanld import Shaman

code = """
#include <stdio.h>
int main() {
	printf("Hello world");
}
"""

r = Shaman.default().detect(code)

print(r)
# [('c', 38.27568605456699), ('objective-c', 8.802419110662512), ('java', 7.5835661834984585), ...]

Test and train with your custom dataset

Shaman supports training the model with your custom dataset easily. The only thing you have to prepare is to make your dataset with CSV format. CSV file should includes "language,code" pairs.

Test with custom dataset

$ shaman-tester path/to/test_set.csv

Training a new model with custom dataset

$ shaman-trainer path/to/training_set.csv path/to/your_model.json.gz

Testing custom model

$ shaman-trainer path/to/test_set.csv path/to/your_model.json.gz

Using custom model on the code

from shamanld import Shaman

detector = Shaman('path/to/your_model.json.gz')
detector.detect('/* some code */')

JavaScript version

JavaScript inferencing implementation is available at Prev/shamanjs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shamanld-1.1.0.tar.gz (178.5 kB view hashes)

Uploaded Source

Built Distribution

shamanld-1.1.0-py3-none-any.whl (179.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page