Skip to main content

Guess programming language from a string or file.

Project description

whats_that_code

This is a programming language detection library.

It will detect programming language of source in pure python from an ensemble of classifiers. Use this when a quick and dirty first approximation is good enough. whats_that_code can currently identify 60%+ of samples without knowing the extension or tag.

I created this because I wanted

  • a pure python programming language detector
  • no machine learning dependencies

Tested on python 3.6 through 3.9.

Badges

Libraries.io SourceRank

Downloads

CodeFactor

Usage

from whats_that_code.election import guess_language_all_methods
code = "def yo():\n   print('hello')"
result = guess_language_all_methods(code, file_name="yo.py")
assert result == ["python"]

How it Works

  1. Inspects file extension if available.
  2. Inspects shebang
  3. Looks for keywords
  4. Counts regexes for common patterns
  5. Attemps to parse python, json, yaml
  6. Inspects tags if available.

Each is imperfect and can error. The classifier then combines the results of each using a voting algorithm

This works best if you only use it for fallback, e.g. classifying code that can't already be classified by extension or tag, or when tag is ambiguous.

It was a tool that outgrew being a part of so_pip a StackOverflow code extraction tool I wrote.

Docs

Notable Similar Tools

  • Guesslang - python and tensorflow driven solution. Reasonable results but slow startup and not pure python.
  • pygments pure python, but sometimes lousy identification rates.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whats_that_code-0.2.0.tar.gz (69.8 kB view details)

Uploaded Source

Built Distribution

whats_that_code-0.2.0-py3-none-any.whl (73.9 kB view details)

Uploaded Python 3

File details

Details for the file whats_that_code-0.2.0.tar.gz.

File metadata

  • Download URL: whats_that_code-0.2.0.tar.gz
  • Upload date:
  • Size: 69.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for whats_that_code-0.2.0.tar.gz
Algorithm Hash digest
SHA256 938fb2443a6a7eb23ceee20f0c246922f206c7356b542113d3161314f8cdc61d
MD5 ad59898d427e6c0a7da2d5b9be5492c3
BLAKE2b-256 174376d1f51a889e321114f85830cc7b9f4a7e60f8bb35d535da07040ba67c16

See more details on using hashes here.

File details

Details for the file whats_that_code-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for whats_that_code-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 923fb3d84ad27c265da7ac2b12251a2055c3325c0bd4dae5e527085b99e84273
MD5 f5e7f97203fccac40d7c38ec81a4122e
BLAKE2b-256 ac5b9a127d4179e86d97fb79f44e0417aff04016adbdaed3cfee48004906d82d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page