A Source Code Tokenizer
Project description
sctokenizer
A Source Code Tokenizer
Supports those languages: C, C++, Java, Python, PHP
How to install
pip install sctokenizer
How to use
Use sctokenizer
:
import sctokenizer
tokens = sctokenizer.tokenize_file(filepath='tests/data/hello_world.cpp', lang='cpp')
for token in tokens:
print(token)
Or create new CppTokenizer
:
from sctokenizer import CppTokenizer
tokenizer = CppTokenizer() # this object can be used for multiple source files
with open('tests/data/hello_world.cpp') as f:
source = f.read()
tokens = tokenizer.tokenize(source)
for token in tokens:
print(token)
Or better solution:
from sctokenizer import Source
src = Source.from_file('tests/data/hello_world.cpp', lang='cpp')
tokens = src.tokenize()
for token in tokens:
print(token)
Result is a list
of Token
. Each Token
has four attributes including token_value, token_type, line, column
:
(#, TokenType.SPECIAL_SYMBOL, (1, 1))
(include, TokenType.KEYWORD, (1, 2))
(<, TokenType.OPERATOR, (1, 10))
(bits/stdc++.h, TokenType.IDENTIFIER, (1, 11))
(>, TokenType.OPERATOR, (1, 24))
(using, TokenType.KEYWORD, (3, 1))
(namespace, TokenType.KEYWORD, (3, 7))
(std, TokenType.IDENTIFIER, (3, 17))
(;, TokenType.SPECIAL_SYMBOL, (3, 20))
(int, TokenType.KEYWORD, (5, 1))
(main, TokenType.IDENTIFIER, (5, 5))
((, TokenType.SPECIAL_SYMBOL, (5, 9))
(), TokenType.SPECIAL_SYMBOL, (5, 10))
({, TokenType.SPECIAL_SYMBOL, (6, 1))
(cout, TokenType.IDENTIFIER, (7, 5))
(<<, TokenType.OPERATOR, (7, 11))
(", TokenType.SPECIAL_SYMBOL, (7, 13))
(Hello World, TokenType.STRING, (7, 14))
(", TokenType.SPECIAL_SYMBOL, (7, 25))
(;, TokenType.SPECIAL_SYMBOL, (7, 26))
(return, TokenType.KEYWORD, (8, 5))
(0, TokenType.CONSTANT, (8, 12))
(;, TokenType.SPECIAL_SYMBOL, (8, 13))
(}, TokenType.SPECIAL_SYMBOL, (9, 1))
TODO
- Support other languages:
Matlab, Javascript, Typescript,...
- Auto detect language
- Parse source to a tree of tokens???
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sctokenizer-0.0.7.tar.gz
(9.6 kB
view details)
Built Distribution
File details
Details for the file sctokenizer-0.0.7.tar.gz
.
File metadata
- Download URL: sctokenizer-0.0.7.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56bca99eb272d0c584354b0fafdbb8184673f6c9df5c8a3518e5aa691ee58327 |
|
MD5 | 334b568cc8ff1ddfa8f62c47a1970467 |
|
BLAKE2b-256 | cd4fcec50d441cec5fb250431ea77079ab923109818ae6d68c8570fbee20cc9c |
File details
Details for the file sctokenizer-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: sctokenizer-0.0.7-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bf0e5c37b51e70fdab28f06a3ca8563ea7776f230e8639c1e2a1d5a5437b9f8 |
|
MD5 | 6e075a99cf08e687b4a46365d6f579f3 |
|
BLAKE2b-256 | 9eb2c1f2c9baa220d8e837f7a355a16ab2e6775516f595f5962060ee1124eda4 |