Multilingual programming language parsers for the extract from raw source code into multiple levels of pair data
Project description
Code-Text data toolkit contains multilingual programming language parsers for the extract from raw source code into multiple levels of pair data (code-text) (e.g., function-level, class-level, inline-level).
Installation
Setup environment and install dependencies and setup by using install_env.sh
bash -i ./install_env.sh
then activate conda environment named "code-text-env"
conda activate code-text-env
Setup for using parser
pip install codetext
Getting started
Build your language
Auto build tree-sitter into <language>.so
located in /tree-sitter/
from codetext.utils import build_language
language = 'rust'
build_language(language)
# INFO:utils:Not found tree-sitter-rust, attempt clone from github
# Cloning into 'tree-sitter-rust'...
# remote: Enumerating objects: 2835, done. ...
# INFO:utils:Attempt to build Tree-sitter Language for rust and store in .../tree-sitter/rust.so
Language Parser
We supported 10 programming languages, namely Python
, Java
, JavaScript
, Golang
, Ruby
, PHP
, C#
, C++
, C
and Rust
.
Setup
from codetext.utils import parse_code
raw_code = """
/**
* Sum of 2 number
* @param a int number
* @param b int number
*/
double sum2num(int a, int b) {
return a + b;
}
"""
root = parse_code(raw_code, 'cpp')
root_node = root.root_node
Get all function nodes inside a specific node, use:
from codetext.utils.parser import CppParser
function_list = CppParser.get_function_list(root_node)
print(function_list)
# [<Node type=function_definition, start_point=(6, 0), end_point=(8, 1)>]
Get function metadata (e.g. function's name, parameters, (optional) return type)
function = function_list[0]
metadata = CppParser.get_function_metadata(function, raw_code)
# {'identifier': 'sum2num', 'parameters': {'a': 'int', 'b': 'int'}, 'type': 'double'}
Get docstring (documentation) of a function
docstring = CppParser.get_docstring(function, code_sample)
# ['Sum of 2 number \n@param a int number \n@param b int number']
We also provide 2 command for extract class object
class_list = CppParser.get_class_list(root_node)
# and
metadata = CppParser.get_metadata_list(root_node)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for codetext-0.0.6-1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 827e693f344cdd72486728f956560e6710e22f4ea6bc168cec9c0ce8c63189d1 |
|
MD5 | 14c76b0f9e22622ef7ba2220b9012749 |
|
BLAKE2b-256 | 9fb55db7bdabbabb4f01a487c2f67bbaeb0ebff1375d729711e75dc76f194eb1 |