Chunks code into a list made up of indexable dictionaries.
Project description
= data-chunker
Python library that chunks code into a Python list consisting of Python dictionaries. This list of dictionaries can then be used for vector-store creation, which can provide granular context for OpenAI queries.
Current list of languages that can be chunked:
* Java (packages, methods, and variables)
== Modules
=== parser.py
* Contains functions to read code lines from given files and file paths
* get_file_list(code_path, file_extension) - returns a list of files from the path passed in
* get_code_lines(file) - returns the code from the file name passed in
=== java_code.py
* Contains functions to split java files into smaller chunks
== Contributing
The GitHub repository for this package is https://github.com/break-free/data-chunker.
The `main` branch is protected therefore any contributions require a branch to be created. Branch names should be preprended with either `feat/` or `fix/` to indicate whether new functionality or a refactor/fix is being made (e.g, `fix/update-readme`). Once the branch is complete, it can be merged back into `main`.
The repository includes additional directories, such as `setup`, `info`, and `training`, and files such as `main.py`, that includes additional resources for development and usage examples.
== Packaging
This Python package was produced using https://hatch.pypa.io/latest/config/build/[hatchling]. Refer to the `pyproject.toml` for specifics.
Recommend reading the following sites to get familiar with Python packages and uploading to https://pypi.org.
* https://packaging.python.org/en/latest/tutorials/packaging-projects/[Packaging Python Projects].
* https://hatch.pypa.io/latest/config/build/[Hatchling - Build Configuration].
* https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#uploading-your-project-to-pypi[Uploading your Project to PyPI].
* https://pypi.org/project/keyring/[keyring] (useful for keeping PyPI login safe).
== Notes
This code has been battle-tested with *one* application. If you encounter any issues then please https://github.com/break-free/java-code-chunker/issues[submit an issue ticket here on GitHub].
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
data_chunker-0.0.4.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for data_chunker-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c76c8edc67e0716467c99b1bed775ce24a9272de97c463a44fbb40f0c3e801e5 |
|
MD5 | fe2f4490f9cb01e23e32c989a95dbaca |
|
BLAKE2b-256 | 3e45cb8c9435ca979045516dfd612a51e2ed0a911421a3b858e84922af1045f4 |