Chunks code into a list made up of indexable dictionaries.
Project description
= data-chunker
Python library that chunks code into a Python list consisting of Python dictionaries. This list of dictionaries can then be used for vector-store creation, which can provide granular context for OpenAI queries.
Current list of languages that can be chunked:
* Java (packages, methods, and variables)
== Modules
=== parser.py
* Contains functions to read code lines from given files and file paths
* get_file_list(code_path, file_extension) - returns a list of files from the path passed in
* get_code_lines(file) - returns the code from the file name passed in
=== java_code.py
* Contains functions to split java files into smaller chunks
== Contributing
The GitHub repository for this package is https://github.com/break-free/data-chunker.
The `main` branch is protected therefore any contributions require a branch to be created. Branch names should be preprended with either `feat/` or `fix/` to indicate whether new functionality or a refactor/fix is being made (e.g, `fix/update-readme`). Once the branch is complete, it can be merged back into `main`.
The repository includes additional directories, such as `setup`, `info`, and `training`, and files such as `main.py`, that includes additional resources for development and usage examples.
== Packaging
This Python package was produced using https://hatch.pypa.io/latest/config/build/[hatchling]. Refer to the `pyproject.toml` for specifics.
Recommend reading the following sites to get familiar with Python packages and uploading to https://pypi.org.
* https://packaging.python.org/en/latest/tutorials/packaging-projects/[Packaging Python Projects].
* https://hatch.pypa.io/latest/config/build/[Hatchling - Build Configuration].
* https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#uploading-your-project-to-pypi[Uploading your Project to PyPI].
* https://pypi.org/project/keyring/[keyring] (useful for keeping PyPI login safe).
== Notes
This code has been battle-tested with *one* application. If you encounter any issues then please https://github.com/break-free/java-code-chunker/issues[submit an issue ticket here on GitHub].
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
data_chunker-0.0.3.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for data_chunker-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b442372e68d9349e6dee93617b4ffd5d11aacaff4e6dadb28d310355a53a188 |
|
MD5 | 6a6baed0267f6de872f2ae1985356530 |
|
BLAKE2b-256 | e89bc7a66f090a36422728453664184484bdcdc62d1e5ca3cd08c418715e0f7d |