Chunks code into a list made up of indexable dictionaries.
Project description
= data-chunker
Python library that chunks code into a Python list consisting of Python dictionaries. This list of dictionaries can then be used for vector-store creation, which can provide granular context for OpenAI queries.
Current list of languages that can be chunked:
* Java (packages, methods, and variables)
== Modules
=== parser.py
* Contains functions to read code lines from given files and file paths
* get_file_list(code_path, file_extension) - returns a list of files from the path passed in
* get_code_lines(file) - returns the code from the file name passed in
=== java_code.py
* Contains functions to split java files into smaller chunks
== Contributing
The GitHub repository for this package is https://github.com/break-free/data-chunker.
The `main` branch is protected therefore any contributions require a branch to be created. Branch names should be preprended with either `feat/` or `fix/` to indicate whether new functionality or a refactor/fix is being made (e.g, `fix/update-readme`). Once the branch is complete, it can be merged back into `main`.
The repository includes additional directories, such as `setup`, `info`, and `training`, and files such as `main.py`, that includes additional resources for development and usage examples.
== Packaging
This Python package was produced using https://hatch.pypa.io/latest/config/build/[hatchling]. Refer to the `pyproject.toml` for specifics.
Recommend reading the following sites to get familiar with Python packages and uploading to https://pypi.org.
* https://packaging.python.org/en/latest/tutorials/packaging-projects/[Packaging Python Projects].
* https://hatch.pypa.io/latest/config/build/[Hatchling - Build Configuration].
* https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#uploading-your-project-to-pypi[Uploading your Project to PyPI].
* https://pypi.org/project/keyring/[keyring] (useful for keeping PyPI login safe).
== Notes
This code has been battle-tested with *one* application. If you encounter any issues then please https://github.com/break-free/java-code-chunker/issues[submit an issue ticket here on GitHub].
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
data_chunker-0.0.4.tar.gz
(4.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_chunker-0.0.4.tar.gz.
File metadata
- Download URL: data_chunker-0.0.4.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30e7760a6215c7aa42c2e45ab906c43d9bb4c286bcec99fd74e98950ac2fa463
|
|
| MD5 |
6b8c92df86ddd03dd6cbeece854a013c
|
|
| BLAKE2b-256 |
79afc08428545d618bf17e3c8b54d4f2a142b60ac8eec749cc0fa4378da1c8dc
|
File details
Details for the file data_chunker-0.0.4-py3-none-any.whl.
File metadata
- Download URL: data_chunker-0.0.4-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c76c8edc67e0716467c99b1bed775ce24a9272de97c463a44fbb40f0c3e801e5
|
|
| MD5 |
fe2f4490f9cb01e23e32c989a95dbaca
|
|
| BLAKE2b-256 |
3e45cb8c9435ca979045516dfd612a51e2ed0a911421a3b858e84922af1045f4
|