GitHub repository reader with document chunking for RAG/LLM applications
Project description
gitsource
GitHub repository reader with document chunking for RAG/LLM applications.
Features
- Download repositories directly from GitHub using
codeload.github.com(no git required) - Filter files by extension and path patterns
- Parse frontmatter from markdown files
- Chunk documents using sliding windows (preserves metadata)
- Lightweight Jupyter notebook parser
Installation
pip install gitsource
Usage
Read GitHub Repository
from gitsource import GithubRepositoryDataReader
reader = GithubRepositoryDataReader(
repo_owner="evidentlyai",
repo_name="docs",
allowed_extensions={"md", "mdx"},
)
files = reader.read()
Chunk Documents
from gitsource import chunk_documents
documents = [
{"content": "Long text here...", "filename": "doc.txt"}
]
chunks = chunk_documents(
documents,
size=2000,
step=1000
)
Parse Jupyter Notebooks
from gitsource import load_notebook
notebook = load_notebook("notebook.ipynb")
cells = notebook.cells # List of cell dictionaries
License
WTFPL
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gitsource-0.0.1.tar.gz
(105.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gitsource-0.0.1.tar.gz.
File metadata
- Download URL: gitsource-0.0.1.tar.gz
- Upload date:
- Size: 105.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.3 cpython/3.13.5 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d57cc05d6140d1f734ad653c53cbb63c62dddf624c21df9defe9434b1ff2fa4f
|
|
| MD5 |
8f7d73289698a07312ab12e273eee5c2
|
|
| BLAKE2b-256 |
b936c54e2ba2ab89a4fb87f4d3507e8e45a3657b0476d05241b99ca40ea4b26b
|
File details
Details for the file gitsource-0.0.1-py3-none-any.whl.
File metadata
- Download URL: gitsource-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.3 cpython/3.13.5 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93d24fbd2b5cc82a5c0f4ad90a2128cb290298a360c00bfa50879fca24a1c1d6
|
|
| MD5 |
fa3859e80a07cdd91fdff7162d519be3
|
|
| BLAKE2b-256 |
d044586c67f65331f65cf7dad795d6382c85a21b272f67c85d3061a79a259e79
|