GitHub repository reader with document chunking for RAG/LLM applications
Project description
gitsource
GitHub repository reader with document chunking for RAG/LLM applications.
Features
- Download repositories directly from GitHub using
codeload.github.com(no git required) - Filter files by extension and path patterns
- Parse frontmatter from markdown files
- Chunk documents using sliding windows (preserves metadata)
- Lightweight Jupyter notebook parser
Installation
pip install gitsource
# or
uv add gitsource
Usage
Read GitHub Repository
from gitsource import GithubRepositoryDataReader
reader = GithubRepositoryDataReader(
repo_owner="evidentlyai",
repo_name="docs",
allowed_extensions={"md", "mdx"},
)
files = reader.read()
Process Jupyter Notebooks
from gitsource import GithubRepositoryDataReader, notebook_processor
reader = GithubRepositoryDataReader(
repo_owner="alexeygrigorev",
repo_name="gitsource",
branch="master",
allowed_extensions={"md", "ipynb"},
filename_filter=lambda fp: fp.startswith("fixtures/"),
processors={"ipynb": notebook_processor}, # Convert .ipynb to text
)
files = reader.read()
for file in files:
print(f"{file.filename}: {file.content[:50]}...")
Chunk Documents
from gitsource import chunk_documents
documents = [
{"content": "Long text here...", "filename": "doc.txt"}
]
chunks = chunk_documents(
documents,
size=2000,
step=1000
)
License
WTFPL
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gitsource-0.0.3.tar.gz
(72.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gitsource-0.0.3.tar.gz.
File metadata
- Download URL: gitsource-0.0.3.tar.gz
- Upload date:
- Size: 72.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.3 cpython/3.13.5 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9014eb21c0c0fdea184d53f74b999c87e13959b0699f4d8bbf32d8687c0afb87
|
|
| MD5 |
e0af9ccbcebef67c4ed04590610b2c2e
|
|
| BLAKE2b-256 |
de83aff9136de7d0e544f29d970cd394521354f7e16d147d58ffbe79fa630cdd
|
File details
Details for the file gitsource-0.0.3-py3-none-any.whl.
File metadata
- Download URL: gitsource-0.0.3-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.3 cpython/3.13.5 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f2f67cb8a6d0ce382e1c55e27aacf281a9d9f7a644b76a5713ca5b8c9958591
|
|
| MD5 |
f5c1b60b772ee3bb22527fbf7c335a63
|
|
| BLAKE2b-256 |
e1cfa3c226421ff3f9170a06aed4fa50b6274e9377d48bc63cdc15dc8808e4a1
|