Skip to main content

GitHub repository reader with document chunking for RAG/LLM applications

Project description

gitsource

GitHub repository reader with document chunking for RAG/LLM applications.

Features

  • Download repositories directly from GitHub using codeload.github.com (no git required)
  • Filter files by extension and path patterns
  • Parse frontmatter from markdown files
  • Chunk documents using sliding windows (preserves metadata)
  • Lightweight Jupyter notebook parser

Installation

pip install gitsource
# or
uv add gitsource

Usage

Read GitHub Repository

from gitsource import GithubRepositoryDataReader

reader = GithubRepositoryDataReader(
    repo_owner="evidentlyai",
    repo_name="docs",
    allowed_extensions={"md", "mdx"},
)

files = reader.read()

Process Jupyter Notebooks

from gitsource import GithubRepositoryDataReader, notebook_processor

reader = GithubRepositoryDataReader(
    repo_owner="alexeygrigorev",
    repo_name="gitsource",
    branch="master",
    allowed_extensions={"md", "ipynb"},
    filename_filter=lambda fp: fp.startswith("fixtures/"),
    processors={"ipynb": notebook_processor},  # Convert .ipynb to text
)

files = reader.read()
for file in files:
    print(f"{file.filename}: {file.content[:50]}...")

Chunk Documents

from gitsource import chunk_documents

documents = [
    {"content": "Long text here...", "filename": "doc.txt"}
]

chunks = chunk_documents(
    documents,
    size=2000,
    step=1000
)

License

WTFPL

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gitsource-0.0.3.tar.gz (72.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gitsource-0.0.3-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file gitsource-0.0.3.tar.gz.

File metadata

  • Download URL: gitsource-0.0.3.tar.gz
  • Upload date:
  • Size: 72.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.3 cpython/3.13.5 HTTPX/0.28.1

File hashes

Hashes for gitsource-0.0.3.tar.gz
Algorithm Hash digest
SHA256 9014eb21c0c0fdea184d53f74b999c87e13959b0699f4d8bbf32d8687c0afb87
MD5 e0af9ccbcebef67c4ed04590610b2c2e
BLAKE2b-256 de83aff9136de7d0e544f29d970cd394521354f7e16d147d58ffbe79fa630cdd

See more details on using hashes here.

File details

Details for the file gitsource-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: gitsource-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.3 cpython/3.13.5 HTTPX/0.28.1

File hashes

Hashes for gitsource-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3f2f67cb8a6d0ce382e1c55e27aacf281a9d9f7a644b76a5713ca5b8c9958591
MD5 f5c1b60b772ee3bb22527fbf7c335a63
BLAKE2b-256 e1cfa3c226421ff3f9170a06aed4fa50b6274e9377d48bc63cdc15dc8808e4a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page