Skip to main content

GitHub repository reader with document chunking for RAG/LLM applications

Project description

gitsource

GitHub repository reader with document chunking for RAG/LLM applications.

Features

  • Download repositories directly from GitHub using codeload.github.com (no git required)
  • Filter files by extension and path patterns
  • Parse frontmatter from markdown files
  • Chunk documents using sliding windows (preserves metadata)
  • Lightweight Jupyter notebook parser

Installation

pip install gitsource

Usage

Read GitHub Repository

from gitsource import GithubRepositoryDataReader

reader = GithubRepositoryDataReader(
    repo_owner="evidentlyai",
    repo_name="docs",
    allowed_extensions={"md", "mdx"},
)

files = reader.read()

Chunk Documents

from gitsource import chunk_documents

documents = [
    {"content": "Long text here...", "filename": "doc.txt"}
]

chunks = chunk_documents(
    documents,
    size=2000,
    step=1000
)

Parse Jupyter Notebooks

from gitsource import load_notebook

notebook = load_notebook("notebook.ipynb")
cells = notebook.cells  # List of cell dictionaries

License

WTFPL

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gitsource-0.0.1.tar.gz (105.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gitsource-0.0.1-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file gitsource-0.0.1.tar.gz.

File metadata

  • Download URL: gitsource-0.0.1.tar.gz
  • Upload date:
  • Size: 105.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.3 cpython/3.13.5 HTTPX/0.28.1

File hashes

Hashes for gitsource-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d57cc05d6140d1f734ad653c53cbb63c62dddf624c21df9defe9434b1ff2fa4f
MD5 8f7d73289698a07312ab12e273eee5c2
BLAKE2b-256 b936c54e2ba2ab89a4fb87f4d3507e8e45a3657b0476d05241b99ca40ea4b26b

See more details on using hashes here.

File details

Details for the file gitsource-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: gitsource-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.3 cpython/3.13.5 HTTPX/0.28.1

File hashes

Hashes for gitsource-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 93d24fbd2b5cc82a5c0f4ad90a2128cb290298a360c00bfa50879fca24a1c1d6
MD5 fa3859e80a07cdd91fdff7162d519be3
BLAKE2b-256 d044586c67f65331f65cf7dad795d6382c85a21b272f67c85d3061a79a259e79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page