Skip to main content

Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.

Project description

Terence  🦅

Terence.jpg

Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.

Installation

From PyPI

pip install terence

Quick Start

1. Get a GitHub Developer Token

Create a personal access token at: https://github.com/settings/tokens

  • New token (classic)
  • Only permission required: repo -> public_repo
  • Additional permissions are optional

2. Basic Usage

from terence import Terence

# Initialize a new Terence instance
terence = Terence()

# Authenticate Terence
terence.auth("ghp_your_token_here")

# Scan a repository
terence.scan_repository("https://github.com/user/repo_name")

# Access repo contents
print(f"Found {len(terence.results)} files")
for file_path, content in terence.results.items():
    print(f"{file_path}: {len(content)} characters")

Usage Guide

Authentication

You must authenticate Terence with your GitHub API token before scanning any repository

terence = Terence()
terence.auth("ghp_your_token_here")

Scanning Repositories

# Scan entire repository
terence.scan_repository("https://github.com/user/repo_name")

You also have the option to scan specific file types by providing the extension in a list argument

Extension can be prepended with "." but not required (py vs .py)

# Scan only Python files
terence.scan_repository("https://github.com/user/repo_name", ["py"])

# Scan multiple file types
terence.scan_repository("https://github.com/user/repo_name", ["py", "js", "html"])

Working with Branches

You can scan the contents of a specific branch rather than the default main/master branch

# Scan a specific branch name
terence.branch("develop")
terence.scan_repository("https://github.com/user/repo_name")

# Scan a specific tag
terence.branch("v2.0.0")
terence.scan_repository("https://github.com/user/repo_name")

# Scan a specific commit (can chain methods)
terence.branch("abc123def456").scan_repository("https://github.com/user/repo_name")

To reset to the default branch, simply clear the results and scan again

# Reset to default branch
terence.clear_results() 
terence.scan_repository("https://github.com/user/repo_name")

Accessing Results

Once a scan is performed, the repository's file contents are stored in a flat dictionary in terence.results.

results = terence.results

# List all files:
for path in results.keys():
    print(f" - {path}")

# Print the first 200 characters of a specific file
if "frontend/app/page.tsx" in results:
    print(results["frontend/app/page.tsx"][:200])

# Search content across files
for file_path, content in results.items():
    if "def main" in content:
        print(f"Found 'def main' in: {file_path})

Sample Results Output

Results is a flat dictionary with each key being the path to the file including the file name and the value is the raw contents of the file

terence.results = {
    'frontend/app/index.html': '<!DOCTYPE html>\n<html>\n<head>\n<meta charset="utf-8">\n</head></html>...',
    'frontend/app/styles/globals.css': 'body {\n  font-family: Arial, sans-serif;\n...}\nh1 {\n  color: #333;\n}'
}

Repository Information

terence.scan_repository("https://github.com/user/repo_name")

repo_info = terence.get_repo_info()

repo_info = {
    'owner': 'user',
    'repo': 'repo_name',
    'url': 'https://github.com/user/repo_name'
}

Rate Limit Management

GitHub API allows for 5000 requests per hour per authenticated API token or 60 for unauthenticated.

Terence automatically flags a RateLimitError if rate limit is too low to make a new repository scan request.

rate = terence.get_rate_limit()

rate = {
    'remaining': 4102,
    'limit': 5000, # GitHub limit
    # Date format yyyy-mm-dd hr:min:sec+00:00 timezone
    'reset': datetime.datetime(2025, 12, 4, 18, 30, 0, tzinfo=datetime.timezone.utc)
}

Clearing Data

# Clear results but stay authenticated
terence.clear_results()

# Clear everything (deauthenticate)
terence.clear_all()

Sample Error Handling

from terence import Terence, RateLimitException

terence = Terence().auth("ghp_your_token_here")

try:
    terence.scan_repository("https://github.com/user/repo_name")
    print(f"Success! Found {len(terence.results)} files")

except RateLimitException as e:
    print(f"Rate limit reached: {e}")
    # Wait until reset time or use different token

except ValueError as e:
    print(f"Invalid input: {e}")
    # Check URL format or extension list

except Exception as e:
    print(f"Error: {e}")
    # Handle authentication, repo not found, etc.

File Filtering

Allowed Extensions

By default, Terence scans these file types:

  • Python: .py
  • JavaScript/TypeScript: .js, .jsx, .ts, .tsx
  • Web: .html, .htm, .css, .scss, .sass, .vue, .svelte
  • Java: .java
  • C/C++: .c, .cpp, .h, .hpp, .cc
  • Other: .go, .rs, .rb, .php, .swift, .kt, .cs

Excluded Directories

The following directories are automatically excluded:

  • node_modules/, .git/, venv/, env/, .venv/
  • __pycache__/, dist/, build/
  • .next/, .nuxt/, target/, bin/, obj/
  • test/, tests/, .pytest_cache/, coverage/

Error Types

RateLimitException

Raised when GitHub API rate limit is too low (< 10 requests remaining).

from terence import RateLimitException

try:
    terence.scan_repository(url)
except RateLimitException as e:
    print(f"Rate limit reached: {e}")

ValueError

Raised when:

  • Invalid GitHub URL format
  • Extension not in allowed extensions list

Exception

Raised for:

  • Not authenticated
  • Invalid GitHub token
  • Repository not found (or private)
  • Other GitHub API errors

Development

Setup

git clone https://github.com/yourusername/terence.git
cd terence
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest tests/test_client.py -v

# Run specific test
pytest tests/test_client.py::TestTerence::test_auth -v

# Run with coverage
pytest tests/test_client.py --cov=terence --cov-report=html

Requirements

  • Python 3.7+
  • PyGithub >= 2.1.1
  • python-dotenv >= 1.0.0

License

MIT License - see LICENSE file for details

Contributions & Support

Contributions are welcome! Feel free to fork and submit a pull request.

For any questions or concerns, please reach out to me at louieyin6@gmail.com

Author

Created by Louie Yin (GarfieldFluffJr)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

terence-1.0.6.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

terence-1.0.6-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file terence-1.0.6.tar.gz.

File metadata

  • Download URL: terence-1.0.6.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for terence-1.0.6.tar.gz
Algorithm Hash digest
SHA256 04a2915ef68abdf0ecf624a2786add4b542d6b44e2c20a6f032f1d26977442c3
MD5 90ed4b002035c725d7293870b072d17d
BLAKE2b-256 3d23eb8797bc88a8fb7def0fc247593b9a0ebc55bab77c07053b0c7c512fce9a

See more details on using hashes here.

File details

Details for the file terence-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: terence-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for terence-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 2a382d5bb9ad5ba979fafc827be4622f6626a5399f1bc6462b55757820cd4a29
MD5 d8b8f9bf8e1fa70242421221af0b7480
BLAKE2b-256 bde93fb9e3385e82742619abea95d13498d1292ce54685c2570ad5e799ecc385

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page