Skip to main content

Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.

Project description

Terence  🦅

Terence.jpg

Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.

Installation

From PyPI

pip install terence

Quick Start

1. Get a GitHub Developer Token

Create a personal access token at: https://github.com/settings/tokens

  • New token (classic)
  • Only permission required: repo -> public_repo
  • Additional permissions are optional

2. Basic Usage

from terence import Terence

# Initialize a new Terence instance
terence = Terence()

# Authenticate Terence
terence.auth("ghp_your_token_here")

# Scan a repository
terence.scan_repository("https://github.com/pallets/flask")

# Access repo contents
print(f"Found {len(terence.results)} files")
for file_path, content in terence.results.items():
    print(f"{file_path}: {len(content)} characters")

Usage Guide

Authentication

You must authenticate Terence with your GitHub API token before scanning any repository

terence = Terence()
terence.auth("ghp_your_token_here")

Scanning Repositories

# Scan entire repository
terence.scan_repository("https://github.com/user/repo_name")

You also have the option to scan specific file types by providing the extension in a list argument

Extension can be prepended with "." but not required (py vs .py)

# Scan only Python files
terence.scan_repository("https://github.com/user/repo_name", ["py"])

# Scan multiple file types
terence.scan_repository("https://github.com/user/repo_name", ["py", "js", "html"])

Working with Branches

You can scan the contents of a specific branch rather than the default main/master branch

# Scan a specific branch name
terence.branch("develop")
terence.scan_repository("https://github.com/user/repo_name")

# Scan a specific tag
terence.branch("v2.0.0")
terence.scan_repository("https://github.com/user/repo_name")

# Scan a specific commit (can chain methods)
terence.branch("abc123def456").scan_repository("https://github.com/user/repo_name")

To reset to the default branch, simply clear the results and scan again

# Reset to default branch
terence.clear_results() 
terence.scan_repository("https://github.com/user/repo_name")

Accessing Results

Once a scan is performed, the repository's file contents are stored in a flat dictionary in terence.results.

results = terence.results

# List all files:
for path in results.keys():
    print(f" - {path}")

# Print the first 200 characters of a specific file
if "frontend/app/page.tsx" in results:
    print(results["frontend/app/page.tsx"][:200])

# Search content across files
for file_path, content in results.items():
    if "def main" in content:
        print(f"Found 'def main' in: {file_path})

Sample Results Output

Results is a flat dictionary with each key being the path to the file including the file name and the value is the raw contents of the file

terence.results = {
    'frontend/app/index.html': '<!DOCTYPE html>\n<html>\n<head>\n<meta charset="utf-8">\n</head></html>...',
    'frontend/app/styles/globals.css': 'body {\n  font-family: Arial, sans-serif;\n...}\nh1 {\n  color: #333;\n}'
}

Repository Information

terence.scan_repository("https://github.com/user/repo_name")

repo_info = terence.get_repo_info()

repo_info = {
    'owner': 'user',
    'repo': 'repo_name',
    'url': 'https://github.com/user/repo_name'
}

Rate Limit Management

GitHub API allows for 5000 requests per hour per authenticated API token or 60 for unauthenticated.

Terence automatically flags a RateLimitError if rate limit is too low to make a new repository scan request.

rate = terence.get_rate_limit()

rate = {
    'remaining': 4102,
    'limit': 5000, # GitHub limit
    # Date format yyyy-mm-dd hr:min:sec+00:00 timezone
    'reset': datetime.datetime(2025, 12, 4, 18, 30, 0, tzinfo=datetime.timezone.utc)
}

Clearing Data

# Clear results but stay authenticated
terence.clear_results()

# Clear everything (deauthenticate)
terence.clear_all()

Sample Error Handling

from terence import Terence, RateLimitException

terence = Terence().auth("ghp_your_token_here")

try:
    terence.scan_repository("https://github.com/owner/repo")
    print(f"Success! Found {len(terence.results)} files")

except RateLimitException as e:
    print(f"Rate limit reached: {e}")
    # Wait until reset time or use different token

except ValueError as e:
    print(f"Invalid input: {e}")
    # Check URL format or extension list

except Exception as e:
    print(f"Error: {e}")
    # Handle authentication, repo not found, etc.

File Filtering

Allowed Extensions

By default, Terence scans these file types:

  • Python: .py
  • JavaScript/TypeScript: .js, .jsx, .ts, .tsx
  • Web: .html, .htm, .css, .scss, .sass, .vue, .svelte
  • Java: .java
  • C/C++: .c, .cpp, .h, .hpp, .cc
  • Other: .go, .rs, .rb, .php, .swift, .kt, .cs

Excluded Directories

The following directories are automatically excluded:

  • node_modules/, .git/, venv/, env/, .venv/
  • __pycache__/, dist/, build/
  • .next/, .nuxt/, target/, bin/, obj/
  • test/, tests/, .pytest_cache/, coverage/

Error Types

RateLimitException

Raised when GitHub API rate limit is too low (< 10 requests remaining).

from terence import RateLimitException

try:
    terence.scan_repository(url)
except RateLimitException as e:
    print(f"Rate limit reached: {e}")

ValueError

Raised when:

  • Invalid GitHub URL format
  • Extension not in allowed extensions list

Exception

Raised for:

  • Not authenticated
  • Invalid GitHub token
  • Repository not found (or private)
  • Other GitHub API errors

Development

Setup

git clone https://github.com/yourusername/terence.git
cd terence
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest tests/test_client.py -v

# Run specific test
pytest tests/test_client.py::TestTerence::test_auth -v

# Run with coverage
pytest tests/test_client.py --cov=terence --cov-report=html

Requirements

  • Python 3.7+
  • PyGithub >= 2.1.1
  • python-dotenv >= 1.0.0

License

MIT License - see LICENSE file for details

Contributions & Support

Contributions are welcome! Feel free to fork and submit a pull request.

For any questions or concerns, please reach out to me at louieyin6@gmail.com

Author

Created by Louie Yin (GarfieldFluffJr)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

terence-1.0.0.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

terence-1.0.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file terence-1.0.0.tar.gz.

File metadata

  • Download URL: terence-1.0.0.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for terence-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3e61d26d80441c71fef8abe06818dcd4500dc9c93e588c8a75e5d014d2a6b1a8
MD5 20b1b61d220510323fbcc4c218b9b2f1
BLAKE2b-256 d6127f488de50a8ae0543cf512594ef2406c7a965d2865537afe1b8282ad931a

See more details on using hashes here.

File details

Details for the file terence-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: terence-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for terence-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a4ceefefe088965ee555598da7527a95eb4d2397bb7aeb50a710f645d4a85716
MD5 6d8b2be173d4529ef331bcb04f41b7ed
BLAKE2b-256 d26a270a66634e4472af71e3bde04d9ba0b89a44bab279b00fa77b6c40b7eb1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page