Skip to main content

Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.

Project description

Terence  🦅

Terence.jpg

Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.

Installation

From PyPI

pip install terence

Quick Start

1. Get a GitHub Developer Token

Create a personal access token at: https://github.com/settings/tokens

  • New token (classic)
  • Only permission required: repo -> public_repo
  • Additional permissions are optional

2. Basic Usage

from terence import Terence

# Initialize a new Terence instance
terence = Terence()

# Authenticate Terence
terence.auth("ghp_your_token_here")

# Scan a repository
terence.scan_repository("https://github.com/pallets/flask")

# Access repo contents
print(f"Found {len(terence.results)} files")
for file_path, content in terence.results.items():
    print(f"{file_path}: {len(content)} characters")

Usage Guide

Authentication

You must authenticate Terence with your GitHub API token before scanning any repository

terence = Terence()
terence.auth("ghp_your_token_here")

Scanning Repositories

# Scan entire repository
terence.scan_repository("https://github.com/user/repo_name")

You also have the option to scan specific file types by providing the extension in a list argument

Extension can be prepended with "." but not required (py vs .py)

# Scan only Python files
terence.scan_repository("https://github.com/user/repo_name", ["py"])

# Scan multiple file types
terence.scan_repository("https://github.com/user/repo_name", ["py", "js", "html"])

Working with Branches

You can scan the contents of a specific branch rather than the default main/master branch

# Scan a specific branch name
terence.branch("develop")
terence.scan_repository("https://github.com/user/repo_name")

# Scan a specific tag
terence.branch("v2.0.0")
terence.scan_repository("https://github.com/user/repo_name")

# Scan a specific commit (can chain methods)
terence.branch("abc123def456").scan_repository("https://github.com/user/repo_name")

To reset to the default branch, simply clear the results and scan again

# Reset to default branch
terence.clear_results() 
terence.scan_repository("https://github.com/user/repo_name")

Accessing Results

Once a scan is performed, the repository's file contents are stored in a flat dictionary in terence.results.

results = terence.results

# List all files:
for path in results.keys():
    print(f" - {path}")

# Print the first 200 characters of a specific file
if "frontend/app/page.tsx" in results:
    print(results["frontend/app/page.tsx"][:200])

# Search content across files
for file_path, content in results.items():
    if "def main" in content:
        print(f"Found 'def main' in: {file_path})

Sample Results Output

Results is a flat dictionary with each key being the path to the file including the file name and the value is the raw contents of the file

terence.results = {
    'frontend/app/index.html': '<!DOCTYPE html>\n<html>\n<head>\n<meta charset="utf-8">\n</head></html>...',
    'frontend/app/styles/globals.css': 'body {\n  font-family: Arial, sans-serif;\n...}\nh1 {\n  color: #333;\n}'
}

Repository Information

terence.scan_repository("https://github.com/user/repo_name")

repo_info = terence.get_repo_info()

repo_info = {
    'owner': 'user',
    'repo': 'repo_name',
    'url': 'https://github.com/user/repo_name'
}

Rate Limit Management

GitHub API allows for 5000 requests per hour per authenticated API token or 60 for unauthenticated.

Terence automatically flags a RateLimitError if rate limit is too low to make a new repository scan request.

rate = terence.get_rate_limit()

rate = {
    'remaining': 4102,
    'limit': 5000, # GitHub limit
    # Date format yyyy-mm-dd hr:min:sec+00:00 timezone
    'reset': datetime.datetime(2025, 12, 4, 18, 30, 0, tzinfo=datetime.timezone.utc)
}

Clearing Data

# Clear results but stay authenticated
terence.clear_results()

# Clear everything (deauthenticate)
terence.clear_all()

Sample Error Handling

from terence import Terence, RateLimitException

terence = Terence().auth("ghp_your_token_here")

try:
    terence.scan_repository("https://github.com/owner/repo")
    print(f"Success! Found {len(terence.results)} files")

except RateLimitException as e:
    print(f"Rate limit reached: {e}")
    # Wait until reset time or use different token

except ValueError as e:
    print(f"Invalid input: {e}")
    # Check URL format or extension list

except Exception as e:
    print(f"Error: {e}")
    # Handle authentication, repo not found, etc.

File Filtering

Allowed Extensions

By default, Terence scans these file types:

  • Python: .py
  • JavaScript/TypeScript: .js, .jsx, .ts, .tsx
  • Web: .html, .htm, .css, .scss, .sass, .vue, .svelte
  • Java: .java
  • C/C++: .c, .cpp, .h, .hpp, .cc
  • Other: .go, .rs, .rb, .php, .swift, .kt, .cs

Excluded Directories

The following directories are automatically excluded:

  • node_modules/, .git/, venv/, env/, .venv/
  • __pycache__/, dist/, build/
  • .next/, .nuxt/, target/, bin/, obj/
  • test/, tests/, .pytest_cache/, coverage/

Error Types

RateLimitException

Raised when GitHub API rate limit is too low (< 10 requests remaining).

from terence import RateLimitException

try:
    terence.scan_repository(url)
except RateLimitException as e:
    print(f"Rate limit reached: {e}")

ValueError

Raised when:

  • Invalid GitHub URL format
  • Extension not in allowed extensions list

Exception

Raised for:

  • Not authenticated
  • Invalid GitHub token
  • Repository not found (or private)
  • Other GitHub API errors

Development

Setup

git clone https://github.com/yourusername/terence.git
cd terence
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest tests/test_client.py -v

# Run specific test
pytest tests/test_client.py::TestTerence::test_auth -v

# Run with coverage
pytest tests/test_client.py --cov=terence --cov-report=html

Requirements

  • Python 3.7+
  • PyGithub >= 2.1.1
  • python-dotenv >= 1.0.0

License

MIT License - see LICENSE file for details

Contributions & Support

Contributions are welcome! Feel free to fork and submit a pull request.

For any questions or concerns, please reach out to me at louieyin6@gmail.com

Author

Created by Louie Yin (GarfieldFluffJr)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

terence-1.0.2.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

terence-1.0.2-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file terence-1.0.2.tar.gz.

File metadata

  • Download URL: terence-1.0.2.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for terence-1.0.2.tar.gz
Algorithm Hash digest
SHA256 6d98b811c03f72571a2ed332462f21fbd1d5aa3508d9a119dd28198971296539
MD5 cc5e15ed13e8915455efa6b6b3d9d372
BLAKE2b-256 0ea766474b01bfc869f840edd639db07e7ccaa98bc2097cd1d80f7ed20f5b7ae

See more details on using hashes here.

File details

Details for the file terence-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: terence-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for terence-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2bb6b42df97c0e4900f9cdaf2ebd1fd460a01eaf983319a8fa3ffa24d8f01994
MD5 c1f82ad07db6a2706b46db7d8f121d39
BLAKE2b-256 5bba600c9f0001fdb8fbc674e8bec3ac99f04cdb1d9936c31731f4c074480f4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page