Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.
Project description
Terence 🦅
Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.
Installation
From PyPI
pip install terence
Quick Start
1. Get a GitHub Developer Token
Create a personal access token at: https://github.com/settings/tokens
- New token (classic)
- Only permission required: repo -> public_repo
- Additional permissions are optional
2. Basic Usage
from terence import Terence
# Initialize a new Terence instance
terence = Terence()
# Authenticate Terence
terence.auth("ghp_your_token_here")
# Scan a repository
terence.scan_repository("https://github.com/pallets/flask")
# Access repo contents
print(f"Found {len(terence.results)} files")
for file_path, content in terence.results.items():
print(f"{file_path}: {len(content)} characters")
Usage Guide
Authentication
You must authenticate Terence with your GitHub API token before scanning any repository
terence = Terence()
terence.auth("ghp_your_token_here")
Scanning Repositories
# Scan entire repository
terence.scan_repository("https://github.com/user/repo_name")
You also have the option to scan specific file types by providing the extension in a list argument
Extension can be prepended with "." but not required (py vs .py)
# Scan only Python files
terence.scan_repository("https://github.com/user/repo_name", ["py"])
# Scan multiple file types
terence.scan_repository("https://github.com/user/repo_name", ["py", "js", "html"])
Working with Branches
You can scan the contents of a specific branch rather than the default main/master branch
# Scan a specific branch name
terence.branch("develop")
terence.scan_repository("https://github.com/user/repo_name")
# Scan a specific tag
terence.branch("v2.0.0")
terence.scan_repository("https://github.com/user/repo_name")
# Scan a specific commit (can chain methods)
terence.branch("abc123def456").scan_repository("https://github.com/user/repo_name")
To reset to the default branch, simply clear the results and scan again
# Reset to default branch
terence.clear_results()
terence.scan_repository("https://github.com/user/repo_name")
Accessing Results
Once a scan is performed, the repository's file contents are stored in a flat dictionary in terence.results.
results = terence.results
# List all files:
for path in results.keys():
print(f" - {path}")
# Print the first 200 characters of a specific file
if "frontend/app/page.tsx" in results:
print(results["frontend/app/page.tsx"][:200])
# Search content across files
for file_path, content in results.items():
if "def main" in content:
print(f"Found 'def main' in: {file_path})
Sample Results Output
Results is a flat dictionary with each key being the path to the file including the file name and the value is the raw contents of the file
terence.results = {
'frontend/app/index.html': '<!DOCTYPE html>\n<html>\n<head>\n<meta charset="utf-8">\n</head></html>...',
'frontend/app/styles/globals.css': 'body {\n font-family: Arial, sans-serif;\n...}\nh1 {\n color: #333;\n}'
}
Repository Information
terence.scan_repository("https://github.com/user/repo_name")
repo_info = terence.get_repo_info()
repo_info = {
'owner': 'user',
'repo': 'repo_name',
'url': 'https://github.com/user/repo_name'
}
Rate Limit Management
GitHub API allows for 5000 requests per hour per authenticated API token or 60 for unauthenticated.
Terence automatically flags a RateLimitError if rate limit is too low to make a new repository scan request.
rate = terence.get_rate_limit()
rate = {
'remaining': 4102,
'limit': 5000, # GitHub limit
# Date format yyyy-mm-dd hr:min:sec+00:00 timezone
'reset': datetime.datetime(2025, 12, 4, 18, 30, 0, tzinfo=datetime.timezone.utc)
}
Clearing Data
# Clear results but stay authenticated
terence.clear_results()
# Clear everything (deauthenticate)
terence.clear_all()
Sample Error Handling
from terence import Terence, RateLimitException
terence = Terence().auth("ghp_your_token_here")
try:
terence.scan_repository("https://github.com/owner/repo")
print(f"Success! Found {len(terence.results)} files")
except RateLimitException as e:
print(f"Rate limit reached: {e}")
# Wait until reset time or use different token
except ValueError as e:
print(f"Invalid input: {e}")
# Check URL format or extension list
except Exception as e:
print(f"Error: {e}")
# Handle authentication, repo not found, etc.
File Filtering
Allowed Extensions
By default, Terence scans these file types:
- Python:
.py - JavaScript/TypeScript:
.js,.jsx,.ts,.tsx - Web:
.html,.htm,.css,.scss,.sass,.vue,.svelte - Java:
.java - C/C++:
.c,.cpp,.h,.hpp,.cc - Other:
.go,.rs,.rb,.php,.swift,.kt,.cs
Excluded Directories
The following directories are automatically excluded:
node_modules/,.git/,venv/,env/,.venv/__pycache__/,dist/,build/.next/,.nuxt/,target/,bin/,obj/test/,tests/,.pytest_cache/,coverage/
Error Types
RateLimitException
Raised when GitHub API rate limit is too low (< 10 requests remaining).
from terence import RateLimitException
try:
terence.scan_repository(url)
except RateLimitException as e:
print(f"Rate limit reached: {e}")
ValueError
Raised when:
- Invalid GitHub URL format
- Extension not in allowed extensions list
Exception
Raised for:
- Not authenticated
- Invalid GitHub token
- Repository not found (or private)
- Other GitHub API errors
Development
Setup
git clone https://github.com/yourusername/terence.git
cd terence
pip install -e ".[dev]"
Running Tests
# Run all tests
pytest tests/test_client.py -v
# Run specific test
pytest tests/test_client.py::TestTerence::test_auth -v
# Run with coverage
pytest tests/test_client.py --cov=terence --cov-report=html
Requirements
- Python 3.7+
- PyGithub >= 2.1.1
- python-dotenv >= 1.0.0
License
MIT License - see LICENSE file for details
Contributions & Support
Contributions are welcome! Feel free to fork and submit a pull request.
For any questions or concerns, please reach out to me at louieyin6@gmail.com
Author
Created by Louie Yin (GarfieldFluffJr)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file terence-1.0.2.tar.gz.
File metadata
- Download URL: terence-1.0.2.tar.gz
- Upload date:
- Size: 11.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d98b811c03f72571a2ed332462f21fbd1d5aa3508d9a119dd28198971296539
|
|
| MD5 |
cc5e15ed13e8915455efa6b6b3d9d372
|
|
| BLAKE2b-256 |
0ea766474b01bfc869f840edd639db07e7ccaa98bc2097cd1d80f7ed20f5b7ae
|
File details
Details for the file terence-1.0.2-py3-none-any.whl.
File metadata
- Download URL: terence-1.0.2-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bb6b42df97c0e4900f9cdaf2ebd1fd460a01eaf983319a8fa3ffa24d8f01994
|
|
| MD5 |
c1f82ad07db6a2706b46db7d8f121d39
|
|
| BLAKE2b-256 |
5bba600c9f0001fdb8fbc674e8bec3ac99f04cdb1d9936c31731f4c074480f4f
|