Skip to main content

An OSINT utility for downloading, analyzing and detecting potential suspicious activity patterns in GitHub profiles

Project description

GitHub Profile Analyzer

A powerful OSINT tool for analyzing GitHub profiles and detecting suspicious activity patterns. This tool helps identify potential bot accounts, scammers, and fake developer profiles by analyzing various aspects of GitHub activity. Comes together with a set of handy tools for scanning and extracting multiple types of metadata from Github profile, organization or repository.

NOTE: For the comprehensive solution of monitoring your Github organization, analyzing contributors and active alerting system against potential impersonation or other Github related threats - contact SEAL911. SEAL operates the project-wide version of the software. This package is not optimized for speed. Its main goal is supporting individual security researchers.

NOTE: The project was possible thanks to the contribution from Ethereum Ecosystem Support Program. All of the investigations conducted by Ketman Project were made with help of gh-fake-analyzer.

Features

  • Profile Analysis: Download and analyze complete GitHub profile data
  • Commit Analysis: Detect copied commits and suspicious commit patterns
  • Identity Detection: Track email/name variations and potential identity rotation
  • Organization Scanning: Analyze contributors across entire organizations and repositories
  • Activity Monitoring: Real-time monitoring of profile changes and activities
  • Advanced Tools:
    • Commit author lookup
    • Activity checking
    • Search result dumping
    • Organization scanning
    • Repository scanning
    • Finding interesting files in repositories
    • Automatically flagging account against list of your own IOCs

Installation

NOTE: If you cloned the repository before version 1.0.0 release, re-download the whole package. We did a significant commit re-write to make the repository more light-weight.

pip install gh-fake-analyzer

Requirements

  • Python 3.7 or higher
  • Git installed on your system (sudo apt install git)

GitHub Token Setup

You need a GitHub API token for full functionality. Set it up in one of these ways:

  1. Create a .env file with GH_TOKEN=<your_token>
  2. Use --token <your_token> flag when running commands
  3. Set environment variable: export GH_TOKEN=<your_token>

Local Installation

# Clone the repository
git clone https://github.com/shortdoom/gh-fake-analyzer.git
cd gh-fake-analyzer

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate

# Install the package in development mode
pip install -e .

Configuration for Development

  1. Create a local config.ini file in your working directory:
[LIMITS]
MAX_FOLLOWING = 1000
MAX_FOLLOWERS = 1000
MAX_REPOSITORIES = 1000
CLONE_DEPTH = 100
CLONE_BARE = True
MONITOR_SLEEP = 10
REMOVE_REPO = True
  1. Set up your GitHub token in .env:
echo "GH_TOKEN=your_token_here" > .env
  1. Test the installation:
gh-analyze --help

Usage

Quick Start Recipe

The most common flow for using the gh-fake-analyzer in CTI related tasks is to:

gh-analyze <username>

# or

gh-analyze --targets <path/to/newlinefile/targets>

# then, for a quick view (supply full path in place of <username> if report is not in the standard out/ path)

gh-analyze --parse <username> --summary

# other often used command is to extract full contributors information from organizations. also works with list of --org-targets.

gh-analyze --tool scan_organization --scan-org <org_name>

# optionally, append --full-analysis to immediately perform full scan on each contributor

gh-analyze --tool scan_organization --scan-org <org_name> --full-analysis

# you could also scan individual repository

gh-analyze --tool scan_repository --scan-repo owner/repo_name --full-analysis

# scan_organization and scan_repository will run `check_activity` tool in the background against all found contributors. for best effect, supply the list of usernames and organizations in target_list/ files to signal if any of those were found in contributors data.

gh-analyze --tool check_activity --targets <file>

It is a good practice to update target_list/ files with your own indicators (usernames and names of organizations). Avoid commiting changed to those files in public.

USERNAMES - list of usernames you consider suspicious and would like to be informed if scanned account has any relation to. ORGANIZATIONS - list of organizations you consider suspicious and would like to be informed if scanned account has any relation to.

All Commands

Basic Profile Analysis

# Analyze a single user
gh-analyze <username>

# Analyze multiple users from a file (one username per line)
gh-analyze --targets <file>

# Custom output directory
gh-analyze <username> --out_path /path/to/dir

# Include forked repositories in analysis (default: off)
gh-analyze <username> --forks

# Only fetch basic profile data (no commits, followers, etc.)
gh-analyze <username> --only_profile

# Regenerate report from existing data without fetching from GitHub
gh-analyze <username> --regenerate

Advanced Analysis

# Search for copied commits in a specific repository
gh-analyze <username> --commit_search <repo_name>

# Search for copied commits across all repositories
gh-analyze <username> --commit_search

# Monitor user activity in real-time
gh-analyze <username> --monitor

# Monitor multiple users from a file
gh-analyze --targets <file> --monitor

# Parse and display specific data from an existing report (<username> needs to be in the default out/ directory, otherwise - supply full path)
gh-analyze --parse <username> --key <output_key>

# Display summary of profile (<username> needs to be in the default out/ directory, otherwise - supply full path)
gh-analyze --parse <username> --summary

# Quick-dump specific data to a file
gh-analyze --parse <username> --key unique_emails >> dump.txt

Organization Analysis

# Scan a single organization
gh-analyze --tool scan_organization --scan-org <org_name>

# Scan multiple organizations from a file
gh-analyze --tool scan_organization --org-targets <file>

# Perform full analysis for each contributor (generates report.json file for each contributor)
gh-analyze --tool scan_organization --scan-org <org_name> --full-analysis

# scan individual repository
gh-analyze --tool scan_repository --scan-repo owner/repo_name 

Advanced Tools

# Get detailed commit author information
gh-analyze --tool get_commit_author --commit-author <sha>

# Search GitHub users
gh-analyze --tool dump_search_results --search "<query>" --endpoint users

# Search GitHub code
gh-analyze --tool dump_search_results --search "<query>" --endpoint code

# Check activity patterns of multiple users, requires targets/connections_filter/usernames file with list of target usernames to check activity against
gh-analyze --tool check_activity --targets <file>

# Find interesting files in user's repositories (.txt, .pdf, binary files etc.)
gh-analyze --tool find_interesting_files <username>

# Find interesting files for multiple users from a file (.txt, .pdf, binary files etc.)
gh-analyze --tool find_interesting_files --targets <file>

# Custom output directory
gh-analyze --tool find_interesting_files <username> --out_path /path/to/dir

# Disable logging to script.log
gh-analyze --logoff

It's possible to develop your own tools by re-using methods accessible in modules/analyze.py. Inspect existing tools code for examples and inspiration.

Configuration

The tool uses a configuration file at ~/.gh_fake_analyzer/config.ini. You can create a local config.ini to override settings:

[LIMITS]
MAX_FOLLOWING = 1000
MAX_FOLLOWERS = 1000
MAX_REPOSITORIES = 1000
CLONE_DEPTH = 100
CLONE_BARE = True # False if you want to save the source code
MONITOR_SLEEP = 10
REMOVE_REPO = True # False if you want to save the source code

Output

Analysis results are saved in the out directory with the following structure:

report.json Structure

The report.json file contains comprehensive data about the analyzed GitHub profile:

Profile Information

  • profile_info: Basic GitHub user profile data
    • login: GitHub username
    • name: Display name
    • location: User's location
    • bio: Profile bio
    • company: Company/organization
    • blog: Website/blog URL
    • email: Public email
    • created_at: Account creation date
    • updated_at: Last profile update
    • followers: Number of followers
    • following: Number of following

Repository Statistics

  • original_repos_count: Number of original repositories
  • forked_repos_count: Number of forked repositories
  • repo_list: Names of all non-forked repositories
  • forked_repo_list: Names of all forked repositories
  • repos: Full repository data for every user repository (includes metadata, languages, stars, etc.)

Social Network Analysis

  • mutual_followers: List of accounts that follow and are followed by the user
  • following: List of accounts the user follows
  • followers: List of accounts following the user

Contribution Analysis

  • unique_emails: Emails and associated names extracted from commit data
  • contributors: User's repositories and their contributors
  • pull_requests_to_other_repos: List of PRs made to other repositories
  • commits_to_other_repos: List of commits made to repositories not owned by the user
  • duplicate_hashes_found: List of repositories with owner commits that do not belong to the owner
  • commits: Full commit data for every user repository
  • issues: List of issues opened by the user
  • comments: List of comments made by the user

Activity Tracking

  • recent_events: List of recent events on the analyzed account (last 90 days)
    • Stars
    • Pushes
    • Forks
    • Issues
    • Pull requests
    • Profile updates

Error Tracking

  • errors: List of repositories that failed to retrieve data
    • Network errors
    • DMCA takedowns
    • Access denied
    • Repository not found

Suspicious Activity Indicators

  • potential_copy: List of repositories with first commit date earlier than account creation
  • commit_filter: List of repositories with similar/duplicated commit messages

Additional Output Files

  • User avatar downloaded to the output directory
  • script.log: Detailed logging of the analysis process
  • monitoring.log: Activity monitoring logs (when using --monitor)
  • github_cache.sqlite file will be created on the first run to speed up potential re-downloading from the same endpoints within the 1h window.

Disclaimer

This tool is for reconnaissance purposes only. The confidence in detecting "malicious" GitHub profiles varies, and many regular user accounts may appear in analysis files. Do not make baseless accusations based on this content. All information is sourced from publicly available third-party sources.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gh_fake_analyzer-1.0.2.tar.gz (53.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gh_fake_analyzer-1.0.2-py3-none-any.whl (58.3 kB view details)

Uploaded Python 3

File details

Details for the file gh_fake_analyzer-1.0.2.tar.gz.

File metadata

  • Download URL: gh_fake_analyzer-1.0.2.tar.gz
  • Upload date:
  • Size: 53.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.7

File hashes

Hashes for gh_fake_analyzer-1.0.2.tar.gz
Algorithm Hash digest
SHA256 ace7764aa6a4f84ecc90f0b8619a9a8b2d0c3ff9f02f40d6f6c4dcbdcb224591
MD5 f114000c8b206ad211061ad5ff9136cb
BLAKE2b-256 356c97cedb492733f11459c86b1f2fcccdbd1d382477f5263032b3648617edc6

See more details on using hashes here.

File details

Details for the file gh_fake_analyzer-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for gh_fake_analyzer-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 03baa396fe59def7bf184f90016b30071a9757b2e666bc5c6812f6f9accf87d6
MD5 63d1f4b76f24957c2ca20e55f955fc6e
BLAKE2b-256 8566d7e69fed0b843606a940f89c4aaf977057ecfa4ac03e5de6b9a2e6c670a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page