Collect and export full GitHub user profile data for everyone associated with a repository.

These details have not been verified by PyPI

Project links

repository

Project description

repo-people

repo-people logo

repo-people is a Python package that collects and exports the full GitHub profile for every person associated with a repository — contributors, maintainers, stargazers, watchers, issue/PR authors, fork owners, commit authors and dependents.

Introduction
Background
Requirements
Installation
Documentation
Usage
Directories
Issues
License
Contact

Introduction

repo-people provides a single-call pipeline to collect every GitHub user associated with a repository across 9 role categories, fetch 30+ profile fields for each person from the GitHub API, and export the results to JSON, CSV, or Markdown. It is designed for research, open-source community analysis, and developer intelligence workflows.

Key capabilities:

Collects users from 9 role categories in a single call
Fetches 30+ profile fields per user (bio, location, company, followers, orgs, languages, …)
Computes derived metrics: account age, followers/following ratio, repos/year, recently-active flag, bot detection
Incremental fetch with save_each_iteration and resume — safe to interrupt and restart on large repos
Flexible filtering: roles, exclude, exclude_bots, limit, fields
Concurrent fetching via workers — uses ThreadPoolExecutor to fetch multiple profiles in parallel
Async fetching via get_users_async() — uses asyncio + aiohttp for high-concurrency scenarios
Opt-in social accounts via include_social_accounts — fetches linked LinkedIn, Mastodon, npm, and other accounts
Export to JSON, CSV and Markdown table
Analysis helpers: summarise() and top_users()
Token validated on startup — invalid or expired tokens raise ConnectionError immediately
Rate-limit progress printed every 50 users with remaining request count and reset time

Background

Understanding who contributes to, uses, and maintains an open-source project is valuable for community health analysis, academic research, and competitive intelligence. GitHub exposes this information across many endpoints (contributors, stargazers, watchers, forks, issues, pull requests, CODEOWNERS, commit history), but collecting and joining it requires many paginated API calls.

repo-people automates that collection, deduplicates users across all roles, enriches each record with the full GitHub profile, and computes additional signals (account age, activity recency, bot detection) in a single pipeline call.

Requirements

Python ^3.9
PyGithub ^2.0.0 — GitHub API client
requests ^2.31.0 — HTTP requests for REST endpoints
beautifulsoup4 ^4.12.0 — HTML scraping for dependents
aiohttp ^3.9 — async HTTP client for get_users_async()

A GitHub personal access token is strongly recommended. Unauthenticated requests are limited to 60/hour; authenticated requests allow 5,000/hour.

Installation

Install the latest version of pySAR via [PyPi][PyPi] using pip:

pip3 install pysar --upgrade

Installation from source:

git clone -b master https://github.com/amckenna41/pySAR.git
cd pySAR
pip3 install .

Documentation

Read the Docs — full package documentation
FIELDS.md — full reference table of all 48 output fields with descriptions
CHANGELOG.md — version history and release notes

Usage

Quick Start

from repo_people import RepoPeople

rp = RepoPeople("owner", "repo", token="ghp_...")
user_data = rp.get_users(export=True)
# Returns a dict keyed by username, with 30+ profile fields per user

Authentication

import os
rp = RepoPeople("owner", "repo", token=os.environ["GITHUB_TOKEN"])

The token is validated immediately on construction — an invalid or expired token raises ConnectionError before any collection begins.

`RepoPeople()` Constructor

RepoPeople(owner, repo, token=None, outdir=None, skip_codeowners=False, skip_collaborators=False)

Parameter	Type	Default	Description
`owner`	`str`	—	GitHub username or organisation that owns the repo.
`repo`	`str`	—	Repository name.
`token`	`str \| None`	`None`	Personal access token. Strongly recommended — validated immediately on init; raises `ConnectionError` for invalid tokens.
`outdir`	`str \| None`	`"{owner}_{repo}"`	Leaf directory inside `outputs/`. All output files are written under `outputs/{outdir}/`.
`skip_codeowners`	`bool`	`False`	Skip CODEOWNERS file when collecting maintainers.
`skip_collaborators`	`bool`	`False`	Skip repo collaborators when collecting maintainers.

`get_users()` Parameters

Parameter	Type	Default	Description
`export`	`bool`	`False`	Write results to a JSON file.
`export_csv`	`bool`	`False`	Write results to a CSV file.
`save_each_iteration`	`bool`	`False`	Save after every single user fetch.
`limit`	`int \| None`	`None`	Cap the number of profiles to fetch.
`roles`	`list[str] \| None`	`None` (all 9)	Restrict which roles to collect.
`exclude`	`list[str] \| None`	`None`	Usernames to skip.
`exclude_bots`	`bool`	`False`	Skip bot accounts automatically.
`resume`	`bool`	`False`	Skip users already in the output file.
`verbose`	`bool`	`True`	Print progress to stdout.
`fields`	`list[str] \| str \| None`	`None` (all)	Restrict which fields appear in output. Invalid names raise `ValueError` before any fetch.
`include_social_accounts`	`bool`	`False`	Fetch each user's linked social accounts (LinkedIn, Mastodon, npm, …). Costs one extra API call per user.
`workers`	`int`	`1`	Number of concurrent fetch threads. Increase for faster collection on large repos.

Valid roles values: contributors, maintainers, stargazers, watchers, issue_authors, pr_authors, fork_owners, commit_authors, dependents.

Examples

Filter by role

# Only gather contributors and stargazers
user_data = rp.get_users(roles=["contributors", "stargazers"])

Limit, exclude, and skip bots

user_data = rp.get_users(
    limit=100,
    exclude=["dependabot", "github-actions[bot]"],
    exclude_bots=True,
)

Export to JSON and CSV

user_data = rp.get_users(export=True, export_csv=True)

Export to Markdown table

rp.export_to_markdown(user_data, fields=["login", "name", "location", "followers"])

Resume an interrupted run

# First run
rp.get_users(save_each_iteration=True, export=True)

# Resume after interruption
rp.get_users(save_each_iteration=True, export=True, resume=True)

Concurrent fetching

# Speed up large repos by fetching profiles in parallel
user_data = rp.get_users(workers=4)

Async fetching

import asyncio

user_data = asyncio.run(rp.get_users_async(concurrency=10))

Include social accounts

user_data = rp.get_users(include_social_accounts=True)
# Each record gains a 'social_accounts' dict, e.g. {'linkedin': 'https://linkedin.com/in/...'}

Analysis helpers

stats = rp.summarise(user_data, top_n=5)
# {'total': 134, 'top_locations': [('San Francisco', 18), ...], ...}

leaders = rp.top_users(user_data, n=10, by="followers")

Output Fields

Each user entry contains 30+ fields. See FIELDS.md for the full reference. A summary by category:

Category	Fields
Identity	`login`, `name`, `company`, `location`, `email_public`, `blog`, `twitter`, `bio`
Timestamps	`created_at`, `updated_at`
Counters	`followers`, `following`, `public_repos`, `public_gists`
Flags	`has_public_email`, `has_blog`, `has_twitter`, `is_bot`, `hireable`
Computed	`account_age_days`, `followers_following_ratio`, `repos_per_year`, `recently_active`, `last_public_event_at`
Organisations	`public_orgs`, `orgs_public_count`
Sampled	`top_languages`, `total_public_stars_sampled`, `total_public_forks_sampled`, `ssh_keys_count`, `gpg_keys_count`, `starred_repos_sampled`
Social	`social_accounts` (opt-in via `include_social_accounts`)
Repo-specific	`is_collaborator`, `permission_on_repo`
Metadata	`roles` (populated by `get_users()`)

Directories

repo-people/
├── repo_people/          # Package source
│   ├── __init__.py
│   ├── repo_people.py    # RepoPeople class — main pipeline
│   ├── export.py         # Role-specific username collectors (9 functions)
│   ├── users.py          # GitHubUserInfo wrapper and UserSnapshot dataclass
│   └── utils.py          # Shared helpers: paginate(), _headers(), write_csv()
├── tests/                # Unit and integration tests
│   ├── test_repo_people.py
│   ├── test_export.py
│   └── test_users.py
├── docs/                 # Sphinx documentation source
├── outputs/              # Default output directory (created at runtime)
├── FIELDS.md             # Full output field reference
├── CHANGELOG.md          # Version history
├── pyproject.toml        # Package metadata and dependencies
└── README.md

Output Fields

Each user entry contains 30+ fields including:

Category	Fields
Identity	`login`, `name`, `company`, `location`, `email_public`, `blog`, `twitter`, `bio`
Timestamps	`created_at`, `updated_at`
Counters	`followers`, `following`, `public_repos`, `public_gists`
Flags	`has_public_email`, `has_blog`, `has_twitter`, `is_bot`, `hireable`
Computed	`account_age_days`, `followers_following_ratio`, `repos_per_year`, `recently_active`, `last_public_event_at`
Organisations	`public_orgs`, `orgs_public_count`
Sampled	`top_languages`, `total_public_stars_sampled`, `total_public_forks_sampled`, `ssh_keys_count`, `gpg_keys_count`, `starred_repos_sampled`
Social	`social_accounts` (opt-in via `include_social_accounts`)
Repo-specific	`is_collaborator`, `permission_on_repo`
Metadata	`roles` (populated by `get_users()`)

Issues

Bugs and feature requests are tracked on GitHub Issues.

When reporting a bug, please include:

Python version (python --version)
Package version (pip show repo-people)
A minimal code snippet that reproduces the issue
The full traceback if an exception is raised

License

Distributed under the MIT License. See MIT for more details.

Contact

AJ McKenna — amckenna41@qub.ac.uk

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

0.2.0

Apr 29, 2026

This version

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repo_people-0.1.0.tar.gz (26.2 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

repo_people-0.1.0-py3-none-any.whl (27.5 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file repo_people-0.1.0.tar.gz.

File metadata

Download URL: repo_people-0.1.0.tar.gz
Upload date: Apr 28, 2026
Size: 26.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for repo_people-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7cfe0be4c88d2d5e8def549e2bf2d37d020591cd2a1ea2b1d55cd92e4c6c94fe`
MD5	`5019e7a5f05232ab299ef39168db91e4`
BLAKE2b-256	`eae90f2cfdfce0de656939ca40f8786e92bbc25a0222e20418c09a0e39ff858a`

See more details on using hashes here.

File details

Details for the file repo_people-0.1.0-py3-none-any.whl.

File metadata

Download URL: repo_people-0.1.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 27.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for repo_people-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b610e6c14a730a2e33892c2502325d32f97652b799fa292472f91cb5157dbed2`
MD5	`daeadcb2f117a0babcccf0e9e220826c`
BLAKE2b-256	`cd407d018955be37f9ec191b6fd664343cf224ee2203f2f8e217b4b8438e8006`

See more details on using hashes here.

repo-people 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

repo-people

Table of Contents

Introduction

Background

Requirements

Installation

Documentation

Usage

Quick Start

Authentication

RepoPeople() Constructor

get_users() Parameters

Examples

Filter by role

Limit, exclude, and skip bots

Export to JSON and CSV

Export to Markdown table

Resume an interrupted run

Concurrent fetching

Async fetching

Include social accounts

Analysis helpers

Output Fields

Directories

Output Fields

Issues

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`RepoPeople()` Constructor

`get_users()` Parameters