Collect and export full GitHub user profile data for everyone associated with a repository.

These details have not been verified by PyPI

Project links

repository

Project description

repo-people

repo-people logo

repo-people is a Python package that collects and exports the full GitHub profile for every person associated with a repository — contributors, maintainers, stargazers, watchers, issue/PR authors, fork owners, commit authors and dependents.

Introduction
Background
Requirements
Installation
Documentation
Usage
Directories
Issues
License
Contact

Introduction

repo-people provides a single-call pipeline to collect every GitHub user associated with a repository across 9 role categories, fetch 30+ profile fields for each person from the GitHub API, and export the results to JSON, CSV, or Markdown. It is designed for research, open-source community analysis, and developer intelligence workflows.

Key capabilities:

Collects users from 9 role categories in a single call
Fetches 30+ profile fields per user (bio, location, company, followers, orgs, languages, …)
Computes derived metrics: account age, followers/following ratio, repos/year, recently-active flag, bot detection
Incremental fetch with save_each_iteration and resume — safe to interrupt and restart on large repos
Flexible filtering: roles, exclude, exclude_bots, limit, fields
Concurrent fetching via workers — uses ThreadPoolExecutor to fetch multiple profiles in parallel
Async fetching via get_users_async() — uses asyncio + aiohttp for high-concurrency scenarios
Opt-in social accounts via include_social_accounts — fetches linked LinkedIn, Mastodon, npm, and other accounts
Export to JSON, CSV and Markdown table
Analysis helpers: summarise() and top_users()
Token validated on startup — invalid or expired tokens raise ConnectionError immediately
Rate-limit progress printed every 50 users with remaining request count and reset time

Background

Understanding who contributes to, uses, and maintains an open-source project is valuable for community health analysis, academic research, and competitive intelligence. GitHub exposes this information across many endpoints (contributors, stargazers, watchers, forks, issues, pull requests, CODEOWNERS, commit history), but collecting and joining it requires many paginated API calls.

repo-people automates that collection, deduplicates users across all roles, enriches each record with the full GitHub profile, and computes additional signals (account age, activity recency, bot detection) in a single pipeline call.

Requirements

Python ^3.9
PyGithub ^2.0.0 — GitHub API client
requests ^2.31.0 — HTTP requests for REST endpoints
beautifulsoup4 ^4.12.0 — HTML scraping for dependents
aiohttp ^3.9 — async HTTP client for get_users_async()

A GitHub personal access token is strongly recommended. Unauthenticated requests are limited to 60/hour; authenticated requests allow 5,000/hour.

Installation

Install the latest version of repo-people via PyPi using pip:

pip3 install repo-people --upgrade

Installation from source:

git clone -b main https://github.com/amckenna41/repo-people.git
cd repo-people
pip3 install .

Documentation

Read the Docs — full package documentation
FIELDS.md — full reference table of all 48 output fields with descriptions
CHANGELOG.md — version history and release notes

Usage

Quick Start

How to get a GitHub Personal Access Token

Sign in to github.com and go to Settings → Developer settings → Personal access tokens → Tokens (classic).
Click Generate new token (classic).
Give the token a descriptive name and set an expiration date.
Select the following scopes:
- repo — read access to repository metadata, contributors, and collaborators
- read:user — read user profile data
- read:org — read organisation membership (needed for public_orgs)
Click Generate token and copy it immediately — it won't be shown again.
Store it securely (e.g. in an environment variable or a secrets manager) and pass it via the token parameter:

import os
rp = RepoPeople("owner", "repo", token=os.environ["GITHUB_TOKEN"])

Tip: Unauthenticated requests are limited to 60/hour. Authenticated requests allow 5,000/hour, making a token essential for any non-trivial repo.

from repo_people import RepoPeople

rp = RepoPeople("owner", "repo", token="ghp_...")
user_data = rp.get_users(export=True)
# Returns a dict keyed by username, with 30+ profile fields per user

Authentication

import os
rp = RepoPeople("owner", "repo", token=os.environ["GITHUB_TOKEN"])

The token is validated immediately on construction — an invalid or expired token raises ConnectionError before any collection begins.

`RepoPeople()` Constructor

RepoPeople(owner, repo, token=None, outdir=None, skip_codeowners=False, skip_collaborators=False)

Parameter	Type	Default	Description
`owner`	`str`	—	GitHub username or organisation that owns the repo.
`repo`	`str`	—	Repository name.
`token`	`str \| None`	`None`	Personal access token. Strongly recommended — validated immediately on init; raises `ConnectionError` for invalid tokens.
`outdir`	`str \| None`	`"{owner}_{repo}"`	Leaf directory inside `outputs/`. All output files are written under `outputs/{outdir}/`.
`skip_codeowners`	`bool`	`False`	Skip CODEOWNERS file when collecting maintainers.
`skip_collaborators`	`bool`	`False`	Skip repo collaborators when collecting maintainers.

`get_users()` Parameters

Parameter	Type	Default	Description
`export`	`bool`	`False`	Write results to a JSON file.
`export_csv`	`bool`	`False`	Write results to a CSV file.
`save_each_iteration`	`bool`	`False`	Save after every single user fetch.
`limit`	`int \| None`	`None`	Cap the number of profiles to fetch.
`roles`	`list[str] \| None`	`None` (all 9)	Restrict which roles to collect.
`exclude`	`list[str] \| None`	`None`	Usernames to skip.
`exclude_bots`	`bool`	`False`	Skip bot accounts automatically.
`resume`	`bool`	`False`	Skip users already in the output file.
`verbose`	`bool`	`True`	Print progress to stdout.
`fields`	`list[str] \| str \| None`	`None` (all)	Restrict which fields appear in output. Invalid names raise `ValueError` before any fetch.
`include_social_accounts`	`bool`	`False`	Fetch each user's linked social accounts (LinkedIn, Mastodon, npm, …). Costs one extra API call per user.
`workers`	`int`	`1`	Number of concurrent fetch threads. Increase for faster collection on large repos.

Valid roles values: contributors, maintainers, stargazers, watchers, issue_authors, pr_authors, fork_owners, commit_authors, dependents.

Examples

Filter by role

# Only gather contributors and stargazers
user_data = rp.get_users(roles=["contributors", "stargazers"])

Limit, exclude, and skip bots

user_data = rp.get_users(
    limit=100,
    exclude=["dependabot", "github-actions[bot]"],
    exclude_bots=True,
)

Export to JSON and CSV

user_data = rp.get_users(export=True, export_csv=True)

Export to Markdown table

rp.export_to_markdown(user_data, fields=["login", "name", "location", "followers"])

Resume an interrupted run

# First run
rp.get_users(save_each_iteration=True, export=True)

# Resume after interruption
rp.get_users(save_each_iteration=True, export=True, resume=True)

Concurrent fetching

# Speed up large repos by fetching profiles in parallel
user_data = rp.get_users(workers=4)

Async fetching

import asyncio

user_data = asyncio.run(rp.get_users_async(concurrency=10))

Include social accounts

user_data = rp.get_users(include_social_accounts=True)
# Each record gains a 'social_accounts' dict, e.g. {'linkedin': 'https://linkedin.com/in/...'}

Dot-notation field access

get_users() returns a UserDataView — a plain dict subclass that additionally supports dot notation to extract a single field across every user at once:

user_data = rp.get_users()

# Extract one field for all users
emails    = user_data.email_public
# {"alice": {"email_public": "alice@example.com"}, "bob": {"email_public": ""}, ...}

locations = user_data.location
followers = user_data.followers
roles     = user_data.roles

All standard dict operations still work unchanged. Accessing an unrecognised field name raises AttributeError listing the valid field names.

Analysis helpers

stats = rp.summarise(user_data, top_n=5)
# {'total': 134, 'top_locations': [('San Francisco', 18), ...], ...}

leaders = rp.top_users(user_data, n=10, by="followers")

Output Fields

Each user entry contains 30+ fields. See FIELDS.md for the full reference. A summary by category:

Category	Fields
Identity	`login`, `name`, `company`, `location`, `email_public`, `blog`, `twitter`, `bio`
Timestamps	`created_at`, `updated_at`
Counters	`followers`, `following`, `public_repos`, `public_gists`
Flags	`has_public_email`, `has_blog`, `has_twitter`, `is_bot`, `hireable`
Computed	`account_age_days`, `followers_following_ratio`, `repos_per_year`, `recently_active`, `last_public_event_at`
Organisations	`public_orgs`, `orgs_public_count`
Sampled	`top_languages`, `total_public_stars_sampled`, `total_public_forks_sampled`, `ssh_keys_count`, `gpg_keys_count`, `starred_repos_sampled`
Social	`social_accounts` (opt-in via `include_social_accounts`)
Repo-specific	`is_collaborator`, `permission_on_repo`
Metadata	`roles` (populated by `get_users()`)

Directories

repo-people/
├── repo_people/          # Package source
│   ├── __init__.py
│   ├── repo_people.py    # RepoPeople class — main pipeline
│   ├── export.py         # Role-specific username collectors (9 functions)
│   ├── users.py          # GitHubUserInfo wrapper and UserSnapshot dataclass
│   └── utils.py          # Shared helpers: paginate(), _headers(), write_csv()
├── tests/                # Unit and integration tests
│   ├── test_repo_people.py
│   ├── test_export.py
│   └── test_users.py
├── docs/                 # Sphinx documentation source
├── outputs/              # Default output directory (created at runtime)
├── FIELDS.md             # Full output field reference
├── CHANGELOG.md          # Version history
├── pyproject.toml        # Package metadata and dependencies
└── README.md

Issues

Any issues, errors or bugs can be raised via the Issues tab in the repository.

Contact

If you have any questions or comments, please contact amckenna41@qub.ac.uk or raise an issue on the Issues tab.

License

Distributed under the MIT License. See LICENSE for more details.

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

This version

0.2.0

Apr 29, 2026

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repo_people-0.2.0.tar.gz (29.4 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

repo_people-0.2.0-py3-none-any.whl (29.8 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file repo_people-0.2.0.tar.gz.

File metadata

Download URL: repo_people-0.2.0.tar.gz
Upload date: Apr 29, 2026
Size: 29.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for repo_people-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`93bf7054ad97a5f293b5167386faa3d9385cdbf5a97b85225f2315e010b0f297`
MD5	`43b08e5420ea69c17a0988cb1f423640`
BLAKE2b-256	`e2ca1f6d5ecd1c72c7d4010b5072649553484fdd9f4f22a4f12be3d8b6946bbf`

See more details on using hashes here.

File details

Details for the file repo_people-0.2.0-py3-none-any.whl.

File metadata

Download URL: repo_people-0.2.0-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 29.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for repo_people-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f2e9e22d97105611a714692c8bb75f4a6f3238ab924f62941deac478a3a7d8c`
MD5	`22ae56a33c8bc2a8e308d917f6519930`
BLAKE2b-256	`7c284d3aa01b2d6cd8f828221b80ad886b402777538f367c689f7af47db82b91`

See more details on using hashes here.

repo-people 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

repo-people

Table of Contents

Introduction

Background

Requirements

Installation

Documentation

Usage

Quick Start

How to get a GitHub Personal Access Token

Authentication

RepoPeople() Constructor

get_users() Parameters

Examples

Filter by role

Limit, exclude, and skip bots

Export to JSON and CSV

Export to Markdown table

Resume an interrupted run

Concurrent fetching

Async fetching

Include social accounts

Dot-notation field access

Analysis helpers

Output Fields

Directories

Issues

Contact

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`RepoPeople()` Constructor

`get_users()` Parameters