Social Account Detection for Python
Project description
socials
Python library and CLI to turn URLs into structured social media profiles.
You have a list of URLs from a scrape, a CSV export, or email signatures. Some of them are social media profiles. Socials finds them and gives you structured data to work with.
| :mag: Extract | Pull social profiles from scraped pages or contact lists |
| :white_check_mark: Validate | Check if URLs are recognized social profiles |
| :arrows_counterclockwise: Normalize | Get consistent usernames from messy URL variations |
| :card_file_box: Categorize | Group URLs by platform or entity type |
| :robot: Automate | Batch process URL files via CLI |
Installation
Note: This README documents the upcoming 1.0 release. To try it, install with pre-release support:
pip install --pre socials
# or
uv add --pre socials
Feedback welcome at GitHub Issues.
For the current stable version (0.3.x), use pip install socials and see the
v0.3.0 documentation.
Quick Example
import socials
# Parse a single URL
repo = socials.parse("https://github.com/lorey/socials")
print(repo)
# GitHubRepoURL(owner='lorey', repo='socials')
print(repo.platform)
# 'github'
print(repo.owner)
# 'lorey'
# Parse multiple URLs at once
urls = ["https://github.com/lorey", "https://twitter.com/karllorey", "https://example.com"]
result = socials.parse_all(urls)
print(result.all())
# [GitHubProfileURL(username='lorey'), TwitterProfileURL(username='karllorey')]
print(result.by_platform())
# {'github': [...], 'twitter': [...]}
Why socials?
- Structured data, not strings. You get typed Python objects with extracted fields like
username,repo, orcompany. Not just a matched URL string. - Handles the edge cases. With or without
www. Trailing slashes or not. Old URL formats. Mobile URLs. Socials normalizes them all. - Comprehensive platform coverage. 8 platforms with multiple entity types each. Profiles, repos, companies, channels. Continuously updated as platforms change their URL formats.
- Extensible. Need to support an internal tool or a platform we don't cover? Register your own parser and it works with the same API.
- Built for messy real-world data. Lenient by default. Unknown URLs return
Noneinstead of crashing. Strict mode available when you need validation. - Type-safe with IDE support. Full type hints. Autocomplete works. Catch bugs before runtime.
Features
Typed URL Objects
Each parsed URL is a typed object with platform-specific fields:
import socials
company = socials.parse("https://linkedin.com/company/acme-corp")
print(company)
# LinkedInCompanyURL(company_name='acme-corp')
print(company.platform)
# 'linkedin'
print(company.entity_type)
# 'company'
Hierarchy Navigation
Navigate from a repo to its owner, or from any URL to its root:
import socials
repo = socials.parse("https://github.com/lorey/socials")
print(repo.get_parent())
# GitHubProfileURL(username='lorey')
Batch Extraction
Parse many URLs at once and group the results:
import socials
urls = ["https://github.com/lorey", "https://twitter.com/karllorey"]
result = socials.parse_all(urls)
result.all()
# list of all parsed URLs
result.by_platform()
# {'github': [...], 'twitter': [...]}
result.by_type()
# {'profile': [...]}
Platform Filtering
Only extract what you need:
import socials
extractor = socials.Extractor(platforms=["github", "linkedin"])
print(extractor.parse("https://twitter.com/someone"))
# None
Supported Platforms
| Platform | Entity Types | Example Fields |
|---|---|---|
| GitHub | profile, repo | username, owner, repo |
| Twitter/X | profile | username |
| profile, company | username, company_name | |
| profile | username | |
| profile | username | |
| YouTube | channel | channel_id, username |
| Phone | phone | phone |
Missing a platform? Open an issue or submit a PR!
CLI
The CLI lets you process URLs directly from the command line. Run it with uvx (no install needed) or install globally with pip install socials.
$ uvx socials --help
Usage: socials [OPTIONS] COMMAND [ARGS]...
Extract social media profile URLs from a list of URLs.
╭─ Commands ───────────────────────────────────────────────────────────────────╮
│ extract Extract social media URLs from input. │
│ check Check which platform a URL belongs to. │
╰──────────────────────────────────────────────────────────────────────────────╯
Examples:
# Find all social links on a webpage
$ curl -s https://karllorey.com | grep -oE 'https?://[^"]+' | socials extract
linkedin https://www.linkedin.com/in/karllorey
github https://github.com/lorey
instagram https://www.instagram.com/karllorey
# Check what platform a URL belongs to
$ socials check https://github.com/lorey
github
Documentation
Full docs at socials.readthedocs.io
- Getting Started - Tutorial with examples
- CLI Reference - Command-line usage
- API Reference - Full API docs
- Architecture - How it works
Related
- Socials API - REST API wrapper. Free hosted version available.
- social-media-profiles-regexs - Regular expressions for social media URLs.
- flutter_url_recognizer - Similar implementation for Flutter.
License
GNU General Public License v3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file
socials-1.0.0a1.tar.gz.File metadata
File hashes
63ba9b7222acdc4cbd3d17169866445cec1144e161c92b6edce2b6b9cd34fcbb0410e0b9cae81118902aa45807a2251c254485905d53d150bb7118d71185bb95beaafc156e13ba8c45c9c7abc1322322See more details on using hashes here.
Provenance
The following attestation bundles were made for
socials-1.0.0a1.tar.gz:Publisher:
Attestations: Values shown here reflect the state when the release was signed and may no longer be current.release.ymlon lorey/socials-
Statement type:
-
Predicate type:
-
Subject name:
-
Subject digest:
-
Sigstore transparency entry: 787097061
- Sigstore integration time:
Source repository:https://in-toto.io/Statement/v1https://docs.pypi.org/attestations/publish/v1socials-1.0.0a1.tar.gz63ba9b7222acdc4cbd3d17169866445cec1144e161c92b6edce2b6b9cd34fcbb-
Permalink:
-
Branch / Tag:
-
Owner: https://github.com/lorey
-
Access:
Publication detail:lorey/socials@a50ab6ab0193c0d0b5da7ade819518fa3f25590brefs/tags/v1.0.0a1publichttps://token.actions.githubusercontent.comgithub-hostedrelease.yml@a50ab6ab0193c0d0b5da7ade819518fa3f25590bpush