Extract accounts' identifiers and metadata from personal pages on various platforms.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

soxoj

These details have not been verified by PyPI

Project links

Funding

Project description

socid_extractor

Turn any public profile page into a structured account record — usernames, display names, bios, avatars, locations, joined-at dates, follower counts, external links, and the stable internal identifiers that uniquely pin an account across renames, redesigns, and deletions.

socid_extractor parses HTML pages and API responses from 130+ platforms and returns a flat, machine-readable dictionary of account fields. No API keys required, no headless browser — just a single function call on response text.

Why it's useful

Stable cross-service IDs. Get GAIA ID (Google), Facebook UID, Yandex Public ID, Instagram pk, and dozens more — values that survive username changes and let you correlate accounts across leaks, archives, and search-engine indices.
One uniform interface. Same extract() call for Instagram, GitHub, VK, Reddit, Substack, Bluesky, TikTok — no per-platform glue code on your side.
Field ontology. Normalized field names across platforms (username, fullname, created_at, is_verified, …) so downstream pipelines don't need 130 mappings.
Battle-tested. Powers Maigret and a number of other OSINT tools.

Installation

Python: 3.10+.

pip install socid-extractor

For a clean CLI install on a workstation:

pipx install socid-extractor

The latest development version:

pip install -U git+https://github.com/soxoj/socid-extractor.git

Quick start

As a CLI:

$ socid_extractor --url https://www.deviantart.com/muse1908
country: France
created_at: 2005-06-16 18:17:41
gender: female
username: Muse1908
website: www.patreon.com/musemercier
links: ['https://www.facebook.com/musemercier', 'https://www.instagram.com/muse.mercier/', 'https://www.patreon.com/musemercier']
tagline: Nothing worth having is easy...

As a Python library:

import requests
import socid_extractor

r = requests.get('https://www.patreon.com/annetlovart')
print(socid_extractor.extract(r.text))
# {'patreon_id': '33913189', 'patreon_username': 'annetlovart',
#  'fullname': 'Annet Lovart',
#  'links': "['https://www.facebook.com/322598031832479', ...]"}

Tip — batch runs: pass --skip-fetch-if-no-url-hint to skip the HTTP request when the URL doesn't match any known site hint (faster, but may skip generic engines such as forum templates):

$ socid_extractor --url https://example.com/foo --skip-fetch-if-no-url-hint

Supported sites

130+ schemes — see METHODS.md for the full list.

A non-exhaustive sample:

Major networks: Facebook (user & group pages), Instagram, VK.com, OK.ru, Reddit, TikTok, Bluesky, Tumblr, Flickr
Google ecosystem: Google docs/maps contributions (cookies required), Google Play, YouTube
Mail.ru: my.mail.ru user mainpage, photo, video
Dev / writing platforms: GitHub, Stack Overflow (HTML + API), LeetCode, Hashnode, Medium, Substack, Paragraph, WordPress.org, Virgool
Forums (universal detectors): Discourse, MediaWiki / Fandom wikis, Mastodon
Niche / vertical: Chess.com, Roblox, MyAnimeList, Scratch, Wikipedia, DailyMotion, SlideShare, Weebly, Calendly, Amazon Author, Boosty, Warpcast (Farcaster), Fragment (TON/Telegram), Rarible, CSSBattle, lnk.bio, Spatial, TwitchTracker, Max (max.ru)

…and many others.

For data examples, see tests/test_e2e.py; for the parsing logic, see socid_extractor/schemes.py; for the field ontology, see FIELDS.md.

Use cases

Pivot from a profile to everything you can see. One call returns the visible info plus the hidden internal IDs the platform uses behind the scenes. Background reading: Week in OSINT — Getting a grasp on Google IDs.
Track accounts across renames, redesigns, and deletions. Stable IDs (GAIA, FB UID, Yandex Public ID, Instagram pk, …) let you re-identify the same person even when every visible field has changed. Background: Aware Online — User IDs in social-media investigations.
Search by cross-service UID. Once you have a stable identifier you can pivot into:
- SQL / leaked databases (forum dumps, breach data) where the UID is the join key,
- Google / Yandex / archive.org indices that captured URLs containing the UID.
Feed downstream OSINT tooling. A normalized record is much easier to ingest than per-site scrapers — used by Maigret and similar tools for enrichment.

Commercial Use

The open-source socid_extractor is MIT-licensed and free for commercial use without restriction — but page parsers break over time as platforms change their HTML and APIs, and they need active maintenance.

For serious commercial use — with a maintained private plugin pack of extra parsers or a hosted extraction API — reach out: 📧 socid@soxoj.com

Private parser plugin — 100+ additional checks on top of the public 150+ sites, kept up to date as platforms change (separate from the public open-source database)
Extraction API — integrate socid_extractor into your product

SOWEL classification

Maps to the following SOWEL techniques:

Tools using socid_extractor

Maigret — powerful namechecker that generates a report with all available info from accounts found across 3000+ sites.
TheScrapper — scrape emails, phone numbers, and social-media accounts from a website.
InfoHunter — open-source OSINT tool to search, collect, and analyze information online.
YaSeeker — gather all available information about a Yandex account by login/email.
Marple — scrape search-engine results for a given username.

Testing

Install the test extras from pyproject.toml, then run pytest:

pip install '.[test]'   # pytest, pytest-rerunfailures, pytest-xdist
python3 -m pytest tests/test_e2e.py -n 10 -k 'not cookies' -m 'not github_failed and not rate_limited'

Use pip install '.[dev]' instead if you also want flake8 / mypy / black (the full set used by CI).

Every new scheme must have an e2e test in tests/test_e2e.py hitting a real URL/API. Unit tests with inline fixtures (tests/test_socid_improvements.py) are also required but do not replace e2e coverage. See docs/testing-and-ci.md for details.

Developer documentation (architecture, modules, CI) lives in docs/.

Contributing

See the contributing guide if you want to add a new scheme or fix anything.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

soxoj

These details have not been verified by PyPI

Project links

Funding

Release history Release notifications | RSS feed

This version

0.1.0

May 26, 2026

0.0.28

Apr 9, 2026

0.0.27

Dec 12, 2024

0.0.26

Oct 11, 2023

0.0.25

Jul 24, 2023

0.0.24

Jul 7, 2023

0.0.23

Jan 2, 2022

0.0.22

Jun 23, 2021

0.0.21

May 31, 2021

0.0.20

May 15, 2021

0.0.19

May 12, 2021

0.0.18

May 4, 2021

0.0.17

Apr 18, 2021

0.0.16

Mar 29, 2021

0.0.15

Mar 21, 2021

0.0.14

Mar 18, 2021

0.0.13

Mar 14, 2021

0.0.12

Feb 21, 2021

0.0.11

Feb 18, 2021

0.0.10

Feb 15, 2021

0.0.9

Feb 6, 2021

0.0.8

Feb 3, 2021

0.0.7

Feb 1, 2021

0.0.6

Jan 31, 2021

0.0.5

Jan 31, 2021

0.0.4

Jan 16, 2021

0.0.3

Jan 15, 2021

0.0.2

Dec 4, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

socid_extractor-0.1.0.tar.gz (85.4 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

socid_extractor-0.1.0-py3-none-any.whl (45.6 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file socid_extractor-0.1.0.tar.gz.

File metadata

Download URL: socid_extractor-0.1.0.tar.gz
Upload date: May 26, 2026
Size: 85.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for socid_extractor-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`94e5e35be06fc3b281900122e12e8feb1a895b189417e311e07f219522d0789e`
MD5	`bc2f6a7ab3cdb1b580df149b527bb7a0`
BLAKE2b-256	`1f9299810d37c81a2ca36f55f0cbc4a0aa6fd7464238b2fa4cf0bab4c52b52eb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for socid_extractor-0.1.0.tar.gz:

Publisher: python-publish.yml on soxoj/socid-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: socid_extractor-0.1.0.tar.gz
- Subject digest: 94e5e35be06fc3b281900122e12e8feb1a895b189417e311e07f219522d0789e
- Sigstore transparency entry: 1632897463
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: soxoj/socid-extractor@af708c86133cb70589de0bb776a4597451fb4acf
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/soxoj
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@af708c86133cb70589de0bb776a4597451fb4acf
- Trigger Event: release

File details

Details for the file socid_extractor-0.1.0-py3-none-any.whl.

File metadata

Download URL: socid_extractor-0.1.0-py3-none-any.whl
Upload date: May 26, 2026
Size: 45.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for socid_extractor-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d19303436f97d394a937ccab576e60fa107aeb2b2cb56a158dcdf62cd8953b05`
MD5	`414dc1aac94758c15a953c68a1acde3a`
BLAKE2b-256	`03213801eb16cf4540975ecbb6c53257c477784f97a1df12d08f51979fe88f1a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for socid_extractor-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on soxoj/socid-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: socid_extractor-0.1.0-py3-none-any.whl
- Subject digest: d19303436f97d394a937ccab576e60fa107aeb2b2cb56a158dcdf62cd8953b05
- Sigstore transparency entry: 1632897546
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: soxoj/socid-extractor@af708c86133cb70589de0bb776a4597451fb4acf
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/soxoj
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@af708c86133cb70589de0bb776a4597451fb4acf
- Trigger Event: release

socid-extractor 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

socid_extractor

Installation

Quick start

Supported sites

Use cases

Commercial Use

SOWEL classification

Tools using socid_extractor

Testing

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance