A fast CLI tool to extract, validate, and identify phone numbers from text, files, and web pages.

These details have not been verified by PyPI

Project links

Project description

phone-harvest

A fast CLI tool to extract, validate, and identify phone numbers from text, files, and web pages.

Features

Extract from anywhere — text strings, local files, URLs, or entire websites
International support — recognizes phone numbers from 200+ countries using Google's libphonenumber
Auto-identify — detects country, city/region, carrier, and number type (mobile/landline/toll-free/VoIP)
E.164 normalization — outputs all numbers in standardized +15551234567 format
Number deobfuscation — handles dot-separated (555.123.4567), fullwidth digits, unicode dashes, HTML entities
Smart validation — rejects fake/invalid numbers using Google's phone number database
Recursive crawling — follow links within a domain up to N levels deep
Polite crawling — rate limiting and robots.txt compliance built-in
Country/type filtering — include or exclude by country code or number type
Multiple output formats — plain text, CSV, JSON, JSONL with full metadata
Proxy support — route requests through HTTP/HTTPS proxies
Pipe-friendly — works with stdin/stdout for shell pipelines
Python API — use as a library in your own scripts

Installation

pip install phone-harvest

Requires Python 3.8+. Only 3 dependencies: requests, beautifulsoup4, phonenumbers.

Quick Start

Extract phone numbers from a web page:

phonex https://example.com/contact

With full details (country, carrier, type):

phonex https://example.com/contact --detail

Output:

+12125551234  US  FIXED_LINE_OR_MOBILE  New York, NY  stdin
+442079460958  GB  FIXED_LINE  London  stdin

Extract from a local file:

phonex contacts.html --format json --detail

[
  {
    "number": "+12125551234",
    "national": "(212) 555-1234",
    "international": "+1 212-555-1234",
    "country_code": "US",
    "country_name": "New York, NY",
    "carrier": "",
    "type": "FIXED_LINE_OR_MOBILE",
    "source": "contacts.html"
  }
]

Pipe text through stdin:

echo "Call +44 20 7946 0958 or +1 212-555-1234" | phonex -

Crawl an entire website:

phonex https://example.com --depth 2 --max-pages 100 -v

CLI Reference

phonex [OPTIONS] SOURCE...

Region

Flag	Default	Description
`--region, -r CODE`	`US`	Default country for local numbers (ISO 3166-1)

Crawling Options

Flag	Default	Description
`--depth N`	`0`	Recursion depth for URLs (0 = single page)
`--max-pages N`	`50`	Maximum pages to crawl per URL
`--rate-limit SEC`	`1.0`	Seconds between requests
`--ignore-robots`	off	Ignore robots.txt restrictions
`--proxy URL`	—	HTTP/HTTPS proxy
`--timeout SEC`	`10`	Request timeout

Filtering Options

Flag	Description
`--include-country CODE`	Only keep numbers from this country (repeatable)
`--exclude-country CODE`	Exclude numbers from this country (repeatable)
`--type TYPE`	Only keep: MOBILE, FIXED_LINE, TOLL_FREE, VOIP (repeatable)
`--no-deobfuscate`	Disable number deobfuscation

Output Options

Flag	Default	Description
`--format, -f`	`plain`	Output format: `plain`, `csv`, `json`, `jsonl`
`--output, -o FILE`	stdout	Write to file
`--detail, -d`	off	Include country, carrier, type, source
`--national`	off	Output in national format instead of E.164
`--sort`	off	Sort alphabetically
`--count`	off	Print only the count

Python API

from phone_extractor import PhoneExtractor

ex = PhoneExtractor(default_region="US")

# From text
results = ex.from_text("Call +1 (212) 555-1234 or +44 20 7946 0958")
for r in results:
    print(f"{r.number} | {r.country_code} | {r.country_name} | {r.number_type}")
    # +12125551234 | US | New York, NY | FIXED_LINE_OR_MOBILE
    # +442079460958 | GB | London | FIXED_LINE

# Just the numbers
numbers = ex.extract_simple("Call +1 212-555-1234")
# ['+12125551234']

# From file
results = ex.from_file("contacts.html")

# From HTML with tel: link detection
results = ex.from_html("<a href='tel:+12125551234'>Call</a>")

# With country filtering
ex = PhoneExtractor(
    default_region="US",
    include_countries=["US", "GB"],
    exclude_countries=["RU"],
    include_types=["MOBILE"],
)

Web Crawling API

from phone_extractor import PhoneExtractor
from phone_extractor.crawler import WebCrawler

ex = PhoneExtractor(default_region="US")
crawler = WebCrawler(ex, rate_limit=1.0, respect_robots=True)

# Single page
results = crawler.extract_url("https://example.com/contact")

# Recursive crawl
results = crawler.crawl("https://example.com", max_depth=2, max_pages=100)

for r in results:
    print(f"{r.number} ({r.country_name}) found on {r.source}")

Phone Number Deobfuscation

phone-harvest automatically normalizes common obfuscation patterns:

Obfuscated	Normalized
`555.123.4567`	`555-123-4567`
`555 dash 123 dash 4567`	`555-123-4567`
`+1 212 555 1234`	`+1 212 555 1234`
`%2B1 212 555 1234`	`+1 212 555 1234`
Fullwidth digits `５５５`	`555`
Unicode dashes `555–123–4567`	`555-123-4567`
Non-breaking spaces	Regular spaces

Comparison

Feature	phone-harvest	python-phonenumbers	PhoneInfoga	CommonRegex
CLI tool	Yes	No	Yes (Go)	No
Python API	Yes	Yes	No	Yes
Web page extraction	Yes	No	No	No
Recursive crawling	Yes	No	No	No
International support	200+ countries	200+ countries	Yes	US-only
Country detection	Yes	Yes	Yes	No
Carrier lookup	Yes	Yes	Yes	No
Number type (mobile/landline)	Yes	Yes	Yes	No
E.164 normalization	Yes	Yes	N/A	No
Number validation	Yes	Yes	Yes	No
Deobfuscation	Yes	No	No	No
robots.txt compliance	Yes	N/A	N/A	N/A
JSON/CSV output	Yes	No	JSON	No
stdin pipe support	Yes	No	No	No
pip install	Yes	Yes	No	Yes

Development

git clone https://github.com/thunderbit/phone-harvest.git
cd phone-harvest
pip install -e ".[dev]"
pytest

Also by Thunderbit

email-harvest — Extract email addresses from text, files, and web pages
image-harvest — Discover and batch download images from web pages
product-harvest — Extract structured product data using Schema.org

License

MIT

Built by Thunderbit — AI-powered web scraper and data extraction tools.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_phone-0.1.0.tar.gz (16.2 kB view details)

Uploaded Mar 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

extract_phone-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Mar 27, 2026 Python 3

File details

Details for the file extract_phone-0.1.0.tar.gz.

File metadata

Download URL: extract_phone-0.1.0.tar.gz
Upload date: Mar 27, 2026
Size: 16.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for extract_phone-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`be5deeda41e9ea63ca7070ce87c841c525a1ecc45d0eb557586fba0ae0d600f0`
MD5	`0a78592df7e0bf5ae0b6a543904578e2`
BLAKE2b-256	`2fa603f6a9022168c5cff11023a390cad884dcdc97c6755755508b66c0b53541`

See more details on using hashes here.

File details

Details for the file extract_phone-0.1.0-py3-none-any.whl.

File metadata

Download URL: extract_phone-0.1.0-py3-none-any.whl
Upload date: Mar 27, 2026
Size: 16.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for extract_phone-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`be7539f97e2888a327d4aac2b0183694f98777dded71019ca563ea93b741fd6e`
MD5	`b10c40d6ebc8d81d5050961941588076`
BLAKE2b-256	`d37a5b3ee9e4b4b9b2968914f512d306ffd9c1f157ec4c529a0c0025be04ac28`

See more details on using hashes here.

extract-phone 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

phone-harvest

Features

Installation

Quick Start

CLI Reference

Region

Crawling Options

Filtering Options

Output Options

Python API

Web Crawling API

Phone Number Deobfuscation

Comparison

Development

Also by Thunderbit

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes