Library to find URLs and check their validity.

These details have not been verified by PyPI

Project links

Project description

urlfinderlib

This is a Python (3.10+) library for finding URLs in documents and checking their validity.

Supported Documents

Extracts URLs from the following types of documents:

Binary files (finds URLs within strings)
CSV files
HTML files
iCalendar/vCalendar files
PDF files
Text files (ASCII or UTF-8)
XML files

Every extracted URL is validated such that it contains a domain with a valid TLD (or a valid IP address) and does not contain any invalid characters.

URL Permutations

This was originally written to accommodate finding both valid and obfuscated or slightly malformed URLs used by malicious actors and using them as indicators of compromise (IOCs). As such, the extracted URLs will also include the following permutations:

URL with any Unicode characters in its domain
URL with any Unicode characters converted to its IDNA equivalent

For both domain variations, the following permutations are also returned:

URL with its path %-encoded
URL with its path %-decoded
URL with encoded HTML entities in its path
URL with decoded HTML entities in its path
URL with its path %-decoded and HTML entities decoded

Child URLs

This library also attempts to extract or decode child URLs found in the paths of URLs. The following formats are supported:

Barracuda protected URLs
Base64-encoded URLs found within the URL's path
Google redirect URLs
Mandrill/Mailchimp redirect URLs
Outlook Safe Links URLs
Proofpoint protected URLs
URLs found in the URL's path query parameters

Basic usage

from urlfinderlib import find_urls

with open('/path/to/file', 'rb') as f:
    print(find_urls(f.read())

base_url Parameter

If you are trying to find URLs inside of an HTML file, the paths in the URLs are often relative to their location on the server hosting the HTML. You can use the base_url parameter in this case to extract these "relative" URLs.

from urlfinderlib import find_urls

with open('/path/to/file', 'rb') as f:
    print(find_urls(f.read(), base_url='http://example.com')

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.23.0

Jul 8, 2026

0.22.0

May 5, 2026

0.21.0

Mar 25, 2026

0.20.1

Mar 2, 2026

0.19.0

Jan 8, 2026

0.18.6

Dec 1, 2022

0.18.5

Oct 23, 2021

0.18.4

Oct 23, 2021

0.18.3

Oct 14, 2021

0.18.2

Oct 13, 2021

0.18.0

Aug 26, 2021

0.17.8

Aug 12, 2021

0.17.7

Aug 12, 2021

0.17.6

Aug 6, 2021

0.17.5

Apr 17, 2021

0.17.4

Apr 9, 2021

0.17.3

Apr 9, 2021

0.17.2

Mar 31, 2021

0.17.1

Mar 31, 2021

0.17.0

Mar 31, 2021

0.16.1

Mar 9, 2021

0.16.0

Feb 6, 2021

0.15.6

Dec 4, 2020

0.15.5

Dec 1, 2020

0.15.4

Oct 6, 2020

0.15.3

Oct 6, 2020

0.15.2

Sep 24, 2020

0.15.1

Sep 7, 2020

0.15.0 yanked

Sep 7, 2020

Reason this release was yanked:

Need to fix functionality that was accidentally removed

0.14.4

Sep 2, 2020

0.14.3

Aug 25, 2020

0.14.2

Aug 18, 2020

0.14.1

Aug 12, 2020

0.14.0

Aug 11, 2020

0.13.3

Aug 8, 2020

0.13.2

Aug 7, 2020

0.13.1

Aug 5, 2020

0.13.0

Aug 2, 2020

0.12.5

Aug 1, 2020

0.12.4

Jul 29, 2020

0.12.3

Jul 29, 2020

0.12.2

Jul 29, 2020

0.12.1

Jul 28, 2020

0.12.0

Jul 27, 2020

0.11.12

Jul 2, 2020

0.11.11

Dec 23, 2019

0.11.10

Dec 23, 2019

0.11.9

Dec 5, 2019

0.11.8

Dec 5, 2019

0.11.7

Dec 5, 2019

0.11.6

Dec 5, 2019

0.11.5

Dec 3, 2019

0.11.4

Dec 3, 2019

0.11.3

Dec 3, 2019

0.11.2

Aug 6, 2019

0.11.1

Aug 2, 2019

0.11.0

Jul 18, 2019

0.10.1

Jul 5, 2019

0.10.0

Jul 3, 2019

0.9.0

Jun 18, 2019

0.8.0

Jun 14, 2019

0.7.3

May 22, 2019

0.7.2

Feb 14, 2019

0.7.1

Jan 23, 2019

0.7.0

Dec 13, 2018

0.6.0

Nov 14, 2018

0.5.0

Nov 8, 2018

0.4.1

Sep 20, 2018

0.4.0

Sep 19, 2018

0.3.1

Sep 13, 2018

0.2.2

Sep 13, 2018

0.2.1

Aug 30, 2018

0.2.0

Aug 30, 2018

0.1.2

Aug 27, 2018

0.1.1

Aug 21, 2018

0.1.0

Aug 20, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urlfinderlib-0.23.0.tar.gz (14.5 kB view details)

Uploaded Jul 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

urlfinderlib-0.23.0-py3-none-any.whl (20.5 kB view details)

Uploaded Jul 8, 2026 Python 3

File details

Details for the file urlfinderlib-0.23.0.tar.gz.

File metadata

Download URL: urlfinderlib-0.23.0.tar.gz
Upload date: Jul 8, 2026
Size: 14.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for urlfinderlib-0.23.0.tar.gz
Algorithm	Hash digest
SHA256	`3a2a65a2b2390447e9afd460ca5e084a72f4e0a30e859542566db7b587cd7a74`
MD5	`ed885d01084030689a3781c5238852d4`
BLAKE2b-256	`5846f6cca104b9d03ed47de3f94cbb08909ba745c869fb75f1ec5bbb7121406d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for urlfinderlib-0.23.0.tar.gz:

Publisher: pypi.yml on ACE-Collective/urlfinderlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: urlfinderlib-0.23.0.tar.gz
- Subject digest: 3a2a65a2b2390447e9afd460ca5e084a72f4e0a30e859542566db7b587cd7a74
- Sigstore transparency entry: 2116605032
- Sigstore integration time: Jul 8, 2026
Source repository:
- Permalink: ACE-Collective/urlfinderlib@47b232f93b5500c0f9a63ffa1b05fa8042c60151
- Branch / Tag: refs/tags/v0.23.0
- Owner: https://github.com/ACE-Collective
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@47b232f93b5500c0f9a63ffa1b05fa8042c60151
- Trigger Event: push

File details

Details for the file urlfinderlib-0.23.0-py3-none-any.whl.

File metadata

Download URL: urlfinderlib-0.23.0-py3-none-any.whl
Upload date: Jul 8, 2026
Size: 20.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for urlfinderlib-0.23.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0fd375cf8814bcd1229907638b25bba8e483a596be699bb2809bb1cd0752809`
MD5	`45491c6cb306d5feaf339160e39167ae`
BLAKE2b-256	`a9832f0e82220612d74f6f407c2490767c3f39aa440e38fc4e1cfed601f9ba93`

See more details on using hashes here.

Provenance

The following attestation bundles were made for urlfinderlib-0.23.0-py3-none-any.whl:

Publisher: pypi.yml on ACE-Collective/urlfinderlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: urlfinderlib-0.23.0-py3-none-any.whl
- Subject digest: b0fd375cf8814bcd1229907638b25bba8e483a596be699bb2809bb1cd0752809
- Sigstore transparency entry: 2116605084
- Sigstore integration time: Jul 8, 2026
Source repository:
- Permalink: ACE-Collective/urlfinderlib@47b232f93b5500c0f9a63ffa1b05fa8042c60151
- Branch / Tag: refs/tags/v0.23.0
- Owner: https://github.com/ACE-Collective
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@47b232f93b5500c0f9a63ffa1b05fa8042c60151
- Trigger Event: push

urlfinderlib 0.23.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

urlfinderlib

Supported Documents

URL Permutations

Child URLs

Basic usage

base_url Parameter

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance