A utility to determine the content type of a URL.

These details have not been verified by PyPI

Project links

Project description

URL Content Type Detector

A lightweight, efficient utility to determine the content type of any URL with minimal overhead.

Quick Start • Documentation • Examples • Contributing

Overview

URL Content Type Detector is a Python library that retrieves the content type of a URL by making efficient HTTP HEAD requests. It's designed to be lightweight, robust, and production-ready with comprehensive error handling.

Key Features

🚀 Fast & Efficient: Uses HTTP HEAD requests to minimize bandwidth
✅ Robust Error Handling: Custom exceptions and detailed error messages
🔒 URL Validation: Built-in URL validation using industry-standard validators
⏱️ Configurable Timeout: Adjustable timeout settings with sensible defaults
🛡️ Security-First: Optional strict HTTP status code validation
📦 Lightweight: Zero unnecessary dependencies beyond requests and validators
🧪 Well-Tested: Comprehensive test suite with pytest
🐍 Python 3.10+: Modern Python support

Installation

Using pip

pip install url-content-type-detector

Using uv (recommended for development)

uv pip install url-content-type-detector

Development Installation

Clone the repository and install in editable mode:

git clone https://github.com/krsahil8825/url_content_type_detector.git
cd url_content_type_detector
uv pip install -e .

Usage

Basic Example

from url_content_type_detector import get_content_type

# Get content type of a webpage
content_type = get_content_type("https://example.com")
print(content_type)  # Output: text/html; charset=UTF-8

Detecting Different Content Types

from url_content_type_detector import get_content_type

# HTML Page
html_type = get_content_type("https://example.com/page.html")
print(html_type)  # text/html; charset=UTF-8

# Image
image_type = get_content_type("https://example.com/image.png")
print(image_type)  # image/png

# PDF Document
pdf_type = get_content_type("https://example.com/document.pdf")
print(pdf_type)  # application/pdf

# JSON API
json_type = get_content_type("https://api.example.com/data")
print(json_type)  # application/json

Advanced Configuration

from url_content_type_detector import get_content_type, URLUtilsError

# Custom timeout (in seconds)
content_type = get_content_type("https://slow-server.com", timeout=30)

# Disable strict HTTP validation (allows 4xx/5xx responses)
try:
    content_type = get_content_type("https://example.com", is_secure=False)
except URLUtilsError as e:
    print(f"Error: {e}")

# No timeout (not recommended for production)
content_type = get_content_type("https://example.com", timeout=None)

API Reference

`get_content_type(url, timeout=10, is_secure=True)`

Fetches the content type of the resource at the given URL.

Parameters:

Parameter	Type	Default	Description
`url`	`str`	Required	The URL of the resource
`timeout`	`int \| None`	`10`	Request timeout in seconds. Use `None` for no timeout (not recommended in production)
`is_secure`	`bool`	`True`	If `True`, raises an error for HTTP 4xx/5xx status codes

Returns:

str: The content type from the HTTP Content-Type header, or "Not Found" if missing

Raises:

ValueError: If the URL is invalid or timeout is negative
URLUtilsError: For network errors, timeouts, or (when is_secure=True) HTTP error responses
requests.RequestException: For underlying request failures

Example:

from url_content_type_detector import get_content_type, URLUtilsError

try:
    content_type = get_content_type("https://example.com", timeout=15)
    print(f"Content Type: {content_type}")
except ValueError as e:
    print(f"Invalid URL: {e}")
except URLUtilsError as e:
    print(f"Request failed: {e}")

`URLUtilsError`

Custom exception for URL content type detection errors.

Example:

from url_content_type_detector import URLUtilsError, get_content_type

try:
    content_type = get_content_type("https://example.com/nonexistent")
except URLUtilsError as e:
    print(f"URL Error: {e}")

`utils` convenience helpers

from url_content_type_detector import utils

if utils.is_pdf("https://example.com/report.pdf"):
    print("PDF detected")

Examples

Demo Script

Run the included demo to see the library in action:

python scripts/demo.py

Output:

✅ URL: https://www.example.com -> Content Type: text/html; charset=UTF-8
✅ URL: https://www.example.com/image.png -> Content Type: image/png
✅ URL: https://www.example.com/document.pdf -> Content Type: application/pdf

Use Cases

1. File Type Detection in Web Scrapers

from url_content_type_detector import get_content_type

def should_download(url):
    """Check if URL points to an image."""
    try:
        content_type = get_content_type(url)
        return content_type.startswith("image/")
    except Exception:
        return False

urls = ["https://example.com/pic.jpg", "https://example.com/page.html"]
for url in urls:
    if should_download(url):
        print(f"Download {url}")

2. Content-Based Routing

from url_content_type_detector import get_content_type

def route_by_content(url):
    """Route processing based on content type."""
    try:
        content_type = get_content_type(url)
        if content_type.startswith("image/"):
            return "image_processor"
        elif content_type.startswith("video/"):
            return "video_processor"
        elif "json" in content_type:
            return "data_processor"
        else:
            return "generic_processor"
    except Exception:
        return "error_handler"

3. Link Health Checking

from url_content_type_detector import get_content_type, URLUtilsError

def check_link_health(url):
    """Check if a link is accessible and returns valid content."""
    try:
        content_type = get_content_type(url, is_secure=True)
        return {"url": url, "status": "OK", "content_type": content_type}
    except URLUtilsError as e:
        return {"url": url, "status": "ERROR", "error": str(e)}

links = ["https://example.com", "https://example.com/404"]
for link in links:
    print(check_link_health(link))

Requirements

Python: 3.10 or higher
requests: >= 2.32.5
validators: >= 0.35.0

Performance Considerations

HTTP HEAD Requests: The library uses HTTP HEAD requests instead of GET to minimize bandwidth usage
Timeout Defaults: The default 10-second timeout is suitable for most use cases. Adjust based on your network conditions
Redirect Handling: The library automatically follows HTTP redirects (up to 30 by default in requests)
Connection Pooling: For bulk URL processing, consider using a requests.Session for connection reuse (future feature)

Documentation

You can browse the full documentation at:

https://krsahil8825.github.io/url_content_type_detector/

To build the docs locally:

pip install -e ".[dev]"
cd docs
make html

On Windows:

pip install -e ".[dev]"
cd docs
make.bat

Troubleshooting

Common Issues

`ValueError: Invalid URL provided`

Ensure the URL starts with http:// or https://
Check for typos or invalid characters
URLs with spaces are automatically converted to %20

`URLUtilsError: The request timed out`

Increase the timeout parameter
Check your network connection
Verify the server is responsive

`URLUtilsError: Accessing Unsecure URL`

The server returned a 4xx or 5xx status code
Set is_secure=False to allow error responses
Verify the URL is correct and accessible

`URLUtilsError: Failed to fetch content type`

Check your internet connection
Verify the URL is accessible
Some servers may block HEAD requests; check server configuration

Contributing

Contributions are welcome! Here's how to get started:

Setup Development Environment

git clone https://github.com/krsahil8825/url_content_type_detector.git
cd url_content_type_detector
uv pip install -e ".[dev]"

Create a Feature Branch

git checkout -b feature/your-feature-name

Make Your Changes

Write clear, commented code
Add tests for new features
Ensure all tests pass: pytest

Submit a Pull Request

Push your branch to GitHub
Create a pull request with a clear description
Link any related issues

Code Style

Follow PEP 8 guidelines
Use meaningful variable and function names
Add docstrings to all public functions
Keep functions focused and modular

Goals for Future Development

Async support (async_get_content_type)
Testing of bulk URL processing with connection pooling

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Kumar Sahil

Acknowledgments

Built with requests for HTTP communication
URL validation powered by validators
Testing with pytest

Made with ❤️ and python by Kumar Sahil

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.4

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

url_content_type_detector-1.0.4.tar.gz (76.3 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

url_content_type_detector-1.0.4-py3-none-any.whl (9.1 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file url_content_type_detector-1.0.4.tar.gz.

File metadata

Download URL: url_content_type_detector-1.0.4.tar.gz
Upload date: May 5, 2026
Size: 76.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for url_content_type_detector-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`26db389d268302e62d1a85d386f9151cef2db54330cd7298469f0a4bb6e300f1`
MD5	`a85a20099ad767abc87fe1583270723c`
BLAKE2b-256	`9d4c0290254f515a84f3285b4b61ac4bc0cb29d59f7139ba0bbcc756308a6f74`

See more details on using hashes here.

File details

Details for the file url_content_type_detector-1.0.4-py3-none-any.whl.

File metadata

Download URL: url_content_type_detector-1.0.4-py3-none-any.whl
Upload date: May 5, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for url_content_type_detector-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b60d3d78a1c888603ffca6fd739ee0e181beee66f40d5c6293d341d4fff09abb`
MD5	`6d043c9a37b96b339c2154a866a1c87e`
BLAKE2b-256	`3d60bf177822cddb1bfaa454717b00355027a393e088f746a5ae7bf68cee4b39`

See more details on using hashes here.

url-content-type-detector 1.0.4

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

URL Content Type Detector

Overview

Key Features

Installation

Using pip

Using uv (recommended for development)

Development Installation

Usage

Basic Example

Detecting Different Content Types

Advanced Configuration

API Reference

get_content_type(url, timeout=10, is_secure=True)

URLUtilsError

utils convenience helpers

Examples

Demo Script

Use Cases

1. File Type Detection in Web Scrapers

2. Content-Based Routing

3. Link Health Checking

Requirements

Performance Considerations

Documentation

Troubleshooting

Common Issues

ValueError: Invalid URL provided

URLUtilsError: The request timed out

URLUtilsError: Accessing Unsecure URL

URLUtilsError: Failed to fetch content type

Contributing

Setup Development Environment

Create a Feature Branch

Make Your Changes

Submit a Pull Request

Code Style

Goals for Future Development

License

Author

Acknowledgments

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`get_content_type(url, timeout=10, is_secure=True)`

`URLUtilsError`

`utils` convenience helpers

`ValueError: Invalid URL provided`

`URLUtilsError: The request timed out`

`URLUtilsError: Accessing Unsecure URL`

`URLUtilsError: Failed to fetch content type`