Skip to main content

Python3 port of matomo's Device Detector

Project description

A Python user-agent parser & device detector powered by Matomo’s regex database — accurate, updatable, and production-ready.

PyPI Downloads PyPI DownloadsPyPI Downloads

UA-Extract

UA-Extract is a precise and fast user agent parser and device detector written in Python, built on top of the largest and most up-to-date user agent database from the Matomo Device Detector project. It parses user agent strings to detect browsers, operating systems, devices (desktop, tablet, mobile, TV, cars, consoles, etc.), brands, and models, including rare and obscure ones.

UA-Extract is optimized for speed with in-memory caching and supports high-performance parsing. This project is a Python port of the Universal Device Detection library by thinkwelltwd, with Pythonic adaptations while maintaining compatibility with the original regex and fixture YAML files.

You can find the source code at https://github.com/pranavagrawal321/UA-Extract.

Disclaimer

This port is not an exact copy of the original code; it includes Python-specific adaptations. However, it uses the original regex and fixture YAML files to benefit from updates and pull requests to both the original and ported versions.

Installation

Install UA-Extract using pip:

pip install ua_extract

Dependencies

  • PyYAML: For parsing regex and fixture YAML files.
  • rich: For displaying progress bars during regex and fixture updates.
  • aiohttp: For asynchronous downloads when using the GitHub API method.
  • tenacity: For retry logic during API downloads.
  • Git: Required for the git update method.

Usage

Updating Regex and Fixture Files

The regex and fixture files can become outdated and may not accurately detect newly released devices or clients. It’s recommended to update them periodically from the Matomo Device Detector repository. Updates can be performed programmatically or via the command-line interface (CLI), with support for two methods: Git cloning (git) or GitHub API downloads (api).

Programmatic Update

Use the Regexes class to update regex and fixture files. Configure the update process by passing arguments to the Regexes constructor, then call update_regexes() with the desired method ("git" or "api") and optional dry_run parameter. The github_token parameter is optional but recommended for the api method to avoid GitHub API rate limits (60 requests/hour unauthenticated, 5000/hour authenticated).

from ua_extract import Regexes

# Update using default settings (Git method, default paths, repo, branch, etc.)
Regexes().update_regexes()

# Update using GitHub API with custom paths and a GitHub token
Regexes(
    upstream_path="/custom/regexes",
    fixtures_upstream_path="/custom/fixtures",
    client_upstream_dir="/custom/client_fixtures",
    device_upstream_dir="/custom/device_fixtures",
    repo_url="https://github.com/matomo-org/device-detector.git",
    branch="dev",
    github_token="your_token_here"
).update_regexes(method="api", show_progress=True)

# Dry run to simulate update without modifying files
Regexes().update_regexes(method="api", dry_run=True)
Regexes Constructor Arguments

The Regexes class accepts the following arguments during initialization:

  • upstream_path (str, default: regexes/upstream in the project directory): Destination path for regex files.
  • repo_url (str, default: "https://github.com/matomo-org/device-detector.git"): URL of the Git repository.
  • branch (str, default: "master"): Git branch to fetch (e.g., master, dev).
  • sparse_dir (str, default: "regexes"): Directory in the repository for regex files.
  • sparse_fixtures_dir (str, default: "Tests/fixtures"): Directory in the repository for general fixtures.
  • fixtures_upstream_path (str, default: tests/fixtures/upstream): Destination path for general fixture files.
  • sparse_client_dir (str, default: "Tests/Parser/Client/fixtures"): Directory in the repository for client fixtures.
  • client_upstream_dir (str, default: tests/parser/fixtures/upstream/client): Destination path for client fixture files.
  • sparse_device_dir (str, default: "Tests/Parser/Device/fixtures"): Directory in the repository for device fixtures.
  • device_upstream_dir (str, default: tests/parser/fixtures/upstream/device): Destination path for device fixture files.
  • cleanup (bool, default: True): If True, deletes existing files in destination paths before updating.
  • github_token (Optional[str], default: None): GitHub personal access token for the API method.
  • message_callback (Optional[callable], default: None): Function to handle progress messages.
update_regexes Method

The update_regexes method accepts the following arguments:

  • method (str, default: "git"): Update method ("git" for cloning via Git, "api" for downloading via GitHub API).
  • dry_run (bool, default: False): If True, simulates the update without modifying the filesystem.
  • show_progress (bool, default: True) If True, shows a progress bar while updating regex
Update Methods and Use Cases
  • Git Method (method="git"):

    • Description: Clones the repository using Git, fetching only specified directories with shallow cloning (--depth 1) and sparse checkout (--filter=blob:none).
    • Use Case: Ideal for users with Git installed and no API rate limit concerns.
    • Requirements: Requires Git installed and accessible.
    • Process:
      • Clones the repository into a temporary directory.
      • Sets up sparse checkout for specified directories.
      • Copies files to destination paths.
      • Creates __init__.py files to make destinations Python packages.
    • Progress Feedback: Displays a progress bar using rich (cloning, sparse-checkout, copying, finalizing).
    • Error Handling: Logs Git command failures using a message callback.
    • Example:
      Regexes().update_regexes()  # Uses default settings with Git method
      
  • GitHub API Method (method="api"):

    • Description: Downloads files asynchronously from the GitHub API using aiohttp.
    • Use Case: Suitable for users without Git or with restricted Git access.
    • Requirements: Requires aiohttp and tenacity. A GitHub token is recommended.
    • GitHub Token:
      • When Needed: GitHub API rate limits are 60 requests/hour (unauthenticated) or 5000/hour (authenticated).
      • How to Provide: Pass via github_token in the Regexes constructor or --github-token CLI option.
      • How to Generate: Create a token in GitHub under Settings > Developer settings > Personal access tokens with repo scope.
    • Process:
      • Validates the repository URL format.
      • Fetches file metadata recursively.
      • Downloads files with retry logic (3 attempts, exponential backoff).
      • Saves files to destination paths.
      • Creates __init__.py files.
    • Progress Feedback: Displays a progress bar with download speed and elapsed time.
    • Error Handling: Logs errors (e.g., rate limits, network issues) and retries transient failures.
    • Example:
      Regexes(github_token="your_token_here").update_regexes(method="api")
      
Notes for Both Methods
  • Cleanup: If cleanup=True, destination directories are deleted before updating.
  • Progress: Progress bars provide visual feedback, and messages are printed to stderr.
  • Temporary Directory: The git method uses a temporary directory, cleaned up automatically.
  • URL Validation: The api method ensures the URL matches https://github.com/user/repo/tree/branch/path.

CLI Update

Use the ua_extract CLI to update regex and fixture files:

ua_extract update_regexes
CLI Options

The update_regexes command supports the following options:

  • -p, --path (default: regexes/upstream): Destination path for regex files.
  • -r, --repo (default: https://github.com/matomo-org/device-detector.git): Git repository URL.
  • -b, --branch (default: master): Git branch name.
  • -d, --dir (default: regexes): Sparse directory for regex files.
  • --fixtures-dir (default: Tests/fixtures): Sparse directory for general fixtures.
  • --fixtures-path (default: tests/fixtures/upstream): Destination path for general fixtures.
  • --client-dir (default: Tests/Parser/Client/fixtures): Sparse directory for client fixtures.
  • --client-path (default: tests/parser/fixtures/upstream/client): Destination path for client fixtures.
  • --device-dir (default: Tests/Parser/Device/fixtures): Sparse directory for device fixtures.
  • --device-path (default: tests/parser/fixtures/upstream/device): Destination path for device fixtures.
  • -c, --cleanup (default: enabled): Delete existing files before updating.
  • -m, --method (default: git): Update method (git or api).
  • -g, --github-token (default: none): GitHub personal access token for API method.
  • --dry-run (default: disabled): Simulate update without modifying files.
  • --no-progress (default: 0): Shows the progress bar
Example Commands
# Update with default settings (Git method)
ua_extract update_regexes

# Update with custom paths and cleanup disabled
ua_extract update_regexes --path /custom/regexes --fixtures-path /custom/fixtures --client-path /custom/client --device-path /custom/device --no-cleanup

# Update using GitHub API with a token
ua_extract update_regexes --method api --github-token your_token_here

# Dry run with GitHub API
ua_extract update_regexes --method api --dry-run

# Update from a specific branch
ua_extract update_regexes --branch dev

# Remove progress bar
ua_extract update_regexes --no-progress
ua_extract update_regexes --no-progress=1

Note: Even if regexes are not updated anytime, old regexes will continue to work. They might just not work on new devices launched in recent times.

View CLI Help
# List all commands
ua_extract help

# Detailed help for update_regexes
ua_extract help update_regexes
Notes
  • The git method requires Git installed.
  • The api method may hit rate limits without a token.
  • Progress bars and messages provide feedback during updates.

Parsing User Agents

Full Device Detection

To get comprehensive information about a user agent:

from ua_extract import DeviceDetector

ua = 'Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/16D57 EtsyInc/5.22 rv:52200.62.0'
device = DeviceDetector(ua).parse()

print(device.is_bot())              # >>> False
print(device.os_name())             # >>> iOS
print(device.os_version())          # >>> 12.1.4
print(device.engine())              # >>> {'default': 'WebKit'}
print(device.device_brand())        # >>> Apple
print(device.device_model())        # >>> iPhone
print(device.device_type())         # >>> smartphone
print(device.secondary_client_name())     # >>> EtsyInc
print(device.secondary_client_type())     # >>> generic
print(device.secondary_client_version())  # >>> 5.22
print(device.bot_name())
High-Performance Software Detection

For faster parsing, skipping bot and device hardware detection:

from ua_extract import SoftwareDetector

ua = 'Mozilla/5.0 (Linux; Android 6.0; 4Good Light A103 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.83 Mobile Safari/537.36'
device = SoftwareDetector(ua).parse()

print(device.client_name())         # >>> Chrome Mobile
print(device.client_type())         # >>> browser
print(device.client_version())      # >>> 58.0.3029.83
print(device.os_name())             # >>> Android
print(device.os_version())          # >>> 6.0
print(device.engine())              # >>> {'default': 'WebKit', 'versions': {28: 'Blink'}}
print(device.device_brand())        # >>> ''
print(device.device_model())        # >>> ''
print(device.device_type())         # >>> smartphone
App Information in Mobile Browser User Agents

Some mobile browser user agents include app information, as shown in the DeviceDetector example.

Testing

# Test Cases. Run entire suite by:

python -m unittest

# Run individual test class by:

python -m ua_extract.tests.parser.test_bot

Updating from Matomo Project

To update manually from the Matomo Device Detector project:

  1. Clone the Matomo repository:

    git clone https://github.com/matomo-org/device-detector
    
  2. Copy the updated files to your UA-Extract project:

    export upstream=/path/to/cloned/matomo/device-detector
    export pdd=/path/to/python/ported/ua_extract
    
    cp $upstream/regexes/device/*.yml $pdd/ua_extract/regexes/upstream/device/
    cp $upstream/regexes/client/*.yml $pdd/ua_extract/regexes/upstream/client/
    cp $upstream/regexes/client/hints/*.yml $pdd/device_detector/regexes/upstream/client/hints/
    cp $upstream/regexes/*.yml $pdd/ua_extract/regexes/upstream/
    cp $upstream/Tests/fixtures/* $pdd/ua_extract/tests/fixtures/upstream/
    cp $upstream/Tests/Parser/Client/fixtures/* $pdd/ua_extract/tests/parser/fixtures/upstream/client/
    cp $upstream/Tests/Parser/Device/fixtures/* $pdd/ua_extract/tests/parser/fixtures/upstream/device/
    
  3. Review logic changes in the Matomo PHP files and update the Python code.

  4. Run tests and fix any failures.

Contributing

Contributions are welcome! Submit pull requests or issues to https://github.com/pranavagrawal321/UA-Extract.

License

This project is licensed under the MIT License, consistent with the original Device Detector project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ua_extract-0.0.0.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ua_extract-0.0.0-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file ua_extract-0.0.0.tar.gz.

File metadata

  • Download URL: ua_extract-0.0.0.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ua_extract-0.0.0.tar.gz
Algorithm Hash digest
SHA256 7fbec69c8a9087ded57a7f837224dc39fd654533c570994b4b3fe978a5ff8166
MD5 ee5fbf269fe4d64e33afcc5a2489f3c0
BLAKE2b-256 3a242ee43cc9d6d820038551e9b3eff9c0bde8aee9c92898f8241791704acf97

See more details on using hashes here.

File details

Details for the file ua_extract-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: ua_extract-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ua_extract-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5acf7f9c98eec465f0345ed476cbfc92b32874312496e80ec350768ba6c347a2
MD5 4a4b3c8aa9b77c6bc5d96682aa1b720d
BLAKE2b-256 1d0158394611098e7d1aa50ebccad08cae39dd6b3de690c3080b64436f7d92dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page