Skip to main content

Python3 port of matomo's Device Detector

Project description

UA-Extract

A Python user-agent parser and device detector powered by Matomo’s regex database — accurate, updatable, and production-ready.

PyPI Downloads PyPI Downloads (Month) PyPI Downloads (Week)


Overview

UA-Extract is a fast and precise user-agent parser written in Python, built on top of the continuously maintained regex and fixture database from the Matomo Device Detector project.

It detects:

  • Browsers and applications
  • Operating systems and versions
  • Devices (desktop, smartphone, tablet, TV, console, car, camera, etc.)
  • Brands and models
  • Bots and crawlers
  • Secondary clients embedded in mobile user agents

UA-Extract is optimized for performance using in-memory parsing and caching, and is designed for long-running production services.

This project is a Python port of the Universal Device Detection library, adapted to Python while maintaining compatibility with Matomo’s original YAML regex and fixture files.

Source code: https://github.com/pranavagrawal321/UA-Extract


Disclaimer

This is not a line-by-line port of the original PHP implementation. The parsing logic is Pythonic, but the regex and fixture data are identical, ensuring compatibility with upstream updates.


Links


Installation

Stable Release (PyPI)

pip install ua_extract

Nightly / Development Version

🔄 Regex Update Frequency: Matomo’s upstream regexes and fixtures are generally updated daily whenever new changes are available. The nightly / development version of UA-Extract tracks these updates closely, making it the best choice if you want the freshest device and client detection.

If you want the latest regex updates and any unreleased fixes or minor improvements, you can install UA-Extract directly from the GitHub repository.

This version may include:

  • newer device and client regexes
  • updated fixture files
  • small internal changes not yet published to PyPI

⚠️ The nightly version is recommended for testing, experimentation, or environments that require the freshest device detection. For strict stability guarantees, prefer the PyPI release.

Install from GitHub

pip install git+https://github.com/pranavagrawal321/UA-Extract.git

Upgrade an existing GitHub install

pip install --upgrade git+https://github.com/pranavagrawal321/UA-Extract.git

CLI Usage

Install Shell Completion

ua_extract --install-completion

Update Regex & Fixture Files

Regex and fixture files should be updated periodically to recognize newly released devices and clients.

ua_extract update_regexes

By default, this updates files using Git sparse checkout from the Matomo repository.

CLI Options

Option Description
--path Destination directory for regex files
--repo Git repository URL
--branch Git branch (Git method only)
--method Update method: git or api

--no-progress exists for backward compatibility but has no effect and will be removed.


Programmatic Updates

Regex updates can also be triggered programmatically using the Regexes class.

Git Method (recommended)

Uses:

  • shallow clone
  • sparse checkout
  • atomic backup and rollback on failure
from ua_extract import Regexes

Regexes().update_regexes()

GitHub API Method

Uses:

  • asynchronous downloads (aiohttp)
  • concurrency limiting
  • exponential retry logic
  • GitHub rate-limit detection
from ua_extract import Regexes

Regexes(github_token="your_token").update_regexes(method="api")

Notes:

  • GitHub API limits: 60 requests/hour unauthenticated, 5000/hour authenticated
  • Token does not require special scopes (public repository access only)
  • API method always pulls from the master branch

Dry Run

Regexes().update_regexes(method="api", dry_run=True)

Update Safety Guarantees

UA-Extract ensures update safety by:

  • creating backups of all destination directories
  • restoring the previous state automatically on failure
  • never leaving partially updated files behind

This makes it safe to run updates in CI or production environments.


Parsing User Agents

CLI Parsing

You can parse a user-agent directly from the command line using the parse command. The CLI outputs a JSON object containing all detected fields.

ua_extract parse \
  --ua "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/16D57 EtsyInc/5.22" \
  --headers '{"Accept":"*/*","X-Requested-With":"com.example.app"}'

Sample CLI JSON Output

{
  "is_bot": false,
  "os_name": "iOS",
  "os_version": "12.1.4",
  "engine": {
    "default": "WebKit"
  },
  "device_brand": "Apple",
  "device_model": "iPhone",
  "device_type": "smartphone",
  "secondary_client_name": "EtsyInc",
  "secondary_client_type": "generic",
  "secondary_client_version": "5.22",
  "bot_name": null,
  "client_name": "Mobile Safari",
  "client_type": "browser",
  "client_application_id": null,
  "is_television": false,
  "uses_mobile_browser": true,
  "is_mobile": true,
  "is_desktop": false,
  "is_feature_phone": false,
  "preferred_client_name": "Mobile Safari",
  "preferred_client_version": "605.1.15",
  "preferred_client_type": "browser"
}

Notes:

  • --headers must be a valid JSON object
  • headers are optional; omit --headers if not needed
  • the output JSON mirrors the fields shown in the Python API example below

Field Reference

The following table describes every field returned by the CLI and Python API.

Field Type Description
is_bot bool Whether the user agent is identified as a bot
bot_name str | null Name of the bot, if detected
os_name str | null Operating system name
os_version str | null Operating system version
engine dict | str | null Rendering engine information
device_brand str | null Device manufacturer
device_model str | null Device model
device_type str | null Device category (smartphone, tablet, TV, etc.)
client_name str | null Primary client (browser or app) name
client_type str | null Client type (browser, app, library, etc.)
client_version str | null Client version
client_application_id str | null Application identifier, if available
secondary_client_name str | null Embedded or wrapper client name
secondary_client_type str | null Embedded client type
secondary_client_version str | null Embedded client version
is_mobile bool | null Whether the device is mobile
is_desktop bool | null Whether the device is desktop
is_television bool | null Whether the device is a TV
uses_mobile_browser bool | null Whether a mobile browser is used
is_feature_phone bool | null Whether the device is a feature phone
preferred_client_name str | null Best client choice when multiple clients exist
preferred_client_version str | null Preferred client version
preferred_client_type str | null Preferred client type

Python API

Full Device Detection

from ua_extract import DeviceDetector

ua = "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_4 like Mac OS X)..."

headers = {
    "User-Agent": ua,
    "Accept": "*/*",
    "X-Requested-With": "com.example.app",
}

device = DeviceDetector(ua, headers=headers).parse()

# Bot & classification
device.is_bot()
device.bot_name()

# Operating system
device.os_name()
device.os_version()

# Rendering engine
device.engine()

# Device information
device.device_brand()
device.device_model()
device.device_type()

# Client (primary)
device.client_name()
device.client_type()
device.client_version()
device.client_application_id()

# Secondary client
device.secondary_client_name()
device.secondary_client_type()
device.secondary_client_version()

# Device characteristics
device.is_television()
device.uses_mobile_browser()
device.is_mobile()
device.is_desktop()
device.is_feature_phone()

# Preferred client
device.preferred_client_name()
device.preferred_client_version()
device.preferred_client_type()

High-Performance Software Detection

Skips hardware detection for faster parsing:

from ua_extract import SoftwareDetector

device = SoftwareDetector(ua).parse()

device.client_name()
device.client_version()
device.os_name()

Testing

python -m unittest

Run a single test module:

python -m ua_extract.tests.parser.test_bot

Contributing

Contributions and bug reports are welcome: https://github.com/pranavagrawal321/UA-Extract


License

MIT License — compatible with the original Device Detector project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ua_extract-1.3.3.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ua_extract-1.3.3-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file ua_extract-1.3.3.tar.gz.

File metadata

  • Download URL: ua_extract-1.3.3.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ua_extract-1.3.3.tar.gz
Algorithm Hash digest
SHA256 bd2bfc46cf89a3fdf36e82b4992d7a0f663d894a93235925708db61a807e7c67
MD5 cf19b1799f90f979a7e293b068655777
BLAKE2b-256 6500fc409ec865830b096cdf490a789cbdcc4c3134c7513f360fb5f37225d7ee

See more details on using hashes here.

File details

Details for the file ua_extract-1.3.3-py3-none-any.whl.

File metadata

  • Download URL: ua_extract-1.3.3-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ua_extract-1.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f8b6372b41d54316c8237a34545937b1ab3af1ddc7982515a2deaa86ed6de50f
MD5 e511e6074dac73a88b8755a4d6124a74
BLAKE2b-256 063d122dc0c8ae0385c218c357c284234c9fdf9b81d25835006af6508d799dfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page