Skip to main content

Python3 port of matomo's Device Detector

Project description

UA-Extract

A Python user-agent parser and device detector powered by Matomo’s regex database — accurate, updatable, and production-ready.

PyPI Downloads PyPI Downloads (Month) PyPI Downloads (Week)


Overview

UA-Extract is a fast and precise user-agent parser written in Python, built on top of the continuously maintained regex and fixture database from the Matomo Device Detector project.

It detects:

  • Browsers and applications
  • Operating systems and versions
  • Devices (desktop, smartphone, tablet, TV, console, car, camera, etc.)
  • Brands and models
  • Bots and crawlers
  • Secondary clients embedded in mobile user agents

UA-Extract is optimized for performance using in-memory parsing and caching, and is designed for long-running production services.

This project is a Python port of the Universal Device Detection library, adapted to Python while maintaining compatibility with Matomo’s original YAML regex and fixture files.

Source code: https://github.com/pranavagrawal321/UA-Extract


Disclaimer

This is not a line-by-line port of the original PHP implementation. The parsing logic is Pythonic, but the regex and fixture data are identical, ensuring compatibility with upstream updates.


Links


Installation

Stable Release (PyPI)

pip install ua_extract

Nightly / Development Version

🔄 Regex Update Frequency: Matomo’s upstream regexes and fixtures are generally updated daily whenever new changes are available. The nightly / development version of UA-Extract tracks these updates closely, making it the best choice if you want the freshest device and client detection.

If you want the latest regex updates and any unreleased fixes or minor improvements, you can install UA-Extract directly from the GitHub repository.

This version may include:

  • newer device and client regexes
  • updated fixture files
  • small internal changes not yet published to PyPI

⚠️ The nightly version is recommended for testing, experimentation, or environments that require the freshest device detection. For strict stability guarantees, prefer the PyPI release.

Install from GitHub

pip install git+https://github.com/pranavagrawal321/UA-Extract.git

Upgrade an existing GitHub install

pip install --upgrade git+https://github.com/pranavagrawal321/UA-Extract.git

CLI Usage

Install Shell Completion

ua_extract --install-completion

Update Regex & Fixture Files

Regex and fixture files should be updated periodically to recognize newly released devices and clients.

ua_extract update_regexes

By default, this updates files using Git sparse checkout from the Matomo repository.

CLI Options

Option Description
--path Destination directory for regex files
--repo Git repository URL
--branch Git branch (Git method only)
--method Update method: git or api

--no-progress exists for backward compatibility but has no effect and will be removed.


Programmatic Updates

Regex updates can also be triggered programmatically using the Regexes class.

Git Method (recommended)

Uses:

  • shallow clone
  • sparse checkout
  • atomic backup and rollback on failure
from ua_extract import Regexes

Regexes().update_regexes()

GitHub API Method

Uses:

  • asynchronous downloads (aiohttp)
  • concurrency limiting
  • exponential retry logic
  • GitHub rate-limit detection
from ua_extract import Regexes

Regexes(github_token="your_token").update_regexes(method="api")

Notes:

  • GitHub API limits: 60 requests/hour unauthenticated, 5000/hour authenticated
  • Token does not require special scopes (public repository access only)
  • API method always pulls from the master branch

Dry Run

Regexes().update_regexes(method="api", dry_run=True)

Update Safety Guarantees

UA-Extract ensures update safety by:

  • creating backups of all destination directories
  • restoring the previous state automatically on failure
  • never leaving partially updated files behind

This makes it safe to run updates in CI or production environments.


Parsing User Agents

CLI Parsing

You can parse a user-agent directly from the command line using the parse command. The CLI outputs a JSON object containing all detected fields.

ua_extract parse \
  --ua "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/16D57 EtsyInc/5.22" \
  --headers '{"Accept":"*/*","X-Requested-With":"com.example.app"}'

Sample CLI JSON Output

{
  "is_bot": false,
  "os_name": "iOS",
  "os_version": "12.1.4",
  "engine": {
    "default": "WebKit"
  },
  "device_brand": "Apple",
  "device_model": "iPhone",
  "device_type": "smartphone",
  "secondary_client_name": "EtsyInc",
  "secondary_client_type": "generic",
  "secondary_client_version": "5.22",
  "bot_name": null,
  "client_name": "Mobile Safari",
  "client_type": "browser",
  "client_application_id": null,
  "is_television": false,
  "uses_mobile_browser": true,
  "is_mobile": true,
  "is_desktop": false,
  "is_feature_phone": false,
  "preferred_client_name": "Mobile Safari",
  "preferred_client_version": "605.1.15",
  "preferred_client_type": "browser"
}

Notes:

  • --headers must be a valid JSON object
  • headers are optional; omit --headers if not needed
  • the output JSON mirrors the fields shown in the Python API example below

Field Reference

The following table describes every field returned by the CLI and Python API.

Field Type Description
is_bot bool Whether the user agent is identified as a bot
bot_name str | null Name of the bot, if detected
os_name str | null Operating system name
os_version str | null Operating system version
engine dict | str | null Rendering engine information
device_brand str | null Device manufacturer
device_model str | null Device model
device_type str | null Device category (smartphone, tablet, TV, etc.)
client_name str | null Primary client (browser or app) name
client_type str | null Client type (browser, app, library, etc.)
client_version str | null Client version
client_application_id str | null Application identifier, if available
secondary_client_name str | null Embedded or wrapper client name
secondary_client_type str | null Embedded client type
secondary_client_version str | null Embedded client version
is_mobile bool | null Whether the device is mobile
is_desktop bool | null Whether the device is desktop
is_television bool | null Whether the device is a TV
uses_mobile_browser bool | null Whether a mobile browser is used
is_feature_phone bool | null Whether the device is a feature phone
preferred_client_name str | null Best client choice when multiple clients exist
preferred_client_version str | null Preferred client version
preferred_client_type str | null Preferred client type

Python API

Full Device Detection

from ua_extract import DeviceDetector

ua = "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_4 like Mac OS X)..."

headers = {
    "User-Agent": ua,
    "Accept": "*/*",
    "X-Requested-With": "com.example.app",
}

device = DeviceDetector(ua, headers=headers).parse()

# Bot & classification
device.is_bot()
device.bot_name()

# Operating system
device.os_name()
device.os_version()

# Rendering engine
device.engine()

# Device information
device.device_brand()
device.device_model()
device.device_type()

# Client (primary)
device.client_name()
device.client_type()
device.client_version()
device.client_application_id()

# Secondary client
device.secondary_client_name()
device.secondary_client_type()
device.secondary_client_version()

# Device characteristics
device.is_television()
device.uses_mobile_browser()
device.is_mobile()
device.is_desktop()
device.is_feature_phone()

# Preferred client
device.preferred_client_name()
device.preferred_client_version()
device.preferred_client_type()

High-Performance Software Detection

Skips hardware detection for faster parsing:

from ua_extract import SoftwareDetector

device = SoftwareDetector(ua).parse()

device.client_name()
device.client_version()
device.os_name()

Testing

python -m unittest

Run a single test module:

python -m ua_extract.tests.parser.test_bot

Contributing

Contributions and bug reports are welcome: https://github.com/pranavagrawal321/UA-Extract


License

MIT License — compatible with the original Device Detector project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ua_extract-1.3.2.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ua_extract-1.3.2-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file ua_extract-1.3.2.tar.gz.

File metadata

  • Download URL: ua_extract-1.3.2.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ua_extract-1.3.2.tar.gz
Algorithm Hash digest
SHA256 b507462909cac07de6915eb184928bcd131f8d9d345d0f614fbf4633a7a4b3da
MD5 e8ce01cfbac018978d0a0d0581bf46cd
BLAKE2b-256 0ca68ad79313f38e8098be85f43c748d1194f8e202da5447158dc701f16c54a0

See more details on using hashes here.

File details

Details for the file ua_extract-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: ua_extract-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ua_extract-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a5b55f7912efee5e77a1d3f935e3ced52483ed37fb1a5c7ea1daf11a7ea58c3f
MD5 a4ab8c74dbd5c7af2519d60e4daab856
BLAKE2b-256 701b5af0ab30a51ac237aa7ad84a15a81377c585a6407ceb10868a62b206dca4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page