Skip to main content

InnoDB & MyISAM Table Parser & Usage Estimator

Project description

MariaDB Table Usage

This tool reports usage statistics for InnoDB and MyISAM tables.

  • For InnoDB, it parses table pages and segment nodes to calculate usage accurately.
  • For MyISAM, it reads file stats.

The tool reports the following information per table :

  • data_length : Size of table data (In case of InnoDB, it includes the primary index usage as well)
  • index_length : Usage of secondary indexes (In case of MyISAM, size of MYI file)
  • data_free : Deleted or uninitialized space (InnoDB only)

Usage

from mariadb_table_usage import usage

data = usage("test_db")

Why not rely on the database ?

To get the same information from the database, we usually run:

SELECT data_length, index_length, data_free
FROM information_schema.tables
WHERE table_schema = 'test_db'

This query can cause disk I/O issues on shared database servers, especially when many small databases exist on the same host.

Related issues and references:

Although MySQL and MariaDB have improved this over time, there are still some fundamental problems.

Problems with information_schema approach

When MariaDB calculates table usage using information_schema.tables, it needs to open each table and inspect its metadata.

If a database has many tables (for example, hundreds or thousands), this creates a few issues:

  • The database has a limited table_open_cache. Opening many tables for metadata queries can evict entries from the cache, which affects both subsequent metadata queries and normal production queries.

  • The database engine treats information_schema queries like regular queries and does not consider system-level disk I/O limits.

  • Unlike normal queries, information_schema queries with dynamic column do not benefit much from the InnoDB buffer pool and often result in direct disk reads.

  • MariaDB may use multiple threads to calculate usage, which can further increase disk I/O pressure.

For each InnoDB table, the engine typically needs to read at least:

  • The FSP header page
  • The clustered index root page
  • Non-leaf segment node page

Each page is 16 KB, so even a single table requires multiple disk reads. When this is repeated across many tables, disk I/O can spike quickly.

Sometimes the query appears fast because the required pages are already cached. However, this cache can be cleared:

  • periodically
  • after DDL operations
  • when more than 10% of a table is modified

When the cache is cleared, the same query can suddenly cause heavy disk I/O and impact the entire server.

Frappe Cloud Scenario

In Frappe Cloud, shared servers host a large number of very small sites. At minimum, site migrations happen once a week, and these migrations usually modify a few tables in each site database.

After such operations, metadata caches are often invalidated (theoretically). If usage information is then queried sequentially for each site’s database, information_schema queries can trigger disk reads for every table across many databases. This can lead to a sudden and significant disk I/O spike and, in some cases, a temporary I/O freeze on the server.

Approach of this tool

This tool reads table files directly instead of querying the database engine.

It limits disk reads to a configurable rate (default: 200 ops/sec). This ensures consistent IO pressure regardless of how fast the files can be read.

It also monitors CPU iowait and pauses or stops its work if a configured threshold is reached. This helps keep the database server responsive.

Concurrency can be controlled to limit parallel table scans. For InnoDB tables, each table scan typically reads a few pages (FSP header, clustered index root, segment nodes). Each page is 16 KB.

By combining rate limiting, iowait monitoring, and concurrency control, the tool focuses on stability rather than speed and avoids putting excessive load on the disk.

Resources to Understand

The parser and library has been written based on this following resources

The codebase also includes comments explaining the implementation and references to the corresponding C source code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file mariadb_table_usage-0.0.2-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mariadb_table_usage-0.0.2-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a1907c57c52f90a503d635a2d8e311a6860c4523005570fd9dfa44e928a6c6fe
MD5 a88b0ed55eecadfc7c8827291df9ea69
BLAKE2b-256 db44f470f422669c206a33128bec53c79a119f7db3a9c421c35e5042dd5c490f

See more details on using hashes here.

File details

Details for the file mariadb_table_usage-0.0.2-py3-none-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mariadb_table_usage-0.0.2-py3-none-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b205b0e3ea2681d748aa9e4bb7b35446a72635e92749354ee03b6ed43cd1db1b
MD5 35adcf2bd8194306dd9764f05c5ce2ea
BLAKE2b-256 c11dfaed6c2e41bb7555e4c709a4e154c4e39c1ef35dec7b68f66c69d6271bfd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page