Skip to main content

InnoDB & MyISAM Table Parser & Usage Estimator

Project description

MariaDB Table Usage

This tool reports usage statistics for InnoDB and MyISAM tables.

  • For InnoDB, it parses table pages and segment nodes to calculate usage accurately.
  • For MyISAM, it reads file stats.

The tool reports the following information per table :

  • data_length : Size of table data (In case of InnoDB, it includes the primary index usage as well)
  • index_length : Usage of secondary indexes (In case of MyISAM, size of MYI file)
  • data_free : Deleted or uninitialized space (InnoDB only)

Usage

from mariadb_table_usage import usage

data = usage("test_db")

Why not rely on the database ?

To get the same information from the database, we usually run:

SELECT data_length, index_length, data_free
FROM information_schema.tables
WHERE table_schema = 'test_db'

This query can cause disk I/O issues on shared database servers, especially when many small databases exist on the same host.

Related issues and references:

Although MySQL and MariaDB have improved this over time, there are still some fundamental problems.

Problems with information_schema approach

When MariaDB calculates table usage using information_schema.tables, it needs to open each table and inspect its metadata.

If a database has many tables (for example, hundreds or thousands), this creates a few issues:

  • The database has a limited table_open_cache. Opening many tables for metadata queries can evict entries from the cache, which affects both subsequent metadata queries and normal production queries.

  • The database engine treats information_schema queries like regular queries and does not consider system-level disk I/O limits.

  • Unlike normal queries, information_schema queries with dynamic column do not benefit much from the InnoDB buffer pool and often result in direct disk reads.

  • MariaDB may use multiple threads to calculate usage, which can further increase disk I/O pressure.

For each InnoDB table, the engine typically needs to read at least:

  • The FSP header page
  • The clustered index root page
  • Non-leaf segment node page

Each page is 16 KB, so even a single table requires multiple disk reads. When this is repeated across many tables, disk I/O can spike quickly.

Sometimes the query appears fast because the required pages are already cached. However, this cache can be cleared:

  • periodically
  • after DDL operations
  • when more than 10% of a table is modified

When the cache is cleared, the same query can suddenly cause heavy disk I/O and impact the entire server.

Frappe Cloud Scenario

In Frappe Cloud, shared servers host a large number of very small sites. At minimum, site migrations happen once a week, and these migrations usually modify a few tables in each site database.

After such operations, metadata caches are often invalidated (theoretically). If usage information is then queried sequentially for each site’s database, information_schema queries can trigger disk reads for every table across many databases. This can lead to a sudden and significant disk I/O spike and, in some cases, a temporary I/O freeze on the server.

Approach of this tool

This tool reads table files directly instead of querying the database engine.

It limits disk reads to a configurable rate (default: 200 ops/sec). This ensures consistent IO pressure regardless of how fast the files can be read.

It also monitors CPU iowait and pauses or stops its work if a configured threshold is reached. This helps keep the database server responsive.

Concurrency can be controlled to limit parallel table scans. For InnoDB tables, each table scan typically reads a few pages (FSP header, clustered index root, segment nodes). Each page is 16 KB.

By combining rate limiting, iowait monitoring, and concurrency control, the tool focuses on stability rather than speed and avoids putting excessive load on the disk.

Resources to Understand

The parser and library has been written based on this following resources

The codebase also includes comments explaining the implementation and references to the corresponding C source code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file mariadb_table_usage-0.0.1-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mariadb_table_usage-0.0.1-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fbdd850d24918c1a7874e4bdbabf8afb9e68eed214a0d3d6ad0c0ccb75c66525
MD5 81e8a339017c6b4fe1a39d5f20d97bf3
BLAKE2b-256 36e2d31f20e01b15504d088f17bd0638022a7c331008d3e423f37b1de3815a41

See more details on using hashes here.

File details

Details for the file mariadb_table_usage-0.0.1-py3-none-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mariadb_table_usage-0.0.1-py3-none-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1931de1fce37f55b0018e468b71de278361236404a678c5f538ef879b4fb2a21
MD5 142a35be6b4e09908db6c26913cf8e5c
BLAKE2b-256 4216356a3bd57447469dc38ec6cace560c9fd763fb3c1ed5fc63662fa66abf2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page