Skip to main content

InnoDB & MyISAM Table Parser & Usage Estimator

Project description

MariaDB Table Usage

This tool reports usage statistics for InnoDB and MyISAM tables.

  • For InnoDB, it parses table pages and segment nodes to calculate usage accurately.
  • For MyISAM, it reads file stats.

The tool reports the following information per table :

  • data_length : Size of table data (In case of InnoDB, it includes the primary index usage as well)
  • index_length : Usage of secondary indexes (In case of MyISAM, size of MYI file)
  • data_free : Deleted or uninitialized space (InnoDB only)

Usage

from mariadb_table_usage import usage

data = usage("test_db")

Why not rely on the database ?

To get the same information from the database, we usually run:

SELECT data_length, index_length, data_free
FROM information_schema.tables
WHERE table_schema = 'test_db'

This query can cause disk I/O issues on shared database servers, especially when many small databases exist on the same host.

Related issues and references:

Although MySQL and MariaDB have improved this over time, there are still some fundamental problems.

Problems with information_schema approach

When MariaDB calculates table usage using information_schema.tables, it needs to open each table and inspect its metadata.

If a database has many tables (for example, hundreds or thousands), this creates a few issues:

  • The database has a limited table_open_cache. Opening many tables for metadata queries can evict entries from the cache, which affects both subsequent metadata queries and normal production queries.

  • The database engine treats information_schema queries like regular queries and does not consider system-level disk I/O limits.

  • Unlike normal queries, information_schema queries with dynamic column do not benefit much from the InnoDB buffer pool and often result in direct disk reads.

  • MariaDB may use multiple threads to calculate usage, which can further increase disk I/O pressure.

For each InnoDB table, the engine typically needs to read at least:

  • The FSP header page
  • The clustered index root page
  • Non-leaf segment node page

Each page is 16 KB, so even a single table requires multiple disk reads. When this is repeated across many tables, disk I/O can spike quickly.

Sometimes the query appears fast because the required pages are already cached. However, this cache can be cleared:

  • periodically
  • after DDL operations
  • when more than 10% of a table is modified

When the cache is cleared, the same query can suddenly cause heavy disk I/O and impact the entire server.

Frappe Cloud Scenario

In Frappe Cloud, shared servers host a large number of very small sites. At minimum, site migrations happen once a week, and these migrations usually modify a few tables in each site database.

After such operations, metadata caches are often invalidated (theoretically). If usage information is then queried sequentially for each site’s database, information_schema queries can trigger disk reads for every table across many databases. This can lead to a sudden and significant disk I/O spike and, in some cases, a temporary I/O freeze on the server.

Approach of this tool

This tool reads table files directly instead of querying the database engine.

It limits disk reads to a configurable rate (default: 200 ops/sec). This ensures consistent IO pressure regardless of how fast the files can be read.

It also monitors CPU iowait and pauses or stops its work if a configured threshold is reached. This helps keep the database server responsive.

Concurrency can be controlled to limit parallel table scans. For InnoDB tables, each table scan typically reads a few pages (FSP header, clustered index root, segment nodes). Each page is 16 KB.

By combining rate limiting, iowait monitoring, and concurrency control, the tool focuses on stability rather than speed and avoids putting excessive load on the disk.

Resources to Understand

The parser and library has been written based on this following resources

The codebase also includes comments explaining the implementation and references to the corresponding C source code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file mariadb_table_usage-0.0.3-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mariadb_table_usage-0.0.3-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 28deb53f82f9726d03282d2c658ff58da6331913487b95b0ccc91c9e999f6a43
MD5 7b4d05db5eb7a5975e845165a3534a76
BLAKE2b-256 f1c85971ffcdcf7fd99afcb4191755aaed8c5ed462e499e9646011cda52734c7

See more details on using hashes here.

File details

Details for the file mariadb_table_usage-0.0.3-py3-none-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mariadb_table_usage-0.0.3-py3-none-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 32450f7fce61eb1cf3191b181ea78dcf7d1520427bff1cc0d1d7229c2d761270
MD5 47265cc3e2e8c5dab04962694161289e
BLAKE2b-256 bc9ead0921ea9fdb0bb0ce948037ef64e4f70cfd1771ad3c54c6f1fedb99d49c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page