Skip to main content

ZODB Distributed Garbage Collection

Project description

ZODB Distributed GC

This package provides 2 scripts, for multi-database garbage collection and database validation.

The scripts require that the databases provided to them use 64-bit object ids. The garbage-collection script also assumes that the databases support efficient iteration from transactions near the end of the databases.

multi-zodb-gc

The multi-zodb-gc script takes one or 2 configuration files. If a single configuration file is given, garbage collection is performed on the databases specified by the configuration files. If garbage is found, then delete records are written to the databases. When the databases are subsequently packed to a time after the delete records are written, the garbage objects will be removed.

If a second configuration file is given, then the databases specified in the second configuration file will be used to find garbage. Deleted records are still written to the databases given in the first configuration file. When using replicated-database technology, analysis can be performed using secondary storages, which are usually lightly loaded. This is helpful because finding garbage places a significant load on the databases used to find garbage.

If your database uses file-storages, then rather than specifying a second configuration file, you can use the -f option to specify file-storage iterators for finding garbage. Using file storage iterators is much faster than using a ZEO connection and is faster and requires less memory than opening a read-only file storage on the files.

Some number of trailing days (1 by default) of database records are considered good, meaning the objects referenced by them are not garbage. This allows the garbage-collection algorithm to work more efficiently and avoids problems when applications (incorrectly) do things that cause objects to be temporarily unreferenced, such as moving objects in 2 transactions.

Options can be used to control the number of days of trailing data to be treated as non garbage and to specify the logging level. Use the --help option to get details.

multi-zodb-check-refs

The multi-zodb-check-refs script validates a collection of databases by starting with their roots and traversing the databases to make sure all referenced objects are reachable. Any unreachable objects are reported. If any databases are configured to disallow implicit cross-database references, then invalid references are reported as well. Blob records are checked to make sure their blob files can be loaded.

Optionally, a database of reference information can be generated. This database allows you to find objects referencing a given object id in a database. This can be very useful to debugging missing objects. Generation of the references database increases the analysis time substantially. The references database can become quite large, often a substantial percentage of the size of the databases being analyzed. Typically, you’ll perform an initial analysis without a references database and only create a references file in a subsequent run if problems are found.

You can run the script with the --help option to get usage information.

Change History

1.0.0 (2015-08-28)

  • Add support for PyPy, Python 2.7, and Python 3. This requires the addition of the zodbpickle dependency, even on Python 2.6.

  • Fixed the --days argument to multi-zodb-gc with recent versions of persistent.

  • The return values and arguments of the internal implementation functions gc and gc_command have changed for compatibility with Python 3. This will not impact users of the documented scripts and is noted only for developers.

0.6.1 (2012-10-08)

  • Fixed: GC could fail it special cases with a NameError.

0.6.0 (2010-05-27)

  • Added support for storages with transformed (e.g. compressed) data records.

0.5.0 (2009-11-10)

  • Fixed a bug in the delay throttle that made it delete objects way too slowly.

0.4.0 (2009-09-08)

  • The previous version deleted too many objects at a time, which could put too much load on a heavily loaded storage server.

    • Add a sleep or allow the storage to rest after a set of deletions. Sleep for twice the time taken to perform the deletions.

    • Adjust the deletion batch size to take about .5 seconds per batch of deletions, but do at least 10 at a time.

0.3.0 (2009-09-03)

  • Optimized garbage collection by using a temporary file to store object references rather than loading them from the analysis database when needed.

  • Added an -f option to specify file-storage files directly. It is wildly faster to iterate over a file storage than over a ZEO connection. Using this option uses a file iterator rather than opening a file storage in read-only mode, which avoids scanning the database to build an index and avoids the memory cost of a file-storage index.

0.2.0 (2009-06-15)

  • Added an option to ignore references to some databases.

  • Fixed a bug in handling of the logging level option.

0.1.0 (2009-06-11)

Initial release

Download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zc.zodbdgc-1.0.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

zc.zodbdgc-1.0.0-py2.py3-none-any.whl (22.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file zc.zodbdgc-1.0.0.tar.gz.

File metadata

  • Download URL: zc.zodbdgc-1.0.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for zc.zodbdgc-1.0.0.tar.gz
Algorithm Hash digest
SHA256 44afb89dbb166552bd54a9b7dbac8dad8e07fcfbd2424aecaa2073fb354087f4
MD5 0f51333529acf1b61ee326e90d21a07c
BLAKE2b-256 68b2384149d4ca94fdb6fe37fa70e97b00f287d6c5add82700de5118d3b3fcf9

See more details on using hashes here.

File details

Details for the file zc.zodbdgc-1.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for zc.zodbdgc-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f71d5744d720b1f99a387f1f3872bcf037ca3ff870ce5bf1bb6b070cb509b30c
MD5 c17a3a38b31acb5b56fdddac545703f2
BLAKE2b-256 79fe1dfb53f920607ce51584f1a24c5f2f093e53689078afe79394d45fe68f91

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page