Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (
Help us improve Python packaging - Donate today!

File/directory size/duplicate scanning and reporting tool.

Project Description
Size reporter and Dupe Finder README

Joe Koberg 2008-04-14

This program may be distributed under the terms of the GNU Public License, v3.
See the file "LICENSE.TXT" which should be included with this program.
Or find the license at .


This size reporter program will traverse a directory tree and
produce data files listing every file and directory and their
sizes. Additionally it creates a directory map in PDF.

It will then search for files with duplicate content, and
directories with duplicate content and structure.

(These instructions assume you are using the Windows binary package.
If not, use EZ Install to install the script, and run it as "sizedupe")

Simple usage instructions:

1. Unpack the distribution archive. There is no need to move
files around or install anything into Windows. (this
example will assume you unpacked to c:\sizedupe).
additionally you can map to a shared drive with this
executable, including via RDP (\\tsclient\...). It is
not sensitive to directory location, as long as the
executable remains in the folder with its DLLs and library.

2. Run the program on the directory you are interested in. Either
double click the EXE, or open a command prompt and:

C:\Sizereport> sizedupe.exe c:\

3. Three tab-separated files are generated in current directory:

* sizereport_YYYYMMDD_HHMMSS_dirs.txt
List of every directory. Columns:
Parent Directory Name
Directory Name
Number of directly contained directories
Number of all contained directories
Number of directly contained files
Number of all contained files
Size of directly contained files
Size of all contained files

* sizereport_YYYYMMDD_HHMMSS_extensions.txt
List of extensions found in each directory. Columns:
Size of directly contained files of this extension
Size of all contained files of this extension

* sizereport_YYYYMMDD_HHMMSS_files.txt
List of every file. Columns:
File Name
Date Created
Date Modified
Date Accessed

5. The PDF file map is a graph of directories and files by size.
The top-level directories form the leftmost column of rectangles.
To the right of each of those directories are rectangles representing
the directories and files contained therein. The heights of
all rectangles are relative to their disk usage. Intense colors
represent recent files and pale colors are "old" files. A label is
printed to the right of any file or directory big enough to fit it.

6. If you specify -d on the command line, duplicates will be found after
the size report run. These files are in a readable python syntax format
for ease of later parsing.
* sizereport_YYYYMMDD_HHMMSS_dupes.txt
* sizereport_YYYYMMDD_HHMMSS_dupedirs.txt
Release History

Release History

This version
History Node


Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
SizeDupe_Reporter-0.9.1-py2.5.egg (35.9 kB) Copy SHA256 Checksum SHA256 2.5 Egg Jun 24, 2008
SizeDupe Reporter-0.9.1.tar.gz (10.6 kB) Copy SHA256 Checksum SHA256 Source Jun 24, 2008

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting