A Tool to Summarize Web Archive Holdings
Project description
MementoMap
A framework of web archive profiling to express holdings of an archive
$ ./main.py
usage: main.py [-h] {generate,compact,lookup,batchlookup} ...
positional arguments:
{generate,compact,lookup,batchlookup}
generate Generate a MementoMap from a sorted file with the
first columns as SURT (e.g., CDX/CDXJ)
compact Compact a large MementoMap file into a small one
lookup Look for a SURT into a MementoMap
batchlookup Look for a list of SURTs into a MementoMap
optional arguments:
-h, --help show this help message and exit
$ ./main.py generate -h
usage: main.py generate [-h] [--hcf] [--pcf] [--ha] [--pa] [--hk] [--pk]
[--hdepth] [--pdepth]
infile outfile
positional arguments:
infile Input SURT/CDX/CDXJ (plain or GZip) file path or '-' for STDIN
outfile Output MementoMap file path
optional arguments:
-h, --help show this help message and exit
--hcf Host compaction factor (deafault: Inf)
--pcf Path compaction factor (deafault: Inf)
--ha Power law alpha parameter for host (default: 16.329)
--pa Power law alpha parameter for path (default: 24.546)
--hk Power law k parameter for host (default: 0.714)
--pk Power law k parameter for path (default: 1.429)
--hdepth Max host depth (default: 8)
--pdepth Max path depth (default: 9)
$ ./main.py compact -h
usage: main.py compact [-h] [--hcf] [--pcf] [--ha] [--pa] [--hk] [--pk]
[--hdepth] [--pdepth]
infile outfile
positional arguments:
infile Input MementoMap (plain or GZip) file path or '-' for STDIN
outfile Output MementoMap file path
optional arguments:
-h, --help show this help message and exit
--hcf Host compaction factor (deafault: 1.0)
--pcf Path compaction factor (deafault: 1.0)
--ha Power law alpha parameter for host (default: 16.329)
--pa Power law alpha parameter for path (default: 24.546)
--hk Power law k parameter for host (default: 0.714)
--pk Power law k parameter for path (default: 1.429)
--hdepth Max host depth (default: 8)
--pdepth Max path depth (default: 9)
$ ./main.py lookup -h
usage: main.py lookup [-h] mmap surt
positional arguments:
mmap MementoMap file path to look into
surt SURT to look for
optional arguments:
-h, --help show this help message and exit
$ ./main.py batchlookup -h
usage: main.py batchlookup [-h] mmap infile
positional arguments:
mmap MementoMap file path to look into
infile Input SURT (plain or GZip) file path or '-' for STDIN
optional arguments:
-h, --help show this help message and exit
Citing Project
A publication related to this project appeared in the proceedings of JCDL 2019 (Read the PDF). Please cite it as below:
Sawood Alam, Michele C. Weigle, Michael L. Nelson, Fernando Melo, Daniel Bicho, Daniel Gomes. MementoMap Framework for Flexible and Adaptive Web Archive Profiling. In Proceedings of the 19th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2019, pp. 172-181, Urbana-Champaign, Illinois, USA, June 2016.
@inproceedings{jcdl-2019:alam:mementomap,
author = {Sawood Alam and
Michele C. Weigle and
Michael L. Nelson and
Fernando Melo and
Daniel Bicho and
Daniel Gomes},
title = {{MementoMap} Framework for Flexible and Adaptive Web Archive Profiling},
booktitle = {Proceedings of the 19th {ACM/IEEE-CS} Joint Conference on Digital Libraries},
series = {JCDL '19},
year = {2019},
month = {jun},
location = {Urbana-Champaign, Illinois, USA},
pages = {172--181},
numpages = {10},
url = {https://doi.org/10.1109/JCDL.2019.00033},
doi = {10.1109/JCDL.2019.00033},
isbn = {978-1-7281-1547-4},
publisher = {{IEEE}}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mementomap-0.1.0b1.tar.gz
(5.9 kB
view hashes)
Built Distribution
Close
Hashes for mementomap-0.1.0b1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22e3eba9dfd7b53b648fdb32776fb6cf1c6dab4057c3e4d424b4d52132d2652c |
|
MD5 | 090ff6305b5eb8dc1aba697df74f413c |
|
BLAKE2b-256 | 80b56a6462a752672bb9303a882235b7a863c7741e51a91cf6bf56751b5ab877 |