Python library to work with ARC and WARC files
Project description
warc3-wet: Python3 library to work with WARC and WET files
==============================================
Note: This is a fork of the original (now dead) warc repository.
WARC (Web ARChive) is a file format for storing web crawls.
http://bibnum.bnf.fr/WARC/
This `warc` library makes it very easy to work with WARC files.::
import warc
with warc.open("test.warc") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
And WET files.::
import warc
with warc.open("test.warc.wet") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
Documentation
-------------
The documentation of the warc library is available at http://warc.readthedocs.org/.
Apart from the install from pip, which will not work for this warc3 version, the
interface as described there is unchanged.
License
-------
This software is licensed under GPL v2. See LICENSE_ file for details.
.. LICENSE: http://github.com/internetarchive/warc/blob/master/LICENSE
Authors
-------
Original Python2 Versions:
* Anand Chitipothu
* Noufal Ibrahim
Python3 Port:
* Ryan Chartier
* Jan Pieter Bruins Slot
* Almer S. Tigelaar
* Willian Z
==============================================
Note: This is a fork of the original (now dead) warc repository.
WARC (Web ARChive) is a file format for storing web crawls.
http://bibnum.bnf.fr/WARC/
This `warc` library makes it very easy to work with WARC files.::
import warc
with warc.open("test.warc") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
And WET files.::
import warc
with warc.open("test.warc.wet") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
Documentation
-------------
The documentation of the warc library is available at http://warc.readthedocs.org/.
Apart from the install from pip, which will not work for this warc3 version, the
interface as described there is unchanged.
License
-------
This software is licensed under GPL v2. See LICENSE_ file for details.
.. LICENSE: http://github.com/internetarchive/warc/blob/master/LICENSE
Authors
-------
Original Python2 Versions:
* Anand Chitipothu
* Noufal Ibrahim
Python3 Port:
* Ryan Chartier
* Jan Pieter Bruins Slot
* Almer S. Tigelaar
* Willian Z
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
warc3-wet-0.2.2.tar.gz
(16.9 kB
view hashes)
Built Distribution
warc3_wet-0.2.2-py3-none-any.whl
(13.0 kB
view hashes)
Close
Hashes for warc3_wet-0.2.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79794cf170e691ca67fdd0a8ae4886ddff904c59694598597d5f70c1f35b97ac |
|
MD5 | 110682f500cb38cd997283e8239ed4d2 |
|
BLAKE2b-256 | 0a397398d2e792bc91e6141035bfc74ab5244a3ef7f3ce3217a7c8fa444b1e56 |