Python library to work with ARC and WARC files
Project description
warc3-wet: Python3 library to work with WARC and WET files
==============================================
Note: This is a fork of the original (now dead) warc repository.
WARC (Web ARChive) is a file format for storing web crawls.
http://bibnum.bnf.fr/WARC/
This `warc` library makes it very easy to work with WARC files.::
import warc
with warc.open("test.warc") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
And WET files.::
import warc
with warc.open("test.warc.wet") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
Documentation
-------------
The documentation of the warc library is available at http://warc.readthedocs.org/.
Apart from the install from pip, which will not work for this warc3 version, the
interface as described there is unchanged.
License
-------
This software is licensed under GPL v2. See LICENSE_ file for details.
.. LICENSE: http://github.com/internetarchive/warc/blob/master/LICENSE
Authors
-------
Original Python2 Versions:
* Anand Chitipothu
* Noufal Ibrahim
Python3 Port:
* Ryan Chartier
* Jan Pieter Bruins Slot
* Almer S. Tigelaar
Modification
* Willian Zhang
Change Log
-------
0.2.3
Support seeking in WARC/WET
0.2.2
Allow WET parse
older...
see https://github.com/internetarchive/warc
==============================================
Note: This is a fork of the original (now dead) warc repository.
WARC (Web ARChive) is a file format for storing web crawls.
http://bibnum.bnf.fr/WARC/
This `warc` library makes it very easy to work with WARC files.::
import warc
with warc.open("test.warc") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
And WET files.::
import warc
with warc.open("test.warc.wet") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
Documentation
-------------
The documentation of the warc library is available at http://warc.readthedocs.org/.
Apart from the install from pip, which will not work for this warc3 version, the
interface as described there is unchanged.
License
-------
This software is licensed under GPL v2. See LICENSE_ file for details.
.. LICENSE: http://github.com/internetarchive/warc/blob/master/LICENSE
Authors
-------
Original Python2 Versions:
* Anand Chitipothu
* Noufal Ibrahim
Python3 Port:
* Ryan Chartier
* Jan Pieter Bruins Slot
* Almer S. Tigelaar
Modification
* Willian Zhang
Change Log
-------
0.2.3
Support seeking in WARC/WET
0.2.2
Allow WET parse
older...
see https://github.com/internetarchive/warc
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
warc3-wet-0.2.3.tar.gz
(17.2 kB
view hashes)
Built Distribution
warc3_wet-0.2.3-py3-none-any.whl
(13.2 kB
view hashes)
Close
Hashes for warc3_wet-0.2.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f47ec2826806db6fe640c18b299e05678d8346f2a9a574ce698c53e49ac5b456 |
|
MD5 | a7da4faf9e644d8a33536ebaf563f677 |
|
BLAKE2b-256 | 78de017a6bc2e3ba1ad912a08501f58414dd9e8503da1d6239aad548631777ad |