Skip to main content

Read and write spage.

Project description

os-spage

Build Status codecov PyPI - Python Version PyPI

Read and write Spage.

Spage is an incompact data structure to specify fetched record. Generally speaking, it contains four sub-blocks: url, inner_header, http_header, and data.

Spage:

  • url: the URL.
  • inner_header: key-values, can be used to record fetch/process info, such as fetch-time, data-digest, record-type, ect.
  • http_header: key-values, server's response HTTP Header as you know.
  • data: fetched data, can be flat or compressed html.

We use dict type to implements Spage. A predefined schema can be used for validating.

It is common to write Spage to size-rotate-file, we choice os-rotatefile as default back-end.

Notice:

  1. os-spage should not be used for strict serialization/deserialization purpose, it will lose type info when written, all data will be read as string(unicode python2) after all.
  2. Usually, the data stored in compressed format. You can use zlib.decompress method to decompress.

Offpage:

From v0.4, this libaray support reading from offpage. Offpage is another data storage format, include url, headers and series data. You can use read/open_file methods with page_type="offpage" to read from offpage.

From v0.5, support transform spage into offpage. You can use read/open_file methods with page_type="s2o" to read from spage and transform the record into offpage format. (Not fully tested yet)

Example:

from os_spage import read

f = open('your_spage', 'rb')
for offpage in read(f, page_type='s2o'):
    print(offpage )

Install

pip install os-spage

Usage

  • Write to size-rotate-file
  from os_spage import open_file

  url = 'http://www.google.com/'
  inner_header = {'User-Agent': 'Mozilla/5.0', 'batchID': 'test'}
  http_header = {'Content-Type': 'text/html'}
  data = b"Hello world!"

  f = open_file('file', 'w', roll_size='1G', compress=True)
  f.write(url, inner_header=inner_header, http_header=http_header, data=data, flush=True)
  f.close()
  • Read from size-rotate-file
  from os_spage import open_file

  f = open_file('file', 'r')

  for record in f.read():
      print(record)
  f.close()
  • R/W with other file-like object
  from io import BytesIO
  from os_spage import read, write

  s = BytesIO()
  write(s, "http://www.google.com/")

  s.seek(0)
  for record in read(s):
      print(record)

Unit Tests

$ tox

License

MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

os-spage-0.5.1.tar.gz (11.8 kB view details)

Uploaded Source

File details

Details for the file os-spage-0.5.1.tar.gz.

File metadata

  • Download URL: os-spage-0.5.1.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/2.7.15

File hashes

Hashes for os-spage-0.5.1.tar.gz
Algorithm Hash digest
SHA256 12f4179bc27a7534b79b5d61cc4b402e40d4ebc79cc49e6030328a6dfbf2fdcb
MD5 6ed4572857eeabd99e431cb66e49a96b
BLAKE2b-256 e83397f17f853374ecdb919d5e62dc35918481c588b65f3b0b1f3c7644d19aa7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page