Skip to main content

An interface to access common crawl data

Project description

PyCommonCrawl

A python interface for Common Crawl.

INSTALL

pip3 install pycommoncrawl

USAGE

from pycommoncrawl.common_crawl_data_accessor import CommonCrawlDataAccessor

common_crawl_data_accessor = CommonCrawlDataAccessor()

# Iterate by line
for line in common_crawl_data_accessor.get_raw_resource_data("WAT"):
    print(line)

# Iterate by WARC bloc
for warc in common_crawl_data_accessor.get_raw_resource_data_per_warc("WAT"):
    print(warc["Content-Length"])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pycommoncrawl-0.2-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file pycommoncrawl-0.2-py3-none-any.whl.

File metadata

  • Download URL: pycommoncrawl-0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.8

File hashes

Hashes for pycommoncrawl-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a6b9fd48bc7717281f78098ab91831751aa4527cedff093d90234b82574a9552
MD5 b363cc73a434f66611ad1128ded22f20
BLAKE2b-256 c5175362da86620e4a6c832f1614d69f271e4fc80b410706f1a195221e75f683

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page