haralyzer

A python framework for getting useful stuff out of HAR files

These details have not been verified by PyPI

Project links

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License (GPL)
Natural Language
- English
Programming Language

Project description

https://travis-ci.org/mrname/haralyzer.svg?branch=master

https://coveralls.io/repos/mrname/haralyzer/badge.svg?branch=master

https://readthedocs.org/projects/haralyzer/badge/?version=latest

A Python Framework For Using HAR Files To Analyze Web Pages.

Overview

The haralyzer module contains two classes for analyzing web pages based on a HAR file. HarParser() represents a full file (which might have multiple pages), and HarPage() represents a single page from said file.

HarParser has a couple of helpful methods for analyzing single entries from a HAR file, but most of the pertinent functions are inside of the page object.

haralyzer was designed to be easy to use, but you can also access more powerful functions directly.

Quick Intro

HarParser

The HarParser takes a single argument of a dict representing the JSON of a full HAR file. It has the same properties of the HAR file, EXCEPT that each page in HarParser.pages is a HarPage object:

import json
from haralyzer import HarParser, HarPage

with open('har_data.har', 'r') as f:
    har_parser = HarParser(json.loads(f.read()))

print har_parser.browser
# {u'name': u'Firefox', u'version': u'25.0.1'}

for page in har_parser.pages:
    assert isinstance(page, HarPage, None)
    # returns True for each

HarPage

The HarPage object contains most of the goods you need to easily analyze a page. It has helper methods that are accessible, but most of the data you need is in properties for easy access. You can create a HarPage object directly by giving it the page ID (yes, I know it is stupid, it’s just how HAR is organized), and either a HarParser with har_parser=parser, or a dict representing the JSON of a full HAR file (see example above) with har_data=har_data:

import json
From haralyzer import HarPage

with open('har_data.har', 'r') as f:
    har_page = HarPage('page_3', har_data=json.loads(f.read()))

### WORK WITH LOAD TIMES (all load times are in ms) ###

# Get image load time in milliseconds as rendered by the browser
har_page.image_load_time
# prints 713

# We could do this with 'css', 'js', 'html', 'audio', or 'video'

### WORK WITH SIZES (all sizes are in bytes) ###

# Get the total page size (with all assets)
har_page.page_size
# prints 2423765

# Get the total image size
har_page.image_size
# prints 733488
# We could do this with 'css', 'js', 'html', 'audio', or 'video'

MultiHarParser

The MutliHarParser takes a list of dict, each of which represents the JSON of a full HAR file. The concept here is that you can provide multiple HAR files of the same page (representing multiple test runs) and the MultiHarParser will provide aggregate results for load times:

import json
from haralyzer import HarParser, HarPage

test_runs = []
with open('har_data1.har', 'r') as f1:
    test_runs.append( (json.loads( f1.read() ) )
with open('har_data2.har', 'r') as f2:
    test_runs.append( (json.loads( f2.read() ) )

multi_har_parser = MultiHarParser(har_data=test_runs)

# Get the mean for the time to first byte of all runs in MS
print multi_har_parser.time_to_first_byte
# 70

# Get the total page load time mean for all runs in MS
print multi_har_parser.load_time
# 150

# Get the javascript load time mean for all runs in MS
print multi_har_parser.js_load_time
# 50

# You can get the standard deviation for any of these as well
# Let's get the standard deviation for javascript load time
print multi_har_parser.get_stdev('js')
# 5
# We can also do that with 'page' or 'ttfb' (time to first byte)
print multi_har_parser.get_stdev('page')
# 11
print multi_har_parser.get_stdev('ttfb')
# 10

### DECIMAL PRECISION ###

# You will notice that all of the results are above. That is because
# the default decimal precision for the multi parser is 0. However, you
# can pass whatever you want into the constructor to control this.

multi_har_parser = MultiHarParser(har_data=test_runs, decimal_precision=2)
print multi_har_parser.time_to_first_byte
# 70.15

Advanced Usage

HarPage includes a lot of helpful properties, but they are all easily produced using the public methods of HarParser and HarPage:

import json
from haralyzer import HarPage

with open('har_data.har', 'r') as f:
    har_page = HarPage('page_3', har_data=json.loads(f.read()))

### ACCESSING FILES ###

# You can get a JSON representation of all assets using HarPage.entries #
for entry in har_page.entries:
    if entry['startedDateTime'] == 'whatever I expect':
        ... do stuff ...

# It also has methods for filtering assets #
# Get a collection of entries that were images in the 2XX status code range #
entries = har_page.filter_entries(content_type='image.*', status_code='2.*')

# Get the size of the collection we just made #
collection_size = har_page.get_total_size(entries)

# We can also access files by type with a property #
for js_file in har_page.js_files:
    ... do stuff ....

### GETTING LOAD TIMES ###

# Get the BROWSER load time for all images in the 2XX status code range #
load_time = har_page.get_load_time(content_type='image.*', status_code='2.*')

# Get the TOTAL load time for all images in the 2XX status code range #
load_time = har_page.get_load_time(content_type='image.*', status_code='2.*', async=False)

This could potentially be out of date, so please check out the sphinx docs.

More…. Advanced Usage

All of the HarPage methods above leverage stuff from the HarParser, some of which can be useful for more complex operations. They either operate on a single entry (from a HarPage) or a list of entries:

import json
from haralyzer import HarParser

with open('har_data.har', 'r') as f:
    har_parser = HarParser(json.loads(f.read()))

for page in har_parser.pages:
    for entry in page.entries:
        ### MATCH HEADERS ###
        if har_parser.match_headers(entry, 'Content-Type', 'image.*'):
            print 'This would appear to be an image'
        ### MATCH REQUEST TYPE ###
        if har_parser.match_request_type(entry, 'GET'):
            print 'This is a GET request'
        ### MATCH STATUS CODE ###
        if har_parser.match_status_code(entry, '2.*'):
            print 'Looks like all is well in the world'

Asset Timelines

The last helper function of HarParser requires it’s own section, because it is odd, but can be helpful, especially for creating charts and reports.

It can create an asset timeline, which gives you back a dict where each key is a datetime object, and the value is a list of assets that were loading at that time. Each value of the list is a dict representing an entry from a page.

It takes a list of entries to analyze, so it assumes that you have already filtered the entries you want to know about:

import json
from haralyzer import HarParser

with open('har_data.har', 'r') as f:
    har_parser = HarParser(json.loads(f.read()))

### CREATE A TIMELINE OF ALL THE ENTRIES ###
entries = []
for page in har_parser.pages:
    for entry in page.entries:
        entries.append(entry)

timeline = har_parser.create_asset_timeline(entries)

for key, value in timeline.items():
    print(type(key))
    # <type 'datetime.datetime'>
    print(key)
    # 2015-02-21 19:15:41.450000-08:00
    print(type(value))
    # <type 'list'>
    print(value)
    # Each entry in the list is an asset from the page
    # [{u'serverIPAddress': u'157.166.249.67', u'cache': {}, u'startedDateTime': u'2015-02-21T19:15:40.351-08:00', u'pageref': u'page_3', u'request': {u'cookies':............................

With this, you can examine the timeline for any number of assets. Since the key is a datetime object, this is a heavy operation. We could always change this in the future, but for now, limit the assets you give this method to only what you need to examine.

Known Issues

The haralyzer module is currently not compatible with python 3.2. This is due to the fact that it leverages the statistics backport, which is not compatible with python 3.2. We will be working on a fix as soon as possible, feel free to submit a pull request in the meantime.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License (GPL)
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

2.4.0

Jul 11, 2023

2.3.0

Jun 15, 2023

2.2.0

Jan 2, 2023

2.1.1

Dec 4, 2022

2.1.0

Jun 7, 2022

2.0.0

Jun 21, 2021

1.9.0

Dec 25, 2020

1.8.0

Oct 11, 2019

1.7.1

Oct 6, 2019

1.7.0

Oct 6, 2019

1.6.0

Jan 5, 2019

1.5.0

Jul 19, 2018

1.4.11

Nov 14, 2017

1.4.10

Apr 14, 2016

1.4.9

Apr 14, 2016

1.4.8

Apr 3, 2016

1.4.7

Apr 3, 2016

1.4.6

Apr 1, 2016

1.4.5

Mar 23, 2016

1.4.3

Jul 27, 2015

1.4.2

Jul 19, 2015

1.4

Jul 19, 2015

1.3

Jul 9, 2015

This version

1.2.2

Jul 9, 2015

1.2.1

Jul 5, 2015

1.1.1

Apr 10, 2015

1.1

Mar 22, 2015

1.0.9

Mar 1, 2015

1.0.7

Mar 1, 2015

1.0.6

Mar 1, 2015

1.0.5

Mar 1, 2015

1.0.4

Mar 1, 2015

1.0.3

Mar 1, 2015

1.0.2

Mar 1, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haralyzer-1.2.2.tar.gz (9.8 kB view details)

Uploaded Jul 9, 2015 Source

File details

Details for the file haralyzer-1.2.2.tar.gz.

File metadata

Download URL: haralyzer-1.2.2.tar.gz
Upload date: Jul 9, 2015
Size: 9.8 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for haralyzer-1.2.2.tar.gz
Algorithm	Hash digest
SHA256	`e534a1fa6871c5e987857e8a29f535ac1d28ec383101f6e0daaa693581a2bc5e`
MD5	`27a70860c336055c49e75b21f8264bc1`
BLAKE2b-256	`173dc4bae7548e0daf1d54c0aabc9fdd78a3a473f5ec3b72bc49fbc42989b53c`

See more details on using hashes here.

haralyzer 1.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Quick Intro

HarParser

HarPage

MultiHarParser

Advanced Usage

More…. Advanced Usage

Asset Timelines

Known Issues

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes