Skip to main content

Functions required by the access-logs-local-driver

Project description

Load the content of gzipped Apache HTTP log files Exclude bots, scrapers, etc., select URLs matching the provided regex(es), and generate a CSV of the relevant log entries.

Take postprocessed logs and strip out multiple hits in sessions, and resolve URLs to the chosen URI_SCHEME (e.g. info:doi).

We strip out entries where the same (IP address * user agent) pair has accessed a URL within the last SESSION_TIMEOUT (e.g. half-hour)

Additionally, we convert the URLs to ISBNs and collate request data by date, outputting a CSV for ingest via the stats system.

Release Notes:

[0.0.6] - 2023-08-13

Changed:
  • Refactored driver logic

  • breaking | Changed parameters for the Request.__init__() method
    • Removed re_match_dict parameter

    • Added timestamp and user_agent parameters

  • Changed Request.timestamp from type time to datetime

  • Changed LogStream to use the new Request.__init__()

  • Expanded range for LogStream.logfile_names logic to include files within 1 day of the search_date

  • LogStream.lines() yields Request objects, not str values

  • LogStream.filter_in_line_request() only yields one line per measure

[0.0.5] - 2023-07-03

Changed:
  • Added start_date and end_date for searching in the log files

  • Added the measure_uri to the result

[0.0.4] - 2023-07-31

Changed:
  • Update file structure and name of the driver

[0.0.3] - 2023-07-25

Changed:
  • Update requirements

  • Update using a pyproject.toml file as well as the new deployment structure

[0.0.2] - 2023-07-11

Added:
  • Unittests

Changed:
  • Moved the files out of the package and get the file’s data as parameters and return the filtered data.

  • renamed the plugin to access-logs-local

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

access_logs_local-0.0.6.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

access_logs_local-0.0.6-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file access_logs_local-0.0.6.tar.gz.

File metadata

  • Download URL: access_logs_local-0.0.6.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for access_logs_local-0.0.6.tar.gz
Algorithm Hash digest
SHA256 bc960acd37f0b2287161feac22d12c640985a32e263b267cd49042028da4a91c
MD5 94ba4bbed758fb3674545fde0c38cd03
BLAKE2b-256 53ce7fb8d33bedb8414ed293665ff9c803fbdbda25324ba4d86f7bb607536fed

See more details on using hashes here.

File details

Details for the file access_logs_local-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for access_logs_local-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d9fc4c16818379244d8d1846022754185ab7aa861977ef288c8a227f3f940ee8
MD5 a5d7dcf4ffb310d24f81f4577047dece
BLAKE2b-256 5236946e8d055b22c3063f134ab5202b199bc26902d94f2f6918e7dd584c9387

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page