Skip to main content

Lightweight analytics tool for FOLIO services

Project description

LDLite

Copyright (C) 2021-2022 The Open Library Foundation. This software is distributed under the terms of the Apache License, Version 2.0. See the file LICENSE for more information.

LDLite is a lightweight, open source reporting tool for FOLIO services. It is part of the Library Data Platform project and provides basic LDP functions without requiring the server to be installed.

LDLite supports two modes of usage.

  • Server mode uses a persistent postgres server and managed cron server to download large amounts of data for analytic processing and reporting.
  • Ad-hoc mode uses a local DuckDB database to enable downloading small amounts of data and querying using sql.

Check out the migration guide for more information about major version upgrades.

Usage with a persistent postgres server

See the Five Colleges Setup for an example of automating overnight data loads.

It is recommended to install the psycopg[c] package for optimal reliability and performance in a server context.

Usage for ad-hoc local querying

To install LDLite or upgrade to the latest version:

$ python -m pip install --upgrade psycopg[binary] ldlite

(On some systems it might be python3 rather than python.)

To extract and transform data:

$ python
>> > import ldlite
>> > ld = ldlite.LDLite()
>> > ld.connect_folio(url='https://folio-etesting-snapshot-kong.ci.folio.org',
                      tenant='diku',
                      user='diku_admin',
                      password='admin')
>> > db = ld.connect_db()
>> > _ = ld.query(table='g', path='/groups')
ldlite: querying: / groups
ldlite: created
tables: g, g__t, g__tcatalog
>> > ld.select(table='g__t')
 __id |                  id                  |         desc          | expiration_offset_in_days |   group  
------+--------------------------------------+-----------------------+---------------------------+-----------
    1 | 3684a786-6671-4268-8ed0-9db82ebca60b | Staff Member          |                       730 | staff  
    2 | 503a81cd-6c26-400f-b620-14c08943697c | Faculty Member        |                       365 | faculty  
    3 | ad0bc554-d5bc-463c-85d1-5562127ae91b | Graduate Student      |                           | graduate  
    4 | bdc2b6d4-5ceb-4a12-ab46-249b9a68473e | Undergraduate Student |                           | undergrad
(4 rows)
>> > _ = ld.query(table='u', path='/users')
ldlite: querying: / users
ldlite: created
tables: u, u__t, u__t__departments, u__t__personal__addresses, u__t__proxy_for, u__tcatalog
>> > cur = db.cursor()
>> > _ = cur.execute("""
        CREATE TABLE user_groups AS
        SELECT u__t.id, u__t.username, g__t.group
            FROM u__t
                JOIN g__t ON u__t.patron_group = g__t.id;
        """)
>> > ld.export_csv(table='user_groups', filename='groups.csv')

Features

  • Queries FOLIO modules and transforms JSON data into tables for easier reporting
  • Full SQL query support and export to CSV
  • Compatible with DBeaver database tool
  • Compatible with DuckDB and PostgreSQL database systems
  • PostgreSQL support enables:
    • Sharing the data in a multiuser database
    • Access to the data using more database tools
    • Storing the data in an existing LDP database if available
  • Runs on Windows, macOS, and Linux.

ldlite_system.load_history_v1

Starting with ldlite 4.0 useful information is stored during the loading process. This table can be exposed directly to end users but it can be overwhelming. A more useful view of this table can be exposed instead.

CREATE VIEW public.load_history_dashboard AS
SELECT
  table_prefix
  ,folio_path
  ,COALESCE(query_text, 'cql.allRecords=1') AS query_text
  ,final_rowcount AS rowcount
  ,pg_size_pretty(SUM(t.table_size)) AS total_size
  ,TO_CHAR(data_refresh_start AT TIME ZONE 'America/New_York', 'YYYY/MM/DD HH:MI AM') AS has_all_changes_before
  ,TO_CHAR(data_refresh_end AT TIME ZONE 'America/New_York', 'YYYY/MM/DD HH:MI AM') AS has_no_changes_after
  ,EXTRACT(EPOCH FROM data_refresh_end) AS refresh_sort
  ,SUM(t.table_size) AS size_sort
FROM ldlite_system.load_history_v1 h
CROSS JOIN LATERAL
(
  SELECT pg_total_relation_size(t."table_schema" || '.' || t."table_name") AS table_size
  FROM INFORMATION_SCHEMA.TABLES t
  WHERE
  (
    h.table_prefix LIKE '%.%' AND
    t.table_schema = SPLIT_PART(h.table_prefix, '.', 1) AND
    t.table_name LIKE (SPLIT_PART(h.table_prefix, '.', -1) || '%')
  ) OR
  (
    h.table_prefix NOT LIKE '%.%' AND
    t.table_name LIKE (h.table_prefix || '%')
  )
) t
WHERE final_rowcount IS NOT NULL
GROUP BY 1, 2, 3, 4, 6, 7, 8
ORDER BY 8 DESC

When a load starts the table_prefix, folio_path, query_text, and load_start columns are set. Any existing loads with the same table_prefix will have these values overwritten.

The download will transactionally replace the existing raw table and set the rowcount and download_complete columns.

The transformation will transactionally replace the expanded tables. If it fails the existing tables will be retained. At the end of transformation the final_rowcount and transform_complete columns are set in the same transaction.

The data_refresh_start and data_refresh_end times require special attention. These columns get updated when the transformation transaction is committed and represent when the download started and ended.

  • Any changes in FOLIO made before data_refresh_start will be reflected in the expanded tables.
  • Any changes in FOLIO made after data_refresh_end will not be reflected in the expanded tables.
  • Changes made to FOLIO in between the start and end may be reflected :smile_cat:/:scream_cat:.

Because of the transactional nature, it is very possible to have newer data in the raw table than in the resulting expanded tables. This can happen during the transformation stage or if the transformation stage fails. This is indicated by having the data_refresh_start and data_refresh_end columns not match the load_start and download_complete columns.

More examples

LDLite resources

  • LDLite API documentation

  • The LDP project runs a Slack workspace which is a good place to ask questions or to share your work. It also serves as a community space for working together on library data problems. To request an invitation, use the Contact page on the LDP website.

  • Report bugs at Issues

Other resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ldlite-4.0.0.tar.gz (43.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ldlite-4.0.0-py3-none-any.whl (35.1 kB view details)

Uploaded Python 3

File details

Details for the file ldlite-4.0.0.tar.gz.

File metadata

  • Download URL: ldlite-4.0.0.tar.gz
  • Upload date:
  • Size: 43.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.26.6 CPython/3.10.19 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for ldlite-4.0.0.tar.gz
Algorithm Hash digest
SHA256 7083a269eecdddb7a7b45ebfdf66a7f6f69b24c01a791502d5f1468508be6c89
MD5 565cc7b6c72baf61247896782b8480ee
BLAKE2b-256 c9a9ea504aedb712e5ba7191aa499a65e91f28da9b3353a874afcc8ab4dd6cd2

See more details on using hashes here.

File details

Details for the file ldlite-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: ldlite-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.26.6 CPython/3.10.19 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for ldlite-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd4b5d1fc61901da237a62598de6c2a52e679c5959d0af638fe67aa5034d0a79
MD5 7b332762cb972f396ae7ad115660673a
BLAKE2b-256 af744a8a431f92764a468edeb149d346d2a4d2b9708acbfd1aa4bd00dd5903b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page