Generate insurance data for testing and demonstrations

Project description

general_insurance_data_model

Module to generate artificial data to test reserving engines

Overview

The intention of this module is to allow the generation of a simple set of data which represents a set of insurance policies. The data generated can then be used to share demonstrations of new actuarial processes, models and software. While there is really no substitute for real data, this is often commercially sensitive, so this may be the next best thing to help share ideas and solutions within the community.

Example of use

Installation

pip install general-insurance-data-model

Generating insurnace portfolio (ultimate)

The first function generates an ultimate policy and claims DataFrame:

import general_insurance_data_model.generators as gt
import datetime as dt

data_ultimate_m = gt.generate_ultimate_portfolio(
    class_name='Motor', 
    uw_start_date=dt.datetime.strptime('01/01/2019', '%d/%m/%Y'),
    historic_years=12,
    )

Other parameters can be tuned to set claim reporting and paid delays.

filter for reporting date (reported)

There is a second function which filters based on a reporting date

policies written after the reporting date are removed
claims reported and paid after the reporting date are removed

data_reported_m = gidm.asat_filtering(
    data_ultimate_m,
    reporting_date=dt.datetime.strptime('31/1/2024', '%d/%m/%Y')
    )

Optional: use with chainladder reserving package:

The policy and claims information can be imported directly into the python-chainladder package to test different reserving methods.

import chainladder as cl

# build a triangle object
tri_paid = cl.Triangle(data_reported_m, 
                origin='Start_date',
                index='Class_name',
                development='Claim_payment_date',
                columns='Claim_value',
                cumulative=False).incr_to_cum().grain('OYDQ')

# plot the triangle
tri_paid.T.plot()

Generator Options

generate_ultimate_portfolio has the following options

class_name Default:'Class A'. Set a name for the data you are generating. This will be the name used in the Class Name Column.
insured_limit Default:3000, limit of liability for the insurance contracts you are modelling
insured_excess Default:250, excess amount of the insurance contracts you are modelling
policy_premium Default:150, premium per policy
n_policies Default:1000, number of policies to generate
uw_start_date Default:datetime:'01/01/2019', first day of the underwriting year
historic_years= Default:10, number of years to generate
historic_policy_growth Default:0.03, when generating historic years, the number of policies will be adjusted by this number to simulate growth
frequency Default:0.15, claims frequency
severity_mean_gu Default:1000, average of claim amount (ground up)
severity_sd_gu Default:800, standard deviation of claim amount (ground up)
delay_reportdays_mean Default:100, mean number of days from start of contract until claim is reported
delay_reportdays_sd Default:200, standard deviation of above
delay_paymentdays_mean Default:200, mean number of days from start of contract until claim is paid
delay_paymentdays_sd Default:200, standard deviation of above

Assumptions: Policy generation

Policies are assumed to be 12 months
Start date of policies are uniformly distributed within 12 month period (no seasonality)
End date of policies are 365.25 days from start date
Three generic risk factors for the policy are generated (standard normals) [to be used to influence claim parameters]
Option to set a policy excess and limit
Option to set policy premium

Assumptions: Claim generation

Claims are assumed to arise from a single peril
Only a single claim is generated per policy (or represeting total of policy claims). The policy is not terminated
Frequency is set as ground-up frequency, if excess frequeny is used adjust policy excess to zero and limit as the policy exposed value
Claim value generated from a lognormal with mean and SD
assumed case estimate is perfect and does not change, but there is a delay to payment
Timeings are calibrated using distibutions, offset as follows -- Incident date is set from a uniform distribution within policy year (no seasonality) -- Reported date is the delay in days from the incident -- Payment date is the delay in days from the reporting of the claim

Historic years

You can specify the number of historic years you would like in your data set
You can specify a growth rate for the number of policies but all other parameters are assumed to be the same for prior years

Project details

Release history Release notifications | RSS feed

This version

0.0.6

Aug 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

general_insurance_data_model-0.0.6.tar.gz (44.1 kB view hashes)

Uploaded Aug 30, 2024 Source

Built Distribution

general_insurance_data_model-0.0.6-py3-none-any.whl (31.0 kB view hashes)

Uploaded Aug 30, 2024 Python 3

Hashes for general_insurance_data_model-0.0.6.tar.gz

Hashes for general_insurance_data_model-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`f54c4d4c5a5f6b61ff0ada6bfff867b76be728893d53eb9b2c62852b13ec07cd`
MD5	`3cf478b289f5cc3c663435329d329dca`
BLAKE2b-256	`b4a0d53fe82a00dd0614010484ea15982bb5facb4b54fcee92ed4bf55da564a4`

Hashes for general_insurance_data_model-0.0.6-py3-none-any.whl

Hashes for general_insurance_data_model-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b311f3398534bedd1dd823549d9cd166efa1914b5f2a1f637f500a6a5d17826c`
MD5	`c18f637b4793e1b1a889bec27e98fd5b`
BLAKE2b-256	`f3d93f8126814dfed5bf98047895664b349cb20f9c484fc1a66b60f36a08670c`