A package for analyzing time series datasets from the Seshat databank.

These details have not been verified by PyPI

Project description

SeshatDatasetAnalysis

SeshatDatasetAnalysis is a project for analyzing time series datasets. This project leverages various data science libraries to process and analyze historical datasets.

Installation
Usage
Documentation

Installation

To install the dependencies, first install Poetry and then run the following command:

pip install seshatdatasetanalysis

Usage

To use the TimeSeriesDataset class, you can import it as follows:

from seshatdatasetanalysis import TimeSeriesDataset as TSD

The notebook create dataset contains an example of how to use the class

Usage To use the TimeSeriesDataset class, you can import it as follows:

Documentation

Template Data Structure The template data structure is a crucial component of the SeshatDatasetAnalysis project. It is designed to download information about polities from the SQL database and construct a template dataset. This template dataset is then used to derive analysis datasets as needed. The template does not make any assumptions about specific analysis timesteps.

Structure of the Template

The template dataset has a single row for each polity, where each column contains a data structure that records, in a uniform manner, the different kinds of variable data from the SQL database. The data structure captures all the changed values for a variable in an ordered time sequence between the start and end of the polity. The data is represented using a Python dictionary to capture the values.

For example, consider the following representation:

PolID	Start	End	var1	var2	var3	var4
	1700	1850	valds1	valds2	valds3	valds4

Each variable (var1, var2, etc.) is encoded as a dictionary. Here are some examples of how different types of data are encoded:

Single Value without Dates:

valds1 = {'t': [1700, 1850], 'val': [[val1, val1]]}

Multiple Entries with Dates:

valds2 = {'t': [1722, 1800, 1819], 'val': [[v21, (v22, v23), v24]]}

Single Value with an Explicit Date:

valds3 = {'t': [1750], 'val': [[val3]]}

Disputed Values without Dates:

valds4 = {'t': [1700, 1850], 'val': [[val41, val41], [val42, val42]]}

The t values are always ascending and within the start and end dates of the polity. The dictionary data structures encode the value and date assumptions from the SQL database in a uniform and time-ordered way.

Sampling

Once all variables are constructed, a function is applied to 'sample' the variable dictionary at a specific time t. The function sample_var(var_dict, t, sampling_method_disputes, sampling_method_ranges, interpolation_method) performs the following steps:

Ensures that t is between the start and end of the polity.
Samples one of the entries in val using the sampling_method_disputes function. Resolves any range uncertainties in that entry by applying the sampling_method_ranges function.
Interpolates the resolved values using the interpolation_method at time t. Returns the value at time t.
Different sampling and interpolation methods can be chosen depending on the variable, allowing for flexibility in creating time series datasets.

Creating the Final Database

Create a Template: The template is created as detailed above. TimeSeriesDataset Module: The TimeSeriesDataset module creates a dataset based on a set of polities and years by sampling the template.
Construct Social Complexity Variables and Perform Imputations: The sampled data is used to construct social complexity variables and perform imputations.
The template serves as a snapshot of the database, allowing for resampling with different methods. The current sampling process involves resolving disputes by sampling one of the rows in values, sampling uniformly for each range variable, extending the data by adding the start and end of the polity, and taking the value for the closest time preceding the specified year.

Code References

Template Data Structure: The template data structure is constructed in the Template class, which can be found in the src/Template.py file.
Sampling Function: The sample_var function is used to sample the variable dictionary at a specific time t. This function is part of the Template class.
TimeSeriesDataset Module: The TimeSeriesDataset class, which creates datasets based on the template, is located in the src/TimeSeriesDataset.py file.
PCA Computation: The compute_PCA method in the TimeSeriesDataset class performs Principal Component Analysis on specified columns. This method is used to construct social complexity variables.

Plotting Module

This module provides specialized plotting functions for visualizing Seshat dataset analysis results. It includes functions for creating bubble plots, grid visualizations, coefficient plots, and band plots with error bars.

Functions Overview

1. `polity_bubble_plot()`

Creates a bubble plot where each bubble represents a polity, with bubble sizes based on the number of observations and colors determined by a specified variable.

Parameters:

tsd: TimeSeriesDataset or pandas DataFrame
col_x: Column name for x-axis
col_y: Column name for y-axis
col_color: Column name for color coding
show_background_data: Boolean to show background data points (default: False)
cmap: Colormap name (default: 'coolwarm')
size_scale: Scale factor for bubble sizes (default: 10)
vmin, vmax: Color scale limits (optional)

Example:

import seshatdatasetanalysis as sda
from seshatdatasetanalysis import plotting

# Load dataset
tsd = sda.TimeSeriesDataset(['sc'], file_path='test_dataset')

# Create polity bubble plot
fig, ax, scatter = plotting.polity_bubble_plot(
    tsd, 
    col_x='Pop', 
    col_y='Information', 
    col_color='Hierarchy',
    show_background_data=True,
    cmap='viridis',
    size_scale=15
)

2. `grid_bubble_plot()`

Creates a grid-based bubble plot that bins data spatially and shows averaged values within each grid cell.

Parameters:

tsd: TimeSeriesDataset or pandas DataFrame
col_x: Column name for x-axis
col_y: Column name for y-axis
col_color: Column name for color coding
cmap: Colormap name (default: 'coolwarm')
nbins: Number of bins (int or tuple, optional)
grid_size: Size of grid cells (default: 1)
scale_size: Scale factor for bubble sizes (default: 5)
vmin, vmax: Color scale limits (optional)

Example:

# Create grid bubble plot
fig, ax = plotting.grid_bubble_plot(
    tsd, 
    col_x='Hierarchy', 
    col_y='Pop', 
    col_color='Information',
    cmap='plasma',
    grid_size=0.75,
    scale_size=8
)

3. `plot_fit_coefficients()`

Visualizes regression coefficients with confidence intervals for multiple dependent and independent variables.

Parameters:

tsd: TimeSeriesDataset or pandas DataFrame
y_cols: List of dependent variable column names
x_cols: List of independent variable column names
regression_type: 'logit' or 'linear'
pval_max: Maximum p-value for inclusion (default: 0.05)
cmap: Colormap name (default: 'coolwarm')

Example:

# Plot regression coefficients
fig, ax = plotting.plot_fit_coefficients(
    tsd,
    y_cols=['Information', 'Infrastructure', 'Money'],
    x_cols=['Pop', 'Cap', 'Terr'],
    regression_type='linear',
    pval_max=0.05,
    cmap='RdBu'
)

4. `band_plot()`

Creates a band plot showing mean values with error bands across binned x-values, optionally color-coded by a third variable.

Parameters:

tsd: TimeSeriesDataset or pandas DataFrame
col_x: Column name for x-axis
col_y: Column name for y-axis
col_z: Optional column name for color coding
nbins: Number of bins (optional)
grid_size: Size of bins (default: 1)
cmap: Colormap name (default: 'coolwarm')
error: Error type - 'standard' or 'sem' (default: 'standard')

Example:

# Simple band plot without color coding
fig, ax = plotting.band_plot(
    tsd,
    col_x='Pop',
    col_y='Information',
    nbins=15,
    error='sem'
)

# Band plot with color coding
fig, ax = plotting.band_plot(
    tsd,
    col_x='Pop',
    col_y='Information',
    col_z='Cap',
    nbins=10,
    cmap='viridis',
    error='standard'
)

Complete Example

import seshatdatasetanalysis as sda
from seshatdatasetanalysis import plotting
import matplotlib.pyplot as plt

# Load your dataset
tsd = sda.TimeSeriesDataset(['sc'], file_path='your_dataset_path')

# Create multiple visualizations
plt.figure(figsize=(15, 10))

# 1. Polity bubble plot
plotting.polity_bubble_plot(
    tsd, 
    'Pop', 'Information', 'Hierarchy',
    show_background_data=True
)

# 2. Grid bubble plot  
plotting.grid_bubble_plot(
    tsd,
    'Hierarchy', 'Pop', 'Information',
    grid_size=0.5
)

# 3. Coefficient plot
plotting.plot_fit_coefficients(
    tsd,
    y_cols=['Information', 'Infrastructure'],
    x_cols=['Pop', 'Hierarchy'],
    regression_type='linear'
)

# 4. Band plot
plotting.band_plot(
    tsd,
    'Pop', 'Information', 'Cap',
    nbins=8,
    error='sem'
)

Return Values

All plotting functions return:

fig: matplotlib figure object
ax: matplotlib axes object
Some functions also return additional objects (e.g., scatter object from bubble plots)

Dependencies

matplotlib
numpy
pandas

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.11

Mar 3, 2026

0.3.10

Mar 3, 2026

0.3.9

Mar 3, 2026

0.3.8

Mar 3, 2026

0.3.6

Nov 18, 2025

0.3.5

Nov 18, 2025

0.3.4

Nov 7, 2025

0.3.3

Nov 7, 2025

This version

0.3.2

Nov 4, 2025

0.3.1

Sep 24, 2025

0.3.0

Sep 24, 2025

0.2.8

Sep 15, 2025

0.2.7

Sep 9, 2025

0.2.6

Sep 1, 2025

0.2.5

Aug 20, 2025

0.2.4

Aug 20, 2025

0.2.3

Aug 11, 2025

0.2.1

Aug 11, 2025

0.2.0

Aug 6, 2025

0.1.7

Jul 28, 2025

0.1.6

Jul 28, 2025

0.1.5

Jul 25, 2025

0.1.4

Jul 25, 2025

0.1.3

Jul 25, 2025

0.1.2

Jun 23, 2025

0.1.1

Jun 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seshatdatasetanalysis-0.3.2.tar.gz (38.5 kB view details)

Uploaded Nov 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

seshatdatasetanalysis-0.3.2-py3-none-any.whl (38.8 kB view details)

Uploaded Nov 4, 2025 Python 3

File details

Details for the file seshatdatasetanalysis-0.3.2.tar.gz.

File metadata

Download URL: seshatdatasetanalysis-0.3.2.tar.gz
Upload date: Nov 4, 2025
Size: 38.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.11.9 Windows/10

File hashes

Hashes for seshatdatasetanalysis-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`9c5884f6b862793a70b24baf49467a98fd8339dabf6e6c29d5aef6414519dc52`
MD5	`e2cf3ad4306890035cae4229916e5740`
BLAKE2b-256	`bdbe98259b68a3075f2becef54a2c481357b9c107b969e61ebd5ae14afc6e905`

See more details on using hashes here.

File details

Details for the file seshatdatasetanalysis-0.3.2-py3-none-any.whl.

File metadata

Download URL: seshatdatasetanalysis-0.3.2-py3-none-any.whl
Upload date: Nov 4, 2025
Size: 38.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.11.9 Windows/10

File hashes

Hashes for seshatdatasetanalysis-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`91de3cb906b636f2837475169d5a447244fe7bfe4302adc4dd0510922d52b6f2`
MD5	`4a7e393391192957cc3f84bc42f39527`
BLAKE2b-256	`7ea6c691f3e1ddb08f7dfbfc9b18f5b4a04082b43156245ea6fb5a552afef9b9`

See more details on using hashes here.

seshatdatasetanalysis 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

SeshatDatasetAnalysis

Table of Contents

Installation

Usage

Documentation

Structure of the Template

Sampling

Creating the Final Database

Code References

Plotting Module

Functions Overview

1. polity_bubble_plot()

2. grid_bubble_plot()

3. plot_fit_coefficients()

4. band_plot()

Complete Example

Return Values

Dependencies

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `polity_bubble_plot()`

2. `grid_bubble_plot()`

3. `plot_fit_coefficients()`

4. `band_plot()`