Skip to main content

A package for analyzing time series datasets from the Seshat databank.

Project description

SeshatDatasetAnalysis

SeshatDatasetAnalysis is a project for analyzing time series datasets. This project leverages various data science libraries to process and analyze historical datasets.

Table of Contents

Installation

To install the dependencies, first install Poetry and then run the following command:

poetry install

Usage

To use the TimeSeriesDataset class, you can import it as follows:

from src.TimeSeriesDataset import TimeSeriesDataset as TSD

The notebook create dataset contains an example of how to use the class

Usage To use the TimeSeriesDataset class, you can import it as follows:

Documentation

Template Data Structure The template data structure is a crucial component of the SeshatDatasetAnalysis project. It is designed to download information about polities from the SQL database and construct a template dataset. This template dataset is then used to derive analysis datasets as needed. The template does not make any assumptions about specific analysis timesteps.

Structure of the Template

The template dataset has a single row for each polity, where each column contains a data structure that records, in a uniform manner, the different kinds of variable data from the SQL database. The data structure captures all the changed values for a variable in an ordered time sequence between the start and end of the polity. The data is represented using a Python dictionary to capture the values.

For example, consider the following representation:

PolID Start End var1 var2 var3 var4
1700 1850 valds1 valds2 valds3 valds4

Each variable (var1, var2, etc.) is encoded as a dictionary. Here are some examples of how different types of data are encoded:

Single Value without Dates:

valds1 = {'t': [1700, 1850], 'val': [[val1, val1]]}

Multiple Entries with Dates:

valds2 = {'t': [1722, 1800, 1819], 'val': [[v21, (v22, v23), v24]]}

Single Value with an Explicit Date:

valds3 = {'t': [1750], 'val': [[val3]]}

Disputed Values without Dates:

valds4 = {'t': [1700, 1850], 'val': [[val41, val41], [val42, val42]]}

The t values are always ascending and within the start and end dates of the polity. The dictionary data structures encode the value and date assumptions from the SQL database in a uniform and time-ordered way.

Sampling

Once all variables are constructed, a function is applied to 'sample' the variable dictionary at a specific time t. The function sample_var(var_dict, t, sampling_method_disputes, sampling_method_ranges, interpolation_method) performs the following steps:

Ensures that t is between the start and end of the polity.
Samples one of the entries in val using the sampling_method_disputes function. Resolves any range uncertainties in that entry by applying the sampling_method_ranges function.
Interpolates the resolved values using the interpolation_method at time t. Returns the value at time t.
Different sampling and interpolation methods can be chosen depending on the variable, allowing for flexibility in creating time series datasets.

Creating the Final Database

Create a Template: The template is created as detailed above. TimeSeriesDataset Module: The TimeSeriesDataset module creates a dataset based on a set of polities and years by sampling the template.
Construct Social Complexity Variables and Perform Imputations: The sampled data is used to construct social complexity variables and perform imputations.
The template serves as a snapshot of the database, allowing for resampling with different methods. The current sampling process involves resolving disputes by sampling one of the rows in values, sampling uniformly for each range variable, extending the data by adding the start and end of the polity, and taking the value for the closest time preceding the specified year.

Code References

Template Data Structure: The template data structure is constructed in the Template class, which can be found in the src/Template.py file.
Sampling Function: The sample_var function is used to sample the variable dictionary at a specific time t. This function is part of the Template class.
TimeSeriesDataset Module: The TimeSeriesDataset class, which creates datasets based on the template, is located in the src/TimeSeriesDataset.py file.
PCA Computation: The compute_PCA method in the TimeSeriesDataset class performs Principal Component Analysis on specified columns. This method is used to construct social complexity variables.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seshatdatasetanalysis-0.1.5.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seshatdatasetanalysis-0.1.5-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file seshatdatasetanalysis-0.1.5.tar.gz.

File metadata

  • Download URL: seshatdatasetanalysis-0.1.5.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.8.8 Windows/10

File hashes

Hashes for seshatdatasetanalysis-0.1.5.tar.gz
Algorithm Hash digest
SHA256 882b2e20a863a1c6f3189bc1a5dc036d78110f6da27ede4a210d257399d525cc
MD5 83930fc16963808ae0a4175dd89f4504
BLAKE2b-256 6a044902673fe7cfc8c927d3f8d69c0d3bb2b227f32c8390c0283a23a34cba86

See more details on using hashes here.

File details

Details for the file seshatdatasetanalysis-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for seshatdatasetanalysis-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 bf151fb19f151057af0c64e954bfb18274a1b69a8e9e21c4f466223b63c1f2a7
MD5 1d6d3cfd8b0ef389f54238f54b884c1c
BLAKE2b-256 7850ab001f87513cfe6951c064b6668c4d1b9835adf4be233edebb918ca48978

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page