Skip to main content

Thot data analysis and management.

Project description

Thot

python -m pip install thot-data

Data management and analysis software.

Thinking About Thot

Thot is based on top-down organization and bottom-up analysis, or, congruently, outside-in organization and inside-out analysis.

[images]

There are three components of a Thot project Containers, Assets, and Scripts.

Containers

Containers are the organizational building blocks of your project. They allow you to structure your projects and analysis in a logical way. Following the top-down organizational approach, Containers can contain both other Containers as children, and Assets. They can also have descriptors and metadata attached to them. Child containers inheret all the properties of their parents. Containers are also associated with Scripts, which analyze its Assets and produce new Assets.

Assets

An Asset is anything that is consumed or created in your analysis. This includes, raw data, calculated data, and images. Each Asset can have descriptors and metadata attached to it as well.

Scripts

A Script is a multi-input, multi-output function where the inputs and outputs are Assets. The input to a script is consumed and the output is produced. The produced Assets can then be consumed by other Scripts in the future.

Descriptors

Descriptors are human-readable pieces of data that describe what they are attached to. These properties can be used to identify classes of Objects (through its type or tags), or individual objects (by its name).

  • Name
  • Type
  • Tags
  • Description

Metadata

Metadata is data about data. Children inheret metadata from their parents.

Setting up a Project [Organization]

There are two versions of Thot, Local and Hosted. Local projects are run on your computer -- no internet connection or registration required. Hosted projects are run from the Thot website and provide additional functionality. If you have a Hosted account, you can sync your Local Projects with it, so data is automatically uploaded to the Thot servers and analyzed.

Local Projects

Local projects are just a set of folders and files on you computer. To tell Thot what a folder or file is you use an Object File. Object Files are just JSON files that provide information to your project. There are three types of Object Files -- one for each component of a Thot projects.

A folder can be either a Container or an Asset, not both.

_container.json

By adding a _container.json file to a folder you mark it as a Container. A Container file has the following properties:

  • name: The name of the Container. Can be used for retrieval in a script. If this is not provided, the base name of the folder is used.

  • type: Represents the class of the container. This is most useful to designate what level of the organizational structure the Container is at.

  • description: A description of the Container.

  • tags: A list of tags used for retrieving the container in a script.

  • metadata: A set of key-value pairs representing metadata about the Container and its children.

{
	"name": "",
	"type": "",
	"description": "",
	"tags": [],
	"metadata": {}
}

_asset.json

By adding an _asset.json file to a folder you mark it as an Asset. In addition to the basic properties of the Container, an Asset also has:

  • file: Absolute or relative path to the Asset file. It is best to put the file in the Asset folder, so a relative path is most convenient.

  • creator: The creator of the Asset. If the Asset was created by a specific machine, this is a good place to mark that. If the Asset was produced by a script, this will be set to the path of the script, allowing you to trace back its origin.

  • creator_type: This indicates whether the Asset was created by a user or a script. If a script produced it, this will automatically be set.

{
	"name": "",
	"type": "",
	"description": "",
	"tags": [],
	"metadata": {},

	"file": "path/to/asset.csv",
	"creator": "",
	"creator_type": "user"
}

_scripts.json

Scripts files are a bit different than those for Containers and Assets. These files create an association between a script and a Container. This file tells Thot which scripts to run, and in which order.

Only Containers can contain a _scripts.json file.

A Scripts file contains a list of Script Associations:

  • script: Relative or absolute path to the script.
  • priority: The order in which to run the script. Lower priorities go first.
  • autorun: Whether to automatically run the scrpt when evaluating a project. If false you will have to manually run the script.
[
	{
		"script": "path/to/script.py",
		"priority": 0,
		"autorun": true
	}
]

Notes

A _notes folder can also be included in a Container or Asset. Text files containing notes about the object can be stored in this folder. Each notes has the properties

  • created: The date of creation interpreted form the time the note was last modified.
  • title: The title of the note, interpreted from the name of the file.
  • content: The note itself, read from the contents of the file.

Hosted Projects

To create a Hosted project go to thot-data.com and create an account or log in.

Hosted projects have additional features such as user friendly interfaces for project creation, sharing projects and scripts, and more.

Thot Scripts [Analysis]

Thot is founded on the idea that the same analysis needs to be run on different data sets. Often this is done manually, taking additional time and effort, and is prone to mistakes. By separating the analysis process from the data, Thot allows your data to be automatically analyzed.

Thot Projects

Because Thot separates the analysis from the data, you need a way to pull your data in to the script in a Container relative manner. This is done using a Thot Project.

Because analysis is bottom-up, a script only has access to Containers and Assets below it.

Thot Interface

Each Thot Project implements a standard interface. This makes converting between Local and Hosted projects easy. A Thot Interface consists of the following structure.

Properties

  • root: Current Container being analyzed.

Methods

  • find_container( search = {} ): Returns a Container matching the search criteria.

  • find_containers( search = {} ): Returns a list of Containers matching the search criteria.

  • find_asset( search = {} ): Returns an Asset matching the search criteria.

  • find_assets( search = {} ): Returns a list of Assets matching the search criteria.

  • add_asset( asset [, id = None, overwrite = True] ): Creates a new asset in the currently active Container. Returns the id of the new Asset.

Local Project

A Local Project is a Thot Interface that uses your local file system as its database. During the analysis everything is performed relative to the active Container.

A simple python script for a Local Project may look something like

import pandas as pd
from thot.thot import LocalProject

thot = LocalProject() # set up local project

# retrieve data
sample = thot.find_container( { 'type': 'sample'  } )
data = thot.find_asset( { 'type': 'times' } )

# analyze data
df = pd.read_csv( data.file )
stats = df.mean()

# produce new Asset for future consumption
stats_props = {
	'file': 'stats.csv',
	'type': 'stats',
	'name': '{} Stats'.format( sample.name )
}

asset_path = thot.add_asset( stats_props, 'stats', overwrite = True )
stats.to_csv( asset_path )

Once your project is set up you use the Runner to evaluate it.

python -m thot.runner [--root <path/to/project>]

Hosted Project

A Hosted Project uses the Thot servers as its database. Anytime a change is made to a project, the relevant analysis are automatically run, unless the scripts are set to run manually.

There are only two changes you need to make to convert a local analysis script into a hosted analysis script:

  1. from thot.thot import LocalProjectfrom thot.thot import ThotProject
  2. thot = LocalProject()thot = ThotProject()

JSON

JSON is a file format that allows data to be stored in a human-readable form. You can find a nice introduction at W3Schools, and full documentation at json.org.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thot-data-0.0.2.tar.gz (15.2 kB view hashes)

Uploaded Source

Built Distribution

thot_data-0.0.2-py3-none-any.whl (22.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page