Thot data analysis and management.
Project description
Thot
python -m pip install thot-data
Data management and analysis software.
For full documentation go to Read The Docs.
Thinking About Thot
Thot is based on top-down organization and bottom-up analysis, or, congruently, outside-in organization and inside-out analysis.
[images]
There are three components of a Thot project: Containers, Assets, and Scripts.
Containers
Containers are the organizational building blocks of your project. They allow you to structure your projects and analysis in a logical way. Following the top-down organizational approach, Containers can contain both other Containers as children, and Assets. They can also have descriptors and metadata attached to them. Child containers inherit all the properties of their parents. Containers are also associated with Scripts, which analyze its Assets and produce new Assets.
Assets
An Asset is anything that is consumed or created in your analysis. This includes raw data, calculated data, and images. Each Asset can have descriptors and metadata attached to it as well.
Scripts
A Script is a multi-input, multi-output function where the inputs and outputs are Assets. The input to a script is consumed and the output is produced. The produced Assets can then be consumed by other Scripts in the future.
Descriptors
Descriptors are human-readable pieces of data that describe what they are attached to. These properties can be used to identify classes of Objects (through its type or tags), or individual objects (by its name).
- Name
- Type
- Tags
- Description
Metadata
Metadata is data about data. Children inherit metadata from their parents.
Setting up a Project [Organization]
There are two versions of Thot, Local and Hosted. Local projects are run on your computer -- no internet connection or registration required. Hosted projects are run from the Thot website and provide additional functionality. If you have a Hosted account, you can sync your Local Projects with it, so data is automatically uploaded to the Thot servers and analyzed.
Local Projects
Local projects are just a set of folders and files on you computer. To tell Thot what a folder or file is you use an Object File. Object Files are just JSON files that provide information to your project. There are three types of Object Files -- one for each component of a Thot projects.
A folder can be either a Container or an Asset, not both.
_container.json
By adding a _container.json file to a folder you mark it as a Container. A Container file has the following properties:
-
name: The name of the Container. Can be used for retrieval in a script. If this is not provided, the base name of the folder is used.
-
type: Represents the class of the container. This is most useful to designate what level of the organizational structure the Container is at.
-
description: A description of the Container.
-
tags: A list of tags used for retrieving the container in a script.
-
metadata: A set of key-value pairs representing metadata about the Container and its children.
{
"name": "",
"type": "",
"description": "",
"tags": [],
"metadata": {}
}
_asset.json
By adding an _asset.json file to a folder you mark it as an Asset. In addition to the basic properties of the Container, an Asset also has:
-
file: Absolute or relative path to the Asset file. It is best to put the file in the Asset folder, so a relative path is most convenient.
-
creator: The creator of the Asset. If the Asset was created by a specific machine, this is a good place to mark that. If the Asset was produced by a script, this will be set to the path of the script, allowing you to trace back its origin.
-
creator_type: This indicates whether the Asset was created by a user or a script. If a script produced it, this will automatically be set.
{
"name": "",
"type": "",
"description": "",
"tags": [],
"metadata": {},
"file": "path/to/asset.csv",
"creator": "",
"creator_type": "user"
}
_scripts.json
Scripts files are a bit different than those for Containers and Assets. These files create an association between a script and a Container. This file tells Thot which scripts to run, and in which order.
Only Containers can contain a _scripts.json file.
A Scripts file contains a list of Script Associations:
- script: Relative or absolute path to the script.
- priority: The order in which to run the script. Lower priorities go first.
- autorun: Whether to automatically run the scrpt when evaluating a project. If false you will have to manually run the script.
[
{
"script": "path/to/script.py",
"priority": 0,
"autorun": true
}
]
Notes
A _notes folder can also be included in a Container or Asset. Text files containing notes about the object can be stored in this folder. Each notes has the properties
- created: The date of creation interpreted form the time the note was last modified.
- title: The title of the note, interpreted from the name of the file.
- content: The note itself, read from the contents of the file.
Utilities
Thot comes with a utilities
module to make building local projects an easier task. For full documentation use python -m thot.utilities -h
. All utility functions output the ids of modified of Containers.
Options
Utilities functions include some generic options that can be applied to all functions.
--root
,-r
: Specifies path to the root Container.--overwrite
,-w
: If a conflict emergers, overwrite the original content with the provided content. Otherwise, leave the original content.--search
,-s
: JSON object used to match Containers to apply the function to.
Ensure that your JSON is properly quoted. You will likely have to place single quotes around the JSON string, and double quaotes around property keys and strings within the object. E.g.
'{ "string_property": "test string", "boolean_property": true, "number_property": 42 }'
Scripts
You can autotmatically add scripts to a project using the add_scripts
function.
python -m thot.utilities add_scripts --scripts <scripts_object>
Where <scripts_object>
mimics the _scripts.json
file. For convenience, if only one script is being added it does not need to be enclosed in an array.
Scripts can also be automatically removed with the remove_scripts
function.
python -m thot.utilities remove_scripts --scripts [script_1, script_2, ...]
For convenience, if only a single script is being removed it does not need to be in an array. If a script does not exist on a selected Container it is not modified. Scripts are matched based on the "script"
field.
Finally, you can set the scripts automatically using the set_scripts
function.
python -m thot.utilities set_scripts --scripts <scripts_object>
Hosted Projects
To create a Hosted project go to thot-data.com and create an account or log in.
Hosted projects have additional features such as user friendly interfaces for project creation, sharing projects and scripts, and more.
Thot Scripts [Analysis]
Thot is founded on the idea that the same analysis needs to be run on different data sets. Often this is done manually, taking additional time and effort, and is prone to mistakes. By separating the analysis process from the data, Thot allows your data to be automatically analyzed.
Thot Projects
Because Thot separates the analysis from the data, you need a way to pull your data in to the script in a Container relative manner. This is done using a Thot Project.
Because analysis is bottom-up, a script only has access to Containers and Assets below it.
Thot Interface
Each Thot Project implements a standard interface. This makes converting between Local and Hosted projects easy. A Thot Interface consists of the following structure.
Properties
- root: Current Container being analyzed.
Methods
-
find_container( search = {} ): Returns a Container matching the search criteria.
-
find_containers( search = {} ): Returns a list of Containers matching the search criteria.
-
find_asset( search = {} ): Returns an Asset matching the search criteria.
-
find_assets( search = {} ): Returns a list of Assets matching the search criteria.
-
add_asset( asset [, id = None, overwrite = True] ): Creates a new asset in the currently active Container. Returns the id of the new Asset. For a Local project the id is the absolute path to the Asset.
Local Project
A Local Project is a Thot Interface that uses your local file system as its database. During the analysis everything is performed relative to the active Container.
A simple python script for a Local Project may look something like
import pandas as pd
from thot.thot import LocalProject
thot = LocalProject() # set up local project
# retrieve data
sample = thot.find_container( { 'type': 'sample' } )
data = thot.find_asset( { 'type': 'times' } )
# analyze data
df = pd.read_csv( data.file )
stats = df.mean()
# produce new Asset for future consumption
stats_props = {
'file': 'stats.csv',
'type': 'stats',
'name': '{} Stats'.format( sample.name )
}
asset_path = thot.add_asset( stats_props, 'stats', overwrite = True )
stats.to_csv( asset_path )
Testing Scripts
You can test your scripts using the LocalHost.dev_mode()
function along with passing in a test container to act as the temporary root of your project.
root_path = (
'realtive/path/to/test/container'
if LocalProject.dev_mode() else
None
)
thot = LocalProject( root_path )
This allows you to run your scripts within the console or a Jupyter Notebook without analyzing the entire project tree. the LocalProject.dev_mode()
method returns True
if the script is being run by the Runner, and False
if it's being run manualluy, i.e. from the console or within a Jupyter Notebook.
Runner
Once your project is set up you use the Runner to evaluate it.
python -m thot.runner [--root <path/to/tree>] [--scripts [ <script_1>, <script_2>, ... ] ]
--root
: Specifies the root container whose tree should be run. This doesn't need to be the root of the project. If not included the current directory is used as the root.--scripts
: A JSON array specifying which scripts to run. If not included all scripts are run.
Hosted Project
A Hosted Project uses the Thot servers as its database. Anytime a change is made to a project, the relevant analysis are automatically run, unless the scripts are set to run manually.
There are only two changes you need to make to convert a local analysis script into a hosted analysis script:
from thot.thot import LocalProject
→from thot.thot import ThotProject
thot = LocalProject()
→thot = ThotProject()
JSON
JSON is a file format that allows data to be stored in a human-readable form. You can find a nice introduction at W3Schools, and full documentation at json.org.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for thot_data-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 790946a89792f8e6e83c10d5153ad823dd4e35dcefdd3e87ac2a15577e951730 |
|
MD5 | 8cf24ba782b3a4790810e89a799ecfb3 |
|
BLAKE2b-256 | 82d3c12b52b4d6daf3badf92bae39309464b9a1cbb6a80eb30cda3834a634da8 |