Skip to main content

An AI-driven framework for synthesizing adaptive taxonomies, enabling automated data categorization and classification within dynamic hierarchical structures.

Project description

taxonomy-synthesis

An AI-driven framework for synthesizing adaptive taxonomies, enabling automated data categorization and classification within dynamic hierarchical structures.

TLDR: copy this README and throw it into ChatGPT. It will figure things out for you. (will create a "GPT" soon)

Join our Discord Community for questions, discussions, and collaboration!

Check out our YouTube demo video to see Taxonomy Synthesis in action!

Explain Like I'm 5 ๐Ÿค”

Imagine you have a big box of different animals, but youโ€™re not sure how to group them. You know there are "Mammals" and "Reptiles," but you donโ€™t know the smaller groups they belong to, like which mammals are more similar or which reptiles go together. This tool uses smart AI helpers to figure out those smaller groups for you, like finding out there are "Rodents" and "Primates" among the mammals, and "Lizards" and "Snakes" among the reptiles. It then helps you sort all the animals into the right groups automatically, keeping everything neatly organized!

Features ๐Ÿ› ๏ธ

  • Manual and Automatic Taxonomy Generation: Flexibly create taxonomy trees manually or automatically from arbitrary items.
  • Recursive Tree Primitives: Utilize a tree structure that supports recursive operations, making it easy to manage hierarchical data.
  • AI-Generated Subcategories: Automatically generate subcategories using AI models based on the context and data provided.
  • AI Classification: Automatically classify items into appropriate categories using advanced AI models.

Quickstart Guide (colab) ๐Ÿš€

In this quickstart, we'll walk you through the process of using taxonomy-synthesis to create a simplified phylogenetic tree for a list of animals. We'll demonstrate how to initialize the package, set up an OpenAI client, manually create a taxonomy tree, generate subcategories automatically, and classify items using AI.

1. Download and Install the Package

First, ensure you have the package installed. You can install taxonomy-synthesis directly using pip:

pip install taxonomy-synthesis

2. Set Up OpenAI Client

Before proceeding, make sure you have an OpenAI API key.

# Set up the OpenAI client
from openai import OpenAI

client = OpenAI(api_key='sk-...')

3. Prepare Your Data

We'll start with a list of 10 animal species, each represented with an arbitrary schema containing fields like name, fun fact, lifespan, and emoji. The only required field is id, which should be unique for each item.

# Prepare a list of items (animals) with various attributes
items = [
  {"id": "๐Ÿฆ˜", "name": "Kangaroo", "fun_fact": "Can hop at high speeds", "lifespan_years": 23, "emoji": "๐Ÿฆ˜"},
  {"id": "๐Ÿจ", "name": "Koala", "fun_fact": "Sleeps up to 22 hours a day", "lifespan_years": 18, "emoji": "๐Ÿจ"},
  {"id": "๐Ÿ˜", "name": "Elephant", "fun_fact": "Largest land animal", "lifespan_years": 60, "emoji": "๐Ÿ˜"},
  {"id": "๐Ÿ•", "name": "Dog", "fun_fact": "Best friend of humans", "lifespan_years": 15, "emoji": "๐Ÿ•"},
  {"id": "๐Ÿ„", "name": "Cow", "fun_fact": "Gives milk", "lifespan_years": 20, "emoji": "๐Ÿ„"},
  {"id": "๐Ÿ", "name": "Mouse", "fun_fact": "Can squeeze through tiny gaps", "lifespan_years": 2, "emoji": "๐Ÿ"},
  {"id": "๐ŸŠ", "name": "Crocodile", "fun_fact": "Lives in water and land", "lifespan_years": 70, "emoji": "๐ŸŠ"},
  {"id": "๐Ÿ", "name": "Snake", "fun_fact": "No legs", "lifespan_years": 9, "emoji": "๐Ÿ"},
  {"id": "๐Ÿข", "name": "Turtle", "fun_fact": "Can live over 100 years", "lifespan_years": 100, "emoji": "๐Ÿข"},
  {"id": "๐ŸฆŽ", "name": "Gecko", "fun_fact": "Can climb walls", "lifespan_years": 5, "emoji": "๐ŸฆŽ"}
]

4. Initialize the Tree Structure

Create the root node for our taxonomy tree and initialize two subclasses: Mammals and Reptiles.

from taxonomy_synthesis.models import Category, Item
from taxonomy_synthesis.tree.tree_node import TreeNode

# Create root node and two primary subclasses
root_category = Category(name="Animals", description="All animals")
mammal_category = Category(name="Mammals", description="Mammal species")
reptile_category = Category(name="Reptiles", description="Reptile species")

root_node = TreeNode(value=root_category)
mammal_node = TreeNode(value=mammal_category)
reptile_node = TreeNode(value=reptile_category)

# Add subclasses to the root node
root_node.add_child(mammal_node)
root_node.add_child(reptile_node)

5. Classify Items in the Root Node

Classify all items under the root node into Mammals or Reptiles using the AI classifier.

from taxonomy_synthesis.tree.node_operator import NodeOperator
from taxonomy_synthesis.classifiers.gpt_classifier import GPTClassifier

# Initialize the GPT classifier and node operator
classifier = GPTClassifier(client=client)
generator = None  # We'll use manual generation for this part
operator = NodeOperator(classifier=classifier, generator=generator)

# Convert dictionary items to Item objects and classify
item_objects = [Item(**item) for item in items]
classified_items = operator.classify_items(root_node, item_objects)

print("After initial classification:")
print(root_node.print_tree())

Output:

After initial classification:
Animals: []
  Mammals: [๐Ÿฆ˜, ๐Ÿจ, ๐Ÿ˜, ๐Ÿ•, ๐Ÿ„, ๐Ÿ]
  Reptiles: [๐ŸŠ, ๐Ÿ, ๐Ÿข, ๐ŸฆŽ]

6. Generate Subcategories for Mammals

Use AI to automatically generate subcategories under Mammals based on the provided data.

from taxonomy_synthesis.generator.taxonomy_generator import TaxonomyGenerator

# Initialize the Taxonomy Generator
generator = TaxonomyGenerator(
    client=client,
    max_categories=2,
    generation_method="Create categories inaccordance to the philogenetic tree."
)
operator.generator = generator

# Generate subcategories under Mammals
new_categories = operator.generate_subcategories(mammal_node)

print("Generated subcategories under 'Mammals':")
print(mammal_node.print_tree())

Output:

Generated subcategories under 'Mammals':
Mammals: [๐Ÿฆ˜, ๐Ÿจ, ๐Ÿ˜, ๐Ÿ•, ๐Ÿ„, ๐Ÿ]
  marsupials: []
  placentals: []

7. Reclassify Items under Mammals

Now classify the items specifically under the Mammals node into their newly generated subcategories.

# Reclassify items under Mammals based on the new subcategories
classified_items = operator.classify_items(mammal_node, mammal_node.get_all_items())

print("After reclassification under 'Mammals':")
print(root_node.print_tree())

Output:

After reclassification under 'Mammals':
Mammals: []
  marsupials: [๐Ÿฆ˜, ๐Ÿจ]
  placentals: [๐Ÿ˜, ๐Ÿ•, ๐Ÿ„, ๐Ÿ]

8. Print the Final Tree Structure

Finally, print the entire tree to see the categorized structure.

# Print the final tree structure
print("Final taxonomy tree structure:")
print(root_node.print_tree())

Output:

Final taxonomy tree structure:
Animals: []
  Mammals: []
    marsupials: [๐Ÿฆ˜, ๐Ÿจ]
    placentals: [๐Ÿ˜, ๐Ÿ•, ๐Ÿ„, ๐Ÿ]
  Reptiles: [๐ŸŠ, ๐Ÿ, ๐Ÿข, ๐ŸฆŽ]

System Diagram ๐ŸŽจ

For a visual representation of the system architecture and its components, refer to the following diagram:

v1 Class Diagram

Contributing ๐Ÿค—

Contributions are welcome! To get started, follow these steps to set up your development environment:

  1. Clone the Repository:

    git clone https://github.com/CakeCrusher/TaxonomySynthesis.git
    cd taxonomy-synthesis
    
  2. Install Poetry (if not already installed):

    curl -sSL https://install.python-poetry.org | python3 -
    
  3. Install Dependencies:

    Use Poetry to install all the dependencies in a virtual environment:

    poetry install
    
  4. Activate the Virtual Environment:

    To activate the virtual environment created by Poetry:

    poetry shell
    
  5. Run Pre-Commit Hooks:

    To maintain code quality, please run pre-commit hooks before submitting any pull requests:

    poetry run pre-commit install
    poetry run pre-commit run --all-files
    

We encourage you to open issues for any bugs you encounter or features you'd like to see added. Pull requests are also highly appreciated! Let's work together to improve and expand this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomy_synthesis-0.1.9.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

taxonomy_synthesis-0.1.9-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file taxonomy_synthesis-0.1.9.tar.gz.

File metadata

  • Download URL: taxonomy_synthesis-0.1.9.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.9.20 Linux/6.5.0-1025-azure

File hashes

Hashes for taxonomy_synthesis-0.1.9.tar.gz
Algorithm Hash digest
SHA256 6abae374babbba384db0cd4b21fc4cd2cf46f6fc4af041e1910314e1e80d8e65
MD5 60c2ebe76a2bfc832cd0a7a09f3437f3
BLAKE2b-256 247b1c4a6c4ca0eec75b7b5eff0e91612935c3bf5617432152ba96caa93729b6

See more details on using hashes here.

File details

Details for the file taxonomy_synthesis-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for taxonomy_synthesis-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 16be28436813ff76e3b69c55cf1439c8d0e19ecb2ec1c1f6094abe7bd5acf0a4
MD5 eca348b598cfe2f98cf755828ad83807
BLAKE2b-256 fa96ac0cce6b7b48348f67ed04d5d6dc28d17d65335e44eae804059eb089d3a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page