Skip to main content

Create your private AI model with no training data or GPUs ๐Ÿค–๐Ÿš€.

Project description

Artifex

Artifex โ€“ Train task specific LLMs without training data, for offline NLP and Text Classification

Documentation | Tutorial

Artifex โ€“ Latest PyPi package version Artifex โ€“ Tests status Artifex โ€“ GitHub commit activity Artifex โ€“ Documentation

๐ŸŽฏ Create Task-Specific LLMs โ€ข ๐Ÿ“Š No training data needed โ€ข ๐ŸŒฑ No GPU needed โ€ข ๐Ÿ–ฅ๏ธ CPU Inference & Fine-Tuning


Artifex is a Python library for:

  1. Using small, pre-trained task-specific LLMs locally on CPU
  2. Fine-tuning them on CPU without any training data โ€” just based on your instructions for the task at hand.

At this time, we support 7 main tasks:

  • ๐Ÿ›ก๏ธ Guardrail: Flags unsafe, harmful, or off-topic messages.
  • ๐Ÿ—ฃ๏ธ Intent Classification: Classifies user messages into predefined intent categories.
  • ๐Ÿ”€ Reranker: Ranks a list of items or search results based on relevance to a query.
  • ๐Ÿ™‚ Sentiment Analysis: Determines the sentiment (positive, negative, neutral) of a given text.
  • ๐Ÿ˜ก Emotion Detection: Identifies the emotion expressed in a given text.
  • ๐Ÿท๏ธ Named Entity Recognition (NER): Detects and classifies named entities in text (e.g., persons, organizations, locations).
  • ๐Ÿฅธ Text Anonymization: Removes personally identifiable information (PII) from text.

For each task, Artifex provides three easy-to-use APIs:

  1. Inference API to use a default, pre-trained small LLM to perform that task out-of-the-box locally on CPU.
  2. Fine-tune API to fine-tune the default model based on your requirements, without any training data and on CPU. The fine-tuned model is generated on your machine and is yours to keep.
  3. Load API to load your fine-tuned model locally on CPU, and use it for inference or further fine-tuning.

We will be adding more tasks soon, based on user feedback. Want Artifex to perform a specific task? Suggest one or vote one up.

๐Ÿ”ฅ How does it work?

Problem

LLMs available on the market can be broadly classified into two categories:

  • General-purpose LLMs (GPT, Claude, Llama, etc.) have two main limitations:

    1. They are designed for open-ended tasks, which makes them overkill and often suboptimal for simpler, specific use cases.
    2. If open-source, they require expensive GPUs for training and inference; if not open-source, they incur high costs for usage via APIs and have data privacy concerns since your data is sent to 3rd-party servers.
  • Smaller LLMs (DistilBERT, TinyBERT, etc.) can sometimes be trained and run locally on CPU, but they require large amounts of labeled training data to perform well on specific tasks โ€” which is often not available.

Solution

Artifex overcomes these limitations by enabling you to:

  • Use small (capped at 500 Mb in size), pre-trained task-specific LLMs locally on CPU, thereby eliminating costs and data privacy concerns.

  • Fine-tune these models based on your requirements, without any training data โ€” just based on your instructions for the task at hand โ€” thereby obtaining higher accuracy on your specific use case.

    How is it possible? Artifex generates synthetic training data on-the-fly based on your instructions, and uses this data to fine-tune small LLMs for your specific task. This approach allows you to create effective models without the need for large labeled datasets.

๐Ÿš€ Quick Start

Install Artifex with:

pip install artifex

๐Ÿ›ก๏ธ Guardrail Model

Use the default Guardrail model (inference API)

Need a general-purpose guardrail model? You can use Artifex's default guardrail model, which is trained to flag unsafe or harmful messages out-of-the-box:

from artifex import Artifex

guardrail = Artifex().guardrail
print(guardrail("How do I make a bomb?"))

# >>> [{'label': 'unsafe', 'score': 0.9976}]

Learn more about the default guardrail model and what it considers safe vs unsafe on our Guarderail HF model page.

Create & use a custom Guardrail model (fine-tune & load APIs)

Need more control over what is considered safe vs unsafe? Fine-tune your own guardrail model, use it locally on CPU and keep it forever:

from artifex import Artifex

guardrail = Artifex().guardrail

model_output_path = "./output_model/"

guardrail.train(
    instructions=[
        "Discussing a competitor's products or services is not allowed.",
        "Sharing our employees' personal information is prohibited.",
        "Providing instructions for illegal activities is forbidden.",
        "Everything else is allowed.",
    ],
    output_path=model_output_path
)

guardrail.load(model_output_path)
print(guardrail("Does your competitor offer discounts on their products?"))

# >>> [{'label': 'unsafe', 'score': 0.9970}]

๐Ÿ—ฃ๏ธ Intent Classification model

Use the default Intent Classification model (inference API)

Need a general-purpose intent classification model? You can use Artifex's default intent classification model, which is trained to recognize common intents out-of-the-box:

from artifex import Artifex

intent_classifier = Artifex().intent_classifier

print(intent_classifier("Hey there, how are you doing?"))

# >>> [{'label': 'greeting', 'score': 0.9955}]

Learn more about the default intent classification model and what intents it is trained to recognize on our Intent Classification HF model page.

Create & use a custom Intent Classification model (fine-tune & load APIs)

Need more control over the classes recognized, or do you want to tailor the model to your specific domain for better results? Fine-tune your own intent classification model, use it locally on CPU and keep it forever:

from artifex import Artifex

intent_classifier = Artifex().intent_classifier

model_output_path = "./output_model/"

intent_classifier.train(
    domain="e-commerce customer support",
    classes={
        "order_status": "Inquiries about the status of an order.",
        "return_item": "Requests to return a purchased item.",
        "product_info": "Questions about product details or specifications.",
        "greeting": "Friendly greetings or salutations.",
    },
    output_path=model_output_path
)

intent_classifier.load(model_output_path)
print(intent_classifier("I want to return an item I bought last week."))

# >>> [{'label': 'return_item', 'score': 0.9914}]

๐Ÿ”€ Reranker model

Use the default Reranker model (inference API)

Need a general-purpose reranker model? You can use Artifex's default reranker model, which is trained to rank items based on relevance out-of-the-box:

from artifex import Artifex

reranker = Artifex().reranker

print(reranker(
    query="Best programming language for data science",
    documents=[
        "Java is a versatile language typically used for building large-scale applications.",
        "Python is widely used for data science due to its simplicity and extensive libraries.",
        "JavaScript is primarily used for web development.",
    ]
))

# >>> [('Python is widely used for data science due to its simplicity and extensive libraries.', 3.8346), ('Java is a versatile language typically used for building large-scale applications.', -0.8301), ('JavaScript is primarily used for web development.', -1.3784)]

Create & use a custom Reranker model (fine-tune & load APIs)

Want to fine-tune the Reranker model on a specific domain for better accuracy? Fine-tune your own reranker model, use it locally on CPU and keep it forever:

from artifex import Artifex

reranker = Artifex().reranker

model_output_path = "./output_model/"

reranker.train(
    domain="e-commerce product search",
    output_path=model_output_path
)

reranker.load(model_output_path)
print(reranker(
    query="Laptop with long battery life",
    documents=[
        "A powerful gaming laptop with high-end graphics and performance.",
        "An affordable laptop suitable for basic tasks and web browsing.",
        "This laptop features a battery life of up to 12 hours, perfect for all-day use.",
    ]
))

# >>> [('This laptop features a battery life of up to 12 hours, perfect for all-day use.', 4.7381), ('A powerful gaming laptop with high-end graphics and performance.', -1.8824), ('An affordable laptop suitable for basic tasks and web browsing.', -2.7585)]

๐Ÿ”ฃ Other Tasks

For more details and examples on how to use Artifex for the other available tasks, check out the Available Tasks section below and our Documentation.

๐Ÿ”ง Available Tasks & Examples

Task Default Model Default & Fine-Tuned Model Size CPU Inference CPU Fine-Tuning Code Examples
๐Ÿ›ก๏ธ Guardrail tanaos/tanaos-guardrail-v1 0.1B params, 500Mb โœ… โœ… Examples
๐Ÿ—ฃ๏ธ Intent Classification tanaos/tanaos-intent-classifier-v1 0.1B params, 500Mb โœ… โœ… Examples
๐Ÿ”€ Reranker cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 0.1B params, 470Mb โœ… โœ… Examples
๐Ÿ™‚ Sentiment Analysis tanaos/tanaos-sentiment-analysis-v1 0.1B params, 470Mb โœ… โœ… Examples
๐Ÿ˜ก Emotion Detection tanaos/tanaos-emotion-detection-v1 0.1B params, 470Mb โœ… โœ… Examples
๐Ÿท๏ธ Named Entity Recognition tanaos/tanaos-NER-v1 0.1B params, 500Mb โœ… โœ… Examples
๐Ÿฅธ Text Anonymization tanaos/tanaos-text-anonymizer-v1 0.1B params, 500Mb โœ… โœ… Examples

๐Ÿค Contributing

Contributions are welcome! Whether it's a new task module, improvement, or bug fix โ€” weโ€™d love your help. Not ready to contribute code? You can also help by suggesting a new task or voting up any suggestion.

git clone https://github.com/tanaos/artifex.git
cd artifex
pip install -e .

Before making a contribution, please review the CONTRIBUTING.md and CLA.md, which include important guidelines for contributing to the project.

๐Ÿ“š Documentation & Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artifex-0.4.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

artifex-0.4.0-py3-none-any.whl (858.1 kB view details)

Uploaded Python 3

File details

Details for the file artifex-0.4.0.tar.gz.

File metadata

  • Download URL: artifex-0.4.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for artifex-0.4.0.tar.gz
Algorithm Hash digest
SHA256 ed4bbfc5f7411efd8ccfc617585eb7873182d2dc60270028fb46d3e5bda6dd06
MD5 189ad825f0bffb1ca2cae40facd8426e
BLAKE2b-256 5315f3f0e4420da194ead98a764948953441d9f06eddb92d0637cbe4061f9746

See more details on using hashes here.

File details

Details for the file artifex-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: artifex-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 858.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for artifex-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 22ecf540c098509715adc39125d018f9a56c4af3049026ede26c55e33c64055b
MD5 f6a64bd0f9fa4a4e5b6cafb113a8d2e7
BLAKE2b-256 c11eb93dc51d9ad009557321aaa2b9a428d1a2fd796698c748585f48f56651bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page