Skip to main content

Vietnamese NLP Toolkit

Project description




Open-source Agentic AI Toolkit

Underthesea is:

🌊 An Agentic AI Toolkit. Since v9.3.0, Underthesea is an open-source Agentic AI Toolkit with built-in Vietnamese NLP capabilities. It provides multi-provider AI Agent support and a suite of Python modules for Vietnamese Natural Language Processing.

🎁 Support Us! Every bit of support helps us achieve our goals. Thank you so much. 💝💝💝

Installation

$ pip install underthesea

Agent

Multi-provider AI Agent with zero external dependencies. Communicates with LLM APIs using only Python stdlib (urllib + json) — no openai, anthropic, or google-genai packages required.

Providers: OpenAI | Azure OpenAI | Anthropic Claude | Google Gemini

Quick Start

# Pick one provider:
$ export OPENAI_API_KEY=sk-...
# or Azure:
$ export AZURE_OPENAI_API_KEY=... && export AZURE_OPENAI_ENDPOINT=https://...
# or Anthropic:
$ export ANTHROPIC_API_KEY=sk-ant-...
# or Gemini:
$ export GOOGLE_API_KEY=...
from underthesea.agent import Agent, LLM

agent = Agent(name="assistant", provider=LLM())
agent("Hello!")

Providers

Each provider is its own class, following the Anthropic SDK pattern.

from underthesea.agent import Agent, OpenAI, AzureOpenAI, Anthropic, Gemini, LLM

# OpenAI
agent = Agent(name="bot", provider=OpenAI(api_key="sk-..."))

# Azure OpenAI
agent = Agent(name="bot", provider=AzureOpenAI(
    api_key="...",
    endpoint="https://my.openai.azure.com",
    deployment="gpt-4",
))

# Anthropic Claude
agent = Agent(name="bot", provider=Anthropic(api_key="sk-ant-..."))

# Google Gemini
agent = Agent(name="bot", provider=Gemini(api_key="..."))

# Auto-detect from environment variables
agent = Agent(name="bot", provider=LLM())

Streaming

for chunk in agent.stream("Explain what an AI agent is"):
    print(chunk, end="", flush=True)

Tool Calling

from underthesea.agent import Agent, Tool, OpenAI

def get_weather(location: str) -> dict:
    """Get current weather for a location."""
    return {"location": location, "temp": 25, "condition": "sunny"}

agent = Agent(
    name="assistant",
    provider=OpenAI(),
    tools=[Tool(get_weather)],
    instruction="You are a helpful assistant.",
)

agent("What's the weather in Hanoi?")
# 'The weather in Hanoi is 25°C and sunny.'

Default Tools

12 built-in tools: calculator, datetime, web search, wikipedia, file I/O, shell, python exec.

from underthesea.agent import Agent, default_tools, LLM

agent = Agent(name="assistant", provider=LLM(), tools=default_tools)
agent("Calculate sqrt(144) + 10")

Multi-Session

Long-running agents with context reset and structured handoff between sessions, following Anthropic harness patterns.

from underthesea.agent import Agent, Session, AzureOpenAI

agent = Agent(name="researcher", provider=AzureOpenAI(...))
session = Session(agent, progress_file="progress.json")
session.create_task("Analyze documents", [
    "Read and classify documents",
    "Summarize each group",
    "Write final report",
])
session.run_until_complete(max_sessions=5)

Tracing

Every agent call is automatically traced to ~/.underthesea/traces/. Disable with UNDERTHESEA_TRACE_DISABLED=1.

from underthesea.agent import Agent, LangfuseTracer, calculator_tool

# Auto local trace (default) — zero config
agent = Agent(name="bot", tools=[calculator_tool])
agent("What is 2+2?")
# >> Trace [a1b2c3] bot
#    |-- Generation: llm.chat #1 (gpt-4.1-mini) ... 1200ms | 100->18 tokens
#    |-- Tool: tool.calculator ... 0ms
#    |-- Generation: llm.chat #2 (gpt-4.1-mini) ... 800ms | 150->12 tokens
# << Trace [a1b2c3] [ok] 2000ms -> ~/.underthesea/traces/20260411_trace_a1b2c3.json

# Langfuse (pip install langfuse)
agent = Agent(name="bot", tools=[calculator_tool], tracer=LangfuseTracer())

# @trace decorator — nested functions become child spans
from underthesea.agent.trace import trace, LocalTracer

@trace(LocalTracer())
def pipeline(text):
    return Agent(name="bot")(text)  # auto-inherits trace context

Architecture

underthesea.agent
├── providers/
│   ├── OpenAI          # api.openai.com
│   ├── AzureOpenAI     # *.openai.azure.com
│   ├── Anthropic       # api.anthropic.com
│   └── Gemini          # generativelanguage.googleapis.com
├── trace/
│   ├── LocalTracer     # JSON files to ~/.underthesea/traces/
│   ├── LangfuseTracer  # Langfuse v4 observability
│   └── @trace          # Decorator with auto-nesting
├── Agent               # Tool calling loop + streaming
├── LLM                 # Auto-detect provider from env vars
├── Session             # Multi-session orchestration
├── Tool                # Function → tool wrapper
└── default_tools       # 12 built-in tools

Vietnamese NLP

See full documentation at NLP.md.

Pipeline Usage
Sentence Segmentation sent_tokenize(text)
Text Normalization text_normalize(text)
Word Segmentation word_tokenize(text)
POS Tagging pos_tag(text)
Chunking chunk(text)
Named Entity Recognition ner(text)
Text Classification classify(text)
Sentiment Analysis sentiment(text)
Language Detection lang_detect(text)
Dependency Parsing dependency_parse(text)
Translation translate(text)
Text-to-Speech tts(text)
from underthesea import word_tokenize, ner, sentiment

word_tokenize("Chàng trai 9X Quảng Trị khởi nghiệp từ nấm sò")
# ["Chàng trai", "9X", "Quảng Trị", "khởi nghiệp", "từ", "nấm", "sò"]

ner("Chưa tiết lộ lịch trình tới Việt Nam của Tổng thống Mỹ Donald Trump")
# [... ('Việt Nam', 'Np', 'B-NP', 'B-LOC'), ... ('Donald', 'Np', 'B-NP', 'B-PER'), ('Trump', 'Np', 'B-NP', 'I-PER')]

sentiment("Sản phẩm hơi nhỏ nhưng chất lượng tốt, đóng gói cẩn thận.")
# 'positive'

Contributing

Do you want to contribute with underthesea development? Great! Please read more details at Contributing Guide

💝 Support Us

If you found this project helpful and would like to support our work, you can just buy us a coffee ☕.

Your support is our biggest encouragement 🎁!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

underthesea-9.4.0.tar.gz (7.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

underthesea-9.4.0-py3-none-any.whl (7.3 MB view details)

Uploaded Python 3

File details

Details for the file underthesea-9.4.0.tar.gz.

File metadata

  • Download URL: underthesea-9.4.0.tar.gz
  • Upload date:
  • Size: 7.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for underthesea-9.4.0.tar.gz
Algorithm Hash digest
SHA256 3cf76445fd038dff5a9b9cd9f89349b500aac84dcd63c5f0503f34e877f41f21
MD5 a9d9d5c9f0c2e2675cec6381bbb5d86b
BLAKE2b-256 f0526b2e31117a894dfb0bfd474785052e83cd18c50ec34f50be3647486d1500

See more details on using hashes here.

Provenance

The following attestation bundles were made for underthesea-9.4.0.tar.gz:

Publisher: release-pypi.yml on undertheseanlp/underthesea

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file underthesea-9.4.0-py3-none-any.whl.

File metadata

  • Download URL: underthesea-9.4.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for underthesea-9.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a91c33a48ec6309a5ef81ae032e9b6ea16e4d38015a66a36a789a45e6a44ab9c
MD5 7b5fdffc82a67a60a9c4574eed8c7ac8
BLAKE2b-256 e0b29411f21103138790417c2dc6336f91d429f601813ca66363d55e80bd348b

See more details on using hashes here.

Provenance

The following attestation bundles were made for underthesea-9.4.0-py3-none-any.whl:

Publisher: release-pypi.yml on undertheseanlp/underthesea

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page