Vietnamese NLP Toolkit
Project description
Open-source Agentic AI Toolkit
Underthesea is:
🌊 An Agentic AI Toolkit. Since v9.3.0, Underthesea is an open-source Agentic AI Toolkit with built-in Vietnamese NLP capabilities. It provides multi-provider AI Agent support and a suite of Python modules for Vietnamese Natural Language Processing.
🎁 Support Us! Every bit of support helps us achieve our goals. Thank you so much. 💝💝💝
Installation
$ pip install underthesea
Agent
Multi-provider AI Agent with zero external dependencies. Communicates with LLM APIs using only Python stdlib (urllib + json) — no openai, anthropic, or google-genai packages required.
Providers: OpenAI | Azure OpenAI | Anthropic Claude | Google Gemini
Quick Start
# Pick one provider:
$ export OPENAI_API_KEY=sk-...
# or Azure:
$ export AZURE_OPENAI_API_KEY=... && export AZURE_OPENAI_ENDPOINT=https://...
# or Anthropic:
$ export ANTHROPIC_API_KEY=sk-ant-...
# or Gemini:
$ export GOOGLE_API_KEY=...
from underthesea.agent import Agent, LLM
agent = Agent(name="assistant", provider=LLM())
agent("Hello!")
Providers
Each provider is its own class, following the Anthropic SDK pattern.
from underthesea.agent import Agent, OpenAI, AzureOpenAI, Anthropic, Gemini, LLM
# OpenAI
agent = Agent(name="bot", provider=OpenAI(api_key="sk-..."))
# Azure OpenAI
agent = Agent(name="bot", provider=AzureOpenAI(
api_key="...",
endpoint="https://my.openai.azure.com",
deployment="gpt-4",
))
# Anthropic Claude
agent = Agent(name="bot", provider=Anthropic(api_key="sk-ant-..."))
# Google Gemini
agent = Agent(name="bot", provider=Gemini(api_key="..."))
# Auto-detect from environment variables
agent = Agent(name="bot", provider=LLM())
Streaming
for chunk in agent.stream("Explain what an AI agent is"):
print(chunk, end="", flush=True)
Tool Calling
from underthesea.agent import Agent, Tool, OpenAI
def get_weather(location: str) -> dict:
"""Get current weather for a location."""
return {"location": location, "temp": 25, "condition": "sunny"}
agent = Agent(
name="assistant",
provider=OpenAI(),
tools=[Tool(get_weather)],
instruction="You are a helpful assistant.",
)
agent("What's the weather in Hanoi?")
# 'The weather in Hanoi is 25°C and sunny.'
Default Tools
12 built-in tools: calculator, datetime, web search, wikipedia, file I/O, shell, python exec.
from underthesea.agent import Agent, default_tools, LLM
agent = Agent(name="assistant", provider=LLM(), tools=default_tools)
agent("Calculate sqrt(144) + 10")
Multi-Session
Long-running agents with context reset and structured handoff between sessions, following Anthropic harness patterns.
from underthesea.agent import Agent, Session, AzureOpenAI
agent = Agent(name="researcher", provider=AzureOpenAI(...))
session = Session(agent, progress_file="progress.json")
session.create_task("Analyze documents", [
"Read and classify documents",
"Summarize each group",
"Write final report",
])
session.run_until_complete(max_sessions=5)
Tracing
Every agent call is automatically traced to ~/.underthesea/traces/. Disable with UNDERTHESEA_TRACE_DISABLED=1.
from underthesea.agent import Agent, LangfuseTracer, calculator_tool
# Auto local trace (default) — zero config
agent = Agent(name="bot", tools=[calculator_tool])
agent("What is 2+2?")
# >> Trace [a1b2c3] bot
# |-- Generation: llm.chat #1 (gpt-4.1-mini) ... 1200ms | 100->18 tokens
# |-- Tool: tool.calculator ... 0ms
# |-- Generation: llm.chat #2 (gpt-4.1-mini) ... 800ms | 150->12 tokens
# << Trace [a1b2c3] [ok] 2000ms -> ~/.underthesea/traces/20260411_trace_a1b2c3.json
# Langfuse (pip install langfuse)
agent = Agent(name="bot", tools=[calculator_tool], tracer=LangfuseTracer())
# @trace decorator — nested functions become child spans
from underthesea.agent.trace import trace, LocalTracer
@trace(LocalTracer())
def pipeline(text):
return Agent(name="bot")(text) # auto-inherits trace context
Architecture
underthesea.agent
├── providers/
│ ├── OpenAI # api.openai.com
│ ├── AzureOpenAI # *.openai.azure.com
│ ├── Anthropic # api.anthropic.com
│ └── Gemini # generativelanguage.googleapis.com
├── trace/
│ ├── LocalTracer # JSON files to ~/.underthesea/traces/
│ ├── LangfuseTracer # Langfuse v4 observability
│ └── @trace # Decorator with auto-nesting
├── Agent # Tool calling loop + streaming
├── LLM # Auto-detect provider from env vars
├── Session # Multi-session orchestration
├── Tool # Function → tool wrapper
└── default_tools # 12 built-in tools
Vietnamese NLP
See full documentation at NLP.md.
| Pipeline | Usage |
|---|---|
| Sentence Segmentation | sent_tokenize(text) |
| Text Normalization | text_normalize(text) |
| Word Segmentation | word_tokenize(text) |
| POS Tagging | pos_tag(text) |
| Chunking | chunk(text) |
| Named Entity Recognition | ner(text) |
| Text Classification | classify(text) |
| Sentiment Analysis | sentiment(text) |
| Language Detection | lang_detect(text) |
| Dependency Parsing | dependency_parse(text) |
| Translation | translate(text) |
| Text-to-Speech | tts(text) |
from underthesea import word_tokenize, ner, sentiment
word_tokenize("Chàng trai 9X Quảng Trị khởi nghiệp từ nấm sò")
# ["Chàng trai", "9X", "Quảng Trị", "khởi nghiệp", "từ", "nấm", "sò"]
ner("Chưa tiết lộ lịch trình tới Việt Nam của Tổng thống Mỹ Donald Trump")
# [... ('Việt Nam', 'Np', 'B-NP', 'B-LOC'), ... ('Donald', 'Np', 'B-NP', 'B-PER'), ('Trump', 'Np', 'B-NP', 'I-PER')]
sentiment("Sản phẩm hơi nhỏ nhưng chất lượng tốt, đóng gói cẩn thận.")
# 'positive'
Contributing
Do you want to contribute with underthesea development? Great! Please read more details at Contributing Guide
💝 Support Us
If you found this project helpful and would like to support our work, you can just buy us a coffee ☕.
Your support is our biggest encouragement 🎁!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file underthesea-9.4.0.tar.gz.
File metadata
- Download URL: underthesea-9.4.0.tar.gz
- Upload date:
- Size: 7.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cf76445fd038dff5a9b9cd9f89349b500aac84dcd63c5f0503f34e877f41f21
|
|
| MD5 |
a9d9d5c9f0c2e2675cec6381bbb5d86b
|
|
| BLAKE2b-256 |
f0526b2e31117a894dfb0bfd474785052e83cd18c50ec34f50be3647486d1500
|
Provenance
The following attestation bundles were made for underthesea-9.4.0.tar.gz:
Publisher:
release-pypi.yml on undertheseanlp/underthesea
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
underthesea-9.4.0.tar.gz -
Subject digest:
3cf76445fd038dff5a9b9cd9f89349b500aac84dcd63c5f0503f34e877f41f21 - Sigstore transparency entry: 1277063631
- Sigstore integration time:
-
Permalink:
undertheseanlp/underthesea@25128e0f15bfe4a3bbc790da20f49aeb6dea2bdf -
Branch / Tag:
refs/tags/underthesea-v9.4.0 - Owner: https://github.com/undertheseanlp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@25128e0f15bfe4a3bbc790da20f49aeb6dea2bdf -
Trigger Event:
push
-
Statement type:
File details
Details for the file underthesea-9.4.0-py3-none-any.whl.
File metadata
- Download URL: underthesea-9.4.0-py3-none-any.whl
- Upload date:
- Size: 7.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a91c33a48ec6309a5ef81ae032e9b6ea16e4d38015a66a36a789a45e6a44ab9c
|
|
| MD5 |
7b5fdffc82a67a60a9c4574eed8c7ac8
|
|
| BLAKE2b-256 |
e0b29411f21103138790417c2dc6336f91d429f601813ca66363d55e80bd348b
|
Provenance
The following attestation bundles were made for underthesea-9.4.0-py3-none-any.whl:
Publisher:
release-pypi.yml on undertheseanlp/underthesea
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
underthesea-9.4.0-py3-none-any.whl -
Subject digest:
a91c33a48ec6309a5ef81ae032e9b6ea16e4d38015a66a36a789a45e6a44ab9c - Sigstore transparency entry: 1277063669
- Sigstore integration time:
-
Permalink:
undertheseanlp/underthesea@25128e0f15bfe4a3bbc790da20f49aeb6dea2bdf -
Branch / Tag:
refs/tags/underthesea-v9.4.0 - Owner: https://github.com/undertheseanlp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@25128e0f15bfe4a3bbc790da20f49aeb6dea2bdf -
Trigger Event:
push
-
Statement type: