Skip to main content

Vietnamese NLP Toolkit

Project description




Open-source Vietnamese Natural Language Process Toolkit

Underthesea is:

🌊 A Vietnamese NLP toolkit with AI Agent capabilities. Underthesea is a suite of open source Python modules supporting research and development in Vietnamese Natural Language Processing and Agentic AI.

🎁 Support Us! Every bit of support helps us achieve our goals. Thank you so much. 💝💝💝

Installation

$ pip install underthesea

Agent

Multi-provider AI Agent with zero external dependencies. Communicates with LLM APIs using only Python stdlib (urllib + json) — no openai, anthropic, or google-genai packages required.

Providers: OpenAI | Azure OpenAI | Anthropic Claude | Google Gemini

Quick Start

# Pick one provider:
$ export OPENAI_API_KEY=sk-...
# or Azure:
$ export AZURE_OPENAI_API_KEY=... && export AZURE_OPENAI_ENDPOINT=https://...
# or Anthropic:
$ export ANTHROPIC_API_KEY=sk-ant-...
# or Gemini:
$ export GOOGLE_API_KEY=...
from underthesea.agent import Agent, LLM

agent = Agent(name="assistant", provider=LLM())
agent("Hello!")

Providers

Each provider is its own class, following the Anthropic SDK pattern.

from underthesea.agent import Agent, OpenAI, AzureOpenAI, Anthropic, Gemini, LLM

# OpenAI
agent = Agent(name="bot", provider=OpenAI(api_key="sk-..."))

# Azure OpenAI
agent = Agent(name="bot", provider=AzureOpenAI(
    api_key="...",
    endpoint="https://my.openai.azure.com",
    deployment="gpt-4",
))

# Anthropic Claude
agent = Agent(name="bot", provider=Anthropic(api_key="sk-ant-..."))

# Google Gemini
agent = Agent(name="bot", provider=Gemini(api_key="..."))

# Auto-detect from environment variables
agent = Agent(name="bot", provider=LLM())

Streaming

for chunk in agent.stream("Explain what an AI agent is"):
    print(chunk, end="", flush=True)

Tool Calling

from underthesea.agent import Agent, Tool, OpenAI

def get_weather(location: str) -> dict:
    """Get current weather for a location."""
    return {"location": location, "temp": 25, "condition": "sunny"}

agent = Agent(
    name="assistant",
    provider=OpenAI(),
    tools=[Tool(get_weather)],
    instruction="You are a helpful assistant.",
)

agent("What's the weather in Hanoi?")
# 'The weather in Hanoi is 25°C and sunny.'

Default Tools

12 built-in tools: calculator, datetime, web search, wikipedia, file I/O, shell, python exec.

from underthesea.agent import Agent, default_tools, LLM

agent = Agent(name="assistant", provider=LLM(), tools=default_tools)
agent("Calculate sqrt(144) + 10")

Multi-Session

Long-running agents with context reset and structured handoff between sessions, following Anthropic harness patterns.

from underthesea.agent import Agent, Session, AzureOpenAI

agent = Agent(name="researcher", provider=AzureOpenAI(...))
session = Session(agent, progress_file="progress.json")
session.create_task("Analyze documents", [
    "Read and classify documents",
    "Summarize each group",
    "Write final report",
])
session.run_until_complete(max_sessions=5)

Architecture

underthesea.agent
├── providers/
│   ├── OpenAI          # api.openai.com
│   ├── AzureOpenAI     # *.openai.azure.com
│   ├── Anthropic       # api.anthropic.com
│   └── Gemini          # generativelanguage.googleapis.com
├── Agent               # Tool calling loop + streaming
├── LLM                 # Auto-detect provider from env vars
├── Session             # Multi-session orchestration
├── Tool                # Function → tool wrapper
└── default_tools       # 12 built-in tools

Vietnamese NLP

See full documentation at NLP.md.

Pipeline Usage
Sentence Segmentation sent_tokenize(text)
Text Normalization text_normalize(text)
Word Segmentation word_tokenize(text)
POS Tagging pos_tag(text)
Chunking chunk(text)
Named Entity Recognition ner(text)
Text Classification classify(text)
Sentiment Analysis sentiment(text)
Language Detection lang_detect(text)
Dependency Parsing dependency_parse(text)
Translation translate(text)
Text-to-Speech tts(text)
from underthesea import word_tokenize, ner, sentiment

word_tokenize("Chàng trai 9X Quảng Trị khởi nghiệp từ nấm sò")
# ["Chàng trai", "9X", "Quảng Trị", "khởi nghiệp", "từ", "nấm", "sò"]

ner("Chưa tiết lộ lịch trình tới Việt Nam của Tổng thống Mỹ Donald Trump")
# [... ('Việt Nam', 'Np', 'B-NP', 'B-LOC'), ... ('Donald', 'Np', 'B-NP', 'B-PER'), ('Trump', 'Np', 'B-NP', 'I-PER')]

sentiment("Sản phẩm hơi nhỏ nhưng chất lượng tốt, đóng gói cẩn thận.")
# 'positive'

Contributing

Do you want to contribute with underthesea development? Great! Please read more details at Contributing Guide

💝 Support Us

If you found this project helpful and would like to support our work, you can just buy us a coffee ☕.

Your support is our biggest encouragement 🎁!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

underthesea-9.3.0.tar.gz (7.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

underthesea-9.3.0-py3-none-any.whl (7.2 MB view details)

Uploaded Python 3

File details

Details for the file underthesea-9.3.0.tar.gz.

File metadata

  • Download URL: underthesea-9.3.0.tar.gz
  • Upload date:
  • Size: 7.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for underthesea-9.3.0.tar.gz
Algorithm Hash digest
SHA256 202a29ae454fbf015db20ebd584877370ce8d1435737c37c5cc88ff3efda8b27
MD5 de5f8f7e3715e8afb15e28bc98bb6b94
BLAKE2b-256 40fef2e675fe64008613e7fd9de60ff0a0f0859b6f9fd4cb37b087a36fe66038

See more details on using hashes here.

Provenance

The following attestation bundles were made for underthesea-9.3.0.tar.gz:

Publisher: release-pypi.yml on undertheseanlp/underthesea

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file underthesea-9.3.0-py3-none-any.whl.

File metadata

  • Download URL: underthesea-9.3.0-py3-none-any.whl
  • Upload date:
  • Size: 7.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for underthesea-9.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6166d961cc8ba43d591f9316a7c35df59db794523575ef0577617ebdbb3e6d62
MD5 b550195d21d903b484ef7d0d94a4e84f
BLAKE2b-256 896efb9748d4391b8edbaea8fd4e9c0e300ffce1dc7d7197aed086396e131dce

See more details on using hashes here.

Provenance

The following attestation bundles were made for underthesea-9.3.0-py3-none-any.whl:

Publisher: release-pypi.yml on undertheseanlp/underthesea

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page