Baybin Sentinel: OpenSearch writer
Project description
Baybin Sentinel
baybin_sentinel is a Python utility package designed for the Baybin Sentiment Analysis System. It provides specialized writers to streamline the ingestion of social media data into OpenSearch.
Currently supported platforms: Facebook, Threads, PTT, News (RSS / Scrapy), Google Trends.
Installation
(For Crawler Developers) Install Package
pip install -U baybin_sentinel
(For Package Developers) Create Virtual Environment
conda update -n base -c conda-forge conda
conda create -n sentinel python=3.13 pip -y
conda activate sentinel
cd baybin_sentinel
pip install -r requirements.txt
pip install -e .
Configuration
Each writer accepts credentials either as direct parameters or via a config file.
Option A — direct parameters:
writer = PttWriter(
host="192.168.x.x",
port=9200,
user="your_username",
password="your_password",
verify_certs=False,
)
Option B — config file (recommended for development):
writer = PttWriter(config_path="/absolute/path/to/config.yaml")
# config.yaml
opensearch:
host: "your_opensearch_ip"
port: 9200
user: "your_username"
password: "your_password"
verify_certs: false
Option C — environment variable (recommended for Celery workers / containers):
Set BAYBIN_SENTINEL_CONFIG to the absolute path of your config file. Takes priority over config_path.
export BAYBIN_SENTINEL_CONFIG=/absolute/path/to/config.yaml
Config resolution order: direct params → BAYBIN_SENTINEL_CONFIG env var → config_path argument → default "config.yaml" (relative to CWD).
Index naming convention
Each writer targets a dedicated OpenSearch index following the pattern raw_{platform}_{content_type}s:
| Writer | Post index | Comment index |
|---|---|---|
FacebookWriter |
raw_facebook_posts |
raw_facebook_comments |
ThreadsWriter |
raw_threads_posts |
raw_threads_comments |
PttWriter |
raw_ptt_posts |
— |
NewsWriter |
raw_news_posts |
— |
GoogleTrendsWriter |
raw_google_trends_posts |
— |
Field normalization
Each writer accepts pre-normalized data and routes fields to root vs metadata before writing to OpenSearch.
Canonical root-level fields (posts):
post_id, platform, client_id, source_name, url, content, author_name, language, timestamp, crawled_at, s3_path
Canonical root-level fields (comments):
comment_id, legacy_comment_id, post_id, post_url, platform, client_id, author_id, author_name, content, content_hash, timestamp, crawled_at, created_at, depth, s3_path
Any field not in the canonical set is automatically moved into a nested metadata object.
Validation
Every writer validates the document before writing to OpenSearch. A ValueError is raised immediately if any required field is missing or empty — no silent bad writes.
Required post fields: post_id, platform, client_id, timestamp, crawled_at
Required comment fields: comment_id (or legacy_comment_id), post_id, platform, client_id, content, timestamp, crawled_at
This means:
- You must call
normalize_post()/normalize_comment()before passing data to the writer — passing a raw API response directly will raise. client_idmust always be present — enforces multi-tenancy at the write layer.
Platform field maps
ThreadsWriter — accepts raw output from the internal Threads scraper:
| Raw field | Canonical field |
|---|---|
text |
content (posts and comments) |
post_url |
url (posts) |
author |
author_name (posts) |
reply_author |
author_name (comments) |
reply_author_id |
author_id (comments) |
FacebookWriter — expects pre-normalized post data (output of normalize_post()). Comment field map:
| Raw field | Canonical field |
|---|---|
reply_author |
author_name |
reply_author_id |
author_id |
PttWriter, NewsWriter, GoogleTrendsWriter — expect pre-normalized data with canonical field names already set.
Example (Facebook)
from baybin_sentinel.platforms.facebook import FacebookWriter
writer = FacebookWriter(
host="192.168.x.x",
port=9200,
user="your_username",
password="your_password",
verify_certs=False,
)
# Single post with its comments
writer.save(post, comments)
# Bulk posts only
writer.save_bulk_posts(posts)
# Bulk comments only
writer.save_bulk_comments(comments)
Example (Threads)
from baybin_sentinel.platforms.threads import ThreadsWriter
writer = ThreadsWriter(config_path="/path/to/config.yaml")
# Single post with its replies (extracted from post["replies_detail"])
writer.save(post)
# Single post with explicit comments
writer.save(post, comments)
# Bulk posts only
writer.save_bulk_posts(posts)
# Bulk comments for one post
writer.save_bulk_comments(replies, post_url="https://threads.net/...")
Example (PTT)
from baybin_sentinel.platforms.ptt import PttWriter
writer = PttWriter(config_path="/path/to/config.yaml")
writer.save_post(post)
writer.save_bulk_posts(posts)
Example (News)
from baybin_sentinel.platforms.news import NewsWriter
writer = NewsWriter(config_path="/path/to/config.yaml")
writer.save_post(post)
writer.save_bulk_posts(posts)
Example (Google Trends)
from baybin_sentinel.platforms.google_trends import GoogleTrendsWriter
writer = GoogleTrendsWriter(config_path="/path/to/config.yaml")
writer.save_post(post)
writer.save_bulk_posts(posts)
Publishing to PyPI
If you are the maintainer, follow these steps to publish a new version:
- Update version in
pyproject.toml(e.g.,0.2.0). - Install build tools:
pip install build twine
- Build the package:
rmdir /s /q dist build 2>nul python -m build
- Upload to PyPI:
python -m twine upload dist/*
- Authentication:
- Username:
__token__ - Password:
pypi-your-api-token-here(including thepypi-prefix)
- Username:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file baybin_sentinel-2026.6.25.1.tar.gz.
File metadata
- Download URL: baybin_sentinel-2026.6.25.1.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e89952d572e34466f1b93ef87061444adddaf9cc4a1fffa320a37dffaed7f346
|
|
| MD5 |
5949bbd25010e4c9e7e6f1d24bc7579a
|
|
| BLAKE2b-256 |
ece7bd84041cac60bed3d289861db161c73e83efffbac240bb077bf346924344
|
File details
Details for the file baybin_sentinel-2026.6.25.1-py3-none-any.whl.
File metadata
- Download URL: baybin_sentinel-2026.6.25.1-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
207650a4a034b2cd9278ba849648e3f9e6019936b16ac6bbe68186aff4610e11
|
|
| MD5 |
94cfda70eff51f317d96805ada2c488f
|
|
| BLAKE2b-256 |
4888fdecdee23d92c840439b376a42db9d930fca40e9885562f073a1c90c62dd
|