Vinews is an open-source library which provides modules for searching and scraping news data from Vietnamese news websites.
Project description
VINEWS
An open-source library dedicated to searching and scraping Vietnamese news websites with the goal of providing tools and enhancing AI agents news searching capabilities.
Note: This library is still under active development. Expect bugs and incomplete features. Contributions are welcome and appreciated!
Disclaimer & Terms of Use ‼️
This library is provided for educational and research purposes only. You are solely responsible for how you use it. Before scraping any website, you must ensure that your actions comply with all applicable laws and the website’s own policies — including their Terms of Service and robots.txt directives. Many websites explicitly prohibit automated access. The authors and contributors are not responsible for any misuse or legal issues arising from the use of this tool. Always scrape ethically, respectfully, and within legal boundaries.
Responsible Scraping ‼️
Please be respectful of the websites you interact with. Always use appropriate rate limiting and avoid sending excessive requests. Scraping should never disrupt or degrade the performance of a website. Generating unreasonable traffic may not only lead to IP bans but could also violate legal or ethical standards. Respect the site's resources, policies, and the efforts of its creators.
Supported Websites
- VnExpress
- more coming soon...
Installation
pip install vinews==0.1.0b5
Quick Start
from vinews.modules.vnexpress.search import VinewsVnExpressSearch
import asyncio
import json
search_engine = VinewsVnExpressSearch()
query = "Bitcoin"
# Test synchronous search
results = search_engine.search(query=query, date_range="day", category="kinhdoanh", limit=5, advanced=True)
print(results)
homepage = search_engine.search_homepage()
print(homepage)
def vinews_async():
# Test asynchronous search
async def async_test():
async_results = await search_engine.async_search(query=query, date_range="day", category="kinhdoanh", limit=5, advanced=True)
async_homepage = await search_engine.async_search_homepage()
# Optional saving
with open("tests/output/vnexpress_search.json", "w", encoding="utf-8") as f:
json.dump(async_results.model_dump(), f, indent=2, ensure_ascii=False)
with open("tests/output/vnexpress_homepage.json", "w", encoding="utf-8") as f:
json.dump(async_homepage.model_dump(), f, indent=2, ensure_ascii=False)
asyncio.run(async_test())
if __name__ == "__main__":
vinews_async()
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the Apache 2.0 - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vinews-0.1.0b5.tar.gz.
File metadata
- Download URL: vinews-0.1.0b5.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1191c8fc367baeb2117534252a7c0e41c91bd823160b6bf52dfd473f600ad84d
|
|
| MD5 |
0ae651386fc6428d167683eb688458b5
|
|
| BLAKE2b-256 |
f4c79164d339499617be993f5ccd11326fd14d98c43853a28357980485623c91
|
File details
Details for the file vinews-0.1.0b5-py3-none-any.whl.
File metadata
- Download URL: vinews-0.1.0b5-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba2547f5c9ca3f8a14443be9cf1d447512cec07462b7941f913bb075a9946121
|
|
| MD5 |
c2125464614a7cb3e6502b528892ab87
|
|
| BLAKE2b-256 |
2dd99ac650647aee9888714a33e0648cd81ac978930e7e2eb73aef8b250a64be
|