Skip to main content

A python package for web scraping korean news articles

Project description

Korean News Scraper

Hello World! Korean News Scraper aims to be a Korean language data collection tool for LLM.

After version 0.1.1. is possible to use. Please do not use it before.

Build

$ python3 setup.py

Required Libraries

  • beautifulsoup4
  • selenium
  • pandas
  • requests
  • tqdm

Quick Start

import korean_news_scraper

keywords = ["news", "happy", "environment"]
korean_news_scraper.save_article_links(keywords, "data", lang="en-EN")
korean_news_scraper.extract_article_content("data")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

korean_news_scraper-0.1.4.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

korean_news_scraper-0.1.4-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file korean_news_scraper-0.1.4.tar.gz.

File metadata

  • Download URL: korean_news_scraper-0.1.4.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for korean_news_scraper-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ef1dc4118324850cf172bbfe3172c3d54838c2d19c75b1aa76230ca864d36255
MD5 5edaba52a6475707ea3e6f54a6e0ef95
BLAKE2b-256 c36fb6bedc1963461e269637f130e168ee044c3f4b4e3f7d9fbeb9e12e683d1c

See more details on using hashes here.

File details

Details for the file korean_news_scraper-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for korean_news_scraper-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9a222fcf29b8b35850539f0f74583a531d47851a650cd60f74e72ffaeadb29a7
MD5 dbc29e086bae973da7459662cdf2a1b5
BLAKE2b-256 4f58ca104f01be1ea6cbc223740cc1e7697407d907e0e5a56bb2b770c28a1aaf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page