Skip to main content

A python package for web scraping korean news articles

Project description

Korean News Scraper

Hello World! Korean News Scraper aims to be a Korean language data collection tool for LLM.

After version 0.1.1. is possible to use. Please do not use it before.

Build

$ python3 setup.py

Required Libraries

  • beautifulsoup4
  • selenium
  • pandas
  • requests
  • tqdm

Quick Start

import korean_news_scraper

keywords = ["news", "happy", "environment"]
korean_news_scraper.save_article_links(keywords, "data", lang="en-EN")
korean_news_scraper.extract_article_content("data")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

korean_news_scraper-0.1.5.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

korean_news_scraper-0.1.5-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file korean_news_scraper-0.1.5.tar.gz.

File metadata

  • Download URL: korean_news_scraper-0.1.5.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for korean_news_scraper-0.1.5.tar.gz
Algorithm Hash digest
SHA256 25ea2e4d4965f809e4eba89909f628167287dcef1eced6e2808140874c737387
MD5 a7152897b2cf6f0b8a97901a343f2560
BLAKE2b-256 0b40eaa46802ddfb7217eaceec59e0fd2f94cbbf9ef6152bb0830a4cf9c0b3a8

See more details on using hashes here.

File details

Details for the file korean_news_scraper-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for korean_news_scraper-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b25f728e48a0e6bf79c06cad30fde4a7cb81b1d0743136d79f06e026d9a766b7
MD5 3b60b2b4e051acebacd1fc7ed12caf60
BLAKE2b-256 aa724b7c7624209e0316c074711f91e77b0ba83bcaa7327ea4ae9ec3935d2c74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page