Python Library for Crawling Top 10 Korean News and Providing Synonym Dictionary

These details have not been verified by PyPI

Project links

Project description

Korean_News_Crawler

한국 10대 일간지 크롤링 및 유사어 사전 제공 Python 라이브러리입니다. 아직 PyPI에 정식등록되진 않은 beta 버전입니다.
Open Source Project로 기여자, 참여자 상시 모집하고 있습니다. 연락주시면 감사하겠습니다.

This is Python library for crawling articles from Korean Top 10 Newspaper sites and providing synonym dictionary.
The copyright of articles are belong to original media company. We don't take any legal responsibility using of them. We assume that you have agreed to this.
We're greeting to join you as contibutors, collaborator. Thanks to give me contact.

Supported News Sites

Contibutors

_{Indigo_Coder}

Installation

pip install korean_news_crawler

BeautifulSoup, Selenium, Requests are required.

Quick Usage

from korean_news_crawler import chosun

chosun = Chosun()
print(chosun.dynamic_crawl("https://www.chosun.com/..."))

chosun_url_list = list() #Chosun Ilbo url list
print(chosun.dynamic_crawl(chosun_url_list))

`korean_news_crawler.Chosun(delay_time=None, saving_html=False)`

It provides crawling Chosun Ilbo.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Donga(delay_time=None, saving_html=False)`

It provides crawling Dong-a Ilbo.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Hankook(delay_time=None, saving_html=False)`

It provides crawling Hankook Ilbo.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Hankyoreh(delay_time=None, saving_html=False)`

It provides crawling Hankyoreh.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Joongang(delay_time=None, saving_html=False)`

It provides crawling Joongang Ilbo.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Kukmin(delay_time=None, saving_html=False)`

It provides crawling Kukmin Ilbo.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Kyunghyang(delay_time=None, saving_html=False)`

It provides crawling Kyunghyang Shinmun.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Munhwa(delay_time=None, saving_html=False)`

It provides crawling Munhwa Ilbo.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Naeil(delay_time=None, saving_html=False)`

It provides crawling Naeil News.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Segye(delay_time=None, saving_html=False)`

It provides crawling Segye Ilbo.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`korean_news_crawler.Seoul(delay_time=None, saving_html=False)`

It provides crawling Seoul Shinmun.

Parameters

Parameters	Type	Description
delay_time	float or tuple	- Optional, Defaults to None. - When 'delay_time=float', it will crawl sites with delay. - When 'delay_time=tuple', it will crawl sites with random delay.
saving_html	bool	- Optional, Defaults to False. - When 'saving_html=False', it always requests url every function calling. - When 'saving_html=True', It will save requested html only first time. After that, it calls saved html. This will help to alleviate server load.

Attributes

Attributes	Type	Description
delay_time	float or tuple
saving_html	bool

Methods

Methods	Description
dynamic_crawl(url)	Return article text using Selenium.
static_crawl(url)	Return article text using BeautifulSoup.

`dynamic_crawl(url)`

Return article text using Selenium.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

`static_crawl(url)`

Return article text using BeautifulSoup.

Parameters	Type	Description
url	str or list	- When 'url=str', it will only crawl given url. - When 'url=list', it will crawl with iterating url list.

Returns Type	Description
list	Return article list.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.5

May 6, 2024

1.0.4

May 6, 2024

1.0.3

May 6, 2024

1.0.2

May 6, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

korean_news_crawler-1.0.5.tar.gz (10.5 kB view details)

Uploaded May 6, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

korean_news_crawler-1.0.5-py3-none-any.whl (19.4 kB view details)

Uploaded May 6, 2024 Python 3

File details

Details for the file korean_news_crawler-1.0.5.tar.gz.

File metadata

Download URL: korean_news_crawler-1.0.5.tar.gz
Upload date: May 6, 2024
Size: 10.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.11

File hashes

Hashes for korean_news_crawler-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`8753bff944d7ffc81144e635bc2c0295f63eb653fae5c75af9fc252f7d69225b`
MD5	`00ed358e88557161a2bf2df0e086f753`
BLAKE2b-256	`3941d7f4fbb646d30684aca1f0cf9556bf5c1ac58d9f246fc9c69f62aa5e9eb6`

See more details on using hashes here.

File details

Details for the file korean_news_crawler-1.0.5-py3-none-any.whl.

File metadata

Download URL: korean_news_crawler-1.0.5-py3-none-any.whl
Upload date: May 6, 2024
Size: 19.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.11

File hashes

Hashes for korean_news_crawler-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`19eb687cb9d6303a4c29cbf5c0530c0aba1f57e407799679de019c12daab673b`
MD5	`829965a5ae591c5c7d99be82e9f54ae8`
BLAKE2b-256	`b44d8f9d3f6a6f44b4f52c548c542ccca44dcc399f8b86f885ad0d7b19de3847`

See more details on using hashes here.

korean-news-crawler 1.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Korean_News_Crawler

Supported News Sites

Contibutors

Installation

Quick Usage

API

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Parameters

Attributes

Methods

dynamic_crawl(url)

static_crawl(url)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`

`dynamic_crawl(url)`

`static_crawl(url)`