helper pip package module for analyzing presidential speech records (including basic data)
Project description
president-speech
- Presidents of the Republic of Korea Speeches
- Parquet, provided in the form of sqlite db file
- Comes with simple cli
SUMMARY OF PROVIDED DATA
-
data per case can be checked in the following ways
-
some data show date values as empty columns or years only
president size min(date) max(date) 이승만 998 1948.07.24 1959.03.10 윤보선 3 1960.08.13 1960.09.15 박정희 1270 1963.12.17 1979.10.26 최규하 58 1979.10.27 1980.08.16 전두환 602 1980.06.05 1987.02.16 노태우 601 1988.02.25 1992.10.05 김영삼 728 1993.01.09 1998.01.23 김대중 822 1998.02.25 2003.02.17 노무현 780 2003.02.25 2008.01.28 이명박 1027 2008.02.25 2013.02.07 박근혜 493 2013.02.24 2016.10.26 문재인 1389 2017.05.10 2022.03.30 >>> df.info() <class 'pandas.core.frame.DataFrame'> Index: 8771 entries, 5368 to 102591 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 division_number 8771 non-null int64 1 president 8771 non-null object 2 title 8771 non-null object 3 date 8771 non-null object 4 location 8771 non-null object 5 kind 8771 non-null object 6 speech_text 8771 non-null object dtypes: int64(1), object(6) memory usage: 548.2+ KB
Use
$ pip install president-speech
>>> from president_speech.db.parquet_interpreter import read_parquet, get_parquet_full_path
>>> get_parquet_full_path()
'/Users/f16/code/edu/president-speech/.venv/lib/python3.8/site-packages/president_speech/db/parquet/president_speech_ko.parquet'
>>> read_parquet().head(3)
division_number president title date location kind speech_text
5368 1305368 박정희 제5대 대통령 취임식 대통령 취임사 1963.12.17 국내 취임사 \n\n\n단군성조가 천혜의 이 강토 위에 국기를 닦으신지 반만년, 연면히 이어온 ...
5369 1305369 박정희 국회 개회식 치사 1963.12.17 국내 기념사 존경하는 국회의장, 의원제위 그리고 내외귀빈 여러분! 오늘 이 뜻깊은 제3공화국의...
5370 1305370 박정희 신년 메시지 1964.01.01 국내 신년사 친애하는 국내외의 동포 여러분! 혁명의 고된 시련을 겪고 민정이양으로 매듭을 지은...
>>>
Use Cli
$ ps-wordcount -h
usage: ps-word-count [-h] [-t | -p] word
Word frequency output from previous presidential speeches
positional arguments:
word Search word
optional arguments:
-h, --help show this help message and exit
-t, --table Table Format Output
-p, --plot Format Output
$ ps-word-count -p 독립
문재인 [954] ****************************************
이승만 [430] ******************
박정희 [361] ****************
이명박 [176] ********
김대중 [171] ********
전두환 [169] ********
노무현 [167] *******
노태우 [131] ******
김영삼 [114] *****
박근혜 [ 71] ***
최규하 [ 4] *
윤보선 [ 0]
$ ps-word-count -t 독립
| | president | mention |
|---:|:------------|----------:|
| 0 | 문재인 | 954 |
| 1 | 이승만 | 430 |
| 2 | 박정희 | 361 |
| 3 | 이명박 | 176 |
| 4 | 김대중 | 171 |
| 5 | 전두환 | 169 |
| 6 | 노무현 | 167 |
| 7 | 노태우 | 131 |
| 8 | 김영삼 | 114 |
| 9 | 박근혜 | 71 |
| 10 | 최규하 | 4 |
| 11 | 윤보선 | 0 |
Ref
- 대통령기록관 연설기록
- 대통령기록관_행정안전부 대통령기록관_대통령연설기록 연설문 API
- https://stackoverflow.com/questions/45470964/python-extracting-text-from-webpage-pdf
- https://pypdf.readthedocs.io/en/latest/user/extract-text.html
- https://setuptools.pypa.io/en/latest/userguide/datafiles.html
- https://frhyme.github.io/python-basic/py_no_break_space/
- https://pypi.org/project/pandas-downcast/
- https://realpython.com/python-project-documentation-with-mkdocs/
- https://publivate.tistory.com/245 modin, dask, vaex => pandas
- https://www.fileformat.info/info/emoji/list.htm
- https://discuss.streamlit.io/t/version-1-26-0/50056
- https://github.com/ai-yash/st-chat
- https://konlpy.org/ko/latest/index.html
- https://liveyourit.tistory.com/57
- https://dongdongfather.tistory.com/70
- https://docs.python.org/3/library/collections.html#collections.Counter
Development environment setting
$ git clone ...
$ cd president-speech
$ pdm venv create
$ source .venv/bin/activate
$ pdm install
$ pdm add -dG test pytest pytest-cov
$ pdm test
$ pdm ptest
$ pdm ctest
---------- coverage: platform darwin, python 3.9.18-final-0 ----------
Name Stmts Miss Cover
--------------------------------------------------------------------
src/president_speech/__init__.py 0 0 100%
src/president_speech/db/__init__.py 0 0 100%
src/president_speech/db/connection_manager.py 17 3 82%
src/president_speech/db/parquet_interpreter.py 25 1 96%
src/president_speech/db/search.py 15 1 93%
tests/__init__.py 0 0 100%
tests/test_parquet_interpreter.py 11 0 100%
tests/test_search.py 5 0 100%
--------------------------------------------------------------------
TOTAL 73 5 93%
Deploy to fly.io with Docker Technology
$ docker build -t president-speech-webapp .
$ docker run -it --rm -p 7942:8051 president-speech-webapp
$ fly deploy
Visit your newly deployed app at https://president-speech.fly.dev/
Give it a try. And opinions are always welcome. Of course, it's PR.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
president-speech-0.9.1.tar.gz
(73.3 MB
view details)
Built Distribution
File details
Details for the file president-speech-0.9.1.tar.gz
.
File metadata
- Download URL: president-speech-0.9.1.tar.gz
- Upload date:
- Size: 73.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.9.1 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7646de9735bf49a6da29b7763e0f2608f568b68b9a59f015ec28b35aff76f1cc |
|
MD5 | 068a853e1e572682c98055b581446aa2 |
|
BLAKE2b-256 | 0a439fc95a460a3a78c8f5260db40b06ad0bd715715010ea99819d4d1969e366 |
File details
Details for the file president_speech-0.9.1-py3-none-any.whl
.
File metadata
- Download URL: president_speech-0.9.1-py3-none-any.whl
- Upload date:
- Size: 73.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.9.1 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c0546f06f0f445f53c5e90e3b654c86fa8fdbe392f2bb44c5058f4f0d8b1f44 |
|
MD5 | 05f633a7114a8a8c5a9086186cce1bf4 |
|
BLAKE2b-256 | ebf6de87f88419bb68146cf00caa69940e2fa6d5b58b5917c366ed22a0eb4b17 |