This module aims to simplify Python package management, script execution, file handling, web scraping, multimedia download, data cleaning, and word list generation for literary students, making it more accessible and convenient to use.
Project description
Purpose: This module aims to assist Python beginners, particularly instructors and students of foreign languages and literature, by providing a convenient way to manage Python packages, run Python scripts, and perform operations on various file types such as txt, xlsx, json, tsv, and docx. It also includes functionality for data scraping, cleaning and generating word lists.
Function 1: Enables efficient data retrieval and storage in files with a single line of code.
Function 2: Facilitates retrieval of all absolute file paths and file names in any folder (including sub-folders) with a single line of code using "FilePath" and "FileName" functions.
Function 3: Simplifies creation of word lists and frequency sorting from a file or batch of files using "word_list" and "batch_word_list" functions in PgsFile.
Function 4: Pgs-Corpora is a comprehensive language resource included in this library, featuring a monolingual corpus of native and translational Chinese and native and non-native English, as well as a bi-directional parallel corpus of Chinese and English texts covering financial, legal, political, academic, and sports news topics. Additionally, the library includes a collection of 8774 English idioms, stopwords for 28 languages, and a termbank of Chinese thought and culture.
Function 5: This library provides support for common text cleaning tasks, such as removing empty text, empty lines, and folders containing empty text. It also offers functions for converting full-width characters to half-width characters and vice versa, as well as standardizing the format of Chinese and English punctuation. These features can help improve the quality and consistency of text data used in natural language processing tasks.
Function 6: It also manages Python package installations and uninstallations, and allows running scripts and commands in Python interactive command lines instead of Windows command prompt.
Function 7: Download audiovisual files like videos, images, and audio using audiovisual_downloader, which is extremely useful and efficient. Additionally, scrape newspaper data with PGScraper, a highly efficient tool for this purpose.
Table 1: The directory and size of Pgs-Corpora ├── Idioms (1, 171.78 KB) ├── Monolingual (2197, 63.65 MB) │ ├── Chinese (456, 15.27 MB) │ │ ├── People's Daily 20130605 (396, 1.38 MB) │ │ │ ├── Raw (132, 261.73 KB) │ │ │ ├── Seg_only (132, 471.47 KB) │ │ │ └── Tagged (132, 675.30 KB) │ │ └── Translational Fictions (60, 13.89 MB) │ └── English (1741, 48.38 MB) │ ├── Native (65, 44.14 MB) │ │ ├── A Short Collection of British Fiction (27, 33.90 MB) │ │ └── Preschoolers- and Teenagers-oriented Texts in English (36, 10.24 MB) │ ├── Non-native (1675, 3.63 MB) │ │ └── Shanghai Daily (1675, 3.63 MB) │ │ └── Business_2019 (1675, 3.63 MB) │ │ ├── 2019-01-01 (1, 3.35 KB) │ │ ├── 2019-01-02 (1, 3.65 KB) │ │ ├── 2019-01-03 (7, 10.90 KB) │ │ ├── 2019-01-04 (5, 9.63 KB) │ │ └── 2019-01-07 (4, 9.50 KB) │ │ └── ... (and 245 more directories) │ └── Translational (1, 622.57 KB) ├── Parallel (371, 24.67 MB) │ ├── HK Financial and Legal EC Parallel Corpora (5, 19.17 MB) │ ├── New Year Address_CE_2006-2021 (15, 147.49 KB) │ ├── Sports News_CE_2010 (20, 66.42 KB) │ ├── TED_EC_2017-2020 (330, 5.24 MB) │ └── Xi's Speech_CE_2021 (1, 53.01 KB) ├── Stopwords (28, 88.09 KB) └── Terminology (1, 2.20 MB)
...
Author: Pan Guisheng, a PhD student at the Graduate Institute of Interpretation and Translation of Shanghai International Studies University E-mail: 895284504@qq.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file PgsFile-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: PgsFile-0.1.5-py3-none-any.whl
- Upload date:
- Size: 46.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c9a252b460a7fe47149298c5a9dd909f984c1d7872cff3259da026c8fcb5ac4 |
|
MD5 | c5549fd24b5802c357d8db92aec7c01c |
|
BLAKE2b-256 | db1938e5605e718e01745c58f1a316b78f4585dc37d37924ad9c030c282ad152 |