Skip to main content

This module aims to simplify Python package management, script execution, file handling, web scraping, multimedia download, data cleaning, and word list generation for literary students, making it more accessible and convenient to use.

Project description

Purpose: This module aims to assist Python beginners, particularly instructors and students of foreign languages and literature, by providing a convenient way to manage Python packages, run Python scripts, and perform operations on various file types such as txt, xlsx, json, tsv, and docx. It also includes functionality for data scraping, cleaning and generating word lists.

Function 1: Enables efficient data retrieval and storage in files with a single line of code.

Function 2: Facilitates retrieval of all absolute file paths and file names in any folder (including sub-folders) with a single line of code using "FilePath" and "FileName" functions.

Function 3: Simplifies creation of word lists and frequency sorting from a file or batch of files using "word_list" and "batch_word_list" functions in PgsFile.

Function 4: Pgs-Corpora is a comprehensive language resource included in this library, featuring a monolingual corpus of native and translational Chinese and native and non-native English, as well as a bi-directional parallel corpus of Chinese and English texts covering financial, legal, political, academic, and sports news topics. Additionally, the library includes a collection of 8774 English idioms, stopwords for 28 languages, and a termbank of Chinese thought and culture.

Function 5: This library provides support for common text cleaning tasks, such as removing empty text, empty lines, and folders containing empty text. It also offers functions for converting full-width characters to half-width characters and vice versa, as well as standardizing the format of Chinese and English punctuation. These features can help improve the quality and consistency of text data used in natural language processing tasks.

Function 6: It also manages Python package installations and uninstallations, and allows running scripts and commands in Python interactive command lines instead of Windows command prompt.

Function 7: Download audiovisual files like videos, images, and audio using audiovisual_downloader, which is extremely useful and efficient. Additionally, scrape newspaper data with PGScraper, a highly efficient tool for this purpose.

Table 1: The directory and size of Pgs-Corpora ├── Idioms (1, 171.78 KB) ├── Monolingual (2197, 63.65 MB) │ ├── Chinese (456, 15.27 MB) │ │ ├── People's Daily 20130605 (396, 1.38 MB) │ │ │ ├── Raw (132, 261.73 KB) │ │ │ ├── Seg_only (132, 471.47 KB) │ │ │ └── Tagged (132, 675.30 KB) │ │ └── Translational Fictions (60, 13.89 MB) │ └── English (1741, 48.38 MB) │ ├── Native (65, 44.14 MB) │ │ ├── A Short Collection of British Fiction (27, 33.90 MB) │ │ └── Preschoolers- and Teenagers-oriented Texts in English (36, 10.24 MB) │ ├── Non-native (1675, 3.63 MB) │ │ └── Shanghai Daily (1675, 3.63 MB) │ │ └── Business_2019 (1675, 3.63 MB) │ │ ├── 2019-01-01 (1, 3.35 KB) │ │ ├── 2019-01-02 (1, 3.65 KB) │ │ ├── 2019-01-03 (7, 10.90 KB) │ │ ├── 2019-01-04 (5, 9.63 KB) │ │ └── 2019-01-07 (4, 9.50 KB) │ │ └── ... (and 245 more directories) │ └── Translational (1, 622.57 KB) ├── Parallel (371, 24.67 MB) │ ├── HK Financial and Legal EC Parallel Corpora (5, 19.17 MB) │ ├── New Year Address_CE_2006-2021 (15, 147.49 KB) │ ├── Sports News_CE_2010 (20, 66.42 KB) │ ├── TED_EC_2017-2020 (330, 5.24 MB) │ └── Xi's Speech_CE_2021 (1, 53.01 KB) ├── Stopwords (28, 88.09 KB) └── Terminology (1, 2.20 MB)

...

Author: Pan Guisheng, a PhD student at the Graduate Institute of Interpretation and Translation of Shanghai International Studies University E-mail: 895284504@qq.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

PgsFile-0.1.5-py3-none-any.whl (46.2 MB view details)

Uploaded Python 3

File details

Details for the file PgsFile-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: PgsFile-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 46.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for PgsFile-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3c9a252b460a7fe47149298c5a9dd909f984c1d7872cff3259da026c8fcb5ac4
MD5 c5549fd24b5802c357d8db92aec7c01c
BLAKE2b-256 db1938e5605e718e01745c58f1a316b78f4585dc37d37924ad9c030c282ad152

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page