swegram

CLI library for Swegram

Project description

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Description: # Swegram

## Introduction

Swegram is a tool that offers the ability to annotate and analyse Swedish and English texts. You can upload one or more texts and have them linguistically analysed with morphological and syntactic features. These linguistically annotated texts can then be used to perform quantitative linguistic text analysis; for example, the tool provides statistics about sentence lengths, total number of words, various readability metrics, part-of-speech (PoS) distribution, as well as frequency of lemmas, PoS, or misspelled words. The tool also visualize the syntactic relations between words in sentences and give detailed information about the distribution of various syntactic functions and relations in the text.

## Setup ENVIRONMENT VARIABLES `bash export SWEGRAM_WORKSPACE=$(pwd) `

## Install swegram command line interface

Before installation, it is strongly recommended to use a virtual environment `bash # Create virtual environment (Highly recommended) python3 -m venv venv source venv/bin/activate # Install swegram package pip install swegram --upgrade # Build dependencies swegram-build # Export pythonpath export PYTHONPATH="$PYTHONPATH:$(pwd):$(pwd)/tools/efselab" `

Check the usage of swegram cli

swegram -h

`console usage: SWGRAM 1.0 [-h] -l {en,sv} -i INPUT_PATH [-o OUTPUT_DIR] [--output-format {txt,xlsx,json,csv}] {annotate,statistic} ... Swegram command line interface description positional arguments: {annotate,statistic} Swegram subparser annotate Annotation parser help statistic Statistic parser help `

` optional arguments: -h, --help show this help message and exit -l {en,sv}, --language {en,sv} choose the language for annotation -i INPUT_PATH, --input-path INPUT_PATH The input path to files/directory where working files are stored -o OUTPUT_DIR, --output-dir OUTPUT_DIR The output directory where working files are stored --save-as {txt,xlsx,json} The output format `

swegram annotate -h ` --normalize Process spelling checker after tokenization and normalized tokens will be used for upcoming annotation actions. --tokenize Process sentence segmentation and tokenization. --tag Process part-of-speech tagging. --parse Process syntactic dependency parsing. --aggregate Aggregate all annotated texts into one file. `

swegram statistic -h `console --include-metadata Include certain texts by selecting metadata. For instance, "--include-metadata key1 key2:value2" only selects the texts that contain key1 or key2:value2 in the metadata -- exclude-metadata Exclude certain texts by deselecting metadata -u --units Checking statistics of features given certain linguistic unit(s). The following units are valid to be chosen: corpus, text, paragraph, sentence --aspects Checking statistics on the basis of selection of certain aspect(s). The following aspects are valid to be chosen: general, readability, morph, lexical, syntactic --include-features Only certain features will be included --exclude-features Certain features will be excluded --print Flag to print the result on console `

## Run annotate and statistic actions with swegram

For example, if you want to annotate one text file called “10-sv.txt” in the existing Resource folder named “resources/corpus/raw”, the final conll file will be generated in a folder called output-folder, type the following command

`bash swegram --language sv --input-path resources/corpus/raw/10-sv.txt --output-dir output-folder annotate `

If you have executed the command above and have the annotated file in the folder named output-folder, you can use the following command to analyze the annotated text(s) and get statistics.

`tips Please remove all metafiles in the output folder, namely all files do not end up with ".conll". rm output/*.tok output/*.tag output/*.txt `

Now, type the following command: `bash swegram --language sv --input-path output statistic `

## Dependencies

[udpipe](https://ufal.mff.cuni.cz/udpipe/1/install)

` g++ 4.7 or newer, clang 3.2 or newer, Visual C++ 2015 or newer make SWIG 3.0.8 or newer for language bindings other than C++ `

[efselab](https://github.com/robertostling/efselab)

[pandoc](https://pandoc.org)

Keywords: one,two Platform: UNKNOWN Classifier: Framework :: Django Classifier: Programming Language :: Python :: 3

Project details

Release history Release notifications | RSS feed

2.1.4

Mar 18, 2024

2.1.3

Mar 17, 2024

2.1.2

Mar 17, 2024

2.1.0

Mar 12, 2024

2.0.1

Feb 20, 2024

2.0.0

Feb 19, 2024

1.0.7

Feb 17, 2024

1.0.6

Nov 24, 2023

1.0.6.dev1 pre-release

Nov 23, 2023

1.0.6.dev0 pre-release

Nov 23, 2023

1.0.5

Nov 15, 2023

This version

1.0.4

Nov 15, 2023

1.0.3

Nov 13, 2023

1.0.2

Jun 3, 2023

1.0.1

Jun 3, 2023

1.0.0

Jun 3, 2023

1.0.0.dev12 pre-release

Jun 2, 2023

1.0.0.dev11 pre-release

Jun 2, 2023

1.0.0.dev10 pre-release

Jun 2, 2023

1.0.0.dev9 pre-release

Jun 2, 2023

1.0.0.dev8 pre-release

Jun 2, 2023

1.0.0.dev7 pre-release

Jun 2, 2023

1.0.0.dev6 pre-release

Jun 2, 2023

1.0.0.dev5 pre-release

Jun 1, 2023

1.0.0.dev4 pre-release

May 14, 2023

1.0.0.dev3 pre-release

May 14, 2023

1.0.0.dev2 pre-release

May 14, 2023

1.0.0.dev1 pre-release

May 14, 2023

1.0.0.dev0 pre-release

May 1, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

swegram-1.0.4-py3-none-any.whl (58.8 MB view details)

Uploaded Nov 15, 2023 Python 3

File details

Details for the file swegram-1.0.4-py3-none-any.whl.

File metadata

Download URL: swegram-1.0.4-py3-none-any.whl
Upload date: Nov 15, 2023
Size: 58.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for swegram-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f4a279d8efc93fb0f1333818ac97b1451e5078c68570057d0c84d1591e8498de`
MD5	`7ed54a2ed54f9e9f36c32a48be22b022`
BLAKE2b-256	`a80ed7b2279c615ae51a39626b08e7c2f2ec9bb9ec900011d59f53ab19442cf6`