Skip to main content

A Sparv plugin for classifying text using the Superlim baseline models

Project description

Sparv-Superlim: A plugin for classifying text using the models trained on tasks in Superlim

Sparv-Superlim is a Sparv plugin for classifying text using the Superlim baseline models. Superlim is a multi-task benchmark for Swedish, which includes baseline models.

How to use?

Install Sparv-Superlim by injecting it into the Sparv Pipeline:

pipx inject sparv-pipeline git@github.com:spraakbanken/sparv-sbx-superlim.git

See the Sparv documentation for more details on how to install plugins.

Then make a config file and choose the relevant annotations:

metadata:
  id: corpora
  name:
    eng: corpora
    swe: korpora
  language: swe
  description:
    eng: Swedish political manifestos with Superlim annotations
    swe: Svenska valmanifest med Superlimannoteringar
import:
  source_dir: source_small
  importer: text_import:parse
export:
  default:
    - xml_export:pretty
    - sbx_superlim:predictions
  annotations:
    - <token>
    - <sentence>
    - <sentence>:sbx_superlim.migration_stance
    - <sentence>:sbx_superlim.nuclear_stance
sbx_superlim:
  hf_model_path:
    absabank-imm: 'sbx/bert-base-swedish-cased_absabank-imm'
    argumentation: 'sbx/bert-base-swedish-cased-argumentation_sent'
  hf_inference_args:
    batch_size: 32

Plugin-specicific variables which start with hf are HuggingFace parameters. The most important one is the hf_model_path which tells which fine-tuned model to use for each task.

Full working examples can be found in the examples folder.

Available annotations

So far, Sparv-Superlim provides 10 different annotations. These are summarized in the table below:

Superlim task Sparv-Superlim Annotation Annotation Label Segment
absabank-imm migration_stance Attitude towards immigration float between 1-5 sentence
argumentation- sentences [topic]_stance Stance to a given topic pro, con or neutral sentence
dalaj-ged correct_swedish Correct Swedish correct or incorrect sentence
swenli previous_entailment The logical relationship of two sentences entailment, contradiction or neutral sentence pair
sweparaphrase similarity Similarity between two sentences float between 1-5 sentence pair

Wish to contribute?

Do you have new, innovative ways of incorporating models trained on Superlim into Sparv-Superlim? Make a feature request or even better a pull request!

How to cite?

Please cite the following technical report: Felix Morger. 2024. When Sparv met Superlim…A Sparv plugin for natural language understanding analysis of Swedish. Tech. rep. University of Gothenburg. You can also use the bibtex entry below.

@techreport{sparv-superlim,
  title =	 {When {S}parv met {S}uperlim\ldots {A} {S}parv Plugin for Natural Language Understanding Analysis of {S}wedish},
  author =	 {Morger, Felix},
  url = {https://hdl.handle.net/2077/83664},
  year =	 {2024},
  publisher = {Språkbanken Text},
  institution =	 {University of Gothenburg},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv_sbx_superlim-0.1.0.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

sparv_sbx_superlim-0.1.0-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file sparv_sbx_superlim-0.1.0.tar.gz.

File metadata

  • Download URL: sparv_sbx_superlim-0.1.0.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.13

File hashes

Hashes for sparv_sbx_superlim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8cf780757deb0ec07c56259daa1b9046ea0c1fc9ef2390f7ce5b335c64e8483f
MD5 0e64c53481d6324bfa057422f26dc907
BLAKE2b-256 d6f116d778f88ddd2ea8ce5fa78dc8eb0dc0b2fe00ed6c8e2bd4e79ad9f5360f

See more details on using hashes here.

File details

Details for the file sparv_sbx_superlim-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sparv_sbx_superlim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4096eee5437c5c304fb2008153efae39c21c23ed332405e61809a90b30604043
MD5 ff6d6d16e43872bf5dc37ba9c9944a01
BLAKE2b-256 1e51b4142a3d87618b7f11759d9bef0b7184392ebe7bd0c4500e8e4661e2b8b7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page