Skip to main content

Low Resource Context Relation Sampler for contexts with relations for fact-checking and fine-tuning your LLM models, powered by AREkit

Project description

arekit-ss 0.25.0

📜 List of binded sources

arekit-ss [AREkit double "s"] -- is an object-pair context sampler for datasources, powered by AREkit

NOTE: For custom text sampling, please follow the ARElight project.

Installation

Install dependencies:

pip install git+https://github.com/nicolay-r/arekit-ss.git@0.25.0

Download resources:

python -m arekit_ss.download_data

Usage

Example of composing prompts:

python -m arekit_ss.sample --writer csv --source rusentrel --sampler prompt \
  --prompt "For text: '{text}', the attitude between '{s_val}' and '{t_val}' is: '{label_val}'" \
  --dest_lang en --docs_limit 1

Mind the case (issue #18): switching to another language may affect on amount of extracted data because of terms_per_context parameter that crops context by fixed and predefined amount of words.

Parameters

  • source -- source name from the list of the supported sources.
    • terms_per_context -- amount of words (terms) in between SOURCE and TARGET objects.
    • object-source-types -- filter specific source object types
    • object-target-types -- filter specific target object types
    • relation_types -- list of types, in which items separated with | char; all by default
    • splits -- Manual selection of the data-types related splits that should be chosen for the sampling process; types should be separated by ':' sign; for example: 'train:test'
  • sampler -- List of the supported samplers:
    • nn -- CNN/LSTM architecture related, including frames annotation from RuSentiFrames.
      • no-vectorize -- flag is applicable only for nn, and denotes no need to generate embeddings for features
    • bert -- BERT-based, single-input sequence.
    • prompt -- prompt-based sampler for LLM systems [prompt engeneering guide]
      • prompt -- text of the prompt which includes the following parameters:
        • {text} is an original text of the sample
        • {s_val} and {t_val} values of the source and target of the pairs respectively
        • {label_val} value of the label
  • writer -- the output format of samples:
    • csv -- for AREnets framework;
    • jsonl -- for OpenNRE framework.
    • sqlite -- SQLite-3.0 database.
  • mask_entities -- mask entity mode.
  • Text translation parameters:
    • src_lang -- original language of the text.
    • dest_lang -- target language of the text.
  • output_dir -- target directory for samples storing
  • Limiting the amount of documents from source:
    • docs_limit -- amount of documents to be considered for sampling from the whole source.
    • doc_ids -- list of the document IDs.

output_prompts

Powered by

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arekit_ss-0.25.0.tar.gz (70.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arekit_ss-0.25.0-py3-none-any.whl (116.0 kB view details)

Uploaded Python 3

File details

Details for the file arekit_ss-0.25.0.tar.gz.

File metadata

  • Download URL: arekit_ss-0.25.0.tar.gz
  • Upload date:
  • Size: 70.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for arekit_ss-0.25.0.tar.gz
Algorithm Hash digest
SHA256 5b39768c7dfc7a9d50a487e6ddf96a78db83b71c9bf295eb6b1fe695445d6e12
MD5 46b7b53e7705783193fe8638ceda6f4b
BLAKE2b-256 0dd545a9b844d77fbbaf48afb0ad882090d7e235dd6024eef7bc7cfd0a426ebf

See more details on using hashes here.

File details

Details for the file arekit_ss-0.25.0-py3-none-any.whl.

File metadata

  • Download URL: arekit_ss-0.25.0-py3-none-any.whl
  • Upload date:
  • Size: 116.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for arekit_ss-0.25.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aff0011765f57533268ecf6100e668edf612ef6251f6d06025511b8029a997fb
MD5 68b1a29421c2673e3ef0169ff461b8b9
BLAKE2b-256 bd3173d531e09c11f35be0e72619d21029c589cf2f986688cd2032d1cb1544c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page