The NTCIR Math Density Estimator package uses datasets, and judgements in the NTCIR-11 Math-2, and NTCIR-12 MathIR XHTML5 format to compute density, and probability estimates.

These details have not been verified by PyPI

Project links

Project description

NTCIR Math Density Estimator – Estimates relevance of documents based on data from NTCIR Math tasks

NTCIR Math Density Estimator is a Python 3 command-line utility that uses datasets, and judgements in the NTCIR-11 Math-2, and NTCIR-12 MathIR XHTML5 format to compute density, and probability estimates. Most importantly, the package estimates the probability P(relevant | position), where position is a position of a paragraph in a document.

Usage

Installing

The package can be installed by executing the following command:

$ pip install ntcir-math-density

Displaying the usage

Usage information for the package can be displayed by executing the following command:

$ ntcir-math-density --help
usage: ntcir-math-density [-h] [--datasets DATASETS [DATASETS ...]]
                          [--judgements JUDGEMENTS [JUDGEMENTS ...]]
                          [--plots PLOTS [PLOTS ...]] [--positions POSITIONS]
                          [--estimates ESTIMATES] [--num-workers NUM_WORKERS]

Use datasets, and judgements in NTCIR-11 Math-2, and NTCIR-12 MathIR XHTML5
format to compute density, and probability estimates.

optional arguments:
-h, --help            show this help message and exit
--datasets DATASETS [DATASETS ...]
                        Paths to the directories containing the datasets. Each
                        path must be prefixed with a unique single-letter
                        label followed by an equals sign (e.g. "A=/some/path").
--judgements JUDGEMENTS [JUDGEMENTS ...]
                        Paths to the files containing relevance judgements.
                        Each path must be prefixed with a single-letter label
                        corresponding to the judged dataset followed by a
                        semicolon (e.g. "A:/some/path/judgement.dat").
--plots PLOTS [PLOTS ...]
                        The path to the files, where the probability
                        estimates will plotted. When no datasets are
                        specified, the estimates file will be loaded.
--positions POSITIONS
                        The path to the file, where the estimated positions of
                        all paragraph identifiers from all datasets will be
                        stored. Defaults to positions.pkl.gz.
--estimates ESTIMATES
                        The path to the file, where the density, and
                        probability estimates will be stored. When no
                        datasets are specified, this file will be loaded to
                        provide the estimates for plotting. Defaults to
                        estimates.pkl.gz.
--num-workers NUM_WORKERS
                        The number of processes that will be used for
                        processing the datasets, and for computing the
                        density, and probability estimates. Defaults to 1.

Extracting estimates

The following command extracts density, and probability estimates and plots the estimates using 64 worker processes:

$ ntcir-math-density --num-workers 64 \
>     --datasets A=ntcir-10-converted B=ntcir-11-12 \
>     --judgements A:NTCIR_10_Math-qrels_fs-converted.dat A:NTCIR_10_Math-qrels_ft-converted.dat \
>                  B:NTCIR11_Math-qrels.dat B:NTCIR12_Math-qrels_agg.dat \
>                  B:NTCIR12_Math_simto-qrels_agg.dat \
>     --estimates estimates.pkl.gz --positions positions.pkl.gz \
>     --plots plot.pdf plot.svg
Retrieving judged paragraph identifiers, and scores from NTCIR_10_Math-qrels_fs-converted.dat
100%|█████████████████████████████████████████████████████| 2129/2129 [00:00<00:00, 334959.05it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR_10_Math-qrels_ft-converted.dat
100%|█████████████████████████████████████████████████████| 1425/1425 [00:00<00:00, 353201.94it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR11_Math-qrels.dat
100%|█████████████████████████████████████████████████████| 2500/2500 [00:00<00:00, 343345.12it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR12_Math-qrels_agg.dat
100%|█████████████████████████████████████████████████████| 4251/4251 [00:00<00:00, 342252.50it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR12_Math_simto-qrels_agg.dat
100%|█████████████████████████████████████████████████████| 654/654 [00:00<00:00, 314428.57it/s]
Retrieving all paragraph identifiers, and positions from ntcir-10-converted
get_all_identifiers(ntcir-10-converted): 5405167it [04:30, 19946.57it/s]
get_all_positions(ntcir-10-converted): 100%|█████████| 5405167/5405167 [08:44<00:00, 10306.72it/s]
Retrieving all paragraph identifiers, and positions from ntcir-11-12
get_all_identifiers(ntcir-11-12): 8301578it [08:08, 16985.19it/s]
get_all_positions(ntcir-11-12): 100%|█████████████████| 8301578/8301578 [44:30<00:00, 3108.88it/s]
1043 / 3146 / 5405167 relevant / judged / total identifiers in dataset ntcir-10-converted
1742 / 7059 / 8301578 relevant / judged / total identifiers in dataset ntcir-11-12
Pickling positions.pkl.gz
Fitting density, and probability estimators
Fitting prior p(position) density estimator
Fitting conditional p(position | relevant) density estimator
Computing density, and probability estimates
p(position): 100%|████████████████████████████████████████████████| 64/64 [01:19<00:00,  1.24s/it]
p(position | relevant): 100%|█████████████████████████████████████| 64/64 [01:19<00:00,  1.24s/it]
Pickling estimates.pkl.gz
Plotting plot.svg
Plotting plot.pdf

The following command extracts density, and probability estimates using 64 worker processes:

$ ntcir-math-density --num-workers 64 \
>     --datasets A=ntcir-10-converted B=ntcir-11-12 \
>     --judgements A:NTCIR_10_Math-qrels_fs-converted.dat A:NTCIR_10_Math-qrels_ft-converted.dat \
>                  B:NTCIR11_Math-qrels.dat B:NTCIR12_Math-qrels_agg.dat \
>                  B:NTCIR12_Math_simto-qrels_agg.dat \
>     --estimates estimates.pkl.gz --positions positions.pkl.gz
Retrieving judged paragraph identifiers, and scores from NTCIR_10_Math-qrels_fs-converted.dat
100%|█████████████████████████████████████████████████████| 2129/2129 [00:00<00:00, 334959.05it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR_10_Math-qrels_ft-converted.dat
100%|█████████████████████████████████████████████████████| 1425/1425 [00:00<00:00, 353201.94it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR11_Math-qrels.dat
100%|█████████████████████████████████████████████████████| 2500/2500 [00:00<00:00, 343345.12it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR12_Math-qrels_agg.dat
100%|█████████████████████████████████████████████████████| 4251/4251 [00:00<00:00, 342252.50it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR12_Math_simto-qrels_agg.dat
100%|█████████████████████████████████████████████████████| 654/654 [00:00<00:00, 314428.57it/s]
Retrieving all paragraph identifiers, and positions from ntcir-10-converted
get_all_identifiers(ntcir-10-converted): 5405167it [04:30, 19946.57it/s]
get_all_positions(ntcir-10-converted): 100%|█████████| 5405167/5405167 [08:44<00:00, 10306.72it/s]
Retrieving all paragraph identifiers, and positions from ntcir-11-12
get_all_identifiers(ntcir-11-12): 8301578it [08:08, 16985.19it/s]
get_all_positions(ntcir-11-12): 100%|█████████████████| 8301578/8301578 [44:30<00:00, 3108.88it/s]
1043 / 3146 / 5405167 relevant / judged / total identifiers in dataset ntcir-10-converted
1742 / 7059 / 8301578 relevant / judged / total identifiers in dataset ntcir-11-12
Pickling positions.pkl.gz
Fitting density, and probability estimators
Fitting prior p(position) density estimator
Fitting conditional p(position | relevant) density estimator
Computing density, and probability estimates
p(position): 100%|████████████████████████████████████████████████| 64/64 [01:19<00:00,  1.24s/it]
p(position | relevant): 100%|█████████████████████████████████████| 64/64 [01:19<00:00,  1.24s/it]
Pickling estimates.pkl.gz

The following command plots the estimates using 64 worker processes:

$ ntcir-math-density --num-workers 64 \
>     --estimates estimates.pkl.gz --plots plot.pdf plot.svg
Unpickling estimates.pkl.gz
Plotting plot.svg
Plotting plot.pdf

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Jun 21, 2018

0.2.0

Jun 8, 2018

0.1.3

Jun 6, 2018

0.1.2

Jun 6, 2018

0.1.1

Jun 6, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ntcir_math_density-0.2.1.tar.gz (8.8 kB view details)

Uploaded Jun 21, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ntcir_math_density-0.2.1-py2.py3-none-any.whl (10.6 kB view details)

Uploaded Jun 21, 2018 Python 2Python 3

File details

Details for the file ntcir_math_density-0.2.1.tar.gz.

File metadata

Download URL: ntcir_math_density-0.2.1.tar.gz
Upload date: Jun 21, 2018
Size: 8.8 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for ntcir_math_density-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`5de8575681e1a2e262e64999c2ee7538e9b2a104acc1b9378e7fa6adb92bae7e`
MD5	`6413b92a46eba592b16abc8cd9a78fed`
BLAKE2b-256	`89e190d88979697571374916d3bd4e0e2e07a8180918813ff3f4ba308ff402d1`

See more details on using hashes here.

File details

Details for the file ntcir_math_density-0.2.1-py2.py3-none-any.whl.

File metadata

Download URL: ntcir_math_density-0.2.1-py2.py3-none-any.whl
Upload date: Jun 21, 2018
Size: 10.6 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for ntcir_math_density-0.2.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`6a66439d57a7a08c16cc9c0bf3b79d5104693a4001aaf06306291836bfffb8b3`
MD5	`e9755e5c250541e590a14d0c8cdc5bda`
BLAKE2b-256	`18e02b833f6fe87f0edd88e31eab4bb246f3ef3699b808ed9ae6cc86da6b2b49`

See more details on using hashes here.

ntcir-math-density 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NTCIR Math Density Estimator – Estimates relevance of documents based on data from NTCIR Math tasks

Usage

Installing

Displaying the usage

Extracting estimates

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes