Skip to main content

A python package for processing grobid output.

Project description

Python Static Badge PyPI Github Workflow Tests Status GitHub commit activity PyPI Downloads Github Created At GitHub License PRs Welcome

The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.

@mastersthesis{lu2024unsupervised,
  title={Unsupervised Paper2Slides Generation},
  author={Lu, Zehao},
  year={2024}
}

grobidmonkey is a light weight python package built to handle TEI XML files generated by GROBID. It provides a reader class that converts these files into Python dictionaries, making them simple to read and work with. The grobidmonkey reader is capable of reading the entire essay as a dictionary, where each key represents section titles and the corresponding values are lists of section contents in paragraphs. Also the reader provides a method for reading the outline of essay as a tree.

Installation

Currently grobidmonkey is only available in PyPI, and can be installed with

pip install grobidmonkey

Quick Start

from grobidmonkey import reader
monkeyReader = reader.MonkeyReader('monkey') # or 'lxml' or 'x2d'

# read paper outline
outline = monkeyReader.readOutline('/path/to/your/paper.pdf.tei.xml')

# read paper content
essay = monkeyReader.readEssay('/path/to/your/paper.pdf.tei.xml')

For detailed explanantion and tutorial, please check the Document page.

Contirbution

We welcome all contributions, whether they involve code, documentation, or testing, feel free to reach out to me via email at com3dian@outlook.com.

Icon

Gorbidmonkey's icon is a walking monkey.

                  $$                                                                   
           $$$$$$$$$$$$$$$$$$                              $$$$$$
       $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$                    $$$$$$$$$$                       
    $$$$$$$$                $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$             
   $$$$$$                          $$$$$$$$$$$$$$$$$$$$$$$$$$$           
  $$$$$$                                   $$$$$$$$$$$$                             
 $$$$$$                                                              
 $$$$$$                                                                   
 $$$$$$                                                                             
 $$$$$$                                                                    GROBIDMONKEY
 $$$$$$$                           $$$$$$$$$$$$$$$                     $$$$$$$$$$$$$$$$$$$$
 $$$$$$$                   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$          $$$$$$$$$$$$$$$$$$$$$$$
  $$$$$$$$          $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$    $$$$$   $$$$$$$$     $$
    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$  $$$$$$      $$  $
      $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ $$$$$$$$          $$
          $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     $$$$
              $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
               $$$$$$$$$$$$$$$$$$ $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$    $$$$$$$$$$$$$$$
                 $$$$$$$$$$$$$$$$    $$$$$$$$$$$$$  $$$$$$$$$$$$$$$$$$$         $$$$$$$
                  $$$$$$$$$$$$$$$       $$$$$$$$$  $$$$$$$$$$$$$$$$$  
            $$$$$  $$$$$$$$$$$$$$                 $$$$$$$$$$$$$$$$   $$$$
        $$$$$$$$$$$  $$$$$$$$$$$$               $$$$$$$$$$$$$$$$  $$$$$$$
    $$$$$$$$$$$$$$$$ $$$$$$$$$$$$             $$$$$$$$$$$$$$   $$$$$$$$$$
 $$$$$$$$$$$$$$$$$  $$$$$$$$$$$             $$$$$$$$$$$$$    $$$$$$$$$$$
$$$$$$$$$$$        $$$$$$$$$$              $$$$$$$$$$        $$$$$$$$$$$
$$$$$$$            $$$$$$$$$               $$$$$$$$            $$$$$$$$$$
 $$$$$            $$$$$$$$$                $$$$$$$              $$$$$$$$$
 $$$$$$          $$$$$$$$                 $$$$$$$$                $$$$$$$$$
 $$$$$$          $$$$$$$$                $$$$$$$$                 $$$$$$$$$
 $$$$$$          $$$$$$$$              $$$$$$$$$                    $$$$$$$$
 $$$$$           $$$$$$$$$            $$$$$$$$$                       $$$$$$$$
                  $$$$$$$$$$        $$$$$$$$$                           $$$$$$$$
                    $$$$$$$$$$$$$$$$$$$$$$                                $$$$$$$$$$$$$$$$$$$
                         $$$$$$$$$$$                                           $$$$$$$$$$$$$$

About GROBID

GROBID means GeneRation Of BIbliographic Data.

GROBID is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications.

You can also try the GROBID web app with your paper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grobidmonkey-0.0.4.tar.gz (987.6 kB view details)

Uploaded Source

Built Distribution

grobidmonkey-0.0.4-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file grobidmonkey-0.0.4.tar.gz.

File metadata

  • Download URL: grobidmonkey-0.0.4.tar.gz
  • Upload date:
  • Size: 987.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for grobidmonkey-0.0.4.tar.gz
Algorithm Hash digest
SHA256 807ab18939620740ac28f21520645502706f34febf551f600208b44f9f880dc5
MD5 b27507bbea6cf7939d408c33c48dd95a
BLAKE2b-256 6f8d821f2603ebd3f13ddaf1b7584901c10f18c71a84c1f47968ca2b3675f745

See more details on using hashes here.

File details

Details for the file grobidmonkey-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for grobidmonkey-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 34aae63ad45ffad99f7644103ac2dbc0c4698581344d3bceb0a56943ae5cb992
MD5 23ab65c7c430ac695cbc67548c0f447d
BLAKE2b-256 110bbb072d38f239bfc62177ff2cb35607b68b74b05a609ce604a7be205955f6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page