This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!
Project Description

ufal.parsito

The ufal.parsito is a Python binding to Parsito library <http://ufal.mff.cuni.cz/parsito>.

The bindings is a straightforward conversion of the C++ bindings API. In Python 2, strings can be both unicode and UTF-8 encoded str, and the library always produces unicode. In Python 3, strings must be only str.

Wrapped C++ API

The C++ API being wrapped follows. For a API reference of the original C++ API, see <http://ufal.mff.cuni.cz/parsito/api-reference>.

Helper Structures
-----------------

  typedef vector<int> Children;

  class Node {
   public:
    int id;          // 0 is root, >0 is sentence node, <0 is undefined
    string form;    // form
    string lemma;   // lemma
    string upostag; // universal part-of-speech tag
    string xpostag; // language-specific part-of-speech tag
    string feats;   // list of morphological features
    int head;       // head, 0 is root, <0 is without parent
    string deprel;  // dependency relation to the head
    string deps;    // secondary dependencies
    string misc;    // miscellaneous information

    Children children;

    node(int id = -1, string form = string());
  };
  typedef std::vector<node> Nodes;


Main Classes
------------

  class Tree {
   public:
    Tree();

    Nodes nodes;

    bool empty();
    void clear();
    node& addNode(string form);
    void setHead(int id, int head, string deprel);
    void unlinkAllNodes();

    static const std::string root_form;
  }

  class TreeInputFormat {
   public:
    virtual void setText(string text);
    virtual bool nextTree(tree& t) = 0;
    string lastError() const;

    // Static factory methods
    static TreeInputFormat* newInputFormat(string name);
    static TreeInputFormat* newConlluInputFormat();
  };

  class TreeOutputFormat {
   public:

    virtual string writeTree(const tree& t, const tree_input_format* additional_info = nullptr);

    // Static factory methods
    static TreeOutputFormat* newOutputFormat(string name);
    static TreeOutputFormat* newConlluOutputFormat();
  };

  class Parser {
   public:
    virtual void parse(tree& t, unsigned beam_size = 0) const;

    enum { NO_CACHE = 0, FULL_CACHE = 2147483647};
    static Parser* load(string file, unsigned cache = 1000);
  };

  class Version {
   public:
    unsigned major;
    unsigned minor;
    unsigned patch;
    string prerelease;

    static Version current();
  };

Examples

run_parsito

Simple parsing example:

from ufal.parsito import *

# In Python2, wrap sys.stdin and sys.stdout to work with unicode.
if sys.version_info[0] < 3:
  import codecs
  import locale
  encoding = locale.getpreferredencoding()
  sys.stdin = codecs.getreader(encoding)(sys.stdin)
  sys.stdout = codecs.getwriter(encoding)(sys.stdout)

if len(sys.argv) == 1:
  sys.stderr.write('Usage: %s parser_file\n' % sys.argv[0])
  sys.exit(1)

sys.stderr.write('Loading parser: ')
parser = Parser.load(sys.argv[1])
if not parser:
  sys.stderr.write("Cannot load parser from file '%s'\n" % sys.argv[1])
  sys.exit(1)
sys.stderr.write('done\n')

conlluInput = TreeInputFormat.newInputFormat("conllu");
conlluOutput = TreeOutputFormat.newOutputFormat("conllu");
tree = Tree()

not_eof = True
while not_eof:
  text = ''

  # Read block
  while True:
    line = sys.stdin.readline()
    not_eof = bool(line)
    if not not_eof: break
    line = line.rstrip('\r\n')
    text += line
    text += '\n';
    if not line: break


  # Parse
  conlluInput.setText(text)
  while conlluInput.nextTree(tree):
    parser.parse(tree)

    output = conlluOutput.writeTree(tree, conlluInput)
    sys.stdout.write(output)
  if conlluInput.lastError():
    sys.stderr.write("Cannot read input CoNLL-U: ")
    sys.stderr.write(conlluInput.lastError())
    sys.stderr.write("\n")
    sys.exit(1)

AUTHORS

Milan Straka <straka@ufal.mff.cuni.cz>

Release History

Release History

1.1.0.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
ufal.parsito-1.1.0.1.tar.gz (114.7 kB) Copy SHA256 Checksum SHA256 Source Jan 4, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting