Skip to main content

Industrial-strength NLP

Project description

https://travis-ci.org/spacy-io/spaCy.svg?branch=master

spaCy: Industrial-strength NLP

spaCy is a library for advanced natural language processing in Python and Cython.

Documentation and details: https://spacy.io/

spaCy is built on the very latest research, but it isn’t researchware. It was designed from day 1 to be used in real products. It’s commercial open-source software, released under the MIT license.

Features

  • Labelled dependency parsing (91.8% accuracy on OntoNotes 5)

  • Named entity recognition (82.6% accuracy on OntoNotes 5)

  • Part-of-speech tagging (97.1% accuracy on OntoNotes 5)

  • Easy to use word vectors

  • All strings mapped to integer IDs

  • Export to numpy data arrays

  • Alignment maintained to original string, ensuring easy mark up calculation

  • Range of easy-to-use orthographic features.

  • No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Top Peformance

  • Fastest in the world: <50ms per document. No faster system has ever been announced.

  • Accuracy within 1% of the current state of the art on all tasks performed (parsing, named entity recognition, part-of-speech tagging). The only more accurate systems are an order of magnitude slower or more.

Supports

  • CPython 2.6, 2.7, 3.3, 3.4, 3.5 (only 64 bit)

  • OSX

  • Linux

  • Windows (Cygwin, MinGW, Visual Studio)

2016-05-0 0.101.0: Fixed German model

  • Fixed bug that prevented German parses from being deprojectivised.

  • Bug fixes to sentence boundary detection.

  • Add rich comparison methods to the Lexeme class.

  • Add missing Doc.has_vector and Span.has_vector properties.

  • Add missing Span.sent property.

2016-05-05 v0.100.7: German!

spaCy finally supports another language, in addition to English. We’re lucky to have Wolfgang Seeker on the team, and the new German model is just the beginning. Now that there are multiple languages, you should consider loading spaCy via the load() function. This function also makes it easier to load extra word vector data for English:

import spacy
en_nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')
de_nlp = spacy.load('de')

To support use of the load function, there are also two new helper functions: spacy.get_lang_class and spacy.set_lang_class. Once the German model is loaded, you can use it just like the English model:

doc = nlp(u'''Wikipedia ist ein Projekt zum Aufbau einer Enzyklopädie aus freien Inhalten, zu dem du mit deinem Wissen beitragen kannst. Seit Mai 2001 sind 1.936.257 Artikel in deutscher Sprache entstanden.''')

for sent in doc.sents:
    print(sent.root.text, sent.root.n_lefts, sent.root.n_rights)

# (u'ist', 1, 2)
# (u'sind', 1, 3)

The German model provides tokenization, POS tagging, sentence boundary detection, syntactic dependency parsing, recognition of organisation, location and person entities, and word vector representations trained on a mix of open subtitles and Wikipedia data. It doesn’t yet provide lemmatisation or morphological analysis, and it doesn’t yet recognise numeric entities such as numbers and dates.

Bugfixes

  • spaCy < 0.100.7 had a bug in the semantics of the Token.__str__ and Token.__unicode__ built-ins: they included a trailing space.

  • Improve handling of “infixed” hyphens. Previously the tokenizer struggled with multiple hyphens, such as “well-to-do”.

  • Improve handling of periods after mixed-case tokens

  • Improve lemmatization for English special-case tokens

  • Fix bug that allowed spaces to be treated as heads in the syntactic parse

  • Fix bug that led to inconsistent sentence boundaries before and after serialisation.

  • Fix bug from deserialising untagged documents.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-0.101.0.tar.gz (2.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

spacy-0.101.0-cp35-none-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.5Windows x86-64

spacy-0.101.0-cp35-cp35m-manylinux1_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.5m

spacy-0.101.0-cp34-none-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.4Windows x86-64

spacy-0.101.0-cp34-cp34m-manylinux1_x86_64.whl (6.2 MB view details)

Uploaded CPython 3.4m

spacy-0.101.0-cp27-none-win_amd64.whl (1.3 MB view details)

Uploaded CPython 2.7Windows x86-64

spacy-0.101.0-cp27-cp27mu-manylinux1_x86_64.whl (5.7 MB view details)

Uploaded CPython 2.7mu

spacy-0.101.0-cp27-cp27m-manylinux1_x86_64.whl (5.7 MB view details)

Uploaded CPython 2.7m

File details

Details for the file spacy-0.101.0.tar.gz.

File metadata

  • Download URL: spacy-0.101.0.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for spacy-0.101.0.tar.gz
Algorithm Hash digest
SHA256 d8d8a3b09f2c18e7b949d1234705c12c18d780cfb7b906d656536ef4c1669cff
MD5 5a51a8519c4eaa9f907445be42f5fe93
BLAKE2b-256 4ce7efb7268aceca6d1bf50fc2b6b433d3f4bbe222a8b26616d5f2ddfc2c7c43

See more details on using hashes here.

File details

Details for the file spacy-0.101.0-cp35-none-win_amd64.whl.

File metadata

File hashes

Hashes for spacy-0.101.0-cp35-none-win_amd64.whl
Algorithm Hash digest
SHA256 477791a4f1e0ddfdea81ebac31c6b6abfd216f59b0384016357d033d3c08a497
MD5 7903f2d4d83416e9698fdb17254ddb3b
BLAKE2b-256 38cc7f0838e43afd435b0fdfcaf8492b67c6a1ffe7c19698077283bf376e58dd

See more details on using hashes here.

File details

Details for the file spacy-0.101.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for spacy-0.101.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 80620547f833bb2928ec89c9f09b21e712f6d334d517cbc880edc3018c74990c
MD5 e02f91f3be81f49f7427075b492d45b2
BLAKE2b-256 eeb70a21cdae791bb05ff6a6f24cf32f8b5268d625cb3d4f449bc7f42d39847a

See more details on using hashes here.

File details

Details for the file spacy-0.101.0-cp34-none-win_amd64.whl.

File metadata

File hashes

Hashes for spacy-0.101.0-cp34-none-win_amd64.whl
Algorithm Hash digest
SHA256 43c738b8200295e5940360e189eab2d57e65035322d34b0a91e7d9f6a52368db
MD5 6a5639234f0cd654fa0299fe546ddcc7
BLAKE2b-256 62f649ca619324866cc6a1ca3c0a95b3474a8d6887932e620eb90e28f61b576b

See more details on using hashes here.

File details

Details for the file spacy-0.101.0-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for spacy-0.101.0-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 791c24e98376c2975cd34b0f76cf6bdb582449a1beb21a7b40bee849dddbcb21
MD5 de71594e5cf72f835547c4063754d6ee
BLAKE2b-256 50b35574853a171253272b9515f704b681dc7a229fbbe393778fd778286f95bb

See more details on using hashes here.

File details

Details for the file spacy-0.101.0-cp27-none-win_amd64.whl.

File metadata

File hashes

Hashes for spacy-0.101.0-cp27-none-win_amd64.whl
Algorithm Hash digest
SHA256 222c9771675163097033c78e26cdbafadbae8dc63c6e198ef42becfbed3a1076
MD5 577cba8772ad09c7ed5dbadf3c13e3ac
BLAKE2b-256 9a41054e5eafff6ec6a43b4f0909d2bbde7a1e648b497160578d92e64ae73605

See more details on using hashes here.

File details

Details for the file spacy-0.101.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for spacy-0.101.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 354d675f61e52d28f7a8be8381517d205de9121494ef5b1c80e411a2eedf297e
MD5 4a1e214e83a2a4a5f73b1b33cbe6925e
BLAKE2b-256 f9f10a75f1138f8b4fcf8ac8a06a526c2b31130679ea5221284d253a2515e091

See more details on using hashes here.

File details

Details for the file spacy-0.101.0-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for spacy-0.101.0-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f5f00934609bc9dbd77ca150a3f789bc7f38d5677dee11137d0bbea5e0f851ae
MD5 f47f1b80b7c45d3a37b55b60247facaa
BLAKE2b-256 82eaccfc771cf2091e301821751fd47289062c049107c28389666c29cda279d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page