Skip to main content

Industrial-strength NLP

Project description

https://travis-ci.org/spacy-io/spaCy.svg?branch=master

spaCy: Industrial-strength NLP

spaCy is a library for advanced natural language processing in Python and Cython.

Documentation and details: https://spacy.io/

spaCy is built on the very latest research, but it isn’t researchware. It was designed from day 1 to be used in real products. It’s commercial open-source software, released under the MIT license.

2016-04-05 v0.100.7: German!

spaCy finally supports another language, in addition to English. We’re lucky to have Wolfgang Seeker on the team, and the new German model is just the beginning. Now that there are multiple languages, you should consider loading spaCy via the load() function. This function also makes it easier to load extra word vector data for English:

import spacy
en_nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')
de_nlp = spacy.load('de')

To support use of the load function, there are also two new helper functions: spacy.get_lang_class and spacy.set_lang_class. Once the German model is loaded, you can use it just like the English model:

doc = nlp(u'''Wikipedia ist ein Projekt zum Aufbau einer Enzyklopädie aus freien Inhalten, zu dem du mit deinem Wissen beitragen kannst. Seit Mai 2001 sind 1.936.257 Artikel in deutscher Sprache entstanden.''')

for sent in doc.sents:
    print(sent.root.text, sent.root.n_lefts, sent.root.n_rights)

# (u'ist', 1, 2)
# (u'sind', 1, 3)

The German model provides tokenization, POS tagging, sentence boundary detection, syntactic dependency parsing, recognition of organisation, location and person entities, and word vector representations trained on a mix of open subtitles and Wikipedia data. It doesn’t yet provide lemmatisation or morphological analysis, and it doesn’t yet recognise numeric entities such as numbers and dates.

Bugfixes

  • spaCy < 0.100.7 had a bug in the semantics of the Token.__str__ and Token.__unicode__ built-ins: they included a trailing space.

  • Improve handling of “infixed” hyphens. Previously the tokenizer struggled with multiple hyphens, such as “well-to-do”.

  • Improve handling of periods after mixed-case tokens

  • Improve lemmatization for English special-case tokens

  • Fix bug that allowed spaces to be treated as heads in the syntactic parse

  • Fix bug that led to inconsistent sentence boundaries before and after serialisation.

  • Fix bug from deserialising untagged documents.

Features

  • Labelled dependency parsing (91.8% accuracy on OntoNotes 5)

  • Named entity recognition (82.6% accuracy on OntoNotes 5)

  • Part-of-speech tagging (97.1% accuracy on OntoNotes 5)

  • Easy to use word vectors

  • All strings mapped to integer IDs

  • Export to numpy data arrays

  • Alignment maintained to original string, ensuring easy mark up calculation

  • Range of easy-to-use orthographic features.

  • No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Top Peformance

  • Fastest in the world: <50ms per document. No faster system has ever been announced.

  • Accuracy within 1% of the current state of the art on all tasks performed (parsing, named entity recognition, part-of-speech tagging). The only more accurate systems are an order of magnitude slower or more.

Supports

  • CPython 2.6, 2.7, 3.3, 3.4, 3.5 (only 64 bit)

  • OSX

  • Linux

  • Windows (Cygwin, MinGW, Visual Studio)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-0.100.7.tar.gz (2.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

spacy-0.100.7-cp35-none-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.5Windows x86-64

spacy-0.100.7-cp35-cp35m-manylinux1_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.5m

spacy-0.100.7-cp35-cp35m-macosx_10_6_intel.whl (1.5 MB view details)

Uploaded CPython 3.5mmacOS 10.6+ Intel (x86-64, i386)

spacy-0.100.7-cp34-none-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.4Windows x86-64

spacy-0.100.7-cp34-cp34m-manylinux1_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.4m

spacy-0.100.7-cp34-cp34m-macosx_10_6_intel.whl (1.6 MB view details)

Uploaded CPython 3.4mmacOS 10.6+ Intel (x86-64, i386)

spacy-0.100.7-cp27-none-win_amd64.whl (1.3 MB view details)

Uploaded CPython 2.7Windows x86-64

spacy-0.100.7-cp27-cp27mu-manylinux1_x86_64.whl (5.7 MB view details)

Uploaded CPython 2.7mu

spacy-0.100.7-cp27-cp27m-manylinux1_x86_64.whl (5.7 MB view details)

Uploaded CPython 2.7m

spacy-0.100.7-cp27-cp27m-macosx_10_6_intel.whl (1.6 MB view details)

Uploaded CPython 2.7mmacOS 10.6+ Intel (x86-64, i386)

File details

Details for the file spacy-0.100.7.tar.gz.

File metadata

  • Download URL: spacy-0.100.7.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for spacy-0.100.7.tar.gz
Algorithm Hash digest
SHA256 0248f59ebc5cf78fdd9d1389cd43bdbcd89a2402a7aff1a0cdec71bd529e2255
MD5 8b3983b5fd4a5a5835e8ac7ff6ed44fe
BLAKE2b-256 688615deff00c5ad5fbca331a48ca004ac582f04aaec36c94e0574abb841a681

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp35-none-win_amd64.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp35-none-win_amd64.whl
Algorithm Hash digest
SHA256 ff67ec79bb6aaeb14eecba12278cdc43fa270a3826c10584c716376f04a5074c
MD5 9d86c51a14e4902cf5a421b05d8fea49
BLAKE2b-256 375a4f0582480ccc5a529aa4e86db99f4122de1a3eaf43ed80f4a2f7301b4028

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 15a658df99de51d4aaa4a4787a12b9b5a1f1eba55e67ccac7010db153851295f
MD5 6a09483fd05c4762af738e2ed76dcb8a
BLAKE2b-256 5e84c0f44ef245fd280c22efe48f6673775d85f581106f54d32797c6922a8129

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp35-cp35m-macosx_10_6_intel.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp35-cp35m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 6c9eaf6c1d0ec7b008f65b18a5cf2967f98df1062d9765f9c87032e0f3f0e224
MD5 6f0aeb749c054aea5ffcc5d91fb929c6
BLAKE2b-256 2d0c1dbc1337b87f43a8d02a6dbfb92f5bf90160b67e8437ae26d2f2b9cd52e5

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp34-none-win_amd64.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp34-none-win_amd64.whl
Algorithm Hash digest
SHA256 39f8be45fc45e5e62be950dc598da8f2213115c04f9f5f71fbf170690b483180
MD5 3d25d63e2bee5532cfcb186cca0bdc91
BLAKE2b-256 0f7b52af06227888d44bc032aa7542b498b8b00be1537f5426d50a721ffa7742

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c3c4ddacab405229c0c2f1a25a4b585da7952fc5fadc7c42401271a6a835297d
MD5 9f4cdb1f37c8adc2d5044c07703c43d0
BLAKE2b-256 18258400c6acfbb704162f07464cf6042e4a5788321380e5c0d920f798790247

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp34-cp34m-macosx_10_6_intel.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp34-cp34m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 882cc786ee7ae0f704f2f4c03151d413e432ec9b346c1581ce5f560d1a4fb628
MD5 26edaddaf20a27b4fd5a53c6743b9a08
BLAKE2b-256 1a9c4de1b66a6fabe88cdfe26f5b2f497ac39af2d19424d72e273e7ecc4d45de

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp27-none-win_amd64.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp27-none-win_amd64.whl
Algorithm Hash digest
SHA256 9a5ce6869fa94b33b80e3980dfabd7c7eebcc5e5dcfe95f9937389fac4e9d24f
MD5 816895d70908c4b4a56adca7af2ba184
BLAKE2b-256 16cbd0aadeff5e187710cc60e801eae7b95ed6e881656e327ede6820ddf6a848

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f366ab6bae3ef0c04d480ada233d9392cafd7737cbfc56ea1ecc56002958d39a
MD5 0f33f88f57dede677296c233423e560d
BLAKE2b-256 ae031aed553731f0a908626ce1318e17bcfb2d5d5c0a983d10ef1f687d6e7f00

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 dfe2b5977bf5cf3af2922be27446d9dfafb0f592fa31d0be8a6f6943e8d964ec
MD5 1573725b8bc10b1f0424b5b0f10edc48
BLAKE2b-256 41fa9e206897a04ffeda7b3a11ad0be45971b3a5513e2895344bd17df690523a

See more details on using hashes here.

File details

Details for the file spacy-0.100.7-cp27-cp27m-macosx_10_6_intel.whl.

File metadata

File hashes

Hashes for spacy-0.100.7-cp27-cp27m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 af48c15c1ece0a6dbc270616cacd900cc5ed6d0512fd23a10b8ac62cec57f4ea
MD5 f77a6f08fc371e65bed7a76df2c8ca8c
BLAKE2b-256 034904f7bc4499ef00b38e8f6fb3d1d5e4909e5e9647b6ab48df037774cd2de5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page