Skip to main content

A full SpaCy pipeline and models for scientific/biomedical documents.

Project description

<p align="center"><img width="50%" src="docs/scispacy-logo.png" /></p>


This repository contains custom pipes and models related to using spaCy for scientific documents.

In particular, there is a custom tokenizer that adds tokenization rules on top of spaCy's
rule-based tokenizer, a POS tagger and syntactic parser trained on biomedical data and
an entity span detection model. Separately, there are also NER models for more specific tasks.


## Installation
Installing scispacy requires two steps: installing the library and intalling the models. To install the library, run:
```bash
pip install scispacy
```

to install a model, run:

```bash
pip install <model url>
```

Note: We strongly recommend that you use an isolated Python environment (such as virtualenv or conda) to install scispacy.
Take a look below in the "Setting up a virtual environment" section if you need some help with this.
Additionally, scispacy uses modern features of Python and as such is only available for **Python 3.5 or greater**.



#### Setting up a virtual environment

[Conda](https://conda.io/) can be used set up a virtual environment with the
version of Python required for scispaCy. If you already have a Python 3.6 or 3.7
environment you want to use, you can skip to the 'installing via pip' section.

1. [Download and install Conda](https://conda.io/docs/download.html).

2. Create a Conda environment called "scispacy" with Python 3.6:

```bash
conda create -n scispacy python=3.6
```

3. Activate the Conda environment. You will need to activate the Conda environment in each terminal in which you want to use scispaCy.

```bash
source activate scispacy
```

Now you can install `scispacy` and one of the models using the steps above.


Once you have completed the above steps and downloaded one of the models below, you can load a scispaCy model as you would any other spaCy model. For example:
```python
import spacy
nlp = spacy.load("en_core_sci_sm")
doc = nlp("Alterations in the hypocretin receptor 2 and preprohypocretin genes produce narcolepsy in some animals.")
```

## Available Models


<table>
<tr>
<td><b> en_core_sci_sm </b></td>
<td> A full spaCy pipeline for biomedical data. </td>
</tr>
<tr>
<td><b> en_core_sci_md </b></td>
<td> A full spaCy pipeline for biomedical data with a larger vocabulary and word vectors. </td>
</tr>
<tr>
<td><b> en_ner_craft_md </b></td>
<td> A spaCy NER model trained on the CRAFT corpus. </td>
</tr>
<tr>
<td><b> en_ner_jnlpba_md </b></td>
<td> A spaCy NER model trained on the JNLPBA corpus. </td>
</tr>
<tr>
<td><b> en_ner_bc5cdr_md </b></td>
<td> A spaCy NER model trained on the BC5CDR corpus. </td>
</tr>
<tr>
<td><b> en_ner_bionlp13cg_md </b></td>
<td> A spaCy NER model trained on the BIONLP13CG</td>
</tr>
</table>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scispacy-0.1.0.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scispacy-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file scispacy-0.1.0.tar.gz.

File metadata

  • Download URL: scispacy-0.1.0.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/36.4.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.2

File hashes

Hashes for scispacy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 69cf0ec7c35aa8ab9d29e1b1dd1c3d4b428b4eb01a827fe91d58093ffba3e344
MD5 fd883fa1a6bff91d354e07ad2c104946
BLAKE2b-256 ec0bdc2907f6aa866d1392acb519874b41870715c134a8ad426d3325368c32df

See more details on using hashes here.

File details

Details for the file scispacy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scispacy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/36.4.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.2

File hashes

Hashes for scispacy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62ea4c89184519a90f9ad644df9fe4a8cfed7a44e826abfad74ba63e5371180d
MD5 eaf55f9fa15817da51258362dfcb92bb
BLAKE2b-256 b26833d18f448dfddda2392ffd9f4ef349c3627a9bf91806f55e1bf91ed64e75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page