A spaCy library for Named Entity Recognition with Elastic Weight Consolidation.
Project description
EWC-Enhanced spaCy NER Training
Overview
This project, spacy-ewc, integrates Elastic Weight Consolidation (EWC) into spaCy's Named Entity Recognition (NER) pipeline to mitigate catastrophic forgetting during sequential learning tasks. By applying EWC, the model retains important information from previous tasks while learning new ones, leading to improved performance in continual learning scenarios.
Motivation
In sequential or continual learning, neural networks often suffer from catastrophic forgetting, where the model forgets previously learned information upon learning new tasks. EWC addresses this issue by penalizing changes to important parameters identified during earlier training phases. Integrating EWC into spaCy's NER component allows us to build more robust NLP models capable of learning incrementally without significant performance degradation on earlier tasks.
Table of Contents
- Installation
- Usage
- Detailed Explanation
- Code Structure
- Extending the Project
- Troubleshooting
- Contributing
- Limitations
- References
- License
- Contact
Installation
Prerequisites
- Python 3.8 or higher
- spaCy (compatible version with your Python installation)
- Thinc (spaCy's machine learning library)
- Other dependencies as listed in
pyproject.toml
Setup
-
Clone the repository:
git clone https://github.com/darkrockmountain/spacy-ewc.git
-
Navigate to the project directory:
cd spacy-ewc
-
Install required packages:
-
Core dependencies only:
pip install .
-
Development dependencies (recommended for contributors):
pip install .[dev]
After installing the development dependencies, you’ll also need to manually install the spaCy language model used in tests:
python -m spacy download en_core_web_sm
This ensures that all dependencies and the necessary language model are available for development and testing.
-
-
Download the spaCy English model (Optional):
Since
en_core_web_smis listed as a development dependency, it will be installed if you usedpip install .[dev]. Otherwise, install it manually:python -m spacy download en_core_web_sm
Usage
Running the Example Script
The example script demonstrates how to train a spaCy NER model with EWC applied:
python examples/ewc_ner_training_example.py
Script Workflow
The script performs the following steps:
- Load the pre-trained spaCy English model.
- Add new entity labels (
BUDDY,COMPANY) to the NER component. - Prepare training and test data.
- Initialize the EWC wrapper with the NER pipe and original spaCy labels.
create_ewc_pipe(
ner,
[
Example.from_dict(nlp.make_doc(text), annotations)
for text, annotations in original_spacy_labels
],
)
- Train the NER model using EWC over multiple epochs.
- Evaluate the model on a test sentence and display recognized entities.
"Elon Musk founded SpaceX in 2002 as the CEO and lead engineer, investing approximately $100 million of his own money into the company, which was initially based in El Segundo, California."
Expected Output
- Training Loss: Displays the loss after training.
- Entities in Test Sentence: Lists the entities recognized in the test sentence after training.
Example output:
Training loss: 3.1743565
Entities in test sentence:
Elon Musk: BUDDY
SpaceX: COMPANY
2002: DATE
approximately $100 million: MONEY
El Segundo: GPE
California: GPE
Integrating the EWC Class for NER Training with create_ewc_pipe
You can integrate the EWC class into your spaCy training scripts to enhance NER training with Elastic Weight Consolidation (EWC). Below is a sample setup:
import spacy
from spacy.training import Example
from spacy_ewc import create_ewc_pipe
from spacy_ewc.utils.extract_labels import extract_labels
from spacy_ewc.utils.generate_spacy_entities import generate_spacy_entities
# Load a pre-trained spaCy model
nlp = spacy.load("en_core_web_sm")
# Prepare initial training data with sample texts
sample_texts = [
"Apple is looking at buying U.K. startup for $1 billion",
# Add more examples as needed...
]
# Generate entity annotations using the untrained NER model
# Example output:
# [
# ('Apple is looking at buying U.K. startup for $1 billion',
# {'entities': [(0, 5, 'ORG'), (27, 31, 'GPE'), (44, 54, 'MONEY')]}),
# ...
# ]
original_spacy_labels = generate_spacy_entities(sample_texts, nlp)
# Initialize the EWC wrapper for the NER component using the original labels.
# This setup preserves knowledge of initial training data, helping prevent
# catastrophic forgetting as new data is added.
# `create_ewc_pipe` steps:
# - Captures a snapshot of the current model parameters.
# - Calculates the Fisher Information Matrix (FIM) to identify key parameters.
# - Applies an EWC penalty to protect these parameters during further training.
create_ewc_pipe(
pipe=nlp.get_pipe("ner"), # Specify the NER component
examples=[
Example.from_dict(nlp.make_doc(text), annotations)
for text, annotations in original_spacy_labels
],
)
# Set up custom training data with new entity labels
training_data = [
(
"John Doe works at OpenAI.",
{"entities": [(0, 8, "BUDDY"), (18, 24, "COMPANY")]},
),
]
# Extract custom labels and add them to the NER component in the pipeline
training_labels = extract_labels(training_data)
for label in training_labels:
nlp.get_pipe("ner").add_label(label)
# Convert training data into spaCy Example objects
examples = [
Example.from_dict(nlp.make_doc(text), annotations)
for text, annotations in training_data
]
# Run the training loop
for epoch in range(10):
losses = {}
nlp.update(examples, losses=losses)
print(f"Epoch {epoch}, Losses: {losses}")
Detailed Explanation
EWC Theory
Catastrophic Forgetting
In machine learning, catastrophic forgetting refers to the abrupt and complete forgetting of previously learned information upon learning new information. Neural networks, when trained sequentially on multiple tasks without access to data from previous tasks, often overwrite the weights important for the old tasks with weights relevant to the new task.
Elastic Weight Consolidation
Elastic Weight Consolidation (EWC) is a regularization technique proposed to overcome catastrophic forgetting. It allows the model to learn new tasks while preserving performance on previously learned tasks by slowing down learning on important weights for old tasks.
Mathematical Formulation
The key idea behind EWC is to add a penalty term to the loss function that discourages significant changes to parameters that are important for previous tasks.
The total loss function for the current task becomes:
$$ L_{\text{total}}(\theta) = L_{\text{task}}(\theta) + \Omega(\theta) $$
- $L_{\text{task}}(\theta)$: The loss function for the current task.
- $\Omega(\theta)$: The EWC penalty term.
Fisher Information Matrix
The EWC penalty term is based on the Fisher Information Matrix (FIM), which measures the amount of information that an observable random variable carries about an unknown parameter upon which the probability depends.
For each parameter $\theta_i$, the importance is estimated using the diagonal of the FIM, denoted as $F_i$.
EWC Penalty Term
The EWC penalty term is defined as:
$$ \Omega(\theta) = \frac{\lambda}{2} \sum_i F_i (\theta_i - \theta_i^*)^2 $$
- $\theta$: Current model parameters.
- $\theta^*$: Optimal parameters learned from previous tasks.
- $F_i$: Diagonal elements of the Fisher Information Matrix for parameter $\theta_i$.
- $\lambda$: Regularization strength.
This term penalizes deviations of the current parameters $\theta$ from the previous optimal parameters $\theta^*$, scaled by the importance weights $F_i$.
Gradient Adjustment
During training, the gradient of the total loss function with respect to each parameter $\theta_i$ is:
$$ \frac{\partial L_{\text{total}}}{\partial \theta_i} = \frac{\partial L_{\text{task}}}{\partial \theta_i} + \lambda F_i (\theta_i - \theta_i^*) $$
This means the gradient update is adjusted to consider both the task-specific loss and the EWC penalty, preventing significant changes to important parameters.
Integration with spaCy
EWC Class Workflow
The EWC class encapsulates the implementation of the EWC algorithm within the spaCy framework. The workflow involves:
-
Initialization:
- Capture Initial Parameters ($\theta^*$):
- After training the initial task, capture and store the model's parameters.
- Compute Fisher Information Matrix (FIM):
- Use the initial task data to compute gradients.
- Square and average these gradients to estimate the FIM.
- Capture Initial Parameters ($\theta^*$):
-
Training on New Task:
- Compute EWC Penalty:
- During training on a new task, compute the EWC penalty using the stored $\theta^*$ and $F_i$.
- Adjust Gradients:
- Modify the gradients by adding $\lambda F_i (\theta_i - \theta_i^*)$ before updating the parameters.
- Compute EWC Penalty:
EWC Class Methods
-
__init__(self, pipe, data, lambda_=1000.0, pipe_name=None):- Initializes the EWC instance.
- Parameters:
pipe: The spaCy pipeline component (e.g.,ner).data: Training examples used to compute the FIM.- Note: Data is essential for computing the FIM, which estimates parameter importance. Initial parameters alone are insufficient because they do not contain gradient information.
lambda_: Regularization strength.
- Operations:
- Validates the pipe.
- Captures initial parameters ($\theta^*$).
- Computes the FIM.
-
_capture_current_parameters(self, copy=False):- Retrieves the current model parameters.
- If
copyisTrue, returns a deep copy to prevent modifications.
-
_compute_fisher_matrix(self, examples):- Computes the Fisher Information Matrix.
- For each parameter:
- Accumulates the squared gradients over the dataset.
- Averages the accumulated values to estimate $F_i$.
-
compute_ewc_penalty(self):- Calculates the EWC penalty $\Omega(\theta)$.
- Uses the stored $\theta^*$ and computed $F_i$.
-
compute_gradient_penalty(self):- Computes the gradient of the EWC penalty with respect to $\theta$.
- For each parameter:
- Calculates $\lambda F_i (\theta_i - \theta_i^*)$.
-
apply_ewc_penalty_to_gradients(self):- Adjusts the model's gradients in-place by adding the EWC gradient penalty.
- Ensures that the penalty is applied before the optimizer updates the parameters.
Model Wrapping with EWCModelWrapper
- The
EWCModelWrapperclass wraps the spaCy model'sfinish_updatemethod. - It ensures that the EWC penalty is applied to the gradients before the optimizer step.
- By overriding
finish_update, it seamlessly integrates the EWC adjustments into the standard spaCy training loop.
Training Workflow with EWC
-
Initialize EWC:
- Use
create_ewc_pipeto wrap the spaCy component with EWC. - This captures $\theta^*$ and computes the FIM.
- Use
-
Training Loop:
- For each training batch:
- Compute task-specific loss and gradients.
- Apply EWC Penalty:
- Adjust gradients using
apply_ewc_penalty_to_gradients.
- Adjust gradients using
- Update Parameters:
- Use the optimizer to update parameters with the adjusted gradients.
- For each training batch:
-
Evaluation:
- After training, evaluate the model on the test data.
- The model should retain performance on previous tasks while learning the new task.
Code Structure
-
examples/ewc_ner_training_example.py: Example script demonstrating EWC-enhanced NER training. -
data_examples/training_data.py: Contains custom training data with new entity labels.original_spacy_labels.py: Contains original spaCy NER labels for EWC reference.
-
src/spacy_ewc/ewc.py: Implements theEWCclass for calculating EWC penalties and adjusting gradients.vector_dict.py: DefinesVectorDict, a specialized dictionary for model parameters and gradients.
spacy_wrapper/ewc_spacy_wrapper.py: Provides a wrapper to integrate EWC into spaCy's pipeline components.
ner_trainer/ewc_ner_trainer.py: Contains functions to train NER models with EWC applied to gradients.
utils/extract_labels.py: Utility function to extract labels from training data.generate_spacy_entities.py: Generates spaCy-formatted entity annotations from sentences.
Extending the Project
Adding New Components
To extend EWC to other spaCy pipeline components (e.g., textcat, parser):
-
Modify the
EWCClass:- Ensure the class captures and computes parameters relevant to the new component.
- Adjust methods to handle different types of model architectures.
-
Adjust FIM Computation:
- Use appropriate loss functions and data for computing the Fisher Information Matrix for the new component.
-
Wrap the Component:
- Use
create_ewc_pipeto wrap the new component with EWC functionality.
- Use
Customizing EWC Parameters
-
Adjusting $\lambda$ (lambda):
- Controls the balance between learning new information and retaining old knowledge.
- Experiment with different values to find the optimal balance for your use case.
-
Modifying FIM Calculation:
- Consider alternative methods for estimating parameter importance.
- For example, use empirical Fisher Information or other approximations.
Experimentation
-
Different Datasets: Test the model on various datasets to evaluate the effectiveness of EWC in different scenarios.
-
Sequential Tasks: Simulate continual learning by training on multiple tasks sequentially and observing performance retention.
-
Parameter Sensitivity: Analyze how changes in $\lambda$ and other hyperparameters affect the model's performance.
Troubleshooting
-
Gradient Shape Mismatch:
- If you encounter shape mismatches when applying the EWC penalty, ensure that the model's parameters have not changed since initializing EWC.
- Adding new layers or changing the architecture after initializing EWC can cause mismatches.
-
Zero or Negative Loss Values:
- Ensure that your training data is sufficient and correctly formatted.
- Skipped batches due to zero loss can lead to issues in FIM computation.
-
Memory Consumption:
- Computing and storing the FIM can be memory-intensive for large models.
- Consider reducing model size or using a subset of data for FIM estimation.
Contributing
We welcome contributions to enhance the functionality and usability of this project. To contribute:
-
Fork the repository on GitHub.
-
Create a new branch for your feature or bugfix:
git checkout -b feature/your-feature-name
-
Make your changes and commit them with clear messages.
-
Push to your fork:
git push origin feature/your-feature-name
-
Submit a pull request detailing your changes.
Please ensure that your code adheres to the existing style and includes appropriate tests.
Limitations
-
Diagonal Approximation: The implementation uses a diagonal approximation of the FIM, which assumes parameter independence and may not capture all parameter interactions.
-
Computational Overhead: Calculating the FIM and adjusting gradients adds computational complexity and may increase training time.
-
Memory Requirements: Storing $\theta^*$ and $F_i$ for all parameters can be memory-intensive, especially for large models.
-
Limited to Known Parameters: EWC is effective for parameters seen during initial training. New parameters introduced in later tasks are not accounted for in the penalty term.
References
-
Kirkpatrick, J., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521-3526. arXiv:1612.00796
-
spaCy Documentation: https://spacy.io/
-
Thinc Documentation: https://thinc.ai/
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact
For questions or further information, please contact the NLP Team at dev@darkrockmountain.com.
This README is intended to assist team members and contributors in understanding and utilizing the EWC-enhanced spaCy NER training framework.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spacy_ewc-0.1.1.tar.gz.
File metadata
- Download URL: spacy_ewc-0.1.1.tar.gz
- Upload date:
- Size: 51.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04c82a02589dc67f46de3af3eb90122a7915a7070221b3d9c41e4fa19b720920
|
|
| MD5 |
f20840655e4feb6eaf97d4a21888a02f
|
|
| BLAKE2b-256 |
33f78a69f3761f4781006df9aab3034185720394beacdc48e4a653177cf8bdb9
|
Provenance
The following attestation bundles were made for spacy_ewc-0.1.1.tar.gz:
Publisher:
publish.yml on darkrockmountain/spacy-ewc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spacy_ewc-0.1.1.tar.gz -
Subject digest:
04c82a02589dc67f46de3af3eb90122a7915a7070221b3d9c41e4fa19b720920 - Sigstore transparency entry: 147403574
- Sigstore integration time:
-
Permalink:
darkrockmountain/spacy-ewc@06f22f9541a2a03dfbb0a575c445eb4b1077dca7 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/darkrockmountain
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@06f22f9541a2a03dfbb0a575c445eb4b1077dca7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file spacy_ewc-0.1.1-py3-none-any.whl.
File metadata
- Download URL: spacy_ewc-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
def2555f2e240f5b8d6cab3841469a7087ee496b39347e354c4c0f0f563fe17a
|
|
| MD5 |
8e652a2cd35caf2cce811c5856ad0ede
|
|
| BLAKE2b-256 |
ecdee0a176e44c501ddb6d3c7f57c59c5943c2e344dcdb43ea8127e0f25e82f3
|
Provenance
The following attestation bundles were made for spacy_ewc-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on darkrockmountain/spacy-ewc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spacy_ewc-0.1.1-py3-none-any.whl -
Subject digest:
def2555f2e240f5b8d6cab3841469a7087ee496b39347e354c4c0f0f563fe17a - Sigstore transparency entry: 147403576
- Sigstore integration time:
-
Permalink:
darkrockmountain/spacy-ewc@06f22f9541a2a03dfbb0a575c445eb4b1077dca7 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/darkrockmountain
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@06f22f9541a2a03dfbb0a575c445eb4b1077dca7 -
Trigger Event:
release
-
Statement type: