Python implementation of Andrej Kaparthy's recurrentjs library
Project description
# RecurrentJS in Python
Following in the footsteps of [Andrej Kaparthy](http://cs.stanford.edu/people/karpathy/), here is a Python implementation of [recurrentJS](http://cs.stanford.edu/people/karpathy/recurrentjs/) ([Github](https://github.com/karpathy/recurrentjs)).
### Why ?
While Python has great automatic differentiation libraries, a nocompile version is lacking. In particular recurrentJS makes great use of callbacks and garbage collection to enable backprop through time. In this implementation the goal is to reduce reliance on these abstractions and have a simple backprop step class. Finally if this is easily feasible, then ultimately we will implement a C++ version that will make the majority of the computation steps fast except for some intermediary allocations of backprop steps, but that do not make up the bulk of the computation, and keep the API and the syntax clear.
### Planned extensions to Javascript version
In this implementation the goal is simple:
* Move away from callbacks
* Enable batch processing (through masked losses, tensors, and advanced indexing when plucking rows from matrices)
* Form a baseline for implementation in nonscripting languages
### Usage
Below we follow the same steps as in the character generation demo, and we import the same text for character model learning. Perplexity drops quickly to around 78, (mirroring the behavior found in the Javascript version).
from recurrentjs import *
input_size = 1
output_size = 1
epoch_size = 1
letter_size = 5
letterToIndex = {}
indexToLetter = {}
hidden_sizes = [20,20]
generator = "lstm"
vocab = []
regc = 0.000001 # L2 regularization strength
learning_rate = 0.01 # learning rate
clipval = 5.0
solver = Solver()
def initVocab(sents, count_threshold):
global input_size
global output_size
global epoch_size
global vocab
global letterToIndex
global indexToLetter
# count up all characters
d = {}
for sent in sents:
for c in sent:
if c in d:
d[c] += 1
else:
d[c] = 1
# filter by count threshold and create pointers
letterToIndex = {}
indexToLetter = {}
vocab = []
# NOTE: start at one because we will have START and END tokens!
# that is, START token will be index 0 in model letter vectors
# and END token will be index 0 in the next character softmax
q = 1
for ch in d.keys():
if d[ch] >= count_threshold:
# add character to vocab
letterToIndex[ch] = q
indexToLetter[q] = ch
vocab.append(ch)
q += 1
# globals written: indexToLetter, letterToIndex, vocab (list), and:
input_size = len(vocab) + 1;
output_size = len(vocab) + 1;
epoch_size = len(sents)
def forwardIndex(G, model, ix, prev):
x = G.row_pluck(model['Wil'], ix)
# forward prop the sequence learner
if generator == "rnn":
out_struct = forwardRNN(G, model, hidden_sizes, x, prev)
else:
out_struct = forwardLSTM(G, model, hidden_sizes, x, prev)
return out_struct
def initModel():
model = {}
lstm = initLSTM(letter_size, hidden_sizes, output_size) if generator == "lstm" else initRNN(letter_size, hidden_sizes, output_size)
model['Wil'] = RandMat(input_size, letter_size , 0.08)
model.update(lstm)
return model
def costfun(model, sent):
# takes a model and a sentence and
# calculates the loss. Also returns the Graph
# object which can be used to do backprop
n = len(sent)
G = Graph()
log2ppl = 0.0;
cost = 0.0;
prev = None
for i in range(1, n):
# start and end tokens are zeros
ix_source = 0 if i == 1 else letterToIndex[sent[i]] # first step: start with START token
ix_target = 0 if i == n1 else letterToIndex[sent[i+1]] # last step: end with END token
lh = forwardIndex(G, model, ix_source, prev)
prev = lh
# set gradients into logprobabilities
logprobs = lh.output # interpret output as logprobs
probs = softmax(logprobs) # compute the softmax probabilities
log2ppl += np.log(probs.w[ix_target,0]) # accumulate base 2 log prob and do smoothing
cost += np.log(probs.w[ix_target,0])
# write gradients into log probabilities
logprobs.dw = probs.w
logprobs.dw[ix_target] = 1
ppl = np.power(2, log2ppl / (n  1))
return G, ppl, cost
text_data = open("paulgraham_text.txt", "rt").readlines()
initVocab(text_data, 1)
model = initModel()
ppl_list = []
median_ppl = []
tick_iter = 0
def tick():
global tick_iter
global ppl_list
global median_ppl
sentix = np.random.randint(0, len(text_data))
sent = text_data[sentix]
G, ppl, cost = costfun(model, sent)
G.backward()
solver.step(model, learning_rate, regc, clipval)
ppl_list.append(ppl)
tick_iter += 1
if tick_iter % 100 == 0:
median = np.median(ppl_list)
ppl_list = []
median_ppl.append(median)
And the training loop (no fancy prediction and sampling implemented here, but fairly straightforward conversion from the javascript code)
for i in range(1000):
tick()
Following in the footsteps of [Andrej Kaparthy](http://cs.stanford.edu/people/karpathy/), here is a Python implementation of [recurrentJS](http://cs.stanford.edu/people/karpathy/recurrentjs/) ([Github](https://github.com/karpathy/recurrentjs)).
### Why ?
While Python has great automatic differentiation libraries, a nocompile version is lacking. In particular recurrentJS makes great use of callbacks and garbage collection to enable backprop through time. In this implementation the goal is to reduce reliance on these abstractions and have a simple backprop step class. Finally if this is easily feasible, then ultimately we will implement a C++ version that will make the majority of the computation steps fast except for some intermediary allocations of backprop steps, but that do not make up the bulk of the computation, and keep the API and the syntax clear.
### Planned extensions to Javascript version
In this implementation the goal is simple:
* Move away from callbacks
* Enable batch processing (through masked losses, tensors, and advanced indexing when plucking rows from matrices)
* Form a baseline for implementation in nonscripting languages
### Usage
Below we follow the same steps as in the character generation demo, and we import the same text for character model learning. Perplexity drops quickly to around 78, (mirroring the behavior found in the Javascript version).
from recurrentjs import *
input_size = 1
output_size = 1
epoch_size = 1
letter_size = 5
letterToIndex = {}
indexToLetter = {}
hidden_sizes = [20,20]
generator = "lstm"
vocab = []
regc = 0.000001 # L2 regularization strength
learning_rate = 0.01 # learning rate
clipval = 5.0
solver = Solver()
def initVocab(sents, count_threshold):
global input_size
global output_size
global epoch_size
global vocab
global letterToIndex
global indexToLetter
# count up all characters
d = {}
for sent in sents:
for c in sent:
if c in d:
d[c] += 1
else:
d[c] = 1
# filter by count threshold and create pointers
letterToIndex = {}
indexToLetter = {}
vocab = []
# NOTE: start at one because we will have START and END tokens!
# that is, START token will be index 0 in model letter vectors
# and END token will be index 0 in the next character softmax
q = 1
for ch in d.keys():
if d[ch] >= count_threshold:
# add character to vocab
letterToIndex[ch] = q
indexToLetter[q] = ch
vocab.append(ch)
q += 1
# globals written: indexToLetter, letterToIndex, vocab (list), and:
input_size = len(vocab) + 1;
output_size = len(vocab) + 1;
epoch_size = len(sents)
def forwardIndex(G, model, ix, prev):
x = G.row_pluck(model['Wil'], ix)
# forward prop the sequence learner
if generator == "rnn":
out_struct = forwardRNN(G, model, hidden_sizes, x, prev)
else:
out_struct = forwardLSTM(G, model, hidden_sizes, x, prev)
return out_struct
def initModel():
model = {}
lstm = initLSTM(letter_size, hidden_sizes, output_size) if generator == "lstm" else initRNN(letter_size, hidden_sizes, output_size)
model['Wil'] = RandMat(input_size, letter_size , 0.08)
model.update(lstm)
return model
def costfun(model, sent):
# takes a model and a sentence and
# calculates the loss. Also returns the Graph
# object which can be used to do backprop
n = len(sent)
G = Graph()
log2ppl = 0.0;
cost = 0.0;
prev = None
for i in range(1, n):
# start and end tokens are zeros
ix_source = 0 if i == 1 else letterToIndex[sent[i]] # first step: start with START token
ix_target = 0 if i == n1 else letterToIndex[sent[i+1]] # last step: end with END token
lh = forwardIndex(G, model, ix_source, prev)
prev = lh
# set gradients into logprobabilities
logprobs = lh.output # interpret output as logprobs
probs = softmax(logprobs) # compute the softmax probabilities
log2ppl += np.log(probs.w[ix_target,0]) # accumulate base 2 log prob and do smoothing
cost += np.log(probs.w[ix_target,0])
# write gradients into log probabilities
logprobs.dw = probs.w
logprobs.dw[ix_target] = 1
ppl = np.power(2, log2ppl / (n  1))
return G, ppl, cost
text_data = open("paulgraham_text.txt", "rt").readlines()
initVocab(text_data, 1)
model = initModel()
ppl_list = []
median_ppl = []
tick_iter = 0
def tick():
global tick_iter
global ppl_list
global median_ppl
sentix = np.random.randint(0, len(text_data))
sent = text_data[sentix]
G, ppl, cost = costfun(model, sent)
G.backward()
solver.step(model, learning_rate, regc, clipval)
ppl_list.append(ppl)
tick_iter += 1
if tick_iter % 100 == 0:
median = np.median(ppl_list)
ppl_list = []
median_ppl.append(median)
And the training loop (no fancy prediction and sampling implemented here, but fairly straightforward conversion from the javascript code)
for i in range(1000):
tick()
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size recurrentjspython0.0.1.tar.gz (39.5 kB)  File type Source  Python version None  Upload date  Hashes View 
Close
Hashes for recurrentjspython0.0.1.tar.gz
Algorithm  Hash digest  

SHA256  982390e084fa44672f1f9c0117f132203b58d76ff146a88531c31046339004db 

MD5  34f503818e7835aaeb622bc40d56883f 

BLAKE2256  69c8e2c971cd06d4ffa982932c4641ad5270c6bb6f32c5782be03c8ddadc867e 