# Novel Text Generation Training RNN LSTM on Anna Karenina

Posted 2018-12-19

## 1. Goal:¶

- Train a Recurrent Neural Network (RNN) of Long Short Term Memory (LSTM) cells on a corpus of text in order to generate new text on a character-by-character basis.
- What makes RNN's interesting:
- Able to find a pattern on ordered sequences / time series (dimension of time / memory)
- Each timestep is dependent on what came before it
- Much more flexible with inputs
- Even if data is not a sequence, can learn to treat it as such

- Concept:
- Data is fed in ordered sequence (sliding window over the entire dataset)
- Minimizing loss function is calculated on cross-entropy loss of predicted output at each timestep versus actual, ie. predict 2nd character given input of first character.
- Note this means target data is just training shifted 1 timestep, so no need for labeled data per se

- Memory comes into play when predicting the 3rd character given the first 2 prior ordered characters
- Beyond prediction, can also use for generation. A hyperparameter can be tuned to produce more diverse or conservative results

- This is based on a replication (with a few of my own expansions) from the Udacity AIND, specifically the Intro to RNN project on GitHub.
- Training corpus is Anna Karenina full text from Project Gutenberg.
- Great resources to learn about RNNs - building a visual intuition about what's going on at what level of the network was very central to finally understanding what's going on under the hood
- https://r2rt.com/written-memories-understanding-deriving-and-extending-the-lstm.html - This was by far the most helpful in understanding the conceptual progression from RNN's to LSTM cells at both a high level and detailed level.
- http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- https://www.tensorflow.org/tutorials/sequences/recurrent

In [1]:

```
import time
from collections import namedtuple
import keras
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers.embeddings import Embedding
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
# from tqdm import tqdm_notebook
% matplotlib inline
plt.style.use('fivethirtyeight')
```

## 2. Load and Preprocess Data¶

### 2.1 Keras Tokenizer¶

- This was a very simple tokenization based on characters so using the high-level Keras text preprocessing was fine https://keras.io/preprocessing/text/
- In reality it would be better to tokenize by word at which point could do a lot more text pre-processing like word-stemming, stop words, parts of speech, etc. Of which NLTK is much more suited for http://www.nltk.org/

In [2]:

```
class TokenizedData:
"""Data class to convert file to sequences.
Parameters
----------
filename : string
Relative path and filename to text
Attributes
----------
training_text : str
Full text of training data
t : Tokenizer
Tokenizer that's already fitted on training_text
index_to_char : dict
Dict mapping index to character
encoded_text : array, int
Training text converted to sequence
"""
def __init__(self, filename):
# Read data
with open(filename) as f:
self.training_text = f.read()
# Tokenizer
self.t = Tokenizer(filters='', lower=False, char_level=True)
self.t.fit_on_texts(self.training_text)
self.index_to_char = dict(map(reversed, self.t.word_index.items()))
self.encoded_text = np.squeeze(self.t.texts_to_sequences(self.training_text))
def sequence_to_str(self, sequence):
"""Map a list of sequences back to text string.
Parameters
----------
sequence : list, int
Sequence to convert
Returns
-------
mapped_text : str
Converted sequence to text
"""
mapped_text = ''.join([self.index_to_char[c] for c in sequence])
return mapped_text
def get_batches(self, batch_size, timesteps):
"""Yield data in batches.
Parameters
---------
batch_size : int
Number of input sequences per batch
timesteps : int
Number of time steps per sequence (also width of sequence)
Yields
------
Batches of size batch_size X timesteps at a time.
"""
# Get rid of extra characters not enough to fill sequence in the last
# batch
text = self.encoded_text
chars_leftover = len(text) % (batch_size * timesteps)
if chars_leftover > 0:
text = text[:-chars_leftover]
text = text.reshape(batch_size, -1)
for cursor in range(0, text.shape[1], timesteps):
x = text[:batch_size, cursor:cursor+timesteps]
# Since y is x shifted over by 1, the last batch will need to be
# padded by a column of 0's
y_padded = np.zeros(x.shape)
y = text[:batch_size, cursor+1:cursor+timesteps+1]
if (y_padded.shape == y.shape):
yield x, y
else:
y_padded[:, :-1] = y
yield x, y_padded
tokenized_data = TokenizedData('anna.txt')
```

In [3]:

```
# Set of all characters in the text
tokenized_data.t.word_index
```

Out[3]:

In [4]:

```
# Counts of each character in the text
tokenized_data.t.word_counts
```

Out[4]:

In [5]:

```
# First 1000 characters of text
tokenized_data.training_text[:100]
```

Out[5]:

In [6]:

```
# Encoded sequence
sequence = tokenized_data.encoded_text[:100]
sequence
```

Out[6]:

In [7]:

```
tokenized_data.sequence_to_str(sequence)
```

Out[7]:

### 2.2 Generate Batches¶

In [8]:

```
# Sample array manipulation to get a feel of what's going on when we get the
# batches
a = np.arange(20)
a = a.reshape(5,-1)
print(a)
for cursor in range(0, 4, 2):
print(a[:5, cursor:cursor+2])
print(a[:5, cursor+1:cursor+3])
```

In [9]:

```
batches = tokenized_data.get_batches(10, 50)
x, y = next(batches)
```

In [10]:

```
x[:10, :10]
```

Out[10]:

In [11]:

```
y[:10, :10]
```

Out[11]:

## 3. Model¶

- Part of the struggle in learning is was having a mental model of the following
- RNN vs input/output layer
- 1 stack of LSTM vs unrolled RNN of LSTM cells and how the states / outputs connected over timesteps
- How states / outputs were calculated within 1 LSTM cell using forget, ignore, read gates
- How implementation of batches affected the matrix calculations

- These distinctions are important to keep in mind for the following subsections

### 3.1 Inputs to TF Graph¶

- Inputs to TF are based on batch X timesteps. The input to timesteps are integers that will be one hot encoded later within the TF graph
- Dropout regularization within LSTM cell can be specified for input/output/state - https://www.tensorflow.org/api_docs/python/tf/nn/rnn_cell/DropoutWrapper

In [12]:

```
def model_inputs(batch_size, timesteps):
"""
Build model inputs to the TF graph. Note that target is just the input
sequence shifted over one timestep.
Parameters
----------
batch_size : int
Batch size
timesteps : int
Number of steps per sequence
Returns
-------
x : placeholder tensor, int
Training data input
shape (None, timesteps)
y : placeholder tensor, int
Target data input (training data shifted over by 1 step)
shape (None, timesteps)
keep_prob : float
Keep probability regularization within lstm cell
"""
x = tf.placeholder(tf.int32, shape=(None, timesteps), name='input_sequence')
y = tf.placeholder(tf.int32, shape=(None, timesteps), name='target_sequence')
keep_prob = tf.placeholder(tf.float32, shape=(), name='keep_prob')
return x, y, keep_prob
```

### 3.2 LSTM Stack¶

- After we've specified 1 cell
`tf.nn.rnn_cell.LSTMCell`

, the creation is of the stack is taken care of by`tf.nn.rnn_cell.MultiRNNCell`

- We're building the stack of lstm cells for 1 timestep. 'Unrolling' or sequential handling of this lstm stack into a RNN is taken care of later
- Peephole lstm cell is based on https://arxiv.org/abs/1402.1128 (Sak, Senior, Beaufays 2014)

In [13]:

```
def model_lstm_stack(lstm_cell_size, num_layers, batch_size, keep_prob):
"""Build a stack of peephole lstm cells for 1 timestep.
Parameters
---------
lstm_cell_size : int
Number of hidden neurons in each lstm cell
num_layers : int
Number of layers within each lstm cell
keep_prob : placeholder scalar tensor
Keep probability regularization within lstm cell (note this is for
output gate, not inpute or state)
Returns
-------
lstm_stack : MultiRNNCell
RNN stack composed sequentially of a number of lstm cells
shape (lstm_cell_size)
init_state : MultiRNNCell
RNN stack accomodating batch_size and initialized values to 0
shape (batch_size, lstm_cell_size)
"""
def lstm_cell(lstm_cell_size, keep_prob):
cell = tf.nn.rnn_cell.LSTMCell(lstm_cell_size, use_peepholes=True)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=keep_prob)
return cell
lstm_stack = tf.nn.rnn_cell.MultiRNNCell(
[lstm_cell(lstm_cell_size, keep_prob) for _ in range(num_layers)])
init_state = lstm_stack.zero_state(batch_size, dtype=tf.float32)
return lstm_stack, init_state
```

### 3.3 Output Layer¶

- RNN outputs at all timesteps train the same dense output layer weights and biases

In [14]:

```
def model_output(rnn_outputs, in_size, out_size):
"""Manually build softmax prediction layer using 1 set of weights for all
timestep outputs from RNN.
Parameters
----------
rnn_outputs :
shape (batch_size, timestep, lstm_cell_size)
in_size : int
Input size into softmax layer - number of hidden neurons in each lstm
cell
out_size : int
Output size for softmax layer - number of prediction classes
Returns
-------
logits : tensor
Logit outputs
shape ((batch_size * timestep), out_size)
predictions : tensor
Softmax logits
shape ((batch_size * timestep), out_size)
"""
# Convert rnn_output shape (batch_size, timestep, lstm_cell_size) so that
# there is only 1 timestep per row. Ordering of the rows is by going thru
# all the timesteps in 1 sequence, then next sequence in batch.
# Resulting shape ((batch_step * timestep), lstm_cell_size)
# Concat so all timesteps in 1 sequence are concatenated with
# shape (batch_size, (timestep * lstm_cell_size))
input_seq = tf.concat(rnn_outputs, axis=1)
# Order is now correct with order by batch, timestep. Then reshape so
# there is only 1 timestep input per row
input_step = tf.reshape(input_seq, [-1, in_size])
# Variable scope to avoid name collision with softmax within LSTM cells
with tf.variable_scope('output_layer'):
dense_w = tf.Variable(tf.truncated_normal((in_size, out_size),
dtype=tf.float32))
dense_b = tf.Variable(tf.zeros((out_size)))
logits = tf.add(tf.matmul(input_step, dense_w),
dense_b, name='logits')
predictions = tf.nn.softmax(logits, name='predictions')
return logits, predictions
```

### 3.4 Loss¶

- Calculate losses for the batch of timesteps
`tf.nn.softmax_cross_entropy_with_logits_v2`

In [15]:

```
def model_loss(logits, y_one_hot):
"""Calculate losses.
Parameters
----------
logits : tensor
Output layer logits from softmax
shape ((batch_size, timestep), num_classes)
y_one_hot : tensor
Target labels of next token
shape (batch_size, timestep, num_classes)
Returns
-------
loss : scalar tensor
Cross entropy loss for the batch of timesteps
"""
# Reshape y to (batch_size, (timestep * num_classes)) to match logits
y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_reshaped,
logits=logits), name='loss')
return loss
```

### 3.5 Optimizer¶

- Due to vanishing gradients (ie. gradients become really small as timesteps get larger and larger), LSTM cells are designed to deal with this problem.
- Gradient clipping to a threshold is also employed to deal with exploding gradients.
- https://arxiv.org/pdf/1211.5063.pdf (Pascanu 2013)
- Thus, this is at a lower level due to breaking the optimization into 2 distnct steps (calculation of gradient & clipping
`tf.clip_by_global_norm`

, then applying gradient`apply_gradients`

)

In [16]:

```
def model_optimizer(loss, learning_rate, grad_clip):
"""Low level building of optimizer using gradient clipping to help with
gradient overflow. LSTM solve underflow but gradient clipping is needed
for overflow. Gradient clipping clips values of multiple tensors by the
ratio of sum of their norms.
Parameters
----------
loss : scalar tensor
learning_rate : float
grad_clip : scalar tensor
Clipping ratio of the sum of their norms
Returns
-------
optimizer_op : Optimizer Operation
The Optimizer Operation that applies the specific gradients
"""
# Optimizer minimze = compute gradient and apply gradient
# Compute gradients of trainable vars, clip them if too big
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
# Use AdamOptimizer and apply gradient
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
optimizer_op = optimizer.apply_gradients(zip(grads, tvars))
return optimizer_op
```

### 3.6 CharRNN Class¶

- Because 'memory' is maintained in RNN, in contract to other deep learning models, we need to track initial state (ie state of RNN before timestep 1 of the entire training corpus) and be able to set it to 0 for every new epoch. Otherwise the RNN would retain memory from the end of training corpus (ie. have a memory of the future).
- Note the distinction of state versus output. State is the memory of the RNN, output at timestep is based on input and memory. Thus state is the 'long term memory' while output is the 'short term memory' of LSTM.
`tf.nn.dynamic_rnn`

unrolls the lstm stack across timesteps

In [17]:

```
class CharRNN:
"""A character based RNN using LSTM cells and gradient clipping.
Parameters
----------
num_classes : int
Number of prediction classes
batch_size : int
Number in each batch [default: 64]
timesteps : int
Number of timesteps within each sequence [default: 50]
lstm_cell_size : int
Number of neurons within each lstm cell [default: 128]
num_layers : int
Number of layers within each lstm cell [default: 2]
learning_rate : float
Optmizer learning rate [default: 0.001]
grad_clip : float
Gradient clipping ratio [default: 5.]
sampling : bool
1 timestep calculation at a time if True [default: False]
Attributes
----------
x : ndarray, int
Training data inputs
y : ndarray, int
Target data
batch_size : int
Number in each batch
timesteps : int
Number of timesteps within each sequence
init_state : MultiRNNCell
Initial state of the RNN stack - in the context of 'unrolled' RNN this
is the beginning of the first timestep, ie. state before anything has
been fed thru the network
shape (batch_size, lstm_cell_size)
final_state : MultiRNNCell
Final state of the RNN stack - in the context of the 'unrolled' RNN
this is the state at the end of the last timestep, ie. state after the
sequence has been feed thru the network
shape (batch_size, lstm_cell_size)
logits : tensor
Logit outputs
shape ((batch_size * timestep), out_size)
predictions : tensor
Softmax logits
shape ((batch_size * timestep), out_size)
batch_loss : float
loss for the batch
optimizer_op : Optimizer Operation
The Optimizer Operation that applies the specific gradients
"""
def __init__(self, num_classes, batch_size=64, timesteps=50, lstm_cell_size=128,
num_layers=2, learning_rate=0.001, grad_clip=5.,
sampling=False):
if sampling:
self.batch_size, self.timesteps = 1, 1
else:
self.batch_size, self.timesteps = batch_size, timesteps
tf.reset_default_graph()
# Build inputs to TF graph
self.x, self.y, self.keep_prob = model_inputs(self.batch_size, self.timesteps)
x_one_hot = tf.one_hot(self.x, num_classes)
y_one_hot = tf.one_hot(self.y, num_classes)
# Build 1 lstm stack
lstm_stack, self.init_state = model_lstm_stack(lstm_cell_size,
num_layers,
self.batch_size,
self.keep_prob)
# Build RNN by unrolling the lstm stack to timesteps
rnn_outputs, self.final_state = tf.nn.dynamic_rnn(cell=lstm_stack,
inputs=x_one_hot,
initial_state=self.init_state)
# Apply dense layer to RNN outputs to get logits and softmax prediction
self.logits, self.predictions = model_output(rnn_outputs,
lstm_cell_size,
num_classes)
# Calculate losses and optimizer for batch
self.batch_loss = model_loss(self.logits, y_one_hot)
self.optimizer_op = model_optimizer(self.batch_loss,
learning_rate,
grad_clip)
```

## Training Loop¶

- Train each batch to calculate final state, losses, and optimization
- Make sure to pass the final_state to the init_state at the beginning of each batch

In [18]:

```
def train(model, tokenized_data, epochs, keep_prob, batch_size, timesteps,
print_every_n, save_every_n):
"""Training loop.
Parameters
----------
model : CharRNN
CharRNN object initialized with model paramters
tokenized_data : TokenizedData
Tokenized data object
epochs : int
Number of epochs to train
keep_prob : float
Keep probability of layers in each lstm cell
batch_size : int
Batch size
timesteps : int
Number of timesteps for each sequence
print_every_n : int
Print every n batches
save_every_n : int
Save every n batches
Returns
-------
epoch_losses : list, float
List of epoch losses
"""
saver = tf.train.Saver(max_to_keep=20, save_relative_paths=True)
epoch_losses = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for e in range(epochs):
# Initialize state of RNN to 0 at start of epoch
new_state = sess.run(model.init_state)
epoch_t1 = time.time()
epoch_loss = 0
# Progress bar
# total_batches = int(len(tokenized_data.encoded_text) / (batch_size * timesteps))
# pbar = tqdm_notebook(total=total_batches)
for x, y in tokenized_data.get_batches(batch_size, timesteps):
# pbar.update(1)
batch_t1 = time.time()
feed = {model.x: x,
model.y: y,
model.keep_prob: keep_prob,
model.init_state: new_state}
batch_loss, new_state, optimizer_op = sess.run([model.batch_loss,
model.final_state,
model.optimizer_op],
feed_dict=feed)
batch_t2 = time.time()
epoch_loss += batch_loss
epoch_t2 = time.time()
epoch_losses.append(epoch_loss)
# Print output
if (e % print_every_n == 0):
print('Epoch {}/{}'.format(e+1, epochs))
print('Epoch loss: {:.4f}'.format(epoch_loss))
print('Epoch time taken: {:.3f}'.format(epoch_t2 - epoch_t1))
print('Last batch loss: {:.4f}'.format(batch_loss))
print('Last batch time taken: {:.3f}'.format(batch_t2 - batch_t1))
# Save model weights to disk
if (e % save_every_n == 0):
saver.save(sess, 'checkpoints/e{}.ckpt'.format(e))
# Save model weight for the very last iteration
saver.save(sess, 'checkpoints/e{}.ckpt'.format(e))
return epoch_losses
```

## Main Loop¶

### Hyperparameters¶

- Some tips on hyperparameters https://github.com/karpathy/char-rnn#tips-and-tricks
- lstm cell size - # of hidden units
- number of hidden layers - 2 or 3
- timestep - this will govern how far the gradient propagates back, ie. patterns / relationships in 1 sequence
- Note I didn't do cross validation for this but approach would be the same, splitting the data where 95% of it is training, 5% for validation / test.

In [19]:

```
# GLOBAL VARIABLES
# Model hyperparameters
LSTM_CELL_SIZE = 512
NUM_LAYERS = 2
LEARNING_RATE = 0.001
GRAD_CLIP = 5
# Training hyperparameters
EPOCHS = 25
KEEP_PROB = 0.5
BATCH_SIZE = 100
TIMESTEPS = 100
PRINT_EVERY_N = 1
SAVE_EVERY_N = 1
```

In [20]:

```
def main():
model = CharRNN(len(tokenized_data.t.word_index),
batch_size=BATCH_SIZE,
timesteps=TIMESTEPS,
lstm_cell_size=LSTM_CELL_SIZE,
num_layers=NUM_LAYERS,
learning_rate=LEARNING_RATE,
grad_clip=GRAD_CLIP)
epoch_losses = train(model,
tokenized_data,
epochs=EPOCHS,
keep_prob=KEEP_PROB,
batch_size=BATCH_SIZE,
timesteps=TIMESTEPS,
print_every_n=PRINT_EVERY_N,
save_every_n=SAVE_EVERY_N)
plt.figure(figsize=(6,6))
plt.plot(epoch_losses)
plt.xlabel('Epochs')
plt.ylabel('Epoch losses')
main()
```

## Predicting Characters and Generating Novel Text¶

- Now that the RNN is trained, we can use it to predict the next character given a string of previous characters.
- Note that if we take the prediction character as input for the next timestep, we've now generated new text of some arbitrary length that we've specified!
- Temperature is a hyperparameter dial that can tune more conservative predictions versus more diverse but higher error

In [21]:

```
def pick_char(prediction, top_n):
"""Sample from the top_n characters probabilistically.
Parameters
----------
prediction : array, float
Prediction probabilities for each char class
top_n : int
Number of top candidates to sample from
Returns
-------
prediction_index : int
Index of the character selected
"""
prediction = np.squeeze(prediction)
# Set all the classes outside top_n to 0 probability
prediction[np.argsort(prediction)[:-top_n]] = 0
# Normalize sum probability to 1
prediction = prediction / np.sum(prediction)
# Randomly select but based on probabilities of likelihood
prediction_index = np.random.choice(len(prediction), 1, p=prediction)[0]
return prediction_index
```

In [22]:

```
def infer(tokenized_data, checkpoint, model, n_samples, text_seed, top_n):
"""Generate new text based on text seed and checkpoint weights.
Parameters
----------
tokenized_data : TokenizedData
Pre-processed data
checkpoint : Checkpoints
Checkpoint file [default: None]
model : CharRNN
RNN model
n_samples : int
Number of characters to generate
text_seed : str
Initial string to prime the RNN state
top_n : int
Number of top candidate characters to sample from
Returns
-------
generated_text : str
Predicted text string
"""
generated_text = text_seed
generated_seq = np.squeeze(tokenized_data.t.texts_to_sequences(text_seed))
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, checkpoint)
new_state = sess.run(model.init_state)
# Prime the RNN state with initial text seed
for i, _ in enumerate(generated_text[:-1]):
x = np.zeros((1, 1))
x[0,0] = generated_seq[i]
feed = {model.x: x,
model.keep_prob: 1.,
model.init_state: new_state}
_, new_state = sess.run([model.predictions,
model.final_state],
feed_dict=feed)
# Start generating predictions after RNN state has been primed
for _ in range(n_samples):
x = np.zeros((1, 1))
x[0,0] = generated_seq[-1]
feed = {model.x: x,
model.keep_prob: 1.,
model.init_state: new_state}
prediction, new_state = sess.run([model.predictions,
model.final_state],
feed_dict=feed)
# Pick character and append
predicted_char_index = pick_char(prediction, top_n)
predicted_char = tokenized_data.sequence_to_str([predicted_char_index])[0]
generated_text = generated_text + str(predicted_char)
generated_seq = np.append(generated_seq, predicted_char_index)
return generated_text
```

In [23]:

```
def generate_text(tokenized_data, n_samples, text_seed, top_n, checkpoint=None):
"""Use trained model to generate new text based on an initial string.
Parameters
----------
tokenized_data : TokenizedData
Pre-processed data
n_samples : int
Number of characters to generate
text_seed : str
Initial string to prime the RNN state
top_n : int
Number of top candidate characters to sample from
checkpoint : Checkpoints
Checkpoint file [default: None]
"""
infer_model = CharRNN(len(tokenized_data.t.word_index),
lstm_cell_size=LSTM_CELL_SIZE,
num_layers=NUM_LAYERS,
sampling=True)
if checkpoint is None:
checkpoint = tf.train.latest_checkpoint('checkpoints')
print(infer(tokenized_data, checkpoint, infer_model, n_samples, text_seed, top_n))
```

### Final Model - Pick from top 5 most likely characters¶

- With the same trained model and text string seed, there still seems to be a fair amount of variation due to probabilistic selection of the top 5 characters.

In [24]:

```
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5)
```

In [25]:

```
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5)
```

### Final Model - Pick from top 1 most likely characters¶

- Not enough randomness - somehow gets into some determinate pattern?

In [26]:

```
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=1)
```

### Final Model - Pick from top 2 most likely characters¶

- Top 2 seemed better but still lots of repeated words

In [27]:

```
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=2)
```

### Epoch 0, 5, 10 Model - Pick from top 5 most likely characters¶

- Taking a look at how it's learning over the epochs

In [28]:

```
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5,
checkpoint='./checkpoints/e5.ckpt')
```

In [29]:

```
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5,
checkpoint='./checkpoints/e10.ckpt')
```

In [30]:

```
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5,
checkpoint='./checkpoints/e15.ckpt')
```

In [31]:

```
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5,
checkpoint='./checkpoints/e20.ckpt')
```

In [ ]:

```
```