# Novel Text Generation Training RNN LSTM on Anna Karenina

Posted 2018-12-19

## 1. Goal:¶

• Train a Recurrent Neural Network (RNN) of Long Short Term Memory (LSTM) cells on a corpus of text in order to generate new text on a character-by-character basis.
• What makes RNN's interesting:
• Able to find a pattern on ordered sequences / time series (dimension of time / memory)
• Each timestep is dependent on what came before it
• Much more flexible with inputs
• Even if data is not a sequence, can learn to treat it as such
• Concept:
• Data is fed in ordered sequence (sliding window over the entire dataset)
• Minimizing loss function is calculated on cross-entropy loss of predicted output at each timestep versus actual, ie. predict 2nd character given input of first character.
• Note this means target data is just training shifted 1 timestep, so no need for labeled data per se
• Memory comes into play when predicting the 3rd character given the first 2 prior ordered characters
• Beyond prediction, can also use for generation. A hyperparameter can be tuned to produce more diverse or conservative results
• This is based on a replication (with a few of my own expansions) from the Udacity AIND, specifically the Intro to RNN project on GitHub.
• Training corpus is Anna Karenina full text from Project Gutenberg.
• Great resources to learn about RNNs - building a visual intuition about what's going on at what level of the network was very central to finally understanding what's going on under the hood
In [1]:
import time
from collections import namedtuple

import keras
from keras.preprocessing.text import Tokenizer
from keras.layers.embeddings import Embedding
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
# from tqdm import tqdm_notebook

% matplotlib inline
plt.style.use('fivethirtyeight')

/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## 2. Load and Preprocess Data¶

### 2.1 Keras Tokenizer¶

• This was a very simple tokenization based on characters so using the high-level Keras text preprocessing was fine https://keras.io/preprocessing/text/
• In reality it would be better to tokenize by word at which point could do a lot more text pre-processing like word-stemming, stop words, parts of speech, etc. Of which NLTK is much more suited for http://www.nltk.org/
In [2]:
class TokenizedData:
"""Data class to convert file to sequences.

Parameters
----------
filename : string
Relative path and filename to text

Attributes
----------
training_text : str
Full text of training data
t : Tokenizer
Tokenizer that's already fitted on training_text
index_to_char : dict
Dict mapping index to character
encoded_text : array, int
Training text converted to sequence
"""
def __init__(self, filename):
with open(filename) as f:

# Tokenizer
self.t = Tokenizer(filters='', lower=False, char_level=True)
self.t.fit_on_texts(self.training_text)
self.index_to_char = dict(map(reversed, self.t.word_index.items()))
self.encoded_text = np.squeeze(self.t.texts_to_sequences(self.training_text))

def sequence_to_str(self, sequence):
"""Map a list of sequences back to text string.

Parameters
----------
sequence : list, int
Sequence to convert

Returns
-------
mapped_text : str
Converted sequence to text
"""
mapped_text = ''.join([self.index_to_char[c] for c in sequence])
return mapped_text

def get_batches(self, batch_size, timesteps):
"""Yield data in batches.

Parameters
---------
batch_size : int
Number of input sequences per batch
timesteps : int
Number of time steps per sequence (also width of sequence)

Yields
------
Batches of size batch_size X timesteps at a time.
"""
# Get rid of extra characters not enough to fill sequence in the last
# batch
text = self.encoded_text
chars_leftover = len(text) % (batch_size * timesteps)
if chars_leftover > 0:
text = text[:-chars_leftover]

text = text.reshape(batch_size, -1)

for cursor in range(0, text.shape[1], timesteps):
x = text[:batch_size, cursor:cursor+timesteps]
# Since y is x shifted over by 1, the last batch will need to be
# padded by a column of 0's
y = text[:batch_size, cursor+1:cursor+timesteps+1]
yield x, y
else:

tokenized_data = TokenizedData('anna.txt')

In [3]:
# Set of all characters in the text
tokenized_data.t.word_index

Out[3]:
{' ': 1,
'e': 2,
't': 3,
'a': 4,
'o': 5,
'n': 6,
'h': 7,
'i': 8,
's': 9,
'r': 10,
'd': 11,
'l': 12,
'\n': 13,
'u': 14,
'w': 15,
'c': 16,
'm': 17,
'g': 18,
'y': 19,
',': 20,
'f': 21,
'p': 22,
'b': 23,
'.': 24,
'v': 25,
'k': 26,
'"': 27,
"'": 28,
'I': 29,
'A': 30,
'x': 31,
'-': 32,
'T': 33,
'S': 34,
'?': 35,
'L': 36,
'H': 37,
'W': 38,
'!': 39,
';': 40,
'B': 41,
'V': 42,
'j': 43,
'q': 44,
'K': 45,
'Y': 46,
'z': 47,
'M': 48,
'O': 49,
'D': 50,
'N': 51,
'P': 52,
'C': 53,
'_': 54,
'G': 55,
'F': 56,
'E': 57,
':': 58,
'R': 59,
'(': 60,
')': 61,
'1': 62,
'2': 63,
'J': 64,
'U': 65,
'3': 66,
'*': 67,
'0': 68,
'5': 69,
'8': 70,
'4': 71,
'6': 72,
'9': 73,
'7': 74,
'/': 75,
'Q': 76,
'Z': 77,
'X': 78,
'@': 79,
'$': 80, '': 81, '&': 82, '%': 83} In [4]: # Counts of each character in the text tokenized_data.t.word_counts  Out[4]: OrderedDict([('C', 796), ('h', 104874), ('a', 119810), ('p', 23288), ('t', 139018), ('e', 186592), ('r', 80402), (' ', 321702), ('1', 179), ('\n', 40263), ('H', 2077), ('y', 31223), ('f', 30986), ('m', 33518), ('i', 103979), ('l', 58913), ('s', 95717), ('k', 14285), (';', 1684), ('v', 18625), ('u', 40052), ('n', 110374), ('o', 114197), ('w', 35484), ('.', 19895), ('E', 491), ('g', 33033), ('c', 33922), ('O', 971), ('b', 19908), ("'", 6721), ('T', 2948), ('d', 68060), ('F', 494), (',', 31140), ('q', 1399), ('-', 3364), ('j', 1416), ('P', 876), ('S', 2901), ('A', 5303), ('"', 14012), ('Y', 1133), ('?', 2362), ('N', 918), ('!', 1717), ('D', 934), ('_', 706), ('I', 6254), ('x', 3422), (':', 439), ('M', 1015), ('W', 1817), ('(', 219), (')', 219), ('B', 1596), ('2', 107), ('R', 361), ('z', 1032), ('G', 613), ('L', 2168), ('3', 68), ('K', 1254), ('4', 35), ('V', 1439), ('5', 38), ('Z', 12), ('6', 33), ('7', 31), ('8', 37), ('U', 77), ('9', 33), ('0', 42), ('J', 89), ('Q', 22), ('', 1), ('X', 3), ('*', 48), ('/', 31), ('&', 1), ('%', 1), ('@', 2), ('$', 2)])
In [5]:
# First 1000 characters of text
tokenized_data.training_text[:100]

Out[5]:
'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'
In [6]:
# Encoded sequence
sequence = tokenized_data.encoded_text[:100]
sequence

Out[6]:
array([53,  7,  4, 22,  3,  2, 10,  1, 62, 13, 13, 13, 37,  4, 22, 22, 19,
1, 21,  4, 17,  8, 12,  8,  2,  9,  1,  4, 10,  2,  1,  4, 12, 12,
1,  4, 12,  8, 26,  2, 40,  1,  2, 25,  2, 10, 19,  1, 14,  6,  7,
4, 22, 22, 19,  1, 21,  4, 17,  8, 12, 19,  1,  8,  9,  1, 14,  6,
7,  4, 22, 22, 19,  1,  8,  6,  1,  8,  3,  9,  1,  5, 15,  6, 13,
15,  4, 19, 24, 13, 13, 57, 25,  2, 10, 19,  3,  7,  8,  6])
In [7]:
tokenized_data.sequence_to_str(sequence)

Out[7]:
'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

### 2.2 Generate Batches¶

In [8]:
# Sample array manipulation to get a feel of what's going on when we get the
# batches
a = np.arange(20)
a = a.reshape(5,-1)
print(a)
for cursor in range(0, 4, 2):
print(a[:5, cursor:cursor+2])
print(a[:5, cursor+1:cursor+3])

[[ 0  1  2  3]
[ 4  5  6  7]
[ 8  9 10 11]
[12 13 14 15]
[16 17 18 19]]
[[ 0  1]
[ 4  5]
[ 8  9]
[12 13]
[16 17]]
[[ 1  2]
[ 5  6]
[ 9 10]
[13 14]
[17 18]]
[[ 2  3]
[ 6  7]
[10 11]
[14 15]
[18 19]]
[[ 3]
[ 7]
[11]
[15]
[19]]

In [9]:
batches = tokenized_data.get_batches(10, 50)
x, y = next(batches)

In [10]:
x[:10, :10]

Out[10]:
array([[53,  7,  4, 22,  3,  2, 10,  1, 62, 13],
[ 1,  4, 17,  1,  6,  5,  3,  1, 18,  5],
[25,  8,  6, 24, 13, 13, 27, 46,  2,  9],
[ 6,  1, 11, 14, 10,  8,  6, 18,  1,  7],
[ 1,  8,  3,  1,  8,  9, 20,  1,  9,  8],
[ 1, 29,  3,  1, 15,  4,  9, 13,  5,  6],
[ 7,  2,  6,  1, 16,  5, 17,  2,  1, 21],
[40,  1, 23, 14,  3,  1,  6,  5, 15,  1],
[ 3,  1,  8,  9,  6, 28,  3, 24,  1, 33],
[ 1,  9,  4,  8, 11,  1,  3,  5,  1,  7]])
In [11]:
y[:10, :10]

Out[11]:
array([[ 7,  4, 22,  3,  2, 10,  1, 62, 13, 13],
[ 4, 17,  1,  6,  5,  3,  1, 18,  5,  8],
[ 8,  6, 24, 13, 13, 27, 46,  2,  9, 20],
[ 1, 11, 14, 10,  8,  6, 18,  1,  7,  8],
[ 8,  3,  1,  8,  9, 20,  1,  9,  8, 10],
[29,  3,  1, 15,  4,  9, 13,  5,  6, 12],
[ 2,  6,  1, 16,  5, 17,  2,  1, 21,  5],
[ 1, 23, 14,  3,  1,  6,  5, 15,  1,  9],
[ 1,  8,  9,  6, 28,  3, 24,  1, 33,  7],
[ 9,  4,  8, 11,  1,  3,  5,  1,  7,  2]])

## 3. Model¶

• Part of the struggle in learning is was having a mental model of the following
• RNN vs input/output layer
• 1 stack of LSTM vs unrolled RNN of LSTM cells and how the states / outputs connected over timesteps
• How states / outputs were calculated within 1 LSTM cell using forget, ignore, read gates
• How implementation of batches affected the matrix calculations
• These distinctions are important to keep in mind for the following subsections

### 3.1 Inputs to TF Graph¶

In [12]:
def model_inputs(batch_size, timesteps):
"""
Build model inputs to the TF graph. Note that target is just the input
sequence shifted over one timestep.

Parameters
----------
batch_size : int
Batch size
timesteps : int
Number of steps per sequence

Returns
-------
x : placeholder tensor, int
Training data input
shape (None, timesteps)
y : placeholder tensor, int
Target data input (training data shifted over by 1 step)
shape (None, timesteps)
keep_prob : float
Keep probability regularization within lstm cell
"""
x = tf.placeholder(tf.int32, shape=(None, timesteps), name='input_sequence')
y = tf.placeholder(tf.int32, shape=(None, timesteps), name='target_sequence')
keep_prob = tf.placeholder(tf.float32, shape=(), name='keep_prob')

return x, y, keep_prob


### 3.2 LSTM Stack¶

• After we've specified 1 cell tf.nn.rnn_cell.LSTMCell, the creation is of the stack is taken care of by tf.nn.rnn_cell.MultiRNNCell
• We're building the stack of lstm cells for 1 timestep. 'Unrolling' or sequential handling of this lstm stack into a RNN is taken care of later
• Peephole lstm cell is based on https://arxiv.org/abs/1402.1128 (Sak, Senior, Beaufays 2014)
In [13]:
def model_lstm_stack(lstm_cell_size, num_layers, batch_size, keep_prob):
"""Build a stack of peephole lstm cells for 1 timestep.

Parameters
---------
lstm_cell_size : int
Number of hidden neurons in each lstm cell
num_layers : int
Number of layers within each lstm cell
keep_prob : placeholder scalar tensor
Keep probability regularization within lstm cell (note this is for
output gate, not inpute or state)

Returns
-------
lstm_stack : MultiRNNCell
RNN stack composed sequentially of a number of lstm cells
shape (lstm_cell_size)
init_state : MultiRNNCell
RNN stack accomodating batch_size and initialized values to 0
shape (batch_size, lstm_cell_size)
"""
def lstm_cell(lstm_cell_size, keep_prob):
cell = tf.nn.rnn_cell.LSTMCell(lstm_cell_size, use_peepholes=True)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=keep_prob)
return cell

lstm_stack = tf.nn.rnn_cell.MultiRNNCell(
[lstm_cell(lstm_cell_size, keep_prob) for _ in range(num_layers)])
init_state = lstm_stack.zero_state(batch_size, dtype=tf.float32)

return lstm_stack, init_state


### 3.3 Output Layer¶

• RNN outputs at all timesteps train the same dense output layer weights and biases
In [14]:
def model_output(rnn_outputs, in_size, out_size):
"""Manually build softmax prediction layer using 1 set of weights for all
timestep outputs from RNN.

Parameters
----------
rnn_outputs :
shape (batch_size, timestep, lstm_cell_size)
in_size : int
Input size into softmax layer - number of hidden neurons in each lstm
cell
out_size : int
Output size for softmax layer - number of prediction classes

Returns
-------
logits : tensor
Logit outputs
shape ((batch_size * timestep), out_size)
predictions : tensor
Softmax logits
shape ((batch_size * timestep), out_size)
"""
# Convert rnn_output shape (batch_size, timestep, lstm_cell_size) so that
# there is only 1 timestep per row. Ordering of the rows is by going thru
# all the timesteps in 1 sequence, then next sequence in batch.
# Resulting shape ((batch_step * timestep), lstm_cell_size)

# Concat so all timesteps in 1 sequence are concatenated with
# shape (batch_size, (timestep * lstm_cell_size))
input_seq = tf.concat(rnn_outputs, axis=1)
# Order is now correct with order by batch, timestep. Then reshape so
# there is only 1 timestep input per row
input_step = tf.reshape(input_seq, [-1, in_size])

# Variable scope to avoid name collision with softmax within LSTM cells
with tf.variable_scope('output_layer'):
dense_w = tf.Variable(tf.truncated_normal((in_size, out_size),
dtype=tf.float32))
dense_b = tf.Variable(tf.zeros((out_size)))

dense_b, name='logits')
predictions = tf.nn.softmax(logits, name='predictions')

return logits, predictions


### 3.4 Loss¶

• Calculate losses for the batch of timesteps tf.nn.softmax_cross_entropy_with_logits_v2
In [15]:
def model_loss(logits, y_one_hot):
"""Calculate losses.

Parameters
----------
logits : tensor
Output layer logits from softmax
shape ((batch_size, timestep), num_classes)
y_one_hot : tensor
Target labels of next token
shape (batch_size, timestep, num_classes)

Returns
-------
loss : scalar tensor
Cross entropy loss for the batch of timesteps
"""
# Reshape y to (batch_size, (timestep * num_classes)) to match logits
y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_reshaped,
logits=logits), name='loss')
return loss


### 3.5 Optimizer¶

• Due to vanishing gradients (ie. gradients become really small as timesteps get larger and larger), LSTM cells are designed to deal with this problem.
• Gradient clipping to a threshold is also employed to deal with exploding gradients.
• https://arxiv.org/pdf/1211.5063.pdf (Pascanu 2013)
• Thus, this is at a lower level due to breaking the optimization into 2 distnct steps (calculation of gradient & clipping tf.clip_by_global_norm, then applying gradient apply_gradients)
In [16]:
def model_optimizer(loss, learning_rate, grad_clip):
"""Low level building of optimizer using gradient clipping to help with
for overflow. Gradient clipping clips values of multiple tensors by the
ratio of sum of their norms.

Parameters
----------
loss : scalar tensor
learning_rate : float
Clipping ratio of the sum of their norms

Returns
-------
optimizer_op : Optimizer Operation
The Optimizer Operation that applies the specific gradients

"""
# Compute gradients of trainable vars, clip them if too big
tvars = tf.trainable_variables()

return optimizer_op


### 3.6 CharRNN Class¶

• Because 'memory' is maintained in RNN, in contract to other deep learning models, we need to track initial state (ie state of RNN before timestep 1 of the entire training corpus) and be able to set it to 0 for every new epoch. Otherwise the RNN would retain memory from the end of training corpus (ie. have a memory of the future).
• Note the distinction of state versus output. State is the memory of the RNN, output at timestep is based on input and memory. Thus state is the 'long term memory' while output is the 'short term memory' of LSTM.
• tf.nn.dynamic_rnn unrolls the lstm stack across timesteps
In [17]:
class CharRNN:
"""A character based RNN using LSTM cells and gradient clipping.

Parameters
----------
num_classes : int
Number of prediction classes
batch_size : int
Number in each batch [default: 64]
timesteps : int
Number of timesteps within each sequence [default: 50]
lstm_cell_size : int
Number of neurons within each lstm cell [default: 128]
num_layers : int
Number of layers within each lstm cell [default: 2]
learning_rate : float
Optmizer learning rate [default: 0.001]
sampling : bool
1 timestep calculation at a time if True [default: False]

Attributes
----------
x : ndarray, int
Training data inputs
y : ndarray, int
Target data
batch_size : int
Number in each batch
timesteps : int
Number of timesteps within each sequence
init_state : MultiRNNCell
Initial state of the RNN stack - in the context of 'unrolled' RNN this
is the beginning of the first timestep, ie. state before anything has
been fed thru the network
shape (batch_size, lstm_cell_size)
final_state : MultiRNNCell
Final state of the RNN stack - in the context of the 'unrolled' RNN
this is the state at the end of the last timestep, ie. state after the
sequence has been feed thru the network
shape (batch_size, lstm_cell_size)
logits : tensor
Logit outputs
shape ((batch_size * timestep), out_size)
predictions : tensor
Softmax logits
shape ((batch_size * timestep), out_size)
batch_loss : float
loss for the batch
optimizer_op : Optimizer Operation
The Optimizer Operation that applies the specific gradients
"""
def __init__(self, num_classes, batch_size=64, timesteps=50, lstm_cell_size=128,
sampling=False):
if sampling:
self.batch_size, self.timesteps = 1, 1
else:
self.batch_size, self.timesteps = batch_size, timesteps

tf.reset_default_graph()

# Build inputs to TF graph
self.x, self.y, self.keep_prob = model_inputs(self.batch_size, self.timesteps)
x_one_hot = tf.one_hot(self.x, num_classes)
y_one_hot = tf.one_hot(self.y, num_classes)

# Build 1 lstm stack
lstm_stack, self.init_state = model_lstm_stack(lstm_cell_size,
num_layers,
self.batch_size,
self.keep_prob)
# Build RNN by unrolling the lstm stack to timesteps
rnn_outputs, self.final_state = tf.nn.dynamic_rnn(cell=lstm_stack,
inputs=x_one_hot,
initial_state=self.init_state)
# Apply dense layer to RNN outputs to get logits and softmax prediction
self.logits, self.predictions = model_output(rnn_outputs,
lstm_cell_size,
num_classes)
# Calculate losses and optimizer for batch
self.batch_loss = model_loss(self.logits, y_one_hot)
self.optimizer_op = model_optimizer(self.batch_loss,
learning_rate,


## Training Loop¶

• Train each batch to calculate final state, losses, and optimization
• Make sure to pass the final_state to the init_state at the beginning of each batch
In [18]:
def train(model, tokenized_data, epochs, keep_prob, batch_size, timesteps,
print_every_n, save_every_n):
"""Training loop.

Parameters
----------
model : CharRNN
CharRNN object initialized with model paramters
tokenized_data : TokenizedData
Tokenized data object
epochs : int
Number of epochs to train
keep_prob : float
Keep probability of layers in each lstm cell
batch_size : int
Batch size
timesteps : int
Number of timesteps for each sequence
print_every_n : int
Print every n batches
save_every_n : int
Save every n batches

Returns
-------
epoch_losses : list, float
List of epoch losses

"""
saver = tf.train.Saver(max_to_keep=20, save_relative_paths=True)
epoch_losses = []

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

for e in range(epochs):
# Initialize state of RNN to 0 at start of epoch
new_state = sess.run(model.init_state)
epoch_t1 = time.time()
epoch_loss = 0

# Progress bar
#             total_batches = int(len(tokenized_data.encoded_text) / (batch_size * timesteps))
#             pbar = tqdm_notebook(total=total_batches)

for x, y in tokenized_data.get_batches(batch_size, timesteps):
#                 pbar.update(1)
batch_t1 = time.time()

feed = {model.x: x,
model.y: y,
model.keep_prob: keep_prob,
model.init_state: new_state}
batch_loss, new_state, optimizer_op = sess.run([model.batch_loss,
model.final_state,
model.optimizer_op],
feed_dict=feed)
batch_t2 = time.time()
epoch_loss += batch_loss
epoch_t2 = time.time()
epoch_losses.append(epoch_loss)

# Print output
if (e % print_every_n == 0):
print('Epoch {}/{}'.format(e+1, epochs))
print('Epoch loss: {:.4f}'.format(epoch_loss))
print('Epoch time taken: {:.3f}'.format(epoch_t2 - epoch_t1))
print('Last batch loss: {:.4f}'.format(batch_loss))
print('Last batch time taken: {:.3f}'.format(batch_t2 - batch_t1))

# Save model weights to disk
if (e % save_every_n == 0):
saver.save(sess, 'checkpoints/e{}.ckpt'.format(e))

# Save model weight for the very last iteration
saver.save(sess, 'checkpoints/e{}.ckpt'.format(e))

return epoch_losses


## Main Loop¶

### Hyperparameters¶

• Some tips on hyperparameters https://github.com/karpathy/char-rnn#tips-and-tricks
• lstm cell size - # of hidden units
• number of hidden layers - 2 or 3
• timestep - this will govern how far the gradient propagates back, ie. patterns / relationships in 1 sequence
• Note I didn't do cross validation for this but approach would be the same, splitting the data where 95% of it is training, 5% for validation / test.
In [19]:
# GLOBAL VARIABLES
# Model hyperparameters
LSTM_CELL_SIZE = 512
NUM_LAYERS = 2
LEARNING_RATE = 0.001

# Training hyperparameters
EPOCHS = 25
KEEP_PROB = 0.5
BATCH_SIZE = 100
TIMESTEPS = 100
PRINT_EVERY_N = 1
SAVE_EVERY_N = 1

In [20]:
def main():
model = CharRNN(len(tokenized_data.t.word_index),
batch_size=BATCH_SIZE,
timesteps=TIMESTEPS,
lstm_cell_size=LSTM_CELL_SIZE,
num_layers=NUM_LAYERS,
learning_rate=LEARNING_RATE,
epoch_losses = train(model,
tokenized_data,
epochs=EPOCHS,
keep_prob=KEEP_PROB,
batch_size=BATCH_SIZE,
timesteps=TIMESTEPS,
print_every_n=PRINT_EVERY_N,
save_every_n=SAVE_EVERY_N)

plt.figure(figsize=(6,6))
plt.plot(epoch_losses)
plt.xlabel('Epochs')
plt.ylabel('Epoch losses')

main()

Epoch 1/25
Epoch loss: 538.2426
Epoch time taken: 60.179
Last batch loss: 2.3385
Last batch time taken: 0.303
Epoch 2/25
Epoch loss: 423.2642
Epoch time taken: 59.991
Last batch loss: 2.0537
Last batch time taken: 0.301
Epoch 3/25
Epoch loss: 377.1644
Epoch time taken: 59.983
Last batch loss: 1.8777
Last batch time taken: 0.297
Epoch 4/25
Epoch loss: 348.0678
Epoch time taken: 60.075
Last batch loss: 1.7771
Last batch time taken: 0.307
Epoch 5/25
Epoch loss: 326.4816
Epoch time taken: 59.961
Last batch loss: 1.6803
Last batch time taken: 0.304
Epoch 6/25
Epoch loss: 310.3031
Epoch time taken: 59.904
Last batch loss: 1.6227
Last batch time taken: 0.305
Epoch 7/25
Epoch loss: 297.5667
Epoch time taken: 60.228
Last batch loss: 1.5557
Last batch time taken: 0.305
Epoch 8/25
Epoch loss: 287.5213
Epoch time taken: 60.257
Last batch loss: 1.5227
Last batch time taken: 0.302
Epoch 9/25
Epoch loss: 277.3047
Epoch time taken: 60.095
Last batch loss: 1.4846
Last batch time taken: 0.297
Epoch 10/25
Epoch loss: 271.0766
Epoch time taken: 60.299
Last batch loss: 1.4610
Last batch time taken: 0.305
Epoch 11/25
Epoch loss: 263.7930
Epoch time taken: 60.388
Last batch loss: 1.4219
Last batch time taken: 0.302
Epoch 12/25
Epoch loss: 258.3275
Epoch time taken: 61.221
Last batch loss: 1.3926
Last batch time taken: 0.306
Epoch 13/25
Epoch loss: 253.9536
Epoch time taken: 61.370
Last batch loss: 1.3828
Last batch time taken: 0.304
Epoch 14/25
Epoch loss: 250.0217
Epoch time taken: 61.558
Last batch loss: 1.3729
Last batch time taken: 0.336
Epoch 15/25
Epoch loss: 246.6374
Epoch time taken: 60.847
Last batch loss: 1.3408
Last batch time taken: 0.304
Epoch 16/25
Epoch loss: 243.6650
Epoch time taken: 61.318
Last batch loss: 1.3302
Last batch time taken: 0.306
Epoch 17/25
Epoch loss: 240.9595
Epoch time taken: 60.983
Last batch loss: 1.3247
Last batch time taken: 0.306
Epoch 18/25
Epoch loss: 238.3090
Epoch time taken: 61.152
Last batch loss: 1.3060
Last batch time taken: 0.307
Epoch 19/25
Epoch loss: 236.3545
Epoch time taken: 60.511
Last batch loss: 1.2986
Last batch time taken: 0.299
Epoch 20/25
Epoch loss: 233.8637
Epoch time taken: 60.366
Last batch loss: 1.2872
Last batch time taken: 0.304
Epoch 21/25
Epoch loss: 231.9072
Epoch time taken: 60.312
Last batch loss: 1.2798
Last batch time taken: 0.305
Epoch 22/25
Epoch loss: 230.0432
Epoch time taken: 60.494
Last batch loss: 1.2702
Last batch time taken: 0.303
Epoch 23/25
Epoch loss: 228.2421
Epoch time taken: 60.489
Last batch loss: 1.2690
Last batch time taken: 0.309
Epoch 24/25
Epoch loss: 226.5291
Epoch time taken: 60.308
Last batch loss: 1.2467
Last batch time taken: 0.299
Epoch 25/25
Epoch loss: 225.4623
Epoch time taken: 60.597
Last batch loss: 1.2338
Last batch time taken: 0.305


## Predicting Characters and Generating Novel Text¶

• Now that the RNN is trained, we can use it to predict the next character given a string of previous characters.
• Note that if we take the prediction character as input for the next timestep, we've now generated new text of some arbitrary length that we've specified!
• Temperature is a hyperparameter dial that can tune more conservative predictions versus more diverse but higher error
In [21]:
def pick_char(prediction, top_n):
"""Sample from the top_n characters probabilistically.

Parameters
----------
prediction : array, float
Prediction probabilities for each char class
top_n : int
Number of top candidates to sample from

Returns
-------
prediction_index : int
Index of the character selected
"""
prediction = np.squeeze(prediction)
# Set all the classes outside top_n to 0 probability
prediction[np.argsort(prediction)[:-top_n]] = 0
# Normalize sum probability to 1
prediction = prediction / np.sum(prediction)
# Randomly select but based on probabilities of likelihood
prediction_index = np.random.choice(len(prediction), 1, p=prediction)[0]

return prediction_index

In [22]:
def infer(tokenized_data, checkpoint, model, n_samples, text_seed, top_n):
"""Generate new text based on text seed and checkpoint weights.

Parameters
----------
tokenized_data : TokenizedData
Pre-processed data
checkpoint : Checkpoints
Checkpoint file [default: None]
model : CharRNN
RNN model
n_samples : int
Number of characters to generate
text_seed : str
Initial string to prime the RNN state
top_n : int
Number of top candidate characters to sample from

Returns
-------
generated_text : str
Predicted text string
"""
generated_text = text_seed
generated_seq = np.squeeze(tokenized_data.t.texts_to_sequences(text_seed))

saver = tf.train.Saver()

with tf.Session() as sess:
saver.restore(sess, checkpoint)
new_state = sess.run(model.init_state)

# Prime the RNN state with initial text seed
for i, _ in enumerate(generated_text[:-1]):
x = np.zeros((1, 1))
x[0,0] = generated_seq[i]
feed = {model.x: x,
model.keep_prob: 1.,
model.init_state: new_state}
_, new_state = sess.run([model.predictions,
model.final_state],
feed_dict=feed)

# Start generating predictions after RNN state has been primed
for _ in range(n_samples):
x = np.zeros((1, 1))
x[0,0] = generated_seq[-1]
feed = {model.x: x,
model.keep_prob: 1.,
model.init_state: new_state}
prediction, new_state = sess.run([model.predictions,
model.final_state],
feed_dict=feed)
# Pick character and append
predicted_char_index = pick_char(prediction, top_n)
predicted_char = tokenized_data.sequence_to_str([predicted_char_index])[0]
generated_text = generated_text + str(predicted_char)
generated_seq = np.append(generated_seq, predicted_char_index)

return generated_text

In [23]:
def generate_text(tokenized_data, n_samples, text_seed, top_n, checkpoint=None):
"""Use trained model to generate new text based on an initial string.

Parameters
----------
tokenized_data : TokenizedData
Pre-processed data
n_samples : int
Number of characters to generate
text_seed : str
Initial string to prime the RNN state
top_n : int
Number of top candidate characters to sample from
checkpoint : Checkpoints
Checkpoint file [default: None]
"""

infer_model = CharRNN(len(tokenized_data.t.word_index),
lstm_cell_size=LSTM_CELL_SIZE,
num_layers=NUM_LAYERS,
sampling=True)

if checkpoint is None:
checkpoint = tf.train.latest_checkpoint('checkpoints')

print(infer(tokenized_data, checkpoint, infer_model, n_samples, text_seed, top_n))


### Final Model - Pick from top 5 most likely characters¶

• With the same trained model and text string seed, there still seems to be a fair amount of variation due to probabilistic selection of the top 5 characters.
In [24]:
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5)

INFO:tensorflow:Restoring parameters from checkpoints/e24.ckpt
She would not have
coming in and so that they had seat to him then, between Anna
and he was already, she should show that his brother was satisfied with

"The preter story a story of the while to my heart, as I can't give my since
the seads of a misence about the children were not telling him of the
most instant what all man, why. It has are stopped, both is that I should
speak off; but I should be sent and so well, and what's if you have been
assidumed it and already at once it will be attached," said Vronsky, "I
have announced for them," said the sacrate and sharing suffering.

"I'll go and shall be some sort feeling at the carpecial, by her clearness
and ashem of that here we should shy live fin that will astitious over to to
her husband. Have your hands with the presents of material action?'

Chapter 4

Peter Surdons, the meaning of his words and all the strices he was
their, simply so all the things that to say,

In [25]:
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5)

INFO:tensorflow:Restoring parameters from checkpoints/e24.ckpt
been an iresting important above the sacrament the strange that
churged by the party, which they had not taken to stait to have so standing
and trinking all and sorred to ask her thinking."

"What? Then I don't know her to grasp the little old position."

"Why, was it that?" asked Kitty, thinking he was tea of her heart; and
he shaked her hair, and they all said: "Well, train then you suppose he's
to be teaching it, trancaition for those people. I suppose," he asked
words that they had stretched his way at the correct of the brilliant
worked, when he could never told her he could not see that he would be it
asked and thought of another couldn't say wanted to allow. The country
was not asked. And she stopped with harmly could be standing at their
character. All she would not come about that this, and this they all
waided, take to me that he wanted to say a sort; there was that she
meant towards the carriage.

His first distinctly answered he was the silence, and the plung had,


### Final Model - Pick from top 1 most likely characters¶

• Not enough randomness - somehow gets into some determinate pattern?
In [26]:
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=1)

INFO:tensorflow:Restoring parameters from checkpoints/e24.ckpt
She was a strange that had been all the same thing to her husband and her
husband and her husband and her husband and her husband was a strange that
had been said that he was a strange thing that he had been at the same
time that he was a strange thing that he had been and he was a strange
thing that he had not seen her husband, and the same thing was the same
thing to her husband and her husband and her husband and her husband's
hands and the same thing was the same thing that he had been at the
same time that he was a strange thing that he had been and the same
thing to her husband and her husband and her husband and her husband's
hands and the same thing was the same thing that he had been at the
same time that he was a strange thing that he had been and the same
thing to her husband and her husband and her husband and her husband's
hands and the same thing was the same thing that he had been at the
same time that he was a strange thing that he had been and the same
thing to her husband


### Final Model - Pick from top 2 most likely characters¶

• Top 2 seemed better but still lots of repeated words
In [27]:
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=2)

INFO:tensorflow:Restoring parameters from checkpoints/e24.ckpt
She was a long while and to see him, and he had no serving, and to the
same thing was to be attracting herself that the children had no seen
from their son, and the province and the same serving and heart that
he was all that he had been to be at a stand of anything, but to see his
heart with his wife, which he had not told him to the strange that he was
staying and starting at him and the same sort of his son when he had
said that he had been angrily to her thoughts, which he had not told her
her face, and her hand with his heart and the same, and there was not a stand
to the state of the same strain of an intimacy, and her sisting of
his hands, with the state of the strange on the state, with a chief strain
of her soul with a smile, and straight at the stands and the conversation
that he had been and that the children had not been and his soul any
of the consequence of the station was a sense of his heart, as he had
said to herself, and he was so standing at the same time there was a
sen


### Epoch 0, 5, 10 Model - Pick from top 5 most likely characters¶

• Taking a look at how it's learning over the epochs
In [28]:
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5,
checkpoint='./checkpoints/e5.ckpt')

INFO:tensorflow:Restoring parameters from ./checkpoints/e5.ckpt
both some
to did not call at the masters, bush though the counts of the same in a croan
tall his helpess, and the many as a cread interriagely to him thried in what said and thoughing the provent would be too and that it was have and all this were and
staking her hought to hour only take in shorl to anyway.

The sermed as to be to the contress of milent, but his his ofters and himself the potrer and the
bealond time of the plass of whom
his begard anyand woman to say a sitter into the departs of with has beginning. He desared off the
mors shooks, husband
that showing the plincess with so so treing, and was that the somether was through an him, and her husbouss with
surdly anywhicels of the mildran, and how thing her has tall the dore. Bet with the process and and his face, and always
he were to happored to to be to home to
angar astation. He did netr that the
cold, to happen though she was say all
her. She that

In [29]:
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5,
checkpoint='./checkpoints/e10.ckpt')

INFO:tensorflow:Restoring parameters from ./checkpoints/e10.ckpt
the sone, with the play of the soried somity to she had been all the
truck and with the morning. "Alexey Alexandrovitch, I should seaking
a standing free this about it all?.."

"No, I'm an idea, and should be the possible what has teartion it
wishing him in."

same to to seemente to her so and the princess of the sacrast of
the position.

"If it! All, the sens of attering, as they house," he said in what he saw
what he had said to this same a strung, but shaking once the that which
she was not something when which had not talked it all of his hands to him
thoughts the creater and with the proness, and with the manshart and
helppended with the same sound to him, had been stonly were tolling
about her to him, and with the secon all this with almost watch androom,
he was the fairs, that seemed of a sort of whether the man had and was
consincing the call of all her seat in the coller waith at
the conversation

In [30]:
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5,
checkpoint='./checkpoints/e15.ckpt')

INFO:tensorflow:Restoring parameters from ./checkpoints/e15.ckpt
She was and thanking of was that way they had bared thooget, and those
pretending and their sunders were trying on him, and when he had seen
some some of an inclight of still outside this and would be to go
at the clooks at her healthy. "I am get to the fartherion," he said, with
a smile. "And I don't want to say that that's not seeing a some
seen on this about to say a love what have been answer. But it must
be so a single telegram."

There was a soul of the meaning to the mother, though that seemed in
the doorway in the clancest with the plain with hards weet as he was
continually from his hand, without her, his face, and as all she was a same.
Taking that herself, she had succored to be delivered, taken on this
was her sounds. She was so telling the princess to see, and there was
nothing and any misery, that they were thinking of all to the
driving room to take of the country to take that the cretty of
tears was a gentleman, as his bather were his face she was and how he

In [31]:
generate_text(tokenized_data, n_samples=1000, text_seed='She ', top_n=5,
checkpoint='./checkpoints/e20.ckpt')

INFO:tensorflow:Restoring parameters from ./checkpoints/e20.ckpt
She would
be stating on the painting, and staying a mere, and so the tone of
the way of his birit, and was serious fearful. He did not care to get
away. And there was no striking of himself will have need of the
same to show the marshal were never been in his soul, and a conversation.
They set to be coming up for a love of what had so married in them,
and she was as it was about his face and the careful force. They were
saying of all the country and the money in her force to her.

He had not conscious of the same thing. Two change to the provises of
his complete winds of the memory of that service--which had not called
in the present of the treater. But, at the morning, and his
stream, well sat down, an attempt of her forcestable
position, a desire all the same and that in a letter the peasants was
tried to gave it, and the soulder of their characteristic to be
thought, but was that he had been saying at the man who sent at the
side of the side. The sacress had been saying, but washed off

In [ ]: