Neural style transferĀ¶
Objective: Implement a Neural Style Transfer (NST) model to generate novel artistic images.
NST is one of the most fun and interesting optimization techniques in deep learning. It merges two images, namely: a "content" image (C) and a "style" image (S), to create a "generated" image (G). The generated image G combines the "content" of the image C with the "style" of image S.
Import librariesĀ¶
from keras import applications, optimizers, Model
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import tensorflow as tf
import numpy as np
from IPython.display import HTML
from tqdm import tqdm
2024-07-30 02:25:46.260323: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-07-30 02:25:46.260351: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-07-30 02:25:46.261693: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-07-30 02:25:46.266230: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Visualize the content and style imagesĀ¶
%%bash
wget -q https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg -O ./content.jpg
wget -q https://storage.googleapis.com/download.tensorflow.org/example_images/Vassily_Kandinsky%2C_1913_-_Composition_7.jpg -O ./style.jpg
fig, axs = plt.subplots(1, 2, figsize=(10, 15))
content_image = Image.open("./content.jpg").resize((400, 400))
style_image = Image.open("./style.jpg").resize((400, 400))
axs[0].imshow(content_image)
axs[0].axis("off")
axs[0].set_title("Content image")
axs[1].imshow(style_image)
axs[1].axis("off")
axs[1].set_title("Style image")
plt.show()
Load the modelĀ¶
We'll use transfer learning to load a previously trained convolutional network, and builds on top of that. We'll use the Visual Geometry Group (VGG) network from the NST paper published by the Visual Geometry Group at University of Oxford in 2014. We'll use VGG-19, a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and has learned to recognize a variety of low level features such as edges and simple textures (at the shallower layers) and high level features such as more complex textures and object classes (at the deeper layers).
%%bash
gdown -q https://drive.google.com/uc?id=1ysYUNOHkg0NmITtBefG1vzjMbUyzP_ta
gunzip ./vgg19_weights_no-top.h5.gz
model = applications.VGG19(include_top=False, input_shape=(400, 400, 3), weights='./vgg19_weights_no-top.h5')
model.trainable = False
model.summary()
2024-07-30 02:26:04.507707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2191 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5
Model: "vgg19"
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā³āāāāāāāāāāāāāāāāāāāāāāāāā³āāāāāāāāāāāāāāāā ā Layer (type) ā Output Shape ā Param # ā ā”āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā© ā input_layer (InputLayer) ā (None, 400, 400, 3) ā 0 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block1_conv1 (Conv2D) ā (None, 400, 400, 64) ā 1,792 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block1_conv2 (Conv2D) ā (None, 400, 400, 64) ā 36,928 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block1_pool (MaxPooling2D) ā (None, 200, 200, 64) ā 0 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block2_conv1 (Conv2D) ā (None, 200, 200, 128) ā 73,856 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block2_conv2 (Conv2D) ā (None, 200, 200, 128) ā 147,584 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block2_pool (MaxPooling2D) ā (None, 100, 100, 128) ā 0 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block3_conv1 (Conv2D) ā (None, 100, 100, 256) ā 295,168 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block3_conv2 (Conv2D) ā (None, 100, 100, 256) ā 590,080 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block3_conv3 (Conv2D) ā (None, 100, 100, 256) ā 590,080 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block3_conv4 (Conv2D) ā (None, 100, 100, 256) ā 590,080 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block3_pool (MaxPooling2D) ā (None, 50, 50, 256) ā 0 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block4_conv1 (Conv2D) ā (None, 50, 50, 512) ā 1,180,160 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block4_conv2 (Conv2D) ā (None, 50, 50, 512) ā 2,359,808 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block4_conv3 (Conv2D) ā (None, 50, 50, 512) ā 2,359,808 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block4_conv4 (Conv2D) ā (None, 50, 50, 512) ā 2,359,808 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block4_pool (MaxPooling2D) ā (None, 25, 25, 512) ā 0 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block5_conv1 (Conv2D) ā (None, 25, 25, 512) ā 2,359,808 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block5_conv2 (Conv2D) ā (None, 25, 25, 512) ā 2,359,808 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block5_conv3 (Conv2D) ā (None, 25, 25, 512) ā 2,359,808 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block5_conv4 (Conv2D) ā (None, 25, 25, 512) ā 2,359,808 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤ ā block5_pool (MaxPooling2D) ā (None, 12, 12, 512) ā 0 ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāā
Total params: 20,024,384 (76.39 MB)
Trainable params: 0 (0.00 B)
Non-trainable params: 20,024,384 (76.39 MB)
Compute the content costĀ¶
One goal to achieve when performing NST is for the content in generated image G to match the content of image C. A method to achieve this is to calculate the content cost function.
def compute_content_cost(content_output, generated_output):
"""
Computes the content cost
Arguments:
a_C -- Tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C
a_G -- Tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G
Returns:
J_content -- Scalar content cost
"""
a_C = content_output[-1]
a_G = generated_output[-1]
# Retrieve dimensions from a_G
m, n_H, n_W, n_C = a_G.get_shape().as_list()
# Reshape a_C and a_G
a_C_unrolled = tf.transpose(tf.reshape(a_C, shape=[m, -1, n_C]), perm=[0, 2, 1])
a_G_unrolled = tf.transpose(tf.reshape(a_G, shape=[m, -1, n_C]), perm=[0, 2, 1])
# Compute the cost
J_content = tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled, a_G_unrolled))) / (4 * n_H * n_W * n_C)
return J_content
Compute the style costĀ¶
The goal is to minimize the distance between the style matrix of the "style" image S and the sytle matrix of the "generated" image G. The style matrix is also called a Gram matrix.
def gram_matrix(A):
"""
Argument:
A -- Matrix of shape (n_C, n_H*n_W)
Returns:
GA -- Gram matrix of A, of shape (n_C, n_C)
"""
GA = tf.matmul(A, A, transpose_b=True)
return GA
def compute_layer_style_cost(a_S, a_G):
"""
Arguments:
a_S -- Tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S
a_G -- Tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G
Returns:
J_style_layer -- Tensor representing a scalar value, style cost
"""
# Retrieve dimensions from a_G
m, n_H, n_W, n_C = a_G.get_shape().as_list()
# Reshape the images to have them of shape (n_C, n_H*n_W)
a_S = tf.transpose(tf.reshape(a_S, shape=[-1, n_C]))
a_G = tf.transpose(tf.reshape(a_G, shape=[-1, n_C]))
# Computing gram_matrices for both images S and G
GS = gram_matrix(a_S)
GG = gram_matrix(a_G)
# Computing the loss
J_style_layer = tf.reduce_sum((GS - GG)**2) / (2 * n_C * n_H * n_W)**2
return J_style_layer
We'll get better results if we "merge" style costs from several different layers, we'll choose layers to represent the style of the image and assign style costs. Each layer will be given weights that reflect how much each layer will contribute to the style.
How to choose the coefficients for each layer? The deeper layers capture higher-level concepts, and the features in the deeper layers are less localized in the image relative to each other. So if we want the generated image to softly follow the style image, try choosing larger weights for deeper layers and smaller weights for the first layers. In contrast, if we want the generated image to strongly follow the style image, try choosing smaller weights for deeper layers and larger weights for the first layers.
STYLE_LAYERS = [('block1_conv1', 0.2),
('block2_conv1', 0.2),
('block3_conv1', 0.2),
('block4_conv1', 0.2),
('block5_conv1', 0.2)]
def compute_style_cost(style_image_output, generated_image_output, STYLE_LAYERS=STYLE_LAYERS):
"""
Computes the overall style cost from several chosen layers
Arguments:
style_image_output --
generated_image_output --
STYLE_LAYERS -- A python list containing:
- The names of the layers we would like to extract style from
- A coefficient for each of them
Returns:
J_style -- Tensor representing a scalar value, style cost
"""
# Initialize the overall style cost
J_style = 0
# Set a_S to be the hidden layer activation from the layer we have selected.
# The first element of the array contains the input layer image, which must not to be used.
a_S = style_image_output[1:]
# Set a_G to be the output of the choosen hidden layers.
# The First element of the list contains the input layer image which must not to be used.
a_G = generated_image_output[1:]
for i, weight in zip(range(len(a_S)), STYLE_LAYERS):
# Compute style_cost for the current layer
J_style_layer = compute_layer_style_cost(a_S[i], a_G[i])
# Add weight * J_style_layer of this layer to overall style cost
J_style += weight[1] * J_style_layer
return J_style
Compute the total costĀ¶
@tf.function()
def total_cost(J_content, J_style, alpha=10, beta=40):
"""
Computes the total cost function
Arguments:
J_content -- Content cost
J_style -- Style cost
alpha -- Hyperparameter weighting the importance of the content cost
beta -- Hyperparameter weighting the importance of the style cost
Returns:
J -- Total cost
"""
J = alpha * J_content + beta * J_style
return J
Train the ModelĀ¶
Load the content and style images.
content_image = np.array(content_image)
content_image = tf.constant(np.reshape(content_image, ((1,) + content_image.shape)))
style_image = np.array(style_image)
style_image = tf.constant(np.reshape(style_image, ((1,) + style_image.shape)))
Initialize the "generated" image as a noisy image created from the content image. By initializing the pixels of the generated image to be mostly noise but slightly correlated with the content image, this will help the content of the "generated" image more rapidly match the content of the "content" image.
generated_image = tf.Variable(tf.image.convert_image_dtype(content_image, tf.float32))
noise = tf.random.uniform(tf.shape(generated_image), 0, 0.1)
generated_image = tf.add(generated_image, noise)
generated_image = tf.clip_by_value(generated_image, clip_value_min=0.0, clip_value_max=1.0)
generated_image = tf.Variable(tf.image.convert_image_dtype(generated_image, tf.float32))
Define a function which loads the VGG19 model and returns a list of the outputs for the middle layers.
def get_layer_outputs(vgg, layer_names):
""" Creates a vgg model that returns a list of intermediate output values."""
outputs = [vgg.get_layer(layer[0]).output for layer in layer_names]
model = Model([vgg.input], outputs)
return model
Define the content layer and build the model.
content_layer = [('block5_conv4', 1)]
model_outputs = get_layer_outputs(model, STYLE_LAYERS + content_layer)
Assign the content and style images to be the input of the VGG model.
preprocessed_style = tf.Variable(tf.image.convert_image_dtype(style_image, tf.float32))
a_S = model_outputs(preprocessed_style)
preprocessed_content = tf.Variable(tf.image.convert_image_dtype(content_image, tf.float32))
a_C = model_outputs(preprocessed_content)
2024-07-30 02:26:05.735669: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8907 2024-07-30 02:26:06.270023: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-07-30 02:26:06.523002: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-07-30 02:26:06.793620: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-07-30 02:26:07.066549: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
optimizer = optimizers.Adam(learning_rate=0.001)
@tf.function()
def train_step(generated_image):
with tf.GradientTape() as tape:
# Compute a_G as the model_outputs for the current generated image
a_G = model_outputs(generated_image)
# Compute the style cost
J_style = compute_style_cost(a_S, a_G)
# Compute the content cost
J_content = compute_content_cost(a_C, a_G)
# Compute the total cost
J = total_cost(J_content, J_style, 10, 40)
grad = tape.gradient(J, generated_image)
optimizer.apply_gradients([(grad, generated_image)])
generated_image.assign(tf.clip_by_value(generated_image, clip_value_min=0.0, clip_value_max=1.0))
return J
Show the generated image at some epochsĀ¶
Function to display the images generated by the style transfer model.
def tensor_to_image(tensor):
"""
Converts the given tensor into a PIL image
Arguments:
tensor -- Tensor
Returns:
Image: A PIL image
"""
tensor = tensor * 255
tensor = np.array(tensor, dtype=np.uint8)
if np.ndim(tensor) > 3:
assert tensor.shape[0] == 1
tensor = tensor[0]
return Image.fromarray(tensor)
epochs = 10001
frames = []
fig, ax = plt.subplots()
for i in tqdm(range(epochs)):
train_step(generated_image)
if i % 100 == 0:
image = tensor_to_image(generated_image)
plt.tight_layout()
ax.axis("off")
text = ax.text(0, -10, "Epoch = {}".format(i), fontdict={"fontsize": "large"}, animated=True)
frame = ax.imshow(image, animated=True)
frames.append([frame, text])
anim = animation.ArtistAnimation(fig, frames, blit=True, repeat_delay=1000)
plt.close(fig)
HTML(anim.to_html5_video())
0%| | 0/10001 [00:00<?, ?it/s]2024-07-30 02:26:09.229240: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-07-30 02:26:09.229300: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-07-30 02:26:09.477136: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-07-30 02:26:09.477194: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-07-30 02:26:09.705638: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.11GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 2024-07-30 02:26:09.705696: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.11GiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available. 100%|āāāāāāāāāā| 10001/10001 [20:45<00:00, 8.03it/s]
!rm ./vgg19_weights_no-top.h5 ./style.jpg ./content.jpg