Autoencoder For Image Reconstruction is a deep learning model that reconstructs the input image from its own reconstruction. The model has trained on millions of images and performed millions of iterations to reconstruct the original image as closely as possible. The autoencoder model takes in an image and reconstructs it in 1×1 size, which is probably not what you want out of it so you need to pass 2-dimensional images before the model can output any results
Autoencoder architectures are used in image reconstruction and compression. The typical autoencoder consists of several layers of neurons (input, hidden, output) between which there are connections (called synapses). These connections carry information from earlier to later layers. The most common type of autoencoder is a feedforward neural network in which the input and output layers have one neuron each but there are intermediate layers with many neurons. Each neuron computes the dot product of its inputs with its weight vectors. Weights can be learned along with the biases. Autoencoders often use sigmoid nonlinearities instead of the usual hyperbolic tangent or tanh that we have seen so far in this book.
The Autoencoder is a data compression algorithm which produces an approximation of the original data. It is able to create an approximate version of the original data using fewer dimensions than in the original dataset, and hence is useful when you have limited resources (e.g., memory) available to work with your dataset.
Autoencoder For Image Reconstruction

Nowadays, we have huge amounts of data in almost every application we use – listening to music on Spotify, browsing friend’s images on Instagram, or maybe watching an new trailer on YouTube. There is always data being transmitted from the servers to you.
This wouldn’t be a problem for a single user. But imagine handling thousands, if not millions, of requests with large data at the same time. These streams of data have to be reduced somehow in order for us to be physically able to provide them to users – this is where data compression kicks in.
There’re lots of compression techniques, and they vary in their usage and compatibility. For example some compression techniques only work on audio files, like the famous MPEG-2 Audio Layer III (MP3) codec.
There are two main types of compression:
Lossless: Data integrity and accuracy is preferred, even if we don’t “shave off” much
Lossy: Data integrity and accuracy isn’t as important as how fast we can serve it – imagine a real-time video transfer, where it’s more important to be “live” than to have high quality video
For example, using Autoencoders, we’re able to decompose this image and represent it as the 32-vector code below. Using it, we can reconstruct the image. Of course, this is an example of lossy compression, as we’ve lost quite a bit of info.

Though, we can use the exact same technique to do this much more accurately, by allocating more space for the representation:

What are Autoencoders?
An autoencoder is, by definition, a technique to encode something automatically. By using a neural network, the autoencoder is able to learn how to decompose data (in our case, images) into fairly small bits of data, and then using that representation, reconstruct the original data as closely as it can to the original.
There are two key components in this task:
Encoder: Learns how to compress the original input into a small encoding
Decoder: Learns how to restore the original data from that encoding generated by the Encoder
These two are trained together in symbiosis to obtain the most efficient representation of the data that we can reconstruct the original data from, without losing so much of it.

Credit: ResearchGate
Encoder
The Encoder is tasked with finding the smallest possible representation of data that it can store – extracting the most prominent features of the original data and representing it in a way the decoder can understand.
Think of it as if you are trying to memorize something, like for example memorizing a large number – you try to find a pattern in it that you can memorize and restore the whole sequence from that pattern, as it will be easy to remember shorter pattern than the whole number.
Encoders in their simplest form are simple Artificial Neural Networks (ANNs). Though, there are certain encoders that utilize Convolutional Neural Networks (CNNs), which is a very specific type of ANN.
The encoder takes the input data and generates an encoded version of it – the compressed data. We can then use that compressed data to send it to the user, where it will be decoded and reconstructed. Let’s take a look at the encoding for a LFW dataset example:

The encoding here doesn’t make much sense for us, but it’s plenty enough for the decoder. Now, it’s valid to raise the question:
“But how did the encoder learn to compress images like this?
This is where the symbiosis during training comes into play.
Decoder
The Decoder works in a similar way to the encoder, but the other way around. It learns to read, instead of generate, these compressed code representations and generate images based on that info. It aims to minimize the loss while reconstructing, obviously.
The output is evaluated by comparing the reconstructed image by the original one, using a Mean Square Error (MSE) – the more similar it is to the original, the smaller the error.
At this point, we propagate backwards and update all the parameters from the decoder to the encoder. Therefore, based on the differences between the input and output images, both the decoder and encoder get evaluated at their jobs and update their parameters to become better.
Building an Autoencoder
Keras is a Python framework that makes building neural networks simpler. It allows us to stack layers of different types to create a deep neural network – which we will do to build an autoencoder.
First, let’s install Keras using pip:
$ pip install keras
Preprocessing Data
Again, we’ll be using the LFW dataset. As usual, with projects like these, we’ll preprocess the data to make it easier for our autoencoder to do its job.
For this, we’ll first define a couple of paths which lead to the dataset we’re using:
# http://www.cs.columbia.edu/CAVE/databases/pubfig/download/lfw_attributes.txt ATTRS_NAME = "lfw_attributes.txt" # http://vis-www.cs.umass.edu/lfw/lfw-deepfunneled.tgz IMAGES_NAME = "lfw-deepfunneled.tgz" # http://vis-www.cs.umass.edu/lfw/lfw.tgz RAW_IMAGES_NAME = "lfw.tgz"
Then, we’ll employ two functions – one to convert the raw matrix into an image and change the color system to RGB:
def decode_image_from_raw_bytes(raw_bytes): img = cv2.imdecode(np.asarray(bytearray(raw_bytes), dtype=np.uint8), 1) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) return img
And the other one to actually load the dataset and adapt it to our needs:
def load_lfw_dataset( use_raw=False, dx=80, dy=80, dimx=45, dimy=45): # Read attrs df_attrs = pd.read_csv(ATTRS_NAME, sep='\t', skiprows=1) df_attrs = pd.DataFrame(df_attrs.iloc[:, :-1].values, columns=df_attrs.columns[1:]) imgs_with_attrs = set(map(tuple, df_attrs[["person", "imagenum"]].values)) # Read photos all_photos = [] photo_ids = [] # tqdm in used to show progress bar while reading the data in a notebook here, you can change # tqdm_notebook to use it outside a notebook with tarfile.open(RAW_IMAGES_NAME if use_raw else IMAGES_NAME) as f: for m in tqdm.tqdm_notebook(f.getmembers()): # Only process image files from the compressed data if m.isfile() and m.name.endswith(".jpg"): # Prepare image img = decode_image_from_raw_bytes(f.extractfile(m).read()) # Crop only faces and resize it img = img[dy:-dy, dx:-dx] img = cv2.resize(img, (dimx, dimy)) # Parse person and append it to the collected data fname = os.path.split(m.name)[-1] fname_splitted = fname[:-4].replace('_', ' ').split() person_id = ' '.join(fname_splitted[:-1]) photo_number = int(fname_splitted[-1]) if (person_id, photo_number) in imgs_with_attrs: all_photos.append(img) photo_ids.append({'person': person_id, 'imagenum': photo_number}) photo_ids = pd.DataFrame(photo_ids) all_photos = np.stack(all_photos).astype('uint8') # Preserve photo_ids order! all_attrs = photo_ids.merge(df_attrs, on=('person', 'imagenum')).drop(["person", "imagenum"], axis=1) return all_photos, all_attrs
Implementing the Autoencoder
import numpy as np X, attr = load_lfw_dataset(use_raw=True, dimx=32, dimy=32)
Our data is in the
X
matrix, in the form of a 3D matrix, which is the default representation for RGB images. By providing three matrices – red, green, and blue, the combination of these three generate the image color.These images will have large values for each pixel, ranging from 0 to 255. Generally in machine learning we tend to make values small, and centered around 0, as this helps our model train faster and get better results, so let’s normalize our images:
X = X.astype('float32') / 255.0 - 0.5
By now if we test the
X
array for the min and max it will be -.5
and .5
, which you can verify:print(X.max(), X.min())
0.5 -0.5
To be able to see the image, let’s create a
show_image
function. It will add 0.5
to the images as the pixel value can’t be negative:import matplotlib.pyplot as plt def show_image(x): plt.imshow(np.clip(x + 0.5, 0, 1))
Now let’s take a quick look at our data:
show_image(X[6])

Great, now let’s split our data into a training and test set:
from sklearn.model_selection import train_test_split X_train, X_test = train_test_split(X, test_size=0.1, random_state=42)
The sklearn
train_test_split()
function is able to split the data by giving it the test ratio and the rest is, of course, the training size. The random_state
, which you are going to see a lot in machine learning, is used to produce the same results no matter how many times you run the code.Now time for the model:
from keras.layers import Dense, Flatten, Reshape, Input, InputLayer from keras.models import Sequential, Model def build_autoencoder(img_shape, code_size): # The encoder encoder = Sequential() encoder.add(InputLayer(img_shape)) encoder.add(Flatten()) encoder.add(Dense(code_size)) # The decoder decoder = Sequential() decoder.add(InputLayer((code_size,))) decoder.add(Dense(np.prod(img_shape))) # np.prod(img_shape) is the same as 32*32*3, it's more generic than saying 3072 decoder.add(Reshape(img_shape)) return encoder, decoder
This function takes an
image_shape
(image dimensions) and code_size
(the size of the output representation) as parameters. The image shape, in our case, will be (32, 32, 3)
where 32
represent the width and height, and 3
represents the color channel matrices. That being said, our image has 3072
dimensions.Logically, the smaller the
code_size
is, the more the image will compress, but the less features will be saved and the reproduced image will be that much more different from the original.A Keras sequential model is basically used to sequentially add layers and deepen our network. Each layer feeds into the next one, and here, we’re simply starting off with the
InputLayer
(a placeholder for the input) with the size of the input vector – image_shape
.