a Non Local Algorithm For Image Denoising

This paper is devoted to the problem of image denoising by nonlocal subspace learning. The presented algorithm, based on the minimization of a regularized trace functional and the Hamiltonian Monte Carlo method, yields to a computationally efficient treatment of outliers in high-dimensional data sets. In particular, under a certain assumption on the underlying distribution of samples, it is proved that the proposed algorithm with initialization does not converge to a local minimum (e.g., Gaussian noise or skylines).

This post is essentially about how we see an image. While it may seem a trivial concept, the process of image denoising is not well-understood at its most basic level. This short piece attempts to unpack some of the lesser-understood aspects of human vision, and their implications for the world of machine vision. The goal is to help provide some context for modern approaches to image denoising used in computer vision and graphics applications.

Maximum a-posteriori (MAP) denoising is the task of removing additive white Gaussian noise from a given natural image while preserving visually meaningful structure where possible. For many years, this problem has garnered attention in the computer vision community, famously leading to the development of mean shift and its many variants. In recent work, we have demonstrated that under certain assumptions about the data, this problem can be cast into an image corrupted by non-Gaussian Poisson noise. Although MAP denoising for Poisson-noise images is classical in nature (and largely equivalent to MAP denoising for Gaussian noise), numerical computation remains challenging due to the fact that these models are computationally expensive on sparse graphs. To this end, we introduce two simple Markov Chain Monte Carlo (MCMC) algorithms — one deterministic and one stochastic — for solving this problem. Both of these algorithms leverage: (1) our novel non-local kernel deformation to preserve self-similarities, which minimizes Poisson noise; and (2) a locally linear smoothing prior to enforce smoothness in regions inferred by the kernel deformation. We demonstrate empirically that our stochastic algorithm is at least as good as existing.

A Non Local Algorithm For Image Denoising

There are hundreds of tutorials on the web which walk you through using Keras for your image segmentation tasks. These are extremely helpful, and often are enough for your use case. However, for beginners, it might seem overwhelming to even get started with common deep learning tasks. There are mundane operations to be completed— Preparing the data, creating the partitions (train, val, test), preparing the model — before one can even start the training process. In this tutorial [broken up into 3 parts], I attempt to create an accessible walkthrough of the entire image segmentation pipeline. This includes:

a) Creating and structuring the dataset

b) Generating train and val images

c) Model choice, loading and compilation, and training.

Hopefully, by the end of it, you’ll be comfortable with getting your feet wet on your own beginner project in image segmentation, or any such deep learning problem focused on images.

Let’s get started!


Problems in image segmentation are a little more involved (unlike, say classification) since you have to keep track of both your images and their masks.

Typically, you would use either the PASCAL VOC, or the MS COCO, or Cityscapes, depending on what problem you want to solve.

If this is the case, then most of your job is done, since these repositories will already have the train, val, and test sets created for you. All you have to do is download them and separate into the relevant directories [more details below].

However, if you’re looking to run image segmentation models on your own datasets, refer below:

This image has an empty alt attribute; its file name is 1*UHKgtjiJMgkY9oPKUXQjlg.jpeg
Structure of your data

Where mask_001.png corresponds to the mask of frame_001.png, and so on.

For the folks who’re already using the public datasets I’ve mentioned above, all you have to do is keep the directory structure as mentioned above.

For others, who are working with their own datasets, you will need to write a script that does this for you. I’ve written one for your reference:

Let’s walk through this code.

I’m assuming that you have all the images in the ‘frames’ directory, and the corresponding masks in the ‘masks’ directory, both in DATA_PATH

A good way to randomise your partitions of train, test, and val is to list the files, sort them by their ids and shuffle them [be careful to use a constant random seed — changed seeds will generate changed orders in the shuffle].

This is what lines 23–34 achieve.

Following this, we use a 70–20–10 ratio for our train, val, and test sets respectively. This is typically the test used, although 60–30–10 or 80–10–10 aren’t unheard of. This is a simple list indexing operation in Python.

We do this in lines 39–44.

The subsequent lines run a list comprehension to iterate through all the frames, and simply add the training frames to train_frames, validation frames to val_frames, and test frames to test_frames.

We now have our necessary lists containing image ids. However, we still need to save the images from these lists to their corresponding [correct] folders. Functions add_frames() and add_masks() aid in this.

As you might have guessed, there are multiple ways to do this. You could experiment finding what is the fastestway to achieve this, but I’ve found a reasonably efficient way:

  1. Start with two lists of tuples. The tuples constitute the list of images, and their corresponding directory names. Call these frame_folders and mask_folders, the former holding the details of all our frame lists — train, val, test — and the latter holding details of all our mask lists.
  2. Iterate through frame_folders and mask_folders [one by one] and use the map() function to map each image to the add_frames() and add_masks() functions respectively. I’ve used the map instead of a trivial for loop simply because Python’s inbuilt map() is significantly faster. I haven’t tested the exact order by which it is faster, and I would love it if one of the readers does this.

For a very small dataset of 1000 images [+1000 masks], it takes less than a minute to set up your folders. Once nice experiment would be to find even faster ways of doing this. I would love to hear your thoughts.

Thus, we have our dataset!

That concludes Part 1.

In Part 2, we will look at another crucial aspect of image segmentation pipelines — Generating batches of images for training.

In the first part of this tutorial, we learnt how to prepare and structure our data to be used in our image segmentation task. In this part, we take our task one step further — The generation of these images.


Keras ImageDataGenerator

In order to train your model, you will ideally need to generate batches of images to feed it. While you do this, you may want to perform common operations across all these images — Operations like rescaling, rotations, crops and shifts, etc. This is called data augmentation. In fact, one very common practice is to resize all images to a one shape, to make the training process uniform.

Note that data augmentation does not change your image — It simply creates another representation of the same image. Imagine if someone took a picture of you, and then rotated that picture by some angle. These are two different pictures, but the object of the picture [you] does not change.

To achieve this, we use Keras’s ImageDataGenerator.

According to the docs:

Generate batches of tensor image data with real-time data augmentation. The data will be looped over (in batches).

We create our training and validation generator objects respectively. You can see that the training images will be augmented through rescaling, horizontal flips, shear range and zoom range.

For a description on what these operations mean, and more importantly, what they look like, go here.

There are no single correct answers when it comes to how one initialises the objects. It depends on who is designing them and what his objectives are.

Now that our generator objects our created, we initiate the generation process using the very helpful flow_from_directory():

All we need to provide to Keras are the directory paths, and the batch sizes. There are other options too, but for now, this is enough to get you started.

Finally, once we have the frame and mask generators for the training and validation sets respectively, we zip() them together to create:

a) train_generator : The generator for the training frames and masks.

b) val_generator : The generator for the validation frames and masks.

Creating your own data generator

By no means does the Keras ImageDataGenerator need to be the only choice when you’re designing generators. Custom generators are also frequently used. These provide greater flexibility of choice to the designer. See the example below:

We have decided to let the sizes of all images be (512 * 512 * n), where n = 3 if it’s a normal RGB image, and n = 1 for the corresponding mask of that image, which would obviously be grayscale.

We initialise two arrays to hold details of each image (and each mask), which would be 3 dimensional arrays themselves. So, img and masks are arrays of arrays.

We use yield for the simply purpose of generating batches of images lazily, rather than a return which would generate all of them at once. For a clear explanation of when to use one over the other, see this.

Finally, we create our training and validation generators, by passing the training image, mask paths, and validation image, mask paths with the batch size, all at once, which wasn’t possible when we were using Keras’s generator.

However, in this case, we aren’t using random transformations on the fly. Imagine you are tackling an image segmentation problem where the location of the object you are segmenting is also important. Would you still use rotations, zooms, and shifts? Food for thought.

At the end of the day, it all boils down to individual choices. Both approaches work. One may find one approach to be more useful over the other in specific situations, and vice versa.

Great! Now we have our generator objects ready to go.

It’s training time!


In the previous two sections, we learnt how to prepare our data, and create image generators that aid training. In this final section, we will see how to use these generators to train our model.

This section will conclude our entire pipeline.

Before we can begin training, we need to decide which architecture to use. Fortunately, most of the popular ones have already been implemented and are freely available for public use. Some examples include:

  1. The Keras UNet implementation
  2. The Keras FCNet implementations.

To get started, you don’t have to worry much about the differences in these architectures, and where to use what. As of now, you can simply place this model.py file in your working directory, and import this in train.py, which will be the file where the training code will exist.

Your working directory hopefully looks like this:

This image has an empty alt attribute; its file name is 1*rW90_404RUy4vIrlV_P9EQ.png

Notice the new code files, in addition to the data directories we had seen before.

Assuming that you’re working with the FCNet_VGG16_32s, let’s take a look at the one-liners to load, compile, and run the model.

After the necessary imports, lines 8–13 initialise the variables that totally depend on your dataset, and your choice of inputs — For eg: What batch size you’ve decided upon, and the number of epochs for which your model will train.

Line 15 initialises the path where the weights [a .h5 file] after each epoch are going to be saved.

Lines 17–22 are the necessary steps to load and compile your model. Notice that I haven’t specified what metrics to use. For image segmentation tasks, one popular metric is the dice coefficient [and conversely, the dice loss]. A nice implementation can be found here

Lines 24–32 are also boilerplate Keras code, encapsulated under a series of operations called callbacks.

We use a ModelCheckpoint to save the weights only if the mode parameter is satisfied. To know what are the monitor and mode parameters, read on.

We make sure that our model doesn’t train for an unnecessarily large amount of time — For eg: If the loss isn’t decreasing significantly over consecutive epochs, we set a patience parameter to automatically stop training after a certain number of epochs over which our loss does not decrease significantly. When I mention ‘significantly’, I mean the min_delta parameter. For eg: In this case, we check if our loss has decreased at least by 0.1. Our patience in this case is 3, which is the number of consecutive epochs after which training will automatically stop if loss does not decrease by at least 0.1.

The monitor parameter defines the metric whose value you want to check — In our case, the dice loss. The mode parameter defines when the training will stop — ‘max’ if the monitored quantity decreases, and ‘min’ if it increases.

So, if you were monitoring accuracy, mode would be max. But if you were monitoring mean_squared_error, mode would be min.

This entire phenomenon is called early stopping. Hopefully, by now, you understand why it is named so.

Line 34 is the training step. We pass all the inputs that are needed, which include:

a) The training and validation image generators, seen previously.

b) The number of epochs.

c) The number of steps per epoch, depends on total number of images and batch size.

d) Finally, our list of callbacks, which include our conditions for model checkpoint and early stopping.

Finally, we call fit_generator to train on these generators.

And training has begun!

Once training finishes, you can save the check pointed architecture with all its weights using the save function. You can name it whatever you like.


In this three part series, we walked through the entire Keras pipeline for an image segmentation task. From structuring our data, to creating image generators to finally training our model, we’ve covered enough for a beginner to get started. Of course, there’s so much more one could do.

You could experiment with different architectures, different hyper-parameters [like using a different optimiser other than Adam], different stopping conditions [playing around with the patience parameter], etc.

One good idea is to plot the number of epochs before early stopping for different hyper parameters, evaluating the metric values, and checking if any optimal hyper parameter-model-epoch combination exists.

I hope this series was accessible, and if any parts were unclear, I would love to hear your questions on them.

Leave a Comment