comp9517
Week1

Introduction and Image Formation

Lecture 1 Image Formation

  • Lab consultations: Wednesdays 6-7pm in weeks 2-5 (via BB Collab) Software demos and consultations with your assigned tutor (links in Moodle)

    Lab Work (4x) 10% Weeks 2, 3, 4, 5 Weeks 3, 4, 5, 7 Online

  • Project consultations: Wednesdays 6-7pm in weeks 6-10 (via BB Collab) All project consultations will be online with your assigned tutor (links in Moodle)

    Group Project 40% Week 5 Week 10 Online

Color Spaces

  • RGB:

    • Red, Green, Blue,
    • Additive color model,
    • 3 channels, 8 bits per channel, 24 bits per pixel, 16.7 million colors,
    • Drawback of RGB: channels are strongly correlated, not a efficient way to model
  • HSV:

    • Hue, Saturation, Value,
    • cylindrical color model,
    • 3 channels, 8 bits per channel, 24 bits per pixel, 16.7 million colors,
    • Hue: 0-360 degrees, Saturation: 0-100%, Value: 0-100%,
    • Drawback of HSV: confounded channels by the fact that it is not perceptually uniform
  • YCbCr:

    • Luminance, Chrominance Blue, Chrominance Red,
    • 3 channels, 8 bits per channel, 24 bits per pixel, 16.7 million colors,
    • Used in JPEG compression, used in television and video compression,
  • Lab:

    • Lightness, a, b, a refers to the green-red component, b refers to the blue-yellow component,
    • 3 channels, 8 bits per channel, 24 bits per pixel, 16.7 million colors
    • Any numerical change in Lab space is perceptually uniform, this is what we use in image processing

Example exam question Which one of the following statements about colour spaces is incorrect? A. The R, G, and B channels of the RGB colour space are often correlated. B. The H and the S channel of the HSV colour space are confounded. C. The Y channel of the YCbCr colour space represents the brightness. D. The a* channel of the Lab* colour space is the green-blue component.

The answer is D, the a* channel of the Lab* colour space is the red-green component.

Face Recognition

Spatial resolution is the number of pixels in an image, the higher the resolution the more detail in the image. For face recognition, we need to have a high spatial resolution to capture the details of the face. Usually we need a resolution of 64x64 pixels.

Lecture 2 Image Processing

What is image processing

  • image processing = image in -> image out
  • image analysis = image in -> features out
  • computer vision = image in -> interpretation out

Categories of image processing

Spatial domain: pixel-based operations

  • point operations: operate on individual pixels, such as contrast stretching, thresholding, inversion, log/power transformations
    • T operates on a single pixel
    • T: R>R, g(x,y)=T(f(x,y))R -> R,\ g(x, y) = T(f(x, y))
  • neighbourhood operations: operate on a group of pixels
    • T operates on a group of pixels
    • T: Rn>R, g(x,y)=T(f(x,y),f(x+1,y),f(x1,y),...)R^n -> R,\ g(x, y) = T(f(x, y), f(x+1, y), f(x-1, y),...)
Contrast stretching
  • Puts values below L in the input to the minimum (black) in the output
  • Puts values above H in the input to the maximum (white) in the output
  • Linearly scales values between L and H in the input to between the minimum and maximum in the output

contrast_stretching

Intensity thresholding
  • Binary thresholding: g(x,y)=1g(x, y) = 1 if f(x,y)>Tf(x, y) > T, g(x,y)=0g(x, y) = 0 otherwise
  • If you reduce the threshold, you will get more white pixels
  • If you increase the threshold, you will get more black pixels

intensity

Automatic intensity thresholding

Otsu's method

Exhaustively searches for the threshold that minimizes the intra-class variance: σw2=p0σ02+p1σ12\sigma^2_w = p_0\sigma^2_0 + p_1\sigma^2_1 . Equivalently, maximizes the inter-class variance: σb2=p0p1(μ0μ1)2\sigma^2_b = p_0p_1(\mu_0 - \mu_1)^2. Here p_0 is the probability of class 0, p_1 is the probability of class 1, σ02\sigma^2_0 and σ12\sigma^2_1 are the variances of the two classes, and μ0\mu_0 and μ1\mu_1 are the means of the two classes.

IsoData method

  • Start with an initial threshold value
  • Compute the mean of the pixels below the threshold and the mean of the pixels above the threshold
  • Update the threshold to the average of the two means t = (mean1 + mean2) / 2
  • Repeat until the threshold converges

Multilevel thresholding: Apply thresholding multiple times to segment an image into multiple regions

Invensity Inversion
  • Apply thresholding to segment the image into two regions
  • Invert the intensity values of one of the regions
Log/Power transformation
  • Logarithmic transformation: s=clog(1+r)s = c \log(1 + r), they are non-linear mapping, so we can map a relatively narrow range of input values to a wider range of output values
  • Power-law (gamma) transformation: s=crγs = c r^{\gamma}, they map a relatively wide range of input values to a narrow range of output values

log_power

Piecewise linear transformations
  • Piecewise linear transformation: s=T(r)s = T(r), where T is a piecewise linear function
  • complementary to other transformations, such as contrast stretching
Grey-level slicing
  • High value for all grey levels in a range of interest and low values for all others
  • Useful for highlighting a specific range of intensities

Histogram of pixel values

  • The histogram can be used to determine the contrast of an image, brightness of an image, and the distribution of pixel values in an image

Histogram based thresholding

  • Find the histogram peak (rp,hpr_p, h_p) and the highest greay level point (rmax,hmaxr_{max}, h_{max})
  • Construct a line between the peak and the highest grey level point
  • Find the gray level r for which the distance between the line and the histogram is maximized

histogram based thresholding

Histogram equalization

  • Enhances contrast for intensity values near histogram maxima and decreases contrast near histogram minima
  • The cumulative distribution function (CDF) is used to map the input intensity values to the output intensity values
  • p_s(s) = p_r(r)dr/ds, where p_s(s) is the probability density function of the output intensity values, p_r(r) is the probability density function of the input intensity values, and dr/ds is the derivative of the CDF

ps(s)=pr(r)(L1)pr(r)=1L1p_s(s) = \frac{p_r(r)}{(L-1)p_r(r)} = \frac{1}{L - 1}

  • The discrete form of the CDF is used to map the input intensity values to the output intensity values
  • p_r(r_k) = n_k / N, where n_k is the number of pixels with intensity r_k, and N is the total number of pixels

sk=(L1)j=0kpr(rj)=(L1)M×Nj=0knjs_k = (L-1) \sum_{j=0}^{k} p_r(r_j) = \frac{(L-1)}{M \times N} \sum_{j=0}^{k} n_j

The difference between cumulative and discrete CDF is that the cumulative CDF is the integral of the probability density function, while the discrete CDF is the sum of the probability density function

We also have constrained historgam equalization, where we can specify the desired histogram to slow down the contrast enhancement (gradient growth rate) in some regions of the image

histogram equalization calculation

We can see that the blue histogram is smoothed out, and the orange histogram is the equalized histogram

Histogram matching

  • Match the histogram of an input image to the histogram of a reference image

Assume that r and s are continuous and pz(z)p_z(z) is the target distribution for the output image. So we can apply the following transformation:

s=T(r)=(L1)0rpr(ζ)dζs = T(r) = (L-1)\int_{0}^{r} p_r(\zeta)d\zeta

Now we can define another function G(s) as:

G(z)=(L1)0zpz(ζ)dζ=sG(z) = (L-1)\int_{0}^{z} p_z(\zeta)d\zeta = s

Therefore:

z=G1(s)=G1[T(r)] z = G^{-1}(s) = G^{-1}[T(r)]

For discrete image values we can write:

sk=T(rk)=(L1)j=0kpr(rj)=(L1)M×Nj=0knjs_k = T(r_k) = (L-1)\sum_{j=0}^{k} p_r(r_j) = \frac{(L-1)}{M \times N} \sum_{j=0}^{k} n_j

And G(zq)=(L1)i=0qpz(zi)G(z_q) = (L - 1)\sum_{i=0}^{q} p_z(z_i)

Therefore we can write:

zq=G1(sk)z_q = G^{-1}(s_k)

The difference between histogram equalization and histogram matching is that histogram equalization enhances the contrast of an image, while histogram matching matches the histogram of an input image to the histogram of a reference image

Example:

Suppose that a 3-bit image (L = 8) of size 64x64 pixels (MN=4096) has the following intensity distribution shown in the following table (on the left). Get the histogram transformation function and make the output image match the specified histogram, listed in the table on the right.

histogram matching calculation

Arithmetic operations on images:

  • Addition: g(x,y)=f1(x,y)+f2(x,y)g(x, y) = f_1(x, y) + f_2(x, y)
  • Subtraction: g(x,y)=f1(x,y)f2(x,y)g(x, y) = f_1(x, y) - f_2(x, y)
  • Multiplication: g(x,y)=f1(x,y)×f2(x,y)g(x, y) = f_1(x, y) \times f_2(x, y)
  • Division: g(x,y)=f1(x,y)/f2(x,y)g(x, y) = f_1(x, y) / f_2(x, y)
  • Logical operations: AND, OR, XOR, NOT Applications: image blending, image subtraction, image division, image masking

Averaging

Application: reduce noise in images

Assume the true noise free image is g(x,y)g(x, y) and the noisy image is fi(x,y)=gi(x,y)+ni(x,y)f_i(x, y) = g_i(x, y) + n_i(x, y) for i = 1...n, where n(x, y) is the zero mean independent and identically distributed noise. We can write the average of the noisy images as:

E[fi(x,y)]=g(x,y)E[f_i(x, y)] = g(x, y) and VAR[ni(x,y)n_i(x, y)] = σ2(x,y)\sigma^2(x, y) \rightarrow f(x,y)=1ni=1nfi(x,y)=g(x,y)+1ni=1nni(x,y)\overline f(x, y) = \frac{1}{n} \sum_{i=1}^{n} f_i(x, y) = g(x, y) + \frac{1}{n} \sum_{i=1}^{n} n_i(x, y)

Question: What is the variance and standard deviation of the average image?

VAR[f(x,y)]=1n2i=1nVAR[ni(x,y)]=1n2i=1nσ2(x,y)=1nσ2(x,y)VAR[\overline f(x, y)] = \frac{1}{n^2} \sum_{i=1}^{n} VAR[n_i(x, y)] = \frac{1}{n^2} \sum_{i=1}^{n} \sigma^2(x, y) = \frac{1}{n} \sigma^2(x, y)

The standard deviation of the average image is σ(x,y)n\frac{\sigma(x, y)}{\sqrt{n}}

Example exam question

Which one of the following statements about intensity transformations is correct?

  • A. Contrast stretching linearly maps intensities between two values to the full output range
  • B. Logarithmic transformations maps a narrow range of high intensities to a wider range of output values
  • C. Power transformation can map intensities similar to log and inverse log transformations
  • D. Piecewise linear transformations are complementary to other transformations

Explanation: B is incorrect because logarithmic transformation maps a narrow range of low-intensity input values to a wide range of output values. C is incorrect because they do not map intensities in a similar manner to log and inverse log transformations. D is incorrect because they are not necessarily complementary but are instead an alternative approach to intensity transformation.