Spatial operations

These methods perform linear spatial filtering operations such as smoothing, convolution on greyscale and color images.

class machinevisiontoolbox.ImageSpatial.ImageSpatialMixin[source]
smooth(sigma, h=None, mode='same', border='reflect', bordervalue=0)[source]

Smooth image

Parameters:
  • sigma (float) – standard deviation of the Gaussian kernel

  • h (int) – half-width of the kernel

  • mode (str, optional) – option for convolution, see convolve, defaults to ‘same’

  • border (str, optional) – option for boundary handling, see convolve, defaults to ‘reflect’

  • bordervalue (scalar, optional) – padding value, see convolve, defaults to 0

Returns:

smoothed image

Return type:

Image

Smooth the image by convolving with a Gaussian kernel of standard deviation sigma. If h is not given the kernel half width is set to \(2 \mbox{ceil}(3 \sigma) + 1\).

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read('monalisa.png')
>>> img.smooth(sigma=3).disp()
<matplotlib.image.AxesImage object at 0x7fdc5af7e970>
Note:
  • Smooths all planes of the input image.

  • The Gaussian kernel has a unit volume.

  • If input image is integer it is converted to float, convolved, then converted back to integer.

References:
  • Robotics, Vision & Control for Python, Section 11.5.1, P. Corke, Springer 2023.

Seealso:

machinevisiontoolbox.Kernel.Gauss convolve

convolve(K, mode='same', border='reflect', bordervalue=0)[source]

Image convolution

Parameters:
  • K (ndarray(N,M)) – kernel

  • mode (str, optional) – option for convolution, defaults to ‘same’

  • border (str, optional) – option for boundary handling, defaults to ‘reflect’

  • bordervalue (scalar, optional) – padding value, defaults to 0

Returns:

convolved image

Return type:

Image instance

Computes the convolution of image with the kernel K.

There are two options that control what happens at the edge of the image where the convolution window lies outside the image border. mode controls the size of the resulting image, while border controls how pixel values are extrapolated outside the image border.

mode

description

'same'

output image is same size as input image (default)

'full'

output image is larger than the input image, add border to input image

'valid'

output image is smaller than the input image and contains only valid pixels

border

description

'replicate'

replicate border pixels outwards

'pad'

outside pixels are set to value

'wrap'

borders are joined, left to right, top to bottom

'reflect'

outside pixels reflect inside pixels

'reflect101'

outside pixels reflect inside pixels except for edge

'none'

do not look outside of border

Example:

>>> from machinevisiontoolbox import Image
>>> import numpy as np
>>> img = Image.Read('monalisa.png')
>>> img.convolve(K=np.ones((11,11))).disp()
<matplotlib.image.AxesImage object at 0x7f14974ab970>
Note:
  • The kernel is typically square with an odd side length.

  • The result has the same datatype as the input image. For a kernel where the results could be negative (eg. edge detection kernel) this will cause issues such as value wraparound.

  • If the image is color (has multiple planes) the kernel is applied to each plane, resulting in an output image with the same number of planes.

References:
  • Robotics, Vision & Control for Python, Section 11.5.1, P. Corke, Springer 2023.

Seealso:

Kernel smooth opencv.filter2D opencv.copyMakeBorder

gradients(kernel=None, mode='same', border='reflect', bordervalue=0)[source]

Compute horizontal and vertical gradients

Parameters:
  • kernel (2D ndarray, optional) – derivative kerne, defaults to Sobel

  • mode (str, optional) – option for convolution, see convolve, defaults to ‘same’

  • border (str, optional) – option for boundary handling, see convolve, defaults to ‘reflect’

  • bordervalue (scalar, optional) – padding value, , see convolve, defaults to 0

Returns:

gradient images

Return type:

Image, Image

Compute horizontal and vertical gradient images.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read('monalisa.png', grey=True)
>>> Iu, Iv = img.gradients()
References:
  • Robotics, Vision & Control for Python, Section 11.5.1.3, P. Corke, Springer 2023.

Seealso:

Kernel

direction(vertical)[source]

Gradient direction

Parameters:

im (Image) – vertical gradient image

Returns:

gradient direction in radians

Return type:

Image

Compute the per-pixel gradient direction from two images comprising the horizontal and vertical gradient components.

\[\theta_{u,v} = \tan^{-1} \frac{\mat{I}_{v: u,v}}{\mat{I}_{u: u,v}}\]

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read('monalisa.png', grey=True)
>>> Iu, Iv = img.gradients()
>>> direction = Iu.direction(Iv)
References:
  • Robotics, Vision & Control for Python, Section 11.5.1.3, P. Corke, Springer 2023.

Seealso:

gradients

Harris_corner_strength(k=0.04, h=2)[source]

Harris corner strength image

Parameters:
  • k (float, optional) – Harris parameter, defaults to 0.04

  • h (int, optional) – kernel half width, defaults to 2

Returns:

Harris corner strength image

Return type:

Image

Returns an image containing Harris corner strength values. This is positive for high gradient in orthogonal directions, and negative for high gradient in a single direction.

References:
  • Robotics, Vision & Control for Python, Section 12.3.1, P. Corke, Springer 2023.

Seealso:

gradients Harris

window(func, h=None, se=None, border='reflect', bordervalue=0, **kwargs)[source]

Generalized spatial operator

Parameters:
  • func (callable) – function applied to window

  • h (int, optional) – half width of structuring element

  • se (ndarray(N,M), optional) – structuring element

  • border (str, optional) – option for boundary handling, see convolve, defaults to ‘reflect’

  • bordervalue (scalar, optional) – padding value, defaults to 0

Raises:
  • ValueErrorborder is not a valid option

  • TypeErrorfunc not callable

  • ValueError – single channel images only

Returns:

transformed image

Return type:

Image

Returns an image where each pixel is the result of applying the function func to a neighbourhood centred on the corresponding pixel in image. The return value of func becomes the corresponding pixel value.

The neighbourhood is defined in two ways:

  • If se is given then it is the the size of the structuring element se which should have odd side lengths. The elements in the neighbourhood corresponding to non-zero elements in se are packed into a vector (in column order from top left) and passed to the specified callable function func.

  • If se is None then h is the half width of a \(w \times w\) square structuring element of ones, where \(w =2h+1\).

Example:

>>> from machinevisiontoolbox import Image
>>> import numpy as np
>>> img = Image.Read('monalisa.png', grey=True)
>>> out = img.window(np.median, h=3)
Note:
  • The structuring element should have an odd side length.

  • Is slow since the function func must be invoked once for every output pixel.

References:
  • Robotics, Vision & Control for Python, Section 11.5.3, P. Corke, Springer 2023.

Seealso:

scipy.ndimage.generic_filter

zerocross()[source]

Compute zero crossing

Returns:

boolean image

Return type:

Image instance

Compute a zero-crossing image, where pixels are true if they are adjacent to a change in sign.

Example:

>>> from machinevisiontoolbox import Image
>>> U, V = Image.meshgrid(None, 6, 6)
>>> img = Image(U - V - 2, dtype='float')
>>> img.print()
 -2.00 -1.00  0.00  1.00  2.00  3.00
 -3.00 -2.00 -1.00  0.00  1.00  2.00
 -4.00 -3.00 -2.00 -1.00  0.00  1.00
 -5.00 -4.00 -3.00 -2.00 -1.00  0.00
 -6.00 -5.00 -4.00 -3.00 -2.00 -1.00
 -7.00 -6.00 -5.00 -4.00 -3.00 -2.00
>>> img.zerocross().print()
 0 0 0 1 0 0
 0 0 1 0 1 0
 0 0 0 1 0 1
 0 0 0 0 1 0
 0 0 0 0 0 0
 0 0 0 0 0 0
Note:

Use morphological filtering with 3x3 structuring element, can lead to erroneous values in border pixels.

References:
  • Robotics, Vision & Control for Python, Section 11.5.1.3, P. Corke, Springer 2023.

Seealso:

Laplace LoG

scalespace(n, sigma=1)[source]

Compute image scalespace sequence

Parameters:
  • n (omt) – number of steps

  • sigma (scalar, optional) – Gaussian filter width, defaults to 1

Returns:

Gaussian and difference of Gaussian sequences, scale factors

Return type:

list of Image, list of Image, list of float

Compute a scalespace image sequence by consecutively smoothing the input image with a Gaussian of width sigma. The difference between consecutive smoothings is the difference of Gaussian which is an approximation to the Laplacian of Gaussian.

Examples:

>>> mona = Image.Read("monalisa.png", dtype="float");
>>> G, L, scales = mona.scalespace(8, sigma=8);
Note:

The two image sequences have the same length, the original image is not included in the list of smoothed images.

References:
  • Robotics, Vision & Control for Python, Section 12.3.2, P. Corke, Springer 2023.

Seealso:

pyramid smooth Kernel.Gauss Kernel.LoG

pyramid(sigma=1, N=None, border='replicate', bordervalue=0)[source]

Pyramidal image decomposition

Parameters:
  • sigma (float) – standard deviation of Gaussian kernel

  • N (int, optional) – number of pyramid levels to be computed, defaults to all

  • border (str, optional) – option for boundary handling, see convolve, defaults to ‘replicate’

  • bordervalue (scalar, optional) – padding value, defaults to 0

Returns:

list of images at each pyramid level

Return type:

list of Image

Returns a pyramid decomposition of the input image using Gaussian smoothing with standard deviation of sigma. The return is a list array of images each one having dimensions half that of the previous image. The pyramid is computed down to a non-halvable image size.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read('monalisa.png')
>>> pyramid = img.pyramid(4)
>>> len(pyramid)
11
>>> pyramid
[Image: 677 x 700 (uint8), Image: 339 x 350 (uint8), Image: 170 x 175 (uint8), Image: 85 x 88 (uint8), Image: 43 x 44 (uint8), Image: 22 x 22 (uint8), Image: 11 x 11 (uint8), Image: 6 x 6 (uint8), Image: 3 x 3 (uint8), Image: 2 x 2 (uint8), Image: 1 x 1 (uint8)]
Note:
  • Works for greyscale images only.

  • Converts a color image to greyscale.

References:
  • Robotics, Vision & Control for Python, Section 12.3.2, P. Corke, Springer 2023.

Seealso:

smooth scalespace

canny(sigma=1, th0=None, th1=None)[source]

Canny edge detection

Parameters:
  • sigma (float, optional) – standard deviation for Gaussian kernel smoothing, defaults to 1

  • th0 (float) – lower threshold

  • th1 (float) – upper threshold

Returns:

edge image

Return type:

Image instance

Computes an edge image obtained using the Canny edge detector algorithm. Hysteresis filtering is applied to the gradient image: edge pixels > th1 are connected to adjacent pixels > th0, those below th0 are set to zero.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read('monalisa.png')
>>> edges = img.canny()
Note:
  • Produces a zero image with single pixel wide edges having non-zero values.

  • Larger values correspond to stronger edges.

  • If th1 is zero then no hysteresis filtering is performed.

  • A color image is automatically converted to greyscale first.

References:
  • “A Computational Approach To Edge Detection”, J. Canny, IEEE Trans. Pattern Analysis and Machine Intelligence, 8(6):679–698, 1986.

  • Robotics, Vision & Control for Python, Section 11.5.1.3, P. Corke, Springer 2023.

rank(footprint=None, h=None, rank=-1, border='replicate', bordervalue=0)[source]

Rank filter

Parameters:
  • footprint (ndarray(N,M), optional) – filter footprint or structuring element

  • h (int, optional) – half width of structuring element

  • rank (int, str) – rank of filter

  • border (str, optional) – option for boundary handling, defaults to ‘replicate’

  • bordervalue (scalar, optional) – padding value, defaults to 0

Returns:

rank filtered image

Return type:

Image

Return a rank filtered version of image. Only pixels corresponding to non-zero elements of the structuring element are ranked, and the value that is rank in rank becomes the corresponding output pixel value. The highest rank, the maximum, is rank 0. The rank can also be given as a string: ‘min|imumum’, ‘max|imum’, ‘med|ian’, long or short versions are supported.

The structuring element is given as:

  • footprint a 2D Numpy array containing zero or one values, or

  • h which is the half width \(w=2h+1\) of an array of ones

Example:

>>> from machinevisiontoolbox import Image
>>> import numpy as np
>>> img = Image(np.arange(25).reshape((5,5)))
>>> img.print()
  0  1  2  3  4
  5  6  7  8  9
 10 11 12 13 14
 15 16 17 18 19
 20 21 22 23 24
>>> img.rank(h=1, rank=0).print()  # maximum filter
  6  7  8  9  9
 11 12 13 14 14
 16 17 18 19 19
 21 22 23 24 24
 21 22 23 24 24
>>> img.rank(h=1, rank=8).print()  # minimum filter
  0  0  1  2  3
  0  0  1  2  3
  5  5  6  7  8
 10 10 11 12 13
 15 15 16 17 18
>>> img.rank(h=1, rank=4).print()  # median filter
  1  2  3  4  4
  5  6  7  8  9
 10 11 12 13 14
 15 16 17 18 19
 20 20 21 22 23
>>> img.rank(h=1, rank='median').print()  # median filter
  1  2  3  4  4
  5  6  7  8  9
 10 11 12 13 14
 15 16 17 18 19
 20 20 21 22 23
Note:
  • The footprint should have an odd side length.

  • The input can be logical, uint8, uint16, float or double, the output is always double.

References:
  • Robotics, Vision & Control for Python, Section 11.5.3, P. Corke, Springer 2023.

Seealso:

scipy.ndimage.rank_filter

medianfilter(h=1, **kwargs)[source]

Median filter

Parameters:
  • h (int, optional) – half width of structuring element, defaults to 1

  • kwargs – options passed to rank

Returns:

median filtered image

Return type:

Image instance

Return the median filtered image. For every \(w \times w, w=2h+1\) window take the median value as the output pixel value.

Example:

  File "/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/site-packages/machinevisiontoolbox/ImageSpatial.py", line 1287, in rank
    out = sp.ndimage.rank_filter(
  File "/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/site-packages/scipy/ndimage/_filters.py", line 1543, in rank_filter
    return _rank_filter(input, rank, size, footprint, output, mode, cval,
  File "/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/site-packages/scipy/ndimage/_filters.py", line 1454, in _rank_filter
    raise RuntimeError('footprint array has incorrect shape.')
RuntimeError: footprint array has incorrect shape.
Note:

This filter is effective for removing impulse (aka salt and pepper) noise.

References:
  • Robotics, Vision & Control for Python, Section 11.5.3, P. Corke, Springer 2023.

Seealso:

rank

distance_transform(invert=False, norm='L2', h=1)[source]

Distance transform

Parameters:
  • invert (bool, optional) – consider inverted image, defaults to False

  • norm (str, optional) – distance metric: ‘L1’ or ‘L2’ [default]

  • h (int, optional) – half width of window, defaults to 1

Returns:

distance transform of image

Return type:

Image

Compute the distance transform. For each zero input pixel, compute its distance to the nearest non-zero input pixel.

Example:

>>> from machinevisiontoolbox import Image
>>> import numpy as np
>>> pixels = np.zeros((5,5))
>>> pixels[2, 1:3] = 1
>>> img = Image(pixels)
>>> img.distance_transform().print(precision=3)
 2.324 1.910 1.910 2.324 2.739
 1.369 0.955 0.955 1.369 2.324
 0.955 0.000 0.000 0.955 1.910
 1.369 0.955 0.955 1.369 2.324
 2.324 1.910 1.910 2.324 2.739
>>> img.distance_transform(norm="L1").print()
 3.00 2.00 2.00 3.00 4.00
 2.00 1.00 1.00 2.00 3.00
 1.00 0.00 0.00 1.00 2.00
 2.00 1.00 1.00 2.00 3.00
 3.00 2.00 2.00 3.00 4.00
Note:
  • The output image is the same size as the input image.

  • Distance is computed using a sliding window and is an approximation of true distance.

  • For non-zero input pixels the corresponding output pixels are set to zero.

  • The signed-distance function is image.distance_transform() - image.distance_transform(invert=True)

References:
  • Robotics, Vision & Control for Python, Section 11.6.4, P. Corke, Springer 2023.

Seealso:

opencv.distanceTransform

labels_binary(connectivity=4, ltype='int32')[source]

Blob labelling

Parameters:
  • connectivity (int, optional) – number of neighbours used for connectivity: 4 [default] or 8

  • ltype (string, optional) – output image type: ‘int32’ [default], ‘uint16’

Returns:

label image, number of regions

Return type:

Image, int

Compute labels of connected components in the input greyscale or binary image. Regions are sets of contiguous pixels with the same value.

The method returns the label image and the number of labels N, so labels lie in the range [0, N-1].The value in the label image in an integer indicating which region the corresponding input pixel belongs to. The background has label 0.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Squares(2, 15)
>>> img.print()
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> labels, N = img.labels_binary()
>>> N
5
>>> labels.print()
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 1 1 1 1 1 0 2 2 2 2 2 0 0
 0 0 1 1 1 1 1 0 2 2 2 2 2 0 0
 0 0 1 1 1 1 1 0 2 2 2 2 2 0 0
 0 0 1 1 1 1 1 0 2 2 2 2 2 0 0
 0 0 1 1 1 1 1 0 2 2 2 2 2 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 3 3 3 3 3 0 4 4 4 4 4 0 0
 0 0 3 3 3 3 3 0 4 4 4 4 4 0 0
 0 0 3 3 3 3 3 0 4 4 4 4 4 0 0
 0 0 3 3 3 3 3 0 4 4 4 4 4 0 0
 0 0 3 3 3 3 3 0 4 4 4 4 4 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Note:
  • This algorithm is variously known as region labelling, connectivity analysis, region coloring, connected component analysis, blob labelling.

  • The output image is the same size as the input image.

  • The input image can be binary or greyscale.

  • Connectivity is performed using 4 nearest neighbours by default.

  • 8-way connectivity introduces ambiguities, a chequerboard is two blobs.

References:
  • Robotics, Vision & Control for Python, Section 12.1.2.1, P. Corke, Springer 2023.

Seealso:

blobs cv2.connectedComponents labels_graphseg labels_MSER

labels_MSER(**kwargs)[source]

Blob labelling using MSER

Parameters:

kwargs – arguments passed to MSER_create

Returns:

label image, number of regions

Return type:

Image, int

Compute labels of connected components in the input greyscale image. Regions are sets of contiguous pixels that form stable regions across a range of threshold values.

The method returns the label image and the number of labels N, so labels lie in the range [0, N-1].The value in the label image in an integer indicating which region the corresponding input pixel belongs to. The background has label 0.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Squares(2, 15)
>>> img.print()
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> labels, N = img.labels_MSER()
>>> N
0
>>> labels.print()
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
References:
  • Linear time maximally stable extremal regions, David Nistér and Henrik Stewénius, In Computer Vision–ECCV 2008, pages 183–196. Springer, 2008.

  • Robotics, Vision & Control for Python, Section 12.1.2.2, P. Corke, Springer 2023.

Seealso:

labels_binary labels_graphseg blobs opencv.MSER_create

labels_graphseg(sigma=0.5, k=2000, minsize=100)[source]

Blob labelling using graph-based segmentation

Parameters:

kwargs – arguments passed to MSER_create

Returns:

label image, number of regions

Return type:

Image, int

Compute labels of connected components in the input color image. Regions are sets of contiguous pixels that are similar with respect to their surrounds.

The method returns the label image and the number of labels N, so labels lie in the range [0, N-1].The value in the label image in an integer indicating which region the corresponding input pixel belongs to. The background has label 0.

References:
  • Efficient graph-based image segmentation, Pedro F Felzenszwalb and Daniel P Huttenlocher, volume 59, pages 167–181. Springer, 2004.

  • Robotics, Vision & Control for Python, Section 12.1.2.2, P. Corke, Springer 2023.

Seealso:

labels_binary labels_MSER blobs opencv.createGraphSegmentation

sad(image2)[source]

Sum of absolute differences

Parameters:

image2 (Image) – second image

Raises:

ValueError – image2 shape is not equal to self

Returns:

sum of absolute differences

Return type:

scalar

Returns a simple image disimilarity measure which is the sum of absolute differences between the image and image2. The result is a scalar and a value of 0 indicates identical pixel patterns and is increasingly positive as image dissimilarity increases.

Example:

>>> from machinevisiontoolbox import Image
>>> img1 = Image([[10, 11], [12, 13]])
>>> img2 = Image([[10, 11], [10, 13]])
>>> img1.sad(img2)
2
>>> img1.sad(img2+10)
986
>>> img1.sad(img2*2)
982
Note:

Not invariant to pixel value scale or offset.

References:
  • Robotics, Vision & Control for Python, Section 11.5.2, P. Corke, Springer 2023.

Seealso:

zsad ssd ncc

ssd(image2)[source]

Sum of squared differences

Parameters:

image2 (Image) – second image

Raises:

ValueError – image2 shape is not equal to self

Returns:

sum of squared differences

Return type:

scalar

Returns a simple image disimilarity measure which is the sum of the squared differences between the image and image2. The result is a scalar and a value of 0 indicates identical pixel patterns and is increasingly positive as image dissimilarity increases.

Example:

>>> from machinevisiontoolbox import Image
>>> img1 = Image([[10, 11], [12, 13]])
>>> img2 = Image([[10, 11], [10, 13]])
>>> img1.ssd(img2)
4
>>> img1.ssd(img2+10)
364
>>> img1.ssd(img2*2)
454
Note:

Not invariant to pixel value scale or offset.

References:
  • Robotics, Vision & Control for Python, Section 11.5.2, P. Corke, Springer 2023.

Seealso:

zssd sad ncc

ncc(image2)[source]

Normalised cross correlation

Parameters:

image2 (Image) – second image

Raises:

ValueError – image2 shape is not equal to self

Returns:

normalised cross correlation

Return type:

scalar

Returns an image similarity measure which is the normalized cross-correlation between the image and image2. The result is a scalar in the interval -1 (non match) to 1 (perfect match) that indicates similarity.

Example:

>>> from machinevisiontoolbox import Image
>>> img1 = Image([[10, 11], [12, 13]])
>>> img2 = Image([[10, 11], [10, 13]])
>>> img1.ncc(img2)
0.9970145759536657
>>> img1.ncc(img2+10)
1.395820406335132
>>> img1.ncc(img2*2)
1.2678511640312011
Note:
  • The ncc similarity measure is invariant to scale changes in image intensity.

References:
  • Robotics, Vision & Control for Python, Section 11.5.2, P. Corke, Springer 2023.

Seealso:

zncc sad ssd

zsad(image2)[source]

Zero-mean sum of absolute differences

Parameters:

image2 (Image) – second image

Raises:

ValueError – image2 shape is not equal to self

Returns:

sum of absolute differences

Return type:

scalar

Returns a simple image disimilarity measure which is the zero-mean sum of absolute differences between the image and image2. The result is a scalar and a value of 0 indicates identical pixel patterns (relative to their mean values) and is increasingly positive as image dissimilarity increases.

Example:

>>> from machinevisiontoolbox import Image
>>> img1 = Image([[10, 11], [12, 13]])
>>> img2 = Image([[10, 11], [10, 13]])
>>> img1.zsad(img2)
3.0
>>> img1.zsad(img2+10)
3.0
>>> img1.zsad(img2*2)
6.0
Note:
  • The zsad similarity measure is invariant to changes in image brightness offset.

References:
  • Robotics, Vision & Control for Python, Section 11.5.2, P. Corke, Springer 2023.

Seealso:

zsad ssd ncc

zssd(image2)[source]

Zero-mean sum of squared differences

Parameters:

image2 (Image) – second image

Raises:

ValueError – image2 shape is not equal to self

Returns:

sum of squared differences

Return type:

scalar

Returns a simple image disimilarity measure which is the zero-mean sum of the squared differences between the image and image2. The result is a scalar and a value of 0 indicates identical pixel patterns (relative to their maen) and is increasingly positive as image dissimilarity increases.

Example:

>>> from machinevisiontoolbox import Image
>>> img1 = Image([[10, 11], [12, 13]])
>>> img2 = Image([[10, 11], [10, 13]])
>>> img1.zssd(img2)
3.0
>>> img1.zssd(img2+10)
3.0
>>> img1.zssd(img2*2)
13.0
Note:
  • The zssd similarity measure is invariant to changes in image brightness offset.

References:
  • Robotics, Vision & Control for Python, Section 11.5.2, P. Corke, Springer 2023.

Seealso:

zssd sad ncc

zncc(image2)[source]

Zero-mean normalized cross correlation

Parameters:

image2 (Image) – second image

Raises:

ValueError – image2 shape is not equal to self

Returns:

normalised cross correlation

Return type:

scalar

Returns an image similarity measure which is the zero-mean normalized cross-correlation between the image and image2. The result is a scalar in the interval -1 (non match) to 1 (perfect match) that indicates similarity.

Example:

>>> from machinevisiontoolbox import Image
>>> img1 = Image([[10, 11], [12, 13]])
>>> img2 = Image([[10, 11], [10, 13]])
>>> img1.zncc(img2)
0.7302967433402214
>>> img1.zncc(img2+10)
0.7302967433402214
>>> img1.zncc(img2*2)
0.7302967433402214
Note:
  • The zncc similarity measure is invariant to affine changes (offset and scale factor) in image intensity (brightness offset and scale).

References:
  • Robotics, Vision & Control for Python, Section 11.5.2, P. Corke, Springer 2023.

Seealso:

zncc sad ssd

similarity(T, metric='zncc')[source]

Locate template in image

Parameters:
  • T (ndarray(N,M)) – template image

  • metric (str) – similarity metric, one of: ‘ssd’, ‘zssd’, ‘ncc’, ‘zncc’ [default]

Raises:
  • ValueError – template T must have odd dimensions

  • ValueError – bad metric specified

Returns:

similarity image

Return type:

Image instance

Compute a similarity image where each output pixel is the similarity of the template T to the same-sized neighbourhood surrounding the corresonding input pixel in image.

Example:

>>> from machinevisiontoolbox import Image
>>> crowd = Image.Read("wheres-wally.png", mono=True, dtype="float")
>>> T = Image.Read("wally.png", mono=True, dtype="float")
>>> sim = crowd.similarity(T, "zncc")
>>> sim.disp(colormap="signed", colorbar=True);
<matplotlib.image.AxesImage object at 0x7f0c24bdc370>
Note:
  • For NCC and ZNCC the maximum similarity value corresponds to the most likely template location. For SSD and ZSSD the minimum value corresponds to the most likely location.

  • Similarity is not computed for those pixels where the template crosses the image boundary, and these output pixels are set to NaN.

References:
  • Robotics, Vision & Control for Python, Section 11.5.2, P. Corke, Springer 2023.

Seealso:

cv2.matchTemplate

Image kernels

These class methods define standard image kernels

class machinevisiontoolbox.ImageSpatial.Kernel[source]

Image processing kernel operations on the Image class

static Gauss(sigma, h=None)[source]

Gaussian kernel

Parameters:
  • sigma (float) – standard deviation of Gaussian kernel

  • h (integer, optional) – half width of the kernel

Returns:

Gaussian kernel

Return type:

ndarray(2h+1, 2h+1)

Return the 2-dimensional Gaussian kernel of standard deviation sigma

\[\mathbf{K} = \frac{1}{2\pi \sigma^2} e^{-(u^2 + v^2) / 2 \sigma^2}\]

The kernel is centred within a square array with side length given by:

  • \(2 \mbox{ceil}(3 \sigma) + 1\), or

  • \(2 \mathtt{h} + 1\)

Example:

>>> from machinevisiontoolbox import Kernel
>>> K = Kernel.Gauss(sigma=1, h=2)
>>> K.shape
(5, 5)
>>> K
array([[0.003 , 0.0133, 0.0219, 0.0133, 0.003 ],
       [0.0133, 0.0596, 0.0983, 0.0596, 0.0133],
       [0.0219, 0.0983, 0.1621, 0.0983, 0.0219],
       [0.0133, 0.0596, 0.0983, 0.0596, 0.0133],
       [0.003 , 0.0133, 0.0219, 0.0133, 0.003 ]])
>>> K = Kernel.Gauss(sigma=2)
>>> K.shape
(13, 13)
Note:
  • The volume under the Gaussian kernel is one.

  • If the kernel is strongly truncated, ie. it is non-zero at the edges of the window then the volume will be less than one.

References:
  • Robotics, Vision & Control for Python, Section 11.5.1.1, P. Corke, Springer 2023.

Seealso:

DGauss

static Laplace()[source]

Laplacian kernel

Returns:

Laplacian kernel

Return type:

ndarray(3,3)

Return the Laplacian kernel

\[\begin{split}\mathbf{K} = \begin{bmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0 \end{bmatrix}\end{split}\]

Example:

>>> from machinevisiontoolbox import Kernel
>>> Kernel.Laplace()
array([[ 0,  1,  0],
       [ 1, -4,  1],
       [ 0,  1,  0]])
Note:
  • This kernel has an isotropic response to image gradient.

References:
  • Robotics, Vision & Control for Python, Section 11.5.1.3, P. Corke, Springer 2023.

Seealso:

LoG zerocross

static Sobel()[source]

Sobel edge detector

Returns:

Sobel kernel

Return type:

ndarray(3,3)

Return the Sobel kernel for horizontal gradient

\[\begin{split}\mathbf{K} = \frac{1}{8} \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix}\end{split}\]

Example:

>>> from machinevisiontoolbox import Kernel
>>> Kernel.Sobel()
array([[ 0.125,  0.   , -0.125],
       [ 0.25 ,  0.   , -0.25 ],
       [ 0.125,  0.   , -0.125]])
Note:
  • This kernel is an effective vertical-edge detector

  • The y-derivative (horizontal-edge) kernel is K.T

References:
  • Robotics, Vision & Control for Python, Section 11.5.1.3, P. Corke, Springer 2023.

Seealso:

DGauss

static DoG(sigma1, sigma2=None, h=None)[source]

Difference of Gaussians kernel

Parameters:
  • sigma1 (float) – standard deviation of first Gaussian kernel

  • sigma2 (float, optional) – standard deviation of second Gaussian kernel

  • h (int, optional) – half-width of Gaussian kernel

Returns:

difference of Gaussian kernel

Return type:

ndarray(2h+1, 2h+1)

Return the 2-dimensional difference of Gaussian kernel defined by two standard deviation values:

\[\mathbf{K} = G(\sigma_1) - G(\sigma_2)\]

where \(\sigma_1 > \sigma_2\). By default, \(\sigma_2 = 1.6 \sigma_1\).

The kernel is centred within a square array with side length given by:

  • \(2 \mbox{ceil}(3 \sigma) + 1\), or

  • \(2\mathtt{h} + 1\)

Example:

>>> from machinevisiontoolbox import Kernel
>>> Kernel.DoG(1)
array([[ 0.0019,  0.0049,  0.0082,  0.0095,  0.0082,  0.0049,  0.0019],
       [ 0.0049,  0.0108,  0.0116,  0.0085,  0.0116,  0.0108,  0.0049],
       [ 0.0082,  0.0116, -0.0142, -0.0427, -0.0142,  0.0116,  0.0082],
       [ 0.0095,  0.0085, -0.0427, -0.0937, -0.0427,  0.0085,  0.0095],
       [ 0.0082,  0.0116, -0.0142, -0.0427, -0.0142,  0.0116,  0.0082],
       [ 0.0049,  0.0108,  0.0116,  0.0085,  0.0116,  0.0108,  0.0049],
       [ 0.0019,  0.0049,  0.0082,  0.0095,  0.0082,  0.0049,  0.0019]])
Note:
  • This kernel is similar to the Laplacian of Gaussian and is often used as an efficient approximation.

  • This is a “Mexican hat” shaped kernel

References:
  • Robotics, Vision & Control for Python, Section 11.5.1.3, P. Corke, Springer 2023.

Seealso:

LoG Gauss

static LoG(sigma, h=None)[source]

Laplacian of Gaussian kernel

Parameters:
  • sigma (float) – standard deviation of first Gaussian kernel

  • h (int, optional) – half-width of kernel

Returns:

kernel

Return type:

ndarray(2h+1, 2h+1)

Return a 2-dimensional Laplacian of Gaussian kernel with standard deviation sigma

\[\mathbf{K} = \frac{1}{\pi \sigma^4} \left(\frac{u^2 + v^2}{2 \sigma^2} -1\right) e^{-(u^2 + v^2) / 2 \sigma^2}\]

The kernel is centred within a square array with side length given by:

  • \(2 \mbox{ceil}(3 \sigma) + 1\), or

  • \(2\mathtt{h} + 1\)

Example:

>>> from machinevisiontoolbox import Kernel
>>> Kernel.LoG(1)
array([[ 0.0005,  0.0028,  0.0087,  0.0125,  0.0087,  0.0028,  0.0005],
       [ 0.0028,  0.0177,  0.0394,  0.0432,  0.0394,  0.0177,  0.0028],
       [ 0.0087,  0.0394,  0.0002, -0.0964,  0.0002,  0.0394,  0.0087],
       [ 0.0125,  0.0432, -0.0964, -0.3181, -0.0964,  0.0432,  0.0125],
       [ 0.0087,  0.0394,  0.0002, -0.0964,  0.0002,  0.0394,  0.0087],
       [ 0.0028,  0.0177,  0.0394,  0.0432,  0.0394,  0.0177,  0.0028],
       [ 0.0005,  0.0028,  0.0087,  0.0125,  0.0087,  0.0028,  0.0005]])
Note:

This is the classic “Mexican hat” shaped kernel

References:
  • Robotics, Vision & Control for Python, Section 11.5.1.3, P. Corke, Springer 2023.

Seealso:

Laplace DoG Gauss zerocross

static DGauss(sigma, h=None)[source]

Derivative of Gaussian kernel

Parameters:
  • sigma (float) – standard deviation of first Gaussian kernel

  • h (int, optional) – half-width of kernel

Returns:

kernel

Return type:

ndarray(2h+1, 2h+1)

Returns a 2-dimensional derivative of Gaussian kernel with standard deviation sigma

\[\mathbf{K} = \frac{-x}{2\pi \sigma^2} e^{-(x^2 + y^2) / 2 \sigma^2}\]

The kernel is centred within a square array with side length given by:

  • \(2 \mbox{ceil}(3 \sigma) + 1\), or

  • \(2\mathtt{h} + 1\)

Example:

>>> from machinevisiontoolbox import Kernel
>>> Kernel.DGauss(1)
array([[ 0.0001,  0.0005,  0.0011, -0.    , -0.0011, -0.0005, -0.0001],
       [ 0.0007,  0.0058,  0.0131, -0.    , -0.0131, -0.0058, -0.0007],
       [ 0.0032,  0.0261,  0.0585, -0.    , -0.0585, -0.0261, -0.0032],
       [ 0.0053,  0.0431,  0.0965, -0.    , -0.0965, -0.0431, -0.0053],
       [ 0.0032,  0.0261,  0.0585, -0.    , -0.0585, -0.0261, -0.0032],
       [ 0.0007,  0.0058,  0.0131, -0.    , -0.0131, -0.0058, -0.0007],
       [ 0.0001,  0.0005,  0.0011, -0.    , -0.0011, -0.0005, -0.0001]])
Note:
  • This kernel is the horizontal derivative of the Gaussian, \(dG/dx\).

  • The vertical derivative, \(dG/dy\), is the transpose of this kernel.

  • This kernel is an effective edge detector.

References:
  • Robotics, Vision & Control for Python, Section 11.5.1.3, P. Corke, Springer 2023.

Seealso:

Gauss Sobel

static Circle(radius, h=None, normalize=False, dtype='uint8')[source]

Circular structuring element

Parameters:
  • radius (scalar, array_like(2)) – radius of circular structuring element

  • h (int) – half-width of kernel

  • normalize (bool, optional) – normalize volume of kernel to one, defaults to False

  • dtype (str or NumPy dtype, optional) – data type for image, defaults to uint8

Returns:

circular kernel

Return type:

ndarray(2h+1, 2h+1)

Returns a circular kernel of radius radius pixels. Sometimes referred to as a tophat kernel. Values inside the circle are set to one, outside are set to zero.

If radius is a 2-element vector the result is an annulus of ones, and the two numbers are interpretted as inner and outer radii respectively.

The kernel is centred within a square array with side length given by \(2\mathtt{h} + 1\).

Example:

>>> from machinevisiontoolbox import Kernel
>>> Kernel.Circle(2)
array([[0, 0, 1, 0, 0],
       [0, 1, 1, 1, 0],
       [1, 1, 1, 1, 1],
       [0, 1, 1, 1, 0],
       [0, 0, 1, 0, 0]], dtype=uint8)
>>> Kernel.Circle([2, 3])
array([[0, 0, 0, 1, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 0],
       [0, 1, 0, 0, 0, 1, 0],
       [1, 1, 0, 0, 0, 1, 1],
       [0, 1, 0, 0, 0, 1, 0],
       [0, 1, 1, 1, 1, 1, 0],
       [0, 0, 0, 1, 0, 0, 0]], dtype=uint8)
References:
  • Robotics, Vision & Control for Python, Section 11.5.1.1, P. Corke, Springer 2023.

Seealso:

Box

static Box(h, normalize=True)[source]

Square structuring element

Parameters:
  • h (int) – half-width of kernel

  • normalize (bool, optional) – normalize volume of kernel to one, defaults to True

Returns:

kernel

Return type:

ndarray(2h+1, 2h+1)

Returns a square kernel with unit volume.

The kernel is centred within a square array with side length given by \(2\mathtt{h} + 1\).

Example:

>>> from machinevisiontoolbox import Kernel
>>> Kernel.Box(2)
array([[0.04, 0.04, 0.04, 0.04, 0.04],
       [0.04, 0.04, 0.04, 0.04, 0.04],
       [0.04, 0.04, 0.04, 0.04, 0.04],
       [0.04, 0.04, 0.04, 0.04, 0.04],
       [0.04, 0.04, 0.04, 0.04, 0.04]])
>>> Kernel.Box(2, normalize=False)
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])
References:
  • Robotics, Vision & Control for Python, Section 11.5.1.1, P. Corke, Springer 2023.

Seealso:

Circle