Introduction#
Rationale#
The Machine Vision Toolbox (MVTB) brings professional-grade vision algorithms to your fingertips with a Pythonic API. Built on a “best-of-breed” foundation: NumPy, SciPy, OpenCV, and Open3D — it bridges the gap between raw pixel manipulation and high-level spatial reasoning.
While the Python ecosystem offers powerful individual tools, using them in concert is often complex. Libraries like OpenCV, scikit-image and Open3D provide excellent algorithms, but they frequently differ in API style, coordinate conventions, and expected data types. MVTB unifies these capabilities into a consistent, object-oriented framework. By treating images and cameras as first-class objects rather than just raw arrays, the Toolbox allows you to focus on the geometry and logic of your vision system rather than the boilerplate of library integration.
For example, to read an image using OpenCV, smooth it, and display it is:
import cv2
import numpy
# read image
src = cv2.imread(".../flowers1.png", cv2.IMREAD_UNCHANGED)
# apply Gaussian blur on src image
dst = cv2.GaussianBlur(src, (5,5), cv2.BORDER_DEFAULT)
# display input and output image
cv2.imshow("Gaussian Smoothing",numpy.hstack((src, dst)))
cv2.waitKey(0) # waits until a key is pressed
cv2.destroyAllWindows() # destroys the window showing image
Using this Toolbox we would write instead:
from machinevisiontoolbox import Image
img = Image.Read("flowers1.png") # read the image
smooth = img.smooth(h=2) # apply a Gaussian blur
smooth.disp(block=True) # display and block until window dismissed
or even:
from machinevisiontoolbox import Image
img = Image.Read("flowers1.png").smooth(h=2).disp(block=True)
which exploits the power of Python’s method chaining – allowing a processing pipeline to be expressed in a single line of very readable code with functions applied in a logical left-to-right order.
It also supports operator overloading so we could display the difference between the original and smoothed image using:
(img - img.smooth(h=2)).disp(block=True)
While the merits (or demerits) of these different approaches is subjective, you get the idea that the Toolbox allows
succinct coding without the need for lots of OpenCV flags like cv2.IMREAD_UNCHANGED in the example above.
In summary, the Machine Vision Toolbox for Python (MVTB-P):
Pythonic OpenCV Wrapper: Treats images as first-class objects with intuitive operator overloading and transparent BGR/float32 conversion.
100+ Vision Tools: Comprehensive suite covering image acquisition, feature extraction (blobs, points, lines), morphology, and camera calibration.
Optimized Performance: Seamlessly integrates NumPy and OpenCV for maximum efficiency and portability.
Scientifically Grounded: Robust support for photometry, photogrammetry, and colorimetry; the official companion to *Robotics, Vision & Control*.
Engineered for Control: Purpose-built for vision-based control, including visual Jacobians, homographies, and visual servoing.
Modern Successor: The Python-native evolution of the long-standing Machine Vision Toolbox for MATLAB.
To get started and install the Toolbox, see Installing the Toolbox.
Image objects#
The key element of the Toolbox is the Image class and full details are given in Image object.
class.
Firstly, there are lots of ways to create an image. We can read an image from a file:
img = Image.Read("street.png")
or create it from code:
img = Image.Zeros(100, dtype="uint8")
Under the hood the Image object contains some image parameters, a lot
of methods, and a reference to a 2D or 3D NumPy ndarray containing the pixel data.
An image object has a lot of useful attributes that describe the image, including:
img.width, the width of the image in pixelsimg.height, the height of the image in pixelsimg.size, the size of the image (width, height) in pixelsimg.nplanes, the number of planes in the image
as well as a number of useful predicates including:
img.iscolor, is the image multichannel?img.ismono, is the image single channel?img.isfloat, does the image have floating point pixels?
Note
Image object methods generally consider pixel coordinates with the horizontal coordinate
first and the vertical coordinate second – consistent with the way we write about
algorithms but the opposite to the way that NumPy indexes an array.
We can slice the image using the same syntax as for a NumPy array:
img[10:20, 30:40]
but only for reading, not for assignment. The result is another Image object.
Accessing the pixel array#
We can access the array of pixel values by
the array attribute:
np.mean(img.array)
where array is a read-only view of the pixel data, so we cannot modify the pixel
values directly through this reference. If we need to modify the image, we should create
a copy of it first, and then modify the copy.
Multi-plane images#
Color images are handled a bit more sensibly than raw OpenCV. A multi-channel or multi-plane image is a NumPy ndarray with an arbitrary number of planes and a dictionary that maps channel names to an integer index. For instance, to create multi-plane images we can write any of the following:
img = Image.Zeros(100, colororder="RGB")
img = Image.Zeros(100, colororder="XYZ")
img = Image.Zeros(100, colororder="red:green:blue")
img = Image.Zeros(100, colororder="PQRST") # 5 channel image
which create 100x100 images with 3, 3, 3 and 5 planes respectively, with all pixel values set to zero. Rather than have the meaning of the plane implicit (ie. plane 0 is red), it is explicit, for example:
img.plane("R")
img.plane("Y")
img.plane("blue")
A more common example is to read a color image:
img = Image.Read("flowers1.png")
img.red().disp() # display the red plane of the image, whether RGB or BGR format
img.colorspace("hsv").plane("h").disp() # display the hue plane of an HSV image
Image iterators#
Frequently we want to use images that form a seqeuence – consecutive frames from a camera or a video file, a web camera, image files in a folder or zip file. Rather than build this capability into the Image object we provide a number of iterator objects:
for img in FileArchive("holidaypix.zip"):
# process the image
Iterators exist for folders of images, all manner of compressed archives, video files, local cameras, remote web cameras, ROS bag files, ROS topics, and more. See Image and PointCloud sources for details.
Examples#
Binary blobs#
We load a binary image of two sharks and find the blobs in the image. We then display the image with the blobs marked by bounding boxes and centroids.
import machinevisiontoolbox as mvtb
im = mvtb.Image("shark2.png") # read a binary image of two sharks
im.disp(); # display it with interactive viewing tool
blobs = im.blobs() # find all the white blobs
print(blobs)
which will display:
┌───┬────────┬──────────────┬──────────┬───────┬───────┬─────────────┬────────┬────────┐
│id │ parent │ centroid │ area │ touch │ perim │ circularity │ orient │ aspect │
├───┼────────┼──────────────┼──────────┼───────┼───────┼─────────────┼────────┼────────┤
│ 0 │ -1 │ 371.2, 355.2 │ 7.59e+03 │ False │ 557.6 │ 0.341 │ 82.9° │ 0.976 │
│ 1 │ -1 │ 171.2, 155.2 │ 7.59e+03 │ False │ 557.6 │ 0.341 │ 82.9° │ 0.976 │
└───┴────────┴──────────────┴──────────┴───────┴───────┴─────────────┴────────┴────────┘
blobs.plot_box(color='g') # put a green bounding box on each blob
blobs.plot_centroid(label=True) # put a circle+cross on the centroid of each blob
A huge number of blob features are computed, including: the centroid, area, perimeter, bounding box, equivalent ellipse (radii, aspect ratio, orientation, axis-aligned bounding box), circularity, minimal enclosing circle and rectangle, convex hull, etc.
Binary blob hierarchy#
The blobs are also organized into a hierarchy based on their nesting. For example, if we load a binary image with nested objects
im = mvtb.Image("multiblobs.png")
im.disp()
f = im.blobs()
print(f)
which will display:
┌───┬────────┬───────────────┬──────────┬───────┬────────┬─────────────┬────────┬────────┐
│id │ parent │ centroid │ area │ touch │ perim │ circularity │ orient │ aspect │
├───┼────────┼───────────────┼──────────┼───────┼────────┼─────────────┼────────┼────────┤
│ 0 │ 1 │ 898.8, 725.3 │ 1.65e+05 │ False │ 2220.0 │ 0.467 │ 86.7° │ 0.754 │
│ 1 │ 2 │ 1025.0, 813.7 │ 1.06e+05 │ False │ 1387.9 │ 0.769 │ -88.9° │ 0.739 │
│ 2 │ -1 │ 938.1, 855.2 │ 1.72e+04 │ False │ 490.7 │ 1.001 │ 88.7° │ 0.862 │
│ 3 │ -1 │ 988.1, 697.2 │ 1.21e+04 │ False │ 412.5 │ 0.994 │ -87.8° │ 0.809 │
│ 4 │ -1 │ 846.0, 511.7 │ 1.75e+04 │ False │ 496.9 │ 0.992 │ -90.0° │ 0.778 │
│ 5 │ 6 │ 291.7, 377.8 │ 1.7e+05 │ False │ 1712.6 │ 0.810 │ -85.3° │ 0.767 │
│ 6 │ -1 │ 312.7, 472.1 │ 1.75e+04 │ False │ 495.5 │ 0.997 │ -89.9° │ 0.777 │
│ 7 │ -1 │ 241.9, 245.0 │ 1.75e+04 │ False │ 496.9 │ 0.992 │ -90.0° │ 0.777 │
│ 8 │ 9 │ 1228.0, 254.3 │ 8.14e+04 │ False │ 1215.2 │ 0.771 │ -77.2° │ 0.713 │
│ 9 │ -1 │ 1225.2, 220.0 │ 1.75e+04 │ False │ 496.9 │ 0.992 │ -90.0° │ 0.777 │
└───┴────────┴───────────────┴──────────┴───────┴────────┴─────────────┴────────┴────────┘
We can display a label image, where the value of each pixel is the label of the blob that the pixel belongs to
out = f.labelImage(im)
out.stats
out.disp(block=True, colormap="jet", cbar=True, vrange=[0,len(f)-1])
and request the blob label image which we then display
Camera modelling#
Cameras are key to machine vision and vision-based robot control and navigation so it important to be able to model them.
We can model a standard central-perspective camera using its known parameters:
camera = mvtb.CentralCamera(f=0.015, rho=10e-6, imagesize=[1280, 1024], pp=[640, 512], name="mycamera")
print(camera)
Name: mycamera [CentralCamera]
focal length: (array([0.015]), array([0.015]))
pixel size: 1e-05 x 1e-05
principal pt: (640.0, 512.0)
image size: 1280.0 x 1024.0
focal length: (array([0.015]), array([0.015]))
pose: t = 0, 0, 0; rpy/zyx = 0°, 0°, 0°
and its intrinsic parameters are
print(cam.K)
[[1.50e+03 0.00e+00 6.40e+02]
[0.00e+00 1.50e+03 5.12e+02]
[0.00e+00 0.00e+00 1.00e+00]]
We can define an arbitrary point in the world
P = [0.3, 0.4, 3.0]
and then project it into the camera
p = cam.project(P)
[790. 712.]
which is the corresponding coordinate in pixels. If we shift the camera slightly the image plane coordinate will also change
p = cam.project(P, T=SE3(0.1, 0, 0) )
print(p)
[740. 712.]
where SE3 is a class from the SpatialMath package representing a 3D rigid body transformation, and the T argument specifies the pose of the camera relative to the world frame in which the point is defined. By default the camera is at the origin of the world frame, so if we do not specify T then it is assumed to be the identity transformation.
We can define an edge-based cube model and project it into the camera’s image plane
>>> X, Y, Z = mkcube(0.2, pose=SE3(0, 0, 1), edge=True)
>>> cam.mesh(X, Y, Z)
The Toolbox provides a number of other camera models, including fisheye cameras, catadioptric cameras, and ideal spherical camera. All camera classes inherit from a common base class, so they have a consistent interface for projecting points from the world into the image.
Color space#
Most real-world images are color images, and the Toolbox provides a number of methods for working with color images, including conversion between different color spaces, and plotting colors in a chromaticity space.
Plot the CIE chromaticity space
>>> showcolorspace("xy")
Load the spectrum of sunlight at the Earth’s surface and compute the CIE xy chromaticity coordinates
>>> nm = 1e-9
>>> lam = np.linspace(400, 701, 5) * nm # visible light
>>> sun_at_ground = loadspectrum(lam, 'solar')
>>> xy = lambda2xy(lam, sun_at_ground)
>>> print(xy)
[[0.33272798 0.3454013 ]]