Region features

These methods extract features such as homogenous regions, text and fiducials from the image.

class machinevisiontoolbox.ImageRegionFeatures.ImageRegionFeaturesMixin[source]

Find MSER features in image

Parameters: kwargs – arguments passed to opencv.MSER_create
Returns: set of MSER features
Return type: MSERFeature

Find all the maximally stable extremal regions in the image and return an object that represents the MSERs found. The object behaves like a list and can be indexed, sliced and used as an iterator in for loops and comprehensions.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> mser = img.MSER()
>>> len(mser)  # number of features
899
>>> mser[:5].bbox
array([[   1,    4,  145,   95],
       [   1,  184,  182,  274],
       [1243,  179, 1279,  258],
       [1243,  179, 1279,  258],
       [1242,  178, 1279,  258]], dtype=int32)

References

Robotics, Vision & Control for Python, Section 12.1.1.2, P. Corke, Springer 2023.

Seealso

MSERFeature, cv2.MSER_create

ocr(minconf=50, plot=False)[source]

Optical character recognition

Parameters

minconf (int, optional) – minimum confidence value for text to be returned or plotted (percentage), defaults to 50
plot (bool, optional) – overlay detected text on the current plot, assumed to be the image, defaults to False

Returns

detected strings and metadata

Return type

list of OCRWord

Example:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'im' is not defined

Each recognized text string is described by an OCRWord instance that contains the string, confidence and bounding box within the image.

Warning

PyTessearct must be installed.

References

Robotics, Vision & Control for Python, Section 12.4.1, P. Corke, Springer 2023.

Seealso

OCRWord

fiducial(dict='4x4_1000', K=None, side=None)[source]

Find fiducial markers in image

Parameters

dict (str, optional) – marker type, defaults to “4x4_1000”
K (ndarray(3,3), optional) – camera intrinsics, defaults to None
side (float, optional) – side length of the marker, defaults to None

Returns

markers found in image

Return type

list of Fiducial instances

Find ArUco or ApriTag markers in the scene and return a list of Fiducial objects, one per marker. If camera intrinsics are provided then also compute the marker pose with respect to the camera.

dict specifies the marker family or dictionary and describes the number of bits in the tag and the number of usable unique tags.

dict	tag type	marker size	number of unique tags
`4x4_50`	Aruco	4x4	50
`4x4_100`	Aruco	4x4	100
`4x4_250`	Aruco	4x4	250
`4x4_1000`	Aruco	4x4	1000
`5x5_50`	Aruco	5x5	50
`5x5_100`	Aruco	5x5	100
`5x5_250`	Aruco	5x5	250
`5x5_1000`	Aruco	5x5	1000
`6x6_50`	Aruco	6x6	50
`6x6_100`	Aruco	6x6	100
`6x6_250`	Aruco	6x6	250
`6x6_1000`	Aruco	6x6	1000
`7x7_50`	Aruco	7x7	50
`7x7_100`	Aruco	7x7	100
`7x7_250`	Aruco	7x7	250
`7x7_1000`	Aruco	7x7	1000
`original`	Aruco	?	?
`16h5`	AprilTag	4x4	30
`25h9`	AprilTag	5x5	35
`36h10`	AprilTag	6x6	?
`36h11`	AprilTag	6x6	587

Example:

  File "/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/machinevisiontoolbox/base/data.py", line 174, in mvtb_path_to_datafile
    raise ValueError(f"file {filename} not found locally or in mvtbdata")
ValueError: file images/tags.png not found locally or in mvtbdata
Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'im' is not defined
Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'fiducials' is not defined
Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'fiducials' is not defined

Note

side is the dimension of the square that contains the small white squares inside the black background.

References

Robotics, Vision & Control for Python, Section 13.6.1, P. Corke, Springer 2023.

Seealso

Fiducial

Region feature classes

class machinevisiontoolbox.ImageRegionFeatures.MSERFeature(image=None, **kwargs)[source]

Find MSERs

Parameters

image (Image) – input image
kwargs – parameters passed to opencv.MSER_create

Find all the maximally stable extremal regions in the image and return an object that represents the MSERs found. This class behaves like a list and each MSER is an element of the list.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read('shark2.png')
>>> msers = img.MSER()
>>> len(msers)
2
>>> msers[0]
MSER features, 2 regions
>>> msers.bbox
array([[299, 300, 445, 408],
       [ 99, 100, 245, 208]], dtype=int32)

References

J. Matas, O. Chum, M. Urban, and T. Pajdla. “Robust wide baseline stereo from maximally stable extremal regions.” Proc. of British Machine Vision Conference, pages 384-396, 2002.
Robotics, Vision & Control for Python, Section 12.1.2.2, P. Corke, Springer 2023.

Seealso

bbox points

__len__()[source]

Number of MSER features

Returns: number of features
Return type: int

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> mser = img.MSER()
>>> len(mser)  # number of features
899

Seealso: __getitem__

__getitem__(i)[source]

Get MSERs from MSER feature object

Parameters: i (int or slice) – index
Raises: IndexError – index out of range
Returns: subset of point features
Return type: MSERFeature instance

This method allows a MSERFeature object to be indexed, sliced or iterated.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> mser = img.MSER()
>>> len(mser)  # number of features
899
>>> mser[:5]   # first 5 MSER features
MSER features, 5 regions
>>> mser[::50]  # every 50th MSER feature
MSER features, 18 regions

Seealso: len

__str__()[source]

String representation of MSER

Returns: Brief readable description of MSER
Return type: str

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> msers = img.MSER()
>>> str(msers)
'MSER features, 899 regions'
>>> str(msers[0])
'MSER features, 2 regions'

property points

Points belonging to MSERs

Returns: Coordinates of points in (u,v) format that belong to MSER
Return type: ndarray(2,N), list of ndarray(2,N)

If the object contains just one region the result is an array, otherwise it is a list of arrays (with different numbers of rows).

Example:

>>> from machinevisiontoolbox import Image
>>> import numpy as np
>>> img = Image.Read("castle.png")
>>> msers = img.MSER()
>>> np.printoptions(threshold=10)
<contextlib._GeneratorContextManager object at 0x7fc7722ab9a0>
>>> msers[0].points
array([[ 9, 10, 11, ...,  8,  9, 10],
       [ 5,  5,  5, ...,  5,  4,  4]], dtype=int32)
>>> msers[2:4].points
[array([[1249, 1249, 1249, ..., 1245, 1251, 1246],
       [ 221,  220,  222, ...,  232,  181,  242]], dtype=int32), array([[1249, 1249, 1249, ..., 1250, 1244, 1255],
       [ 221,  220,  222, ...,  181,  203,  257]], dtype=int32)]

Seealso: bbox

property bbox

Bounding boxes of MSERs

Returns: Bounding box of MSER in [umin, vmin, umax, vmax] format
Return type: ndarray(4) or ndarray(N,4)

If the object contains just one region the result is a 1D array, otherwise it is a 2D arrays with one row per bounding box.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> msers = img.MSER()
>>> msers[0].bbox
array([  1,   4, 145,  95], dtype=int32)
>>> msers[:4].bbox
array([[   1,    4,  145,   95],
       [   1,  184,  182,  274],
       [1243,  179, 1279,  258],
       [1243,  179, 1279,  258]], dtype=int32)

Seealso: points

class machinevisiontoolbox.ImageRegionFeatures.OCRWord(ocr, i)[source]

OCR word and metadata

Parameters

ocr (dict of lists) – dict from Tesseract
i (int) – index of word

Returns

OCR data for word

Return type

OCRWord instance

Describes a word detected by OCR including its metadata which is available as a number of properties:

Property	Meaning
`text`	recognized text
`conf`	confidence in text recognition (percentage)
`l`	left coordinate (umin) of rectangle containing the text
`t`	top coordinate (vmin) of rectangle containing the text
`w`	height of rectangle containing the text
`h`	height of rectangle containing the text
`ltrb`	bounding box [left, top, right, bottom]

Seealso: ocr

__str__()[source]

String representation of MSER

Returns: Brief readable description of OCR word
Return type: str

property l

Left side of word bounding box

Returns: left side coordinate of bounding box in pixels
Return type: int
Seealso: t ltrb

property t

Top side of word bounding box

Returns: top side coordinate of bounding box in pixels
Return type: int
Seealso: l ltrb

property w

Width of word bounding box

Returns: width of bounding box in pixels
Return type: int
Seealso: h ltrb

property h

Height of word bounding box

Returns: height of bounding box in pixels
Return type: int
Seealso: w ltrb

property ltrb

Word bounding box

Returns: bounding box [left top right bottom] in pixels
Return type: list
Seealso: l t w h

property conf

Word confidence

Returns: confidence of word (percentage)
Return type: int
Seealso: text

property text

Word as a string

Returns: word
Return type: str
Seealso: conf

plot()[source]

Plot word and bounding box

Plot a label box around the word in the image, and show the OCR string in the label field.

Seealso: plot_labelbox

class machinevisiontoolbox.ImageRegionFeatures.Fiducial(id, corners, K=None, rvec=None, tvec=None)[source]

Properties of a visual fiducial marker

Parameters

id (int) – identity of the marker
corners (ndarray(2, 4)) – image plane marker corners
K (ndarray(3,3), optional) – camera intrinsics
rvec (ndarray(3), optional) – translation of marker with respect to camera, as an Euler vector
tvec (ndarray(3), optional) – translation of marker with respect to camera

Seealso

id pose draw fiducial

__str__()[source]

String representation of fiducial

Returns: Brief readable description of fidicual id and pose
Return type: str

property id

Fiducial id

Returns: fiducial marker identity
Return type: int

Returns the built in identity code of the April tag or arUco marker.

property pose

Fiducial pose

Returns: marker pose
Return type: SE3

Returns the pose of the tag with respect to the camera. The x- and y-axes are in the marker plane and the z-axis is out of the marker.

Note

Accurate camera intrinsics and dimension parameters are required for this value to be metric.

draw(image, length=100, thick=2)[source]

Draw marker coordinate frame into image

Parameters

image (Image) – image with BGR color order
length (int, optional) – axis length in pixels, defaults to 100
thick (int, optional) – axis thickness in pixels, defaults to 2

Raises

ValueError – image must have BGR color order

Draws a coordinate frame into the image representing the pose of the marker. The x-, y- and z-axes are drawn as red, green and blue line segments.