Region features

These methods extract features such as homogenous regions, text and fiducials from the image.

class machinevisiontoolbox.ImageRegionFeatures.ImageRegionFeaturesMixin[source]
MSER(**kwargs)[source]

Find MSER features in image

Parameters

kwargs – arguments passed to opencv.MSER_create

Returns

set of MSER features

Return type

MSERFeature

Find all the maximally stable extremal regions in the image and return an object that represents the MSERs found. The object behaves like a list and can be indexed, sliced and used as an iterator in for loops and comprehensions.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> mser = img.MSER()
>>> len(mser)  # number of features
899
>>> mser[:5].bbox
array([[   1,    4,  145,   95],
       [   1,  184,  182,  274],
       [1243,  179, 1279,  258],
       [1243,  179, 1279,  258],
       [1242,  178, 1279,  258]], dtype=int32)
References
  • Robotics, Vision & Control for Python, Section 12.1.1.2, P. Corke, Springer 2023.

Seealso

MSERFeature, cv2.MSER_create

ocr(minconf=50, plot=False)[source]

Optical character recognition

Parameters
  • minconf (int, optional) – minimum confidence value for text to be returned or plotted (percentage), defaults to 50

  • plot (bool, optional) – overlay detected text on the current plot, assumed to be the image, defaults to False

Returns

detected strings and metadata

Return type

list of OCRWord

Example:


Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'im' is not defined

Each recognized text string is described by an OCRWord instance that contains the string, confidence and bounding box within the image.

Warning

PyTessearct must be installed.

References
  • Robotics, Vision & Control for Python, Section 12.4.1, P. Corke, Springer 2023.

Seealso

OCRWord

fiducial(dict='4x4_1000', K=None, side=None)[source]

Find fiducial markers in image

Parameters
  • dict (str, optional) – marker type, defaults to “4x4_1000”

  • K (ndarray(3,3), optional) – camera intrinsics, defaults to None

  • side (float, optional) – side length of the marker, defaults to None

Returns

markers found in image

Return type

list of Fiducial instances

Find ArUco or ApriTag markers in the scene and return a list of Fiducial objects, one per marker. If camera intrinsics are provided then also compute the marker pose with respect to the camera.

dict specifies the marker family or dictionary and describes the number of bits in the tag and the number of usable unique tags.

dict

tag type

marker size

number of unique tags

4x4_50

Aruco

4x4

50

4x4_100

Aruco

4x4

100

4x4_250

Aruco

4x4

250

4x4_1000

Aruco

4x4

1000

5x5_50

Aruco

5x5

50

5x5_100

Aruco

5x5

100

5x5_250

Aruco

5x5

250

5x5_1000

Aruco

5x5

1000

6x6_50

Aruco

6x6

50

6x6_100

Aruco

6x6

100

6x6_250

Aruco

6x6

250

6x6_1000

Aruco

6x6

1000

7x7_50

Aruco

7x7

50

7x7_100

Aruco

7x7

100

7x7_250

Aruco

7x7

250

7x7_1000

Aruco

7x7

1000

original

Aruco

?

?

16h5

AprilTag

4x4

30

25h9

AprilTag

5x5

35

36h10

AprilTag

6x6

?

36h11

AprilTag

6x6

587

Example:

  File "/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/machinevisiontoolbox/base/data.py", line 174, in mvtb_path_to_datafile
    raise ValueError(f"file {filename} not found locally or in mvtbdata")
ValueError: file images/tags.png not found locally or in mvtbdata
Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'im' is not defined
Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'fiducials' is not defined
Traceback (most recent call last):
  File "<input>", line 1, in <module>
NameError: name 'fiducials' is not defined

Note

side is the dimension of the square that contains the small white squares inside the black background.

References
  • Robotics, Vision & Control for Python, Section 13.6.1, P. Corke, Springer 2023.

Seealso

Fiducial

Region feature classes

class machinevisiontoolbox.ImageRegionFeatures.MSERFeature(image=None, **kwargs)[source]

Find MSERs

Parameters
  • image (Image) – input image

  • kwargs – parameters passed to opencv.MSER_create

Find all the maximally stable extremal regions in the image and return an object that represents the MSERs found. This class behaves like a list and each MSER is an element of the list.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read('shark2.png')
>>> msers = img.MSER()
>>> len(msers)
2
>>> msers[0]
MSER features, 2 regions
>>> msers.bbox
array([[299, 300, 445, 408],
       [ 99, 100, 245, 208]], dtype=int32)
References
  • J. Matas, O. Chum, M. Urban, and T. Pajdla. “Robust wide baseline stereo from maximally stable extremal regions.” Proc. of British Machine Vision Conference, pages 384-396, 2002.

  • Robotics, Vision & Control for Python, Section 12.1.2.2, P. Corke, Springer 2023.

Seealso

bbox points

__len__()[source]

Number of MSER features

Returns

number of features

Return type

int

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> mser = img.MSER()
>>> len(mser)  # number of features
899
Seealso

__getitem__

__getitem__(i)[source]

Get MSERs from MSER feature object

Parameters

i (int or slice) – index

Raises

IndexError – index out of range

Returns

subset of point features

Return type

MSERFeature instance

This method allows a MSERFeature object to be indexed, sliced or iterated.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> mser = img.MSER()
>>> len(mser)  # number of features
899
>>> mser[:5]   # first 5 MSER features
MSER features, 5 regions
>>> mser[::50]  # every 50th MSER feature
MSER features, 18 regions
Seealso

len

__str__()[source]

String representation of MSER

Returns

Brief readable description of MSER

Return type

str

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> msers = img.MSER()
>>> str(msers)
'MSER features, 899 regions'
>>> str(msers[0])
'MSER features, 2 regions'
property points

Points belonging to MSERs

Returns

Coordinates of points in (u,v) format that belong to MSER

Return type

ndarray(2,N), list of ndarray(2,N)

If the object contains just one region the result is an array, otherwise it is a list of arrays (with different numbers of rows).

Example:

>>> from machinevisiontoolbox import Image
>>> import numpy as np
>>> img = Image.Read("castle.png")
>>> msers = img.MSER()
>>> np.printoptions(threshold=10)
<contextlib._GeneratorContextManager object at 0x7fc7722ab9a0>
>>> msers[0].points
array([[ 9, 10, 11, ...,  8,  9, 10],
       [ 5,  5,  5, ...,  5,  4,  4]], dtype=int32)
>>> msers[2:4].points
[array([[1249, 1249, 1249, ..., 1245, 1251, 1246],
       [ 221,  220,  222, ...,  232,  181,  242]], dtype=int32), array([[1249, 1249, 1249, ..., 1250, 1244, 1255],
       [ 221,  220,  222, ...,  181,  203,  257]], dtype=int32)]
Seealso

bbox

property bbox

Bounding boxes of MSERs

Returns

Bounding box of MSER in [umin, vmin, umax, vmax] format

Return type

ndarray(4) or ndarray(N,4)

If the object contains just one region the result is a 1D array, otherwise it is a 2D arrays with one row per bounding box.

Example:

>>> from machinevisiontoolbox import Image
>>> img = Image.Read("castle.png")
>>> msers = img.MSER()
>>> msers[0].bbox
array([  1,   4, 145,  95], dtype=int32)
>>> msers[:4].bbox
array([[   1,    4,  145,   95],
       [   1,  184,  182,  274],
       [1243,  179, 1279,  258],
       [1243,  179, 1279,  258]], dtype=int32)
Seealso

points

class machinevisiontoolbox.ImageRegionFeatures.OCRWord(ocr, i)[source]

OCR word and metadata

Parameters
  • ocr (dict of lists) – dict from Tesseract

  • i (int) – index of word

Returns

OCR data for word

Return type

OCRWord instance

Describes a word detected by OCR including its metadata which is available as a number of properties:

Property

Meaning

text

recognized text

conf

confidence in text recognition (percentage)

l

left coordinate (umin) of rectangle containing the text

t

top coordinate (vmin) of rectangle containing the text

w

height of rectangle containing the text

h

height of rectangle containing the text

ltrb

bounding box [left, top, right, bottom]

Seealso

ocr

__str__()[source]

String representation of MSER

Returns

Brief readable description of OCR word

Return type

str

property l

Left side of word bounding box

Returns

left side coordinate of bounding box in pixels

Return type

int

Seealso

t ltrb

property t

Top side of word bounding box

Returns

top side coordinate of bounding box in pixels

Return type

int

Seealso

l ltrb

property w

Width of word bounding box

Returns

width of bounding box in pixels

Return type

int

Seealso

h ltrb

property h

Height of word bounding box

Returns

height of bounding box in pixels

Return type

int

Seealso

w ltrb

property ltrb

Word bounding box

Returns

bounding box [left top right bottom] in pixels

Return type

list

Seealso

l t w h

property conf

Word confidence

Returns

confidence of word (percentage)

Return type

int

Seealso

text

property text

Word as a string

Returns

word

Return type

str

Seealso

conf

plot()[source]

Plot word and bounding box

Plot a label box around the word in the image, and show the OCR string in the label field.

Seealso

plot_labelbox

class machinevisiontoolbox.ImageRegionFeatures.Fiducial(id, corners, K=None, rvec=None, tvec=None)[source]

Properties of a visual fiducial marker

Parameters
  • id (int) – identity of the marker

  • corners (ndarray(2, 4)) – image plane marker corners

  • K (ndarray(3,3), optional) – camera intrinsics

  • rvec (ndarray(3), optional) – translation of marker with respect to camera, as an Euler vector

  • tvec (ndarray(3), optional) – translation of marker with respect to camera

Seealso

id pose draw fiducial

__str__()[source]

String representation of fiducial

Returns

Brief readable description of fidicual id and pose

Return type

str

property id

Fiducial id

Returns

fiducial marker identity

Return type

int

Returns the built in identity code of the April tag or arUco marker.

property pose

Fiducial pose

Returns

marker pose

Return type

SE3

Returns the pose of the tag with respect to the camera. The x- and y-axes are in the marker plane and the z-axis is out of the marker.

Note

Accurate camera intrinsics and dimension parameters are required for this value to be metric.

draw(image, length=100, thick=2)[source]

Draw marker coordinate frame into image

Parameters
  • image (Image) – image with BGR color order

  • length (int, optional) – axis length in pixels, defaults to 100

  • thick (int, optional) – axis thickness in pixels, defaults to 2

Raises

ValueError – image must have BGR color order

Draws a coordinate frame into the image representing the pose of the marker. The x-, y- and z-axes are drawn as red, green and blue line segments.