Command line tools#

The Toolbox ships with a number of command line tools that provide convenient access to some of the functionality of the toolbox without needing to write a script.

All tools accept image file names as command line arguments. These can be:

  • the name of a local file. If the file is not found locally, it is searched for in the accompanying image data folder, for example street.png

  • a URL, for example https://petercorke.com/files/images/monalisa.png

MVTB tool#

An interactive IPython session with the MVTB toolbox, NumPy and Matplotlib already imported. Compared to the regular Python REPL it has the advantage of command history, tab completion, and inline help. For example:

$ mvtbtool
_  _ ____ ____ _  _ _ _  _ ____    _  _ _ ____ _ ____ _  _
|\/| |__| |    |__| | |\ | |___    |  | | [__  | |  | |\ |
|  | |  | |___ |  | | | \| |___     \/  | ___] | |__| | \|

___ ____ ____ _    ___  ____ _  _
|  |  | |  | |    |__] |  |  \/
|  |__| |__| |___ |__] |__| _/\_

for Python

You're running: MVTB==0.9.7, SMTB==1.1.13, NumPy==1.26.4, SciPy==1.14.1,
                                Matplotlib==3.10.0, OpenCV==4.10.0, Open3D==0.18.0
 .
 .
 .
>>> im = Image.Read("monalisa.png")
>>> im.disp()
Out[2]: <matplotlib.image.AxesImage at 0x1690e9720>

Images can also be loaded by listing them as command line arguments, either as a filename or a URL:

$ mvtbtool street.png

and the images appear in the IPython session as img which is an instance, or a list of instances, of Image objects, in the order they are listed on the command line. For example:

$ mvtbtool street.png https://petercorke.com/files/images/monalisa.png

A script can be run at startup using the --run option. For example:

myscript.py#
img.disp()

then we can run the script at startup with an image file by:

$ mvtbtool street.png --run=myscript.py

and the result is a display of the image in an interactive Matplotlib window and the IPython session is left open for further experimentation.

IPython has many configuration options and mechanisms including command line arguments, configuration files and startup scripts. mvtbtool’s command line arguments are processed before IPython’s command line options.

$ mvtbtool --help
usage: mvtbtool [-h] [-r RUN] [-B BACKEND] [-t THEME] [-x] [-P PROMPT] [-a]
                [-R RESULTPREFIX] [--reload] [--torch]
                [images ...]

Machine Vision Toolbox shell

positional arguments:
  images                images to load on startup. These appear in the variable img; or img[0], img[1],
                        ... if multiple are specified (default: None)

options:
  -h, --help            show this help message and exit
  -r RUN, --run RUN     script to run at startup, but not displayed. Same as IPython's builtin -i option
                        (default: None)
  -B BACKEND, --backend BACKEND
                        specify BACKEND as the Matplotlib graphics backend (e.g. 'TkAgg', 'Qt5Agg',
                        'WebAgg', etc). By default, the backend is chosen automatically by Matplotlib.
                        (default: None)
  -t THEME, --theme THEME
                        specify terminal color theme (neutral, lightbg, nocolor, linux), linux is for
                        dark mode (default: neutral)
  -x, --confirmexit     confirm exit (default: False)
  -P PROMPT, --prompt PROMPT
                        input prompt string (default: >>> )
  -a, --showassign      automatically display the result of assignments, use ';' to suppress output
                        (default: False)
  -R RESULTPREFIX, --resultprefix RESULTPREFIX
                        execution result prefix, include {} for execution count number (default: None)
  --reload              enable autoreload of any imported modules, same as IPython's builtin %autoreload
                        2 (default: False)
  --torch               import torch and torchvision if installed (default: False)

options can be set via the environment variable MVTB_OPTIONS, for example:

    $ export MVTB_OPTIONS="--backend TkAgg --prompt 'mvtb> ' --reload --torch --showassign"

Image tool#

imtool is a command line tool that opens a window for each of the images specified on the command line. For example:

$ imtool street.png https://petercorke.com/files/images/monalisa.png

Essentially, it is just another image browser, but images are displayed using idisp which has a number of useful features such as the ability to display pixel values on hover, zoom and pan the image.

The pick option allows the user to click on the image and select a series of coordinates. For example:

$ imtool street.png --points

Each selected point is indicated by a red cross and the coordinates of the point are printed to the terminal. Left-click adds a new point, right-click removes the last added point, and Enter means end of picking and the coordinates of the selected points are printed to the terminal. The coordinates are in pixel units, with the origin at the top left corner of the image:

$ imtool street.png --points
   u       v       Δu      Δv      |Δ|
146.6   91.1
302.7   136.2   156.1    45.2    162.5
301.4   645.9   -1.3     509.7   509.7
142.7   682.0   -158.7   36.1    162.8

It is important to select the window (click the title bar) before clicking on the image, otherwise the first click will just select the window and get lost. The user can zoom in using the magnifier button at bottom of the window.

$ imtool --help
usage: imtool [-h] [--block] [--metadata] [--points] [--csv] [--grid]
              [--verbose]
              files [files ...]

Display an image using Machine Vision Toolbox for Python.

positional arguments:
  files           list of image files to view, files can also include those
                  distributed with machinevision toolbox, eg. 'monalisa.png'

options:
  -h, --help      show this help message and exit
  --block, -b     block after each image (default: False)
  --metadata, -m  Print image metadata to stdout (default: False)
  --points, -p    Pick points (default: False)
  --csv, -c       Output picked points as CSV to stdout (default: False)
  --grid, -g      Overlay grid on images (default: False)
  --verbose, -v   Show image details (default: False)

Tag tool#

tagtool is a command line tool that highlights the AR markers (ArUco or AprilTag) in the specified image. For example:

$ tagtool lab-scene.png
tag IDs: 0, 1, 2, 3, 4, 5
Binary image showing bounding boxes and centroids
$ tagtool --help
usage: tagtool [-h] [-b] [-g] [-v] [-d DICT] [-s SIDE] [-f FOCALLENGTH]
               [-p PRINCIPALPOINT] [-a]
               files [files ...]

Display AR tags in image using Machine Vision Toolbox for Python.AR tags are
highlighted with their IDs and the canonic top-left corner is marked.

positional arguments:
  files                 list of image files to view, files can also include
                        those distributed with machinevision toolbox, eg.
                        'lab-scene.png'

options:
  -h, --help            show this help message and exit
  -b, --block           block after each image (default: False)
  -g, --grid            Overlay grid on images (default: False)
  -v, --verbose         Show image details (default: False)
  -d DICT, --dict DICT  Aruco dictionary to use, default is 4x4_50
  -s SIDE, --side SIDE  Tag side length, default is 25
  -f FOCALLENGTH, --focallength FOCALLENGTH
                        Focal length in units of pixels: f | fu,fv (default:
                        None)
  -p PRINCIPALPOINT, --principalpoint PRINCIPALPOINT
                        Principal point coordinate in units of pixels: pu,pv.
                        If not specified use image centre (default: None)
  -a, --axes            Show axes on the image (default: False)

A camera model is required to determine poses, this requires that focal length
is specified.

OCR tool#

ocrtool is a command line tool that performs optical character recognition (OCR) on the specified image. For example:

$ ocrtool penguins.png -l
pytesseract is required for OCR functionality. Install it with: pip install pytesseract or pip install machinevision-toolbox-python[ocr]
Install the tesseract OCR engine from https://github.com/tesseract-ocr/tesseract?tab=readme-ov-file#installing-tesseract
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.13/x64/bin/ocrtool", line 6, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/machinevisiontoolbox/bin/ocrtool.py", line 116, in main
    f"# {file}: {len(words)} words; confidence: {c.min():.1f} - {c.max():.1f}%, mean {c.mean():.1f}%"
                                                 ^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/numpy/_core/_methods.py", line 45, in _amin
    return umr_minimum(a, axis, None, out, keepdims, initial, where)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zero-size array to reduction operation minimum which has no identity

The tool uses Tesseract OCR to identify words in the image and their bounding boxes. The results are printed to the terminal as a table of word, confidence, left, top, right, bottom, width and height. The bounding box is shown in the image if the --view option is specified. For example:

$ ocrtool penguins.png --view
Binary image showing bounding boxes and centroids
$ ocrtool --help
usage: ocrtool [-h] [-L | -D] [-c CONFIDENCE] [-l] [-j FILE] [-v] [-b] [-g]
               files [files ...]

Display text words found in image using Machine Vision Toolbox for Python.
Words are written to stdout or a JSON file, but can also be highlighted in the
image.

positional arguments:
  files                 list of image files to view, files can also include
                        those distributed with machinevision toolbox, eg.
                        '.png'

options:
  -h, --help            show this help message and exit
  -L, --lightbg         Look for light background with dark text (default)
                        (default: False)
  -D, --darkbg          Look for dark background with light text (default:
                        False)
  -c CONFIDENCE, --confidence CONFIDENCE
                        Minimum confidence for OCR text to be displayed (%)
                        (default: 50.0)
  -l, --long            Long listing (include bounding box coordinates and
                        confidence in output) (default: False)
  -j FILE, --json FILE  Output results in JSON format to FILE: word,
                        confidence, LTRB bounding box coordinates, and
                        dimensions (default: None)
  -v, --view            Overlay recognised word boxes on image (default:
                        False)
  -b, --block           block after each image (default: False)
  -g, --grid            Show grid (default: False)

ROS bag tool#

rosbagtool is a command line tool that reads images and point cloudsfrom a ROS bag file and displays them. To scope out what’s in the bag file, the tool can print a table of the topics in the bag file, the message type of each topic, the number of messages on each topic, and whether the topic is allowed to be displayed. For example:

$ rosbagtool race_1.bag
RosBag('bags/race_1.bag')
┌────────────────────────────┬───────────────────────┬───────┬─────────┐
│           topic            │        msgtype        │ count │ allowed │
├────────────────────────────┼───────────────────────┼───────┼─────────┤
│ /camera/fisheye2/image_raw │ sensor_msgs/msg/Image │   855 │    ✓    │
│ /camera/odom/sample        │ nav_msgs/msg/Odometry │  5679 │    ✗    │
│ /camera/imu                │ sensor_msgs/msg/Imu   │  5679 │    ✗    │
└────────────────────────────┴───────────────────────┴───────┴─────────┘

There is a topic with an image message type, and the tool can display it as an animation:

$ rosbagtool --animate race_1.bag

and various keystrokes can pause/resume the animation, and change the playback speed. Alternatively, the tool can display one frame at a time, with keystrokes to jump forward in various step sizes:

$ bagtool --view race_1.bag

The image is displayed using disp and has the ability to display pixel values on hover, zoom and pan the image. The current topic is displayed in the title bar of the window.

If multiple topics contain images, select the one display using the --topic option which specifies a substring that must be present in the topic name:

$ bagtool --view --topic=fisheye2

The tool goes to some effort to convert the ROS image message into the correct data type and color order. NaNs within floating point images are displayed as red.

The tool also supports displaying point clouds (uncolored and colored) if the bag file contains them.

If a bag file is given as a URL it will be downloaded and cached locally in a temporary file. If the --keep option is given it will be saved in the current directory. Some sources of ROS bag files include:

$ bagtool --help
usage: bagtool [-h] [-i | -p] [-t FILTER] [-m FILTER] [-v] [-l] [-b] [-a] [-g]
               [--colororder COLORORDER] [--dtype DTYPE] [-k] [--no-progress]
               files [files ...]

Display images or pointclouds from a ROS bag file using Machine Vision Toolbox
for Python.

positional arguments:
  files                 list of ROS bag files to view.  URLs (http:// or
                        https://) are also supported and will be downloaded
                        before viewing, see --keep option below.

options:
  -h, --help            show this help message and exit
  -i, --image           only display image messages (Image / CompressedImage),
                        same as --msgfilter=Image (default: False)
  -p, --pointcloud      only display point cloud messages (PointCloud2), same
                        as --msgfilter=PointCloud2 (default: False)
  -t FILTER, --topic FILTER
                        Only display messages from topics containing FILTER
                        (default: None)
  -m FILTER, --message FILTER
                        Only display messages of type containing FILTER
                        (default: None)
  -v, --view            Display images in bag file (default: False)
  -l, --list            List topics in bag file (default: False)
  -b, --block           block after each image (default: False)
  -a, --animate         Animate images in bag file (default: False)
  -g, --grid            Overlay grid on images (default: False)
  --colororder COLORORDER
                        Override the default color order for the image
                        messages (default: None)
  --dtype DTYPE         Override the default data type for the image messages
                        (default: None)
  -k, --keep            when a file argument is a URL, save the downloaded bag
                        in the current directory (default: False)
  --no-progress         disable the tqdm progress bar when scanning bag
                        metadata (default: False)