Command line tools#
The Toolbox ships with a number of command line tools that provide convenient access to some of the functionality of the toolbox without needing to write a script.
All tools accept image file names as command line arguments. These can be:
the name of a local file. If the file is not found locally, it is searched for in the accompanying image data folder, for example
street.pnga URL, for example
https://petercorke.com/files/images/monalisa.png
MVTB tool#
An interactive IPython session with the MVTB toolbox, NumPy and Matplotlib already imported. Compared to the regular Python REPL it has the advantage of command history, tab completion, and inline help. For example:
$ mvtbtool
_ _ ____ ____ _ _ _ _ _ ____ _ _ _ ____ _ ____ _ _
|\/| |__| | |__| | |\ | |___ | | | [__ | | | |\ |
| | | | |___ | | | | \| |___ \/ | ___] | |__| | \|
___ ____ ____ _ ___ ____ _ _
| | | | | | |__] | | \/
| |__| |__| |___ |__] |__| _/\_
for Python
You're running: MVTB==0.9.7, SMTB==1.1.13, NumPy==1.26.4, SciPy==1.14.1,
Matplotlib==3.10.0, OpenCV==4.10.0, Open3D==0.18.0
.
.
.
>>> im = Image.Read("monalisa.png")
>>> im.disp()
Out[2]: <matplotlib.image.AxesImage at 0x1690e9720>
Images can also be loaded by listing them as command line arguments, either as a filename or a URL:
$ mvtbtool street.png
and the images appear in the IPython session as img which is an instance, or a list
of instances, of Image objects, in the order
they are listed on the command line. For example:
$ mvtbtool street.png https://petercorke.com/files/images/monalisa.png
A script can be run at startup using the --run option. For example:
img.disp()
then we can run the script at startup with an image file by:
$ mvtbtool street.png --run=myscript.py
and the result is a display of the image in an interactive Matplotlib window and the IPython session is left open for further experimentation.
IPython has many configuration options and mechanisms including command line arguments,
configuration files and startup scripts. mvtbtool’s command line arguments are processed before IPython’s command line options.
$ mvtbtool --help
usage: mvtbtool [-h] [-r RUN] [-B BACKEND] [-t THEME] [-x] [-P PROMPT] [-a]
[-R RESULTPREFIX] [--reload] [--torch]
[images ...]
Machine Vision Toolbox shell
positional arguments:
images images to load on startup. These appear in the variable img; or img[0], img[1],
... if multiple are specified (default: None)
options:
-h, --help show this help message and exit
-r RUN, --run RUN script to run at startup, but not displayed. Same as IPython's builtin -i option
(default: None)
-B BACKEND, --backend BACKEND
specify BACKEND as the Matplotlib graphics backend (e.g. 'TkAgg', 'Qt5Agg',
'WebAgg', etc). By default, the backend is chosen automatically by Matplotlib.
(default: None)
-t THEME, --theme THEME
specify terminal color theme (neutral, lightbg, nocolor, linux), linux is for
dark mode (default: neutral)
-x, --confirmexit confirm exit (default: False)
-P PROMPT, --prompt PROMPT
input prompt string (default: >>> )
-a, --showassign automatically display the result of assignments, use ';' to suppress output
(default: False)
-R RESULTPREFIX, --resultprefix RESULTPREFIX
execution result prefix, include {} for execution count number (default: None)
--reload enable autoreload of any imported modules, same as IPython's builtin %autoreload
2 (default: False)
--torch import torch and torchvision if installed (default: False)
options can be set via the environment variable MVTB_OPTIONS, for example:
$ export MVTB_OPTIONS="--backend TkAgg --prompt 'mvtb> ' --reload --torch --showassign"
Image tool#
imtool is a command line tool that opens a window for each of the images specified
on the command line. For example:
$ imtool street.png https://petercorke.com/files/images/monalisa.png
Essentially, it is just another image browser, but images are displayed using idisp
which has a number of useful features such as the ability to display pixel values on
hover, zoom and pan the image.
The pick option allows the user to click on the image and select a series of coordinates. For example:
$ imtool street.png --points
Each selected point is indicated by a red cross and the coordinates of the point are printed to the terminal. Left-click adds a new point, right-click removes the last added point, and Enter means end of picking and the coordinates of the selected points are printed to the terminal. The coordinates are in pixel units, with the origin at the top left corner of the image:
$ imtool street.png --points
u v Δu Δv |Δ|
146.6 91.1
302.7 136.2 156.1 45.2 162.5
301.4 645.9 -1.3 509.7 509.7
142.7 682.0 -158.7 36.1 162.8
It is important to select the window (click the title bar) before clicking on the image, otherwise the first click will just select the window and get lost. The user can zoom in using the magnifier button at bottom of the window.
$ imtool --help
usage: imtool [-h] [--block] [--metadata] [--points] [--csv] [--grid]
[--verbose]
files [files ...]
Display an image using Machine Vision Toolbox for Python.
positional arguments:
files list of image files to view, files can also include those
distributed with machinevision toolbox, eg. 'monalisa.png'
options:
-h, --help show this help message and exit
--block, -b block after each image (default: False)
--metadata, -m Print image metadata to stdout (default: False)
--points, -p Pick points (default: False)
--csv, -c Output picked points as CSV to stdout (default: False)
--grid, -g Overlay grid on images (default: False)
--verbose, -v Show image details (default: False)
Tag tool#
tagtool is a command line tool that highlights the AR markers (ArUco or AprilTag) in
the specified image. For example:
$ tagtool lab-scene.png
tag IDs: 0, 1, 2, 3, 4, 5
$ tagtool --help
usage: tagtool [-h] [-b] [-g] [-v] [-d DICT] [-s SIDE] [-f FOCALLENGTH]
[-p PRINCIPALPOINT] [-a]
files [files ...]
Display AR tags in image using Machine Vision Toolbox for Python.AR tags are
highlighted with their IDs and the canonic top-left corner is marked.
positional arguments:
files list of image files to view, files can also include
those distributed with machinevision toolbox, eg.
'lab-scene.png'
options:
-h, --help show this help message and exit
-b, --block block after each image (default: False)
-g, --grid Overlay grid on images (default: False)
-v, --verbose Show image details (default: False)
-d DICT, --dict DICT Aruco dictionary to use, default is 4x4_50
-s SIDE, --side SIDE Tag side length, default is 25
-f FOCALLENGTH, --focallength FOCALLENGTH
Focal length in units of pixels: f | fu,fv (default:
None)
-p PRINCIPALPOINT, --principalpoint PRINCIPALPOINT
Principal point coordinate in units of pixels: pu,pv.
If not specified use image centre (default: None)
-a, --axes Show axes on the image (default: False)
A camera model is required to determine poses, this requires that focal length
is specified.
OCR tool#
ocrtool is a command line tool that performs optical character recognition (OCR) on the specified image. For example:
$ ocrtool penguins.png -l
pytesseract is required for OCR functionality. Install it with: pip install pytesseract or pip install machinevision-toolbox-python[ocr]
Install the tesseract OCR engine from https://github.com/tesseract-ocr/tesseract?tab=readme-ov-file#installing-tesseract
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.12.13/x64/bin/ocrtool", line 6, in <module>
sys.exit(main())
^^^^^^
File "/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/machinevisiontoolbox/bin/ocrtool.py", line 116, in main
f"# {file}: {len(words)} words; confidence: {c.min():.1f} - {c.max():.1f}%, mean {c.mean():.1f}%"
^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/numpy/_core/_methods.py", line 45, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial, where)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zero-size array to reduction operation minimum which has no identity
The tool uses Tesseract OCR to identify words in the image and their bounding boxes. The results are printed to the terminal as a table of word, confidence, left, top, right, bottom, width and height. The bounding box is shown in the image if the --view option is specified. For example:
$ ocrtool penguins.png --view
$ ocrtool --help
usage: ocrtool [-h] [-L | -D] [-c CONFIDENCE] [-l] [-j FILE] [-v] [-b] [-g]
files [files ...]
Display text words found in image using Machine Vision Toolbox for Python.
Words are written to stdout or a JSON file, but can also be highlighted in the
image.
positional arguments:
files list of image files to view, files can also include
those distributed with machinevision toolbox, eg.
'.png'
options:
-h, --help show this help message and exit
-L, --lightbg Look for light background with dark text (default)
(default: False)
-D, --darkbg Look for dark background with light text (default:
False)
-c CONFIDENCE, --confidence CONFIDENCE
Minimum confidence for OCR text to be displayed (%)
(default: 50.0)
-l, --long Long listing (include bounding box coordinates and
confidence in output) (default: False)
-j FILE, --json FILE Output results in JSON format to FILE: word,
confidence, LTRB bounding box coordinates, and
dimensions (default: None)
-v, --view Overlay recognised word boxes on image (default:
False)
-b, --block block after each image (default: False)
-g, --grid Show grid (default: False)
ROS bag tool#
rosbagtool is a command line tool that reads images and point cloudsfrom a ROS bag file and displays
them. To scope out what’s in the bag file, the tool can print a table of the topics in
the bag file, the message type of each topic, the number of messages on each topic, and
whether the topic is allowed to be displayed. For example:
$ rosbagtool race_1.bag
RosBag('bags/race_1.bag')
┌────────────────────────────┬───────────────────────┬───────┬─────────┐
│ topic │ msgtype │ count │ allowed │
├────────────────────────────┼───────────────────────┼───────┼─────────┤
│ /camera/fisheye2/image_raw │ sensor_msgs/msg/Image │ 855 │ ✓ │
│ /camera/odom/sample │ nav_msgs/msg/Odometry │ 5679 │ ✗ │
│ /camera/imu │ sensor_msgs/msg/Imu │ 5679 │ ✗ │
└────────────────────────────┴───────────────────────┴───────┴─────────┘
There is a topic with an image message type, and the tool can display it as an animation:
$ rosbagtool --animate race_1.bag
and various keystrokes can pause/resume the animation, and change the playback speed. Alternatively, the tool can display one frame at a time, with keystrokes to jump forward in various step sizes:
$ bagtool --view race_1.bag
The image is displayed using disp and has the ability to display pixel values on hover, zoom and pan the image. The current
topic is displayed in the title bar of the window.
If multiple topics contain images, select the one display using the --topic option which specifies a substring that must be present in the topic name:
$ bagtool --view --topic=fisheye2
The tool goes to some effort to convert the ROS image message into the correct data type and color order. NaNs within floating point images are displayed as red.
The tool also supports displaying point clouds (uncolored and colored) if the bag file contains them.
If a bag file is given as a URL it will be downloaded and cached locally in a temporary file. If
the --keep option is given it will be saved in the current directory. Some sources of ROS bag files
include:
CSIRO Forest Dataset, e.g. forestI.bag
$ bagtool --help
usage: bagtool [-h] [-i | -p] [-t FILTER] [-m FILTER] [-v] [-l] [-b] [-a] [-g]
[--colororder COLORORDER] [--dtype DTYPE] [-k] [--no-progress]
files [files ...]
Display images or pointclouds from a ROS bag file using Machine Vision Toolbox
for Python.
positional arguments:
files list of ROS bag files to view. URLs (http:// or
https://) are also supported and will be downloaded
before viewing, see --keep option below.
options:
-h, --help show this help message and exit
-i, --image only display image messages (Image / CompressedImage),
same as --msgfilter=Image (default: False)
-p, --pointcloud only display point cloud messages (PointCloud2), same
as --msgfilter=PointCloud2 (default: False)
-t FILTER, --topic FILTER
Only display messages from topics containing FILTER
(default: None)
-m FILTER, --message FILTER
Only display messages of type containing FILTER
(default: None)
-v, --view Display images in bag file (default: False)
-l, --list List topics in bag file (default: False)
-b, --block block after each image (default: False)
-a, --animate Animate images in bag file (default: False)
-g, --grid Overlay grid on images (default: False)
--colororder COLORORDER
Override the default color order for the image
messages (default: None)
--dtype DTYPE Override the default data type for the image messages
(default: None)
-k, --keep when a file argument is a URL, save the downloaded bag
in the current directory (default: False)
--no-progress disable the tqdm progress bar when scanning bag
metadata (default: False)