PyTorch tensor interface ======================== .. code-block:: python import machinevisiontoolbox print(machinevisiontoolbox.__version__) The toolbox provides interfaces to PyTorch -- an important machine learning framework. The fundamental datatype in PyTorch is the tensor which is a multidimensional array, with shape (N, H, W, C) where N is the batch size, H and W are the image dimensions and C is the number of channels. For a single image the batch size is 1, so the shape is (1, H, W, C) or sometimes in its _squeezed_ form (H, W, C). Image → tensor -------------- Use the :class:`Image` :meth:`tensor` method to convert an :class:`Image` to a tensor. For example: .. runblock:: pycon >>> from machinevisiontoolbox import Image >>> im = Image.Read("eiffel-1.png") >>> tensor = im.tensor() >>> print(tensor.shape) tensor → Image -------------- Use the :class:`Image` constructor :meth:`Tensor` method to create an :class:`Image` from a tensor. For example: .. runblock:: pycon >>> from machinevisiontoolbox import Image >>> from torch import rand >>> tensor = rand(3, 480, 640) # random 3-channel image >>> img = Image.Tensor(tensor) >>> print(img) An exception is thrown if the tensor has a batch dimension greater than 1. Image source → batch tensor --------------------------- An image source (an instance of a class that yields images) represents a set of images and can be converted to a batch tensor where N>1. All sources have a :meth:`tensor` method that creates a batch tensor containing all the images in the source. For example, a video file is a set of images and a tensor can be created that contains all its frames: .. code-block:: python from machinevisiontoolbox import VideoFile with VideoFile("traffic_sequence.mp4") as video: tensor = video.tensor() print(tensor.shape) Note the use of the context manager to ensure that the video file is properly closed after reading. The resulting tensor has shape (N, H, W, C) where N is the number of frames in the video. batch tensor → Image iterator ----------------------------- A batch tensor can be converted to a set of images. This is done using an :class:`Image` iterator: .. code-block:: python from machinevisiontoolbox import TensorStack from torch import rand tensor = rand(16, 480, 640, 3) # random batch of 16 RGB images for img in TensorStack(tensor): img.disp(fps=4) This particular example could be achieved a little more concisely by using the :meth:`ImageSource.disp` method inherited by all image sources, including :class:`TensorStack`. A single line of code iterates over a tensor and displays the frames as a video: .. code-block:: python from machinevisiontoolbox import TensorStack from torch import rand tensor = rand(16, 480, 640, 3) # random batch of 16 RGB images TensorStack(tensor).disp(fps=4)