BagOfWords#

class BagOfWords(images, k: int = 2000, nstopwords: int = 0, attempts: int = 1, seed: int | None = None)[source][source]#

Bag of words class

Parameters:

images (Image iterable, BaseFeature2D) – a sequence of images or set of image features
k (int, optional) – number of visual words, defaults to 2000
nstopwords (int, optional) – number of stop words, defaults to 50
attempts (int, optional) – number of k-means attempts, defaults to 1

Bag of words is a powerful feature-based method for matching images from widely different viewpoints.

This class creates a bag of words from a sequence of images or a set of point features. In the former case, the features will have an .id equal to the index of the image in the sequence. For the latter case, features must have a valid .id attribute indicating which image in the bag they belong to.

k-means clustering is performed to assign a word label to every feature. The cluster centroids are retained as a \(k \times N\) array .centroids with one row per word centroid and each row is a feature descriptor, 128 elements long in the case of SIFT.

.words is an array of word labels that corresponds to the array of image features .features. The word labels are integers, initially in the range [0, k).

Stop words are those visual words that occur most often and we can remove nstopwords of them. The centroids are reordered so that the last nstopwords rows correspond to the stop words. When a new set of image features is assigned labels from the .centroids any with a label greater that .nstopwords is a stop word and can be discarded.

Reference:

Video Google: a text retrieval approach to object matching in videos J.Sivic and A.Zisserman, in Proc. Ninth IEEE Int. Conf. on Computer Vision, pp.1470-1477, Oct. 2003.
Robotics, Vision & Control for Python, Section 12.4.2,
1. Corke, Springer 2023.

Seealso:

recall BaseFeature2D SIFT cv2.kmeans