Bag of words#

Simple bag of words image mathing class

class BagOfWords(images, k: int = 2000, nstopwords: int = 0, attempts: int = 1, seed: int | None = None)[source]#

Bag of words class

Parameters:
  • images (Image iterable, BaseFeature2D) – a sequence of images or set of image features

  • k (int, optional) – number of visual words, defaults to 2000

  • nstopwords (int, optional) – number of stop words, defaults to 50

  • attempts (int, optional) – number of k-means attempts, defaults to 1

Bag of words is a powerful feature-based method for matching images from widely different viewpoints.

This class creates a bag of words from a sequence of images or a set of point features. In the former case, the features will have an .id equal to the index of the image in the sequence. For the latter case, features must have a valid .id attribute indicating which image in the bag they belong to.

k-means clustering is performed to assign a word label to every feature. The cluster centroids are retained as a \(k \times N\) array .centroids with one row per word centroid and each row is a feature descriptor, 128 elements long in the case of SIFT.

.words is an array of word labels that corresponds to the array of image features .features. The word labels are integers, initially in the range [0, k).

Stop words are those visual words that occur most often and we can remove nstopwords of them. The centroids are reordered so that the last nstopwords rows correspond to the stop words. When a new set of image features is assigned labels from the .centroids any with a label greater that .nstopwords is a stop word and can be discarded.

Reference:
  • Video Google: a text retrieval approach to object matching in videos J.Sivic and A.Zisserman, in Proc. Ninth IEEE Int. Conf. on Computer Vision, pp.1470-1477, Oct. 2003.

  • Robotics, Vision & Control for Python, Section 12.4.2,
    1. Corke, Springer 2023.

Seealso:

recall BaseFeature2D SIFT cv2.kmeans

wwfv(i: int | None = None) ndarray[source]#

Weighted word frequency vector for image

Parameters:

i (int, optional) – image within bag, defaults to all images

Returns:

word frequency vector or vectors

Return type:

ndarray(K), ndarray(N,K)

This is the word-frequency vector for the i’th image in the bag. The angle between any two WFVs is an indication of image similarity.

If i is None then the word-frequency matrix is returned, where the columns are the word-frequency vectors for the images in the bag.

Note

The word vector is expensive to compute so a lazy evaluation is performed on the first call to this method.

property nimages: int#

Number of images associated in the bag

Returns:

number of images

Return type:

int

property images: Any#

Images associated with this bag

Returns:

images associated with this bag

Return type:

Image iterable

Note

Only valid if the bag was constructed from images rather than features.

property k: int#

Number of words in the visual vocabulary

Returns:

number of words

Return type:

int

Seealso:

nstopwords

property words: ndarray#

Word labels for every feature

Returns:

word labels

Return type:

ndarray(N)

Word labels are arranged such that the top nstopwords labels are stop words.

Seealso:

nstopwords

word(f: int) int[source]#

Word labels for original feature

Returns:

word labels

Return type:

ndarray(N)

Word labels are arranged such that the top nstopwords labels

property nwords: int#

Number of usable words

Returns:

number of usable words

Return type:

int

This is k - nstopwords.

Seealso:

k nstopwords

property nstopwords: int#

Number of stop words

Returns:

Number of stop words

Return type:

int

Seealso:

k nwords

property firststop: int#

First stop word

Returns:

word index of first stop word

Return type:

int

property centroids: ndarray#

Word feature centroids

Returns:

centroids of visual word features

Return type:

ndarray(k,N)

Is an array with one row per visual word, and the row is the feature descriptor vector. eg. for SIFT features it is 128 elements.

Centroids are arranged such that the last nstopwords rows correspond to the stop words. After clustering against the centroids, any word with a label >= nstopwords is a stop word.

Note

The stop words are kept in the centroid array for the recall process.

Seealso:

similarity

similarity(query) ndarray[source]#

Compute similarity between bag of words and query

Parameters:

query (BagOfWords or ndarray) – bag of words or image features

Returns:

similarity matrix

Return type:

ndarray(M,N)

The array has rows corresponding to the images in self and columns corresponding to the queries in query.

query can be:

  • a single image, a list of images, or an Image iterator (like VideoFile, ZipArchive etc.) for which the visual words are computed using the same dictionary of visual words as the bag.

  • a set of image features, in which case the similarity is computed between the bag and the query features, or

Seealso:

closest

features(word: int) BaseFeature2D[source]#

Get features corresponding to word

Parameters:

word (int) – visual word label

Returns:

features corresponding to this label

Return type:

BaseFeature2D

Return a slice of the image features corresponding to this word label. The .id attribute of each feature indicates which image in the bag it belongs to.

occurrence(word: int) int[source]#

Number of occurrences of specified word

Parameters:

word (int) – visual word label

Returns:

total number of times that visual word appears in this bag

Return type:

int

wordfreq() tuple[ndarray, ndarray][source]#

Get visual word frequency

Returns:

visual words, visual word frequency

Return type:

ndarray, ndarray

Returns two arrays, one containing all visual words, the other containing the frequency of the corresponding word across all images.

closest(S: ndarray, i: int) tuple[ndarray, ndarray][source]#

Find closest image

Parameters:
  • S (ndarray(N,M)) – bag similarity matrix

  • i (int) – the query image index

Returns:

index of the recalled image and similarity

Return type:

int, float

Seealso:

similarity

contains(word: int) ndarray[source]#

Images that contain specified word

Parameters:

word (int) – visual word label

Returns:

list of images containing this word

Return type:

list

Seealso:

exemplars

exemplars(word: int, images=None, maxperimage: int = 2, columns: int = 10, max: int | None = None, width: int = 50, **kwargs)[source]#

Composite image containing exemplars of specified word

Parameters:
  • word (int) – visual word label

  • images – the set of images corresponding to this bag, only required if the bag was constructed from features not images.

  • maxperimage (int, optional) – maximum number of exemplars drawn from any one image, defaults to 2

  • columns (int, optional) – number of exemplar images in each row, defaults to 10

  • max (int, optional) – maximum number of exemplar images, defaults to None

  • width (int, optional) – width of image thumbnail, defaults to 50

Returns:

composite image

Return type:

Image

Produces a grid of examples of a particular visual word.

Seealso:

contains support Tile