Documentation - C API

aib.h File Reference

Agglomerative Information Bottleneck (AIB) More...

#include "generic.h"
#include "mathop.h"

Data Structures

struct  VlAIB
 AIB algorithm data. More...

Functions

Create and destroy
VlAIBvl_aib_new (double *Pcx, vl_uint nvalues, vl_uint nlabels)
 Allocates and initializes the internal data structure.
void vl_aib_delete (VlAIB *aib)
 Deletes AIB data structure.
Process data
void vl_aib_process (VlAIB *aib)
 Runs AIB on Pcx.
Retrieve results
vl_uintvl_aib_get_parents (VlAIB const *aib)
 Get resulting list of parents.
double * vl_aib_get_costs (VlAIB const *aib)
 Get a list of merge costs.

Detailed Description

This provides an implementation of Agglomerative Information Bottleneck (AIB) as first described in:

[Slonim] N. Slonim and N. Tishby. Agglomerative information bottleneck. In Proc. NIPS, 1999

AIB takes a discrete valued feature $x$ and a label $c$ and gradually compresses $x$ by iteratively merging values which minimize the loss in mutual information $I(x,c)$.

While the algorithm is equivalent to the one described in [Slonim], it has some speedups that enable handling much larger datasets. Let N be the number of feature values and C the number of labels. The algorithm of [Slonim] is $O(N^2)$ in space and $O(C N^3)$ in time. This algorithm is $O(N)$ space and $O(C N^2)$ time in common cases ( $O(C N^3)$ in the worst case).

Overview

Given a discrete feature $x \in \mathcal{X} = \{x_1,\dots,x_N\}$ and a category label $c = 1,\dots,C$ with joint probability $p(x,c)$, AIB computes a compressed feature $[x]_{ij}$ by merging two values $x_i$ and $x_j$. Among all the pairs $ij$, AIB chooses the one that yields the smallest loss in the mutual information

\[ D_{ij} = I(x,c) - I([x]_{ij},c) = \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} + \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} - \sum_c (p(x_i)+p(x_j)) \log \frac {p(x_i,c)+p(x_i,c)}{(p(x_i)+p(x_j))p(c)} \]

AIB iterates this procedure until the desired level of compression is achieved.

Algorithm details

Computing $D_{ij}$ requires $O(C)$ operations. For example, in standard AIB we need to calculate

\[ D_{ij} = I(x,c) - I([x]_{ij},c) = \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} + \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} - \sum_c (p(x_i)+p(x_j)) \log \frac {p(x_i,c)+p(x_i,c)}{(p(x_i)+p(x_j))p(c)} \]

Thus in a basic implementation of AIB, finding the optimal pair $ij$ of feature values requires $O(CN^2)$ operations in total. In order to join all the $N$ values, we repeat this procedure $O(N)$ times, yielding $O(N^3 C)$ time and $O(1)$ space complexity (this does not account for the space need to store the input).

The complexity can be improved by reusing computations. For instance, we can store the matrix $D = [ D_{ij} ]$ (which requires $O(N^2)$ space). Then, after joining $ij$, all of the matrix D except the rows and columns (the matrix is symmetric) of indexes i and j is unchanged. These two rows and columns are deleted and a new row and column, whose computation requires $O(NC)$ operations, are added for the merged value $x_{ij}$. Finding the minimal element of the matrix still requires $O(N^2)$ operations, so the complexity of this algorithm is $O(N^2C + N^3)$ time and $O(N^2)$ space.

We can obtain a much better expected complexity as follows. First, instead of storing the whole matrix D, we store the smallest element (index and value) of each row as $(q_i, D_i)$ (notice that this is also the best element of each column since D is symmetric). This requires $O(N)$ space and finding the minimal element of the matrix requires $O(N)$ operations. After joining $ij$, we have to efficiently update this representation. This is done as follows:

  • The entries $(q_i,D_i)$ and $(q_j,D_j)$ are deleted.
  • A new entry $(q_{ij},D_{ij})$ for the joint value $x_{ij}$ is added. This requires $O(CN)$ operations.
  • We test which other entries $(q_{k},D_{k})$ need to be updated. Recall that $(q_{k},D_{k})$ means that, before the merge, the value closest to $x_k$ was $x_{q_k}$ at a distance $D_k$. Then
    • If $q_k \not = i$, $q_k \not = j$ and $D_{k,ij} \geq D_k$, then $q_k$ is still the closest element and we do not do anything.
    • If $q_k \not = i$, $q_k \not = j$ and $D_{k,ij} < D_k$, then the closest element is $ij$ and we update the entry in constant time.
    • If $q_k = i$ or $q_k = j$, then we need to re-compute the closest element in $O(CN)$ operations.

This algorithm requires only $O(N)$ space and $O(\gamma(N) C N^2)$ time, where $\gamma(N)$ is the expected number of times we fall in the last case. In common cases one has $\gamma(N) \approx \mathrm{const.}$, so the time saving is significant.

Author:
Brian Fulkerson
Andrea Vedaldi

Function Documentation

void vl_aib_delete ( VlAIB aib )
Parameters:
aibdata structure to delete.
double * vl_aib_get_costs ( VlAIB const *  aib ) [inline]
Parameters:
aibAIB filter.
Returns:
An array of costs
vl_uint * vl_aib_get_parents ( VlAIB const *  aib ) [inline]
Parameters:
aibAIB filter.
Returns:
An array of parents
VlAIB* vl_aib_new ( double *  Pcx,
vl_uint  nvalues,
vl_uint  nlabels 
)
Parameters:
PcxA pointer to a 2D array of probabilities
nvaluesThe number of rows in the array
nlabelsThe number of columns in the array

Creates a new VlAIB struct containing pointers to all the data that will be used during the AIB process.

Allocates memory for the following:

  • Px (nvalues*sizeof(double))
  • Pc (nlabels*sizeof(double))
  • nodelist (nvalues*sizeof(vl_uint))
  • which (nvalues*sizeof(vl_uint))
  • beta (nvalues*sizeof(double))
  • bidx (nvalues*sizeof(vl_uint))
  • parents ((2*nvalues-1)*sizeof(vl_uint))
  • costs (nvalues*sizeof(double))

Since it simply copies to pointer to Pcx, the total additional memory requirement is:

(3*nvalues+nlabels)*sizeof(double) + 4*nvalues*sizeof(vl_uint)

Returns:
An allocated and initialized VlAIB pointer
void vl_aib_process ( VlAIB aib )
Parameters:
aibAIB object to process

The function runs Agglomerative Information Bottleneck (AIB) on the joint probability table aib->Pcx which has labels along the columns and feature values along the rows. AIB iteratively merges the two values of the feature x that causes the smallest decrease in mutual information between the random variables x and c.

Merge operations are arranged in a binary tree. The nodes of the tree correspond to the original feature values and any other value obtained as a result of a merge operation. The nodes are indexed in breadth-first order, starting from the leaves. The first index is zero. In this way, the leaves correspond directly to the original feature values. In total there are 2*nvalues-1 nodes.

The results may be accessed through vl_aib_get_parents which returns an array with one element per tree node. Each element is the index the parent node. The root parent is equal to zero. The array has 2*nvalues-1 elements.

Feature values with null probability are ignored by the algorithm and their nodes have parents indexing a non-existent tree node (a value bigger than 2*nvalues-1).

Then the function will also compute the information level after each merge. vl_get_costs will return a vector with the information level after each merge. cost has nvalues entries: The first is the value of the cost functional before any merge, and the others are the cost after the nvalues-1 merges.