In Latent Semantic Indexing, a set of documents is represented by a matrix whose entries are measures of frequencies of occurrences of terms in each document. This matrix is then approximated by a sum of rank-one matrices determined using a singular value or related decomposition.
In this work we investigate the use of a semi-discrete approximation to the matrix in which each rank-one matrix can be represented by a scalar multiplying two vectors whose entries equal plus or minus one or zero. Compression rates are thus much greater than for other decompositions, and the processing of each query is much faster. We compare the methods on three databases and discuss the effects of various frequency measures.
This is joint work with Tamara Gibson Kolda.
Back to Workshop Schedule