blog




  • Essay / Multidimensional Data Indexing for Nearest Neighbor Queries

    Most prior work in the database literature has focused on indexing lower-dimensional data and other types of queries in addition to similarity queries. The lc-d tree was one of the first structures proposed to index multidimensional data for nearest neighbor queries. Recently, this structure has been used in geographic information systems for queries such as similarity queries, and could be useful for similarity indexing. Other methods, such as space-filling curves, linear quadtrees, and grid files, do not scale well for high dimensions, but can be useful for medium-dimensional data. Say no to plagiarism. Get a tailor-made essay on “Why violent video games should not be banned”?Get the original essayThe R-tree and its most successful variant, the R*-tree, have been used most often to index data large in the database. literature. However, because ranges are stored on each dimension, the index requires more space and time to search in a higher dimensionality. For this reason, higher-dimensional data is typically mapped into a lower-dimensional space before being indexed into R-trees. The TV-tree is so far the only method in the database literature that has been proposed specifically for the indexing of high-dimensional data. Performance comparisons clearly show that the TV tree can be much more efficient than the R* tree. However, performance improvement depends on two assumptions. The first assumption is that dimensions and feature vectors are ordered by “importance”. This second assumption is that the sets of feature vectors in the dataset will tend to match exactly on dimensions, especially on the first "important" dimensions. The first assumption is reasonable (even desirable) since an appropriate transformation can be used. The second assumption was not explicitly stated in the paper, but careful analysis of their algorithms reveals that their performance improvement depends on it. In some applications, the original feature vectors contain a small set of discrete quantities, so the second assumption is true. Unfortunately, this second assumption will not normally be true in visual information systems and many other applications. The features of these apps usually have real value, so the chances of matching the dimensions exactly are negligible. In this case, the TV tree is reduced to an index on the first dimensions only. Small changes in the proposed algorithms should allow the TV tree to be a slight improvement over the R* tree in these applications. However, in this article we will refer to the R-tree (and its variants) as the best-known structure for similarity indexing, as it has proven itself in more similarity indexing applications . Keep in mind: this is just a sample. Get a custom paper now from our expert writers. Get a Custom Essay There is also related work outside the database literature. In the information retrieval literature, work has been done on fides clusters which provide structures similar to the SS tree. In the image database community, a..