K-Nearest Neighbor Search for Fuzzy Objects
ABSTRACT
The K-Nearest Neighbor search (kNN) problem has been investigated extensively in the past due to its broad range of applications.
In this paper we study this problem in the context of fuzzy objects that have indeterministic boundaries. Fuzzy objects play an important role in many areas, such as biomedical image databases and GIS. Existing research on fuzzy objects mainly focuses on modelling
basic fuzzy object types and operations, leaving the processing of more advanced queries such as kNN query untouched. In this paper, we propose two new kinds of kNN queries for fuzzy objects, Ad-hoc kNN query (AKNN) and Range kNN query (RKNN), to find the k nearest objects qualifying at a probability threshold or within a probability range. For efficient AKNN query processing, we optimize the basic best-first search algorithm by deriving more accurate approximations for the distance function between fuzzy objects and the query object. To improve the performance of RKNN search, effective pruning rules are developed to significantly reduce the search space and further speed up the candidate refinement process. The efficiency of our proposed algorithms as well as the optimization techniques are verified with an extensive set of experiments using both synthetic and real datasets.
摘要(粗译)
K最近邻查询问题因为应用面广所以在过去被广泛研究.本文我们在模糊对象边界不可预测的情况下研究问题研究.模糊对象是一些领域中有着重要的作用,比如说生物医学图象数据库和GIS.现有的模糊对象研究主要集中在模糊对象基础模型类型与操作方面,而在诸如KNN等高级查询操作处理方面还没有被涉及过.本文中,我们设计了两个新的模糊knn查询类型,Ad-hoc KNN查询(AKNN)和范围kNN查询(RKNN),在限制概率阈值或概率范围条件下查找K最近邻目标,对于这样有效的AKNN查询处理,我们在模糊对象和查询对象之间建立起更精确的距离渐近函数,从而优化了基础的最好优先查询算法.为了提高RKNN查询的性能,我们开发了有效的剪枝技术压缩了查询空间并且加快了侯选提纯处理的速度.我们通过大量合成数据和真实数上做的实验据验证了设计算法技术的有效性.
===============以上为粗译,不准确的地方请指正=========
读后:
这个文章是发表在sigmod2010中的accepted paper,细读了一下各部分,是从定义模糊集,建立研究背景与两个查询定义并且在概率阈值和范围阈值的条件下设计算法的优化策略(优化了上边界和下边界),建立了一个渐近的边界函数,线性来代替,来提高运算效果,同时也对反k最近邻做了一些评述.后面的剪枝算法给出了一些证明,前半部分看得还行,但反最近邻查询处理与后面空间分析方面有些没太看懂.
这种模糊环境下的概率查询,不但可以用在KNN上,也可以用在topk,反最近邻与skyline中,作者也提出了下一步的相关研究方向.
我在想如果在移动条件环境下,这种模糊查询又会变成什么样子呢?也许这也值得研究.