Efficient Skyline Computation over Sliding Windows 摘要粗译
Abstract
We consider the problem of efficiently computing the skyline against the most recent N elements in a data stream seen so far. Specifically, we study the n-of-N skyline queries; that is, computing the skyline for the most recent n (?n ≤ N) elements. Firstly, we developed an effective pruning technique to minimize the number of elements to be kept. It can be shown that on average storing only O(logd N) elements from the most recent N elements is sufficient to support the precise computation of all n-of-N skyline queries in a d-dimension space if the data distribution on each dimension is independent. Then, a novel encoding scheme is proposed, together with efficient update techniques, for the stored elements, so that computing an n-of-N skyline query in a d-dimension space takes O(logN + s) time that is reduced to O(d log logN + s) if the data distribution is independent, where s is the number of skyline points. Thirdly, a novel trigger based technique is provided to process continuous n-of-N skyline queries with O(δ) time to update the current result per new data element and O(log s) time to update the trigger list per result change, where δ is the number of element changes from the current result to the new result. Finally, we extend our techniques to computing the skyline against an arbitrary window in the most recent N elements. Besides theoretical performance guarantees, our extensive experiments demonstrated that the new techniques can support on-line skyline query computation over very rapid data streams.
我们认为迄今为止关于在数据流以最新的N个元素为背景的skyline有效的计算问题有很多,特别地,我们研究了n个N skyline查询;也就是计算最近的N的元素的skyline计算。首先,我们研究了一个有效的剪枝技术来压缩保留的元素数量。若d维空间中每一维度上数据分布都是独立的话,平均存储中有O(logN)元素,从最新的N个元素中就足以精确计算出所有的n-N skyline 查询。第二、提出了一个新的编码方案和有效的更新技术用来存储元素。从而在多维空间中计算n-N skyline查询的时间,在数据分布独立的条件下,时间效率可以由O(logN+s)压缩到O(dloglogN+s),其中s是skyline点的数量。第三,假设了一个新的触发器技术用来处理连续的n-N skyline查询。用O(δ)时间来更新每一个新数据无素的默认值。用O(logs)时间来更新每个触发器列表。δ是从一个结果到新的结果是元素的变化量。最后,我们扩展了我们的技术在以任意窗口为背景的最近的N元中计算skyline。除了理论证明,我们通过充分的实验证明新技术可以在迅速的数据流中进行在线skyline查询计算。