An Approximate Search Framework for Big Data
In the age of big data, a traditional scanning search pattern is gradually becoming unfit for a satisfying user experience due to its lengthy computing process. In this paper, we propose a sampling-based approximate search framework called Hermes, to meet user’s query demand for both accurate and efficient results. A novel metric, (ε, δ)-approximation, is presented to uniformly measure accuracy and efficiency for a big data search service, which enables Hermes to work out a feasible searching job. Based on this, we employ the bootstrapping technique to further speed up the search process. Moreover, an incremental sampling strategy is investigated to process homogeneous queries; in addition, the reuse theory of historical results is also studied for the scenario of appending data. Theoretical analyses and experiments on a real-world dataset demonstrate that Hermes is capable of producing approximate results meeting the preset query requirements with both high accuracy and efficiency.