Publication:
An Efficient Method for Mining Top-<i>K</i> Closed Sequential Patterns

No Thumbnail Available
Date
2020
Authors
Thi-Thiet Pham
Tung Do
Anh Nguyen
Bay Vo
Tzung-Pei Hong
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Abstract
The problem of exploiting Closed Sequential Patterns (CSPs) is an essential task in data mining, with many different applications. It is used to resolve the situations of huge databases or low minimum support (minsup) thresholds in mining sequential patterns. However, it is challenging and needs a lot of time to customize the minsup values for generating appropriate numbers of CSPs desired by users. To conquer this issue, the TSP algorithm for mining top-k CSPs was previously proposed, with k being a given parameter. The algorithm would return the k CSPs which have the highest support values in a database. However, its execution time and memory usage were high. In this paper, an algorithm named TKCS (Top-K Closed Sequences) is proposed to mine the top-k CSPs ef ciently. To improve the execution time and memory usage, it uses a vertical bitmap database to represent data. Besides, it adopts some useful strategies in the process of exploiting the top-k CSPs such as: always choosing the sequential patterns with the greatest support values for generating candidate patterns and storing top-k CSPs in an ascending order of the support values to increase the minsup value more quickly. The empirical results show that TKCS has better performance than TSP for discovering the top-k CSPs in terms of both runtime and memory usage.
Description
Keywords
Closed sequential pattern, data mining, sequential pattern, top-k sequential patterns.
Citation