Publication:
An Efficient Method for Mining Top-<i>K</i> Closed Sequential Patterns
An Efficient Method for Mining Top-<i>K</i> Closed Sequential Patterns
No Thumbnail Available
Files
Date
2020
Authors
Thi-Thiet Pham
Tung Do
Anh Nguyen
Bay Vo
Tzung-Pei Hong
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Abstract
The problem of exploiting Closed Sequential Patterns (CSPs) is an essential task in data
mining, with many different applications. It is used to resolve the situations of huge databases or low
minimum support (minsup) thresholds in mining sequential patterns. However, it is challenging and needs
a lot of time to customize the minsup values for generating appropriate numbers of CSPs desired by users.
To conquer this issue, the TSP algorithm for mining top-k CSPs was previously proposed, with k being
a given parameter. The algorithm would return the k CSPs which have the highest support values in a
database. However, its execution time and memory usage were high. In this paper, an algorithm named
TKCS (Top-K Closed Sequences) is proposed to mine the top-k CSPs ef ciently. To improve the execution
time and memory usage, it uses a vertical bitmap database to represent data. Besides, it adopts some useful
strategies in the process of exploiting the top-k CSPs such as: always choosing the sequential patterns with
the greatest support values for generating candidate patterns and storing top-k CSPs in an ascending order
of the support values to increase the minsup value more quickly. The empirical results show that TKCS has
better performance than TSP for discovering the top-k CSPs in terms of both runtime and memory usage.
Description
Keywords
Closed sequential pattern,
data mining,
sequential pattern,
top-k sequential patterns.