Fast generation of sequential patterns with item constraints from concise representations

Constraint-based frequent sequence mining is an important and necessary task in data mining since it shows results very close to the requirements and interests of users. Most existing algorithms for performing this task are based on a traditional approach that mines patterns directly from a sequence...

Mô tả đầy đủ

Đã lưu trong:
Chi tiết về thư mục
Những tác giả chính: Dương, Văn Hải, Trương, Chí Tín, Trần, Ngọc Anh, Bac Le
Định dạng: Journal article
Ngôn ngữ:English
Được phát hành: Springer Link 2021
Những chủ đề:
Truy cập trực tuyến:http://scholar.dlu.edu.vn/handle/123456789/555
Các nhãn: Thêm thẻ
Không có thẻ, Là người đầu tiên thẻ bản ghi này!
Thư viện lưu trữ: Thư viện Trường Đại học Đà Lạt
id oai:scholar.dlu.edu.vn:123456789-555
record_format dspace
spelling oai:scholar.dlu.edu.vn:123456789-5552023-03-09T22:52:17Z Fast generation of sequential patterns with item constraints from concise representations Dương, Văn Hải Trương, Chí Tín Trần, Ngọc Anh Bac Le Sequential pattern Generator and closed sequences Equivalence relation Partition Constraint-based pattern mining Item constraint Constraint-based frequent sequence mining is an important and necessary task in data mining since it shows results very close to the requirements and interests of users. Most existing algorithms for performing this task are based on a traditional approach that mines patterns directly from a sequence database (SDB). However, in fact, SDBs are often very large. The algorithms thus often exhibit poor performance because the number of generated candidates and the search space are enormous, especially for low minimum support thresholds. In addition, these algorithms must read an SDB again when a constraint is changed by the user. In the context of frequently varied constraints, repeatedly scanning SDBs consume much time. To address this issue, we propose a novel approach for generating frequent sequences with various constraints from the two sets of frequent closed sequences (FCS) and frequent generator sequences (FGS), which are the concise representations of the set FS of all frequent sequences. The proposed approach is based on novel theoretical results that show an explicit relationship between FS and these two sets and have been strictly proved. The approach is then used to develop an efficient algorithm named MFS-IC for quickly generating frequent sequences with item constraints, a task that has many real-life applications. Extensive experiments on real-life and synthetic databases show that the proposed MFS-IC algorithm outperforms state-of-the-art algorithms, which directly mine frequent sequences with constraints from an SDB, in terms of runtime, memory usage and scalability. 62 2191–2223 2021-09-16T04:28:01Z 2021-09-16T04:28:01Z 2020-11 Journal article Bài báo đăng trên tạp chí thuộc ISI, bao gồm book chapter http://scholar.dlu.edu.vn/handle/123456789/555 10.1007/s10115-019-01418-2 en Knowl Inf Syst 10.1007/s10115-019-01418-2 Electronic ISSN 0219-3116 Print ISSN 0219-1377 Springer Link
institution Thư viện Trường Đại học Đà Lạt
collection Thư viện số
language English
topic Sequential pattern
Generator and closed sequences
Equivalence relation
Partition
Constraint-based pattern mining
Item constraint
spellingShingle Sequential pattern
Generator and closed sequences
Equivalence relation
Partition
Constraint-based pattern mining
Item constraint
Dương, Văn Hải
Trương, Chí Tín
Trần, Ngọc Anh
Bac Le
Fast generation of sequential patterns with item constraints from concise representations
description Constraint-based frequent sequence mining is an important and necessary task in data mining since it shows results very close to the requirements and interests of users. Most existing algorithms for performing this task are based on a traditional approach that mines patterns directly from a sequence database (SDB). However, in fact, SDBs are often very large. The algorithms thus often exhibit poor performance because the number of generated candidates and the search space are enormous, especially for low minimum support thresholds. In addition, these algorithms must read an SDB again when a constraint is changed by the user. In the context of frequently varied constraints, repeatedly scanning SDBs consume much time. To address this issue, we propose a novel approach for generating frequent sequences with various constraints from the two sets of frequent closed sequences (FCS) and frequent generator sequences (FGS), which are the concise representations of the set FS of all frequent sequences. The proposed approach is based on novel theoretical results that show an explicit relationship between FS and these two sets and have been strictly proved. The approach is then used to develop an efficient algorithm named MFS-IC for quickly generating frequent sequences with item constraints, a task that has many real-life applications. Extensive experiments on real-life and synthetic databases show that the proposed MFS-IC algorithm outperforms state-of-the-art algorithms, which directly mine frequent sequences with constraints from an SDB, in terms of runtime, memory usage and scalability.
format Journal article
author Dương, Văn Hải
Trương, Chí Tín
Trần, Ngọc Anh
Bac Le
author_facet Dương, Văn Hải
Trương, Chí Tín
Trần, Ngọc Anh
Bac Le
author_sort Dương, Văn Hải
title Fast generation of sequential patterns with item constraints from concise representations
title_short Fast generation of sequential patterns with item constraints from concise representations
title_full Fast generation of sequential patterns with item constraints from concise representations
title_fullStr Fast generation of sequential patterns with item constraints from concise representations
title_full_unstemmed Fast generation of sequential patterns with item constraints from concise representations
title_sort fast generation of sequential patterns with item constraints from concise representations
publisher Springer Link
publishDate 2021
url http://scholar.dlu.edu.vn/handle/123456789/555
_version_ 1768306193049059328