High-level efficient constraint dominance programming for pattern mining problems

  • Gokberk Kocak

Student thesis: Doctoral Thesis (PhD)

Abstract

Pattern mining is a sub-field of data mining that focuses on discovering patterns in data to extract knowledge. There are various techniques to identify different types of patterns in a dataset. Constraint-based mining is a well-known approach to this where additional constraints are introduced to retrieve only interesting patterns. However, in these systems, there are limitations on imposing complex constraints.

Constraint programming is a declarative methodology where the problem is modelled using constraints. Generic solvers can operate on a model to find the solutions. Constraint programming has been shown to be a well-suited and generic framework for various pattern mining problems with a selection of constraints and their combinations. However, a system that handles arbitrary constraints in a generic way has been missing in this field.

In this thesis, we propose a declarative framework where the pattern mining models can be represented in high-level constraint specifications with arbitrary additional constraints. These models can be efficiently solved using underlying optimisations.

The first contribution of this thesis is to determine the key aspects of solving pattern mining problems by creating an ad-hoc solver system. We investigate this further and create Constraint Dominance Programming (CDP) to be able to capture certain behaviours of pattern mining problems in an abstract way. To that end, we integrate CDP into the high-level \essence pipeline. Early empirical evaluation presents that CDP is already competitive with current existing techniques. The second contribution of this thesis is to exploit an additional behaviour, the incomparability, in pattern mining problems. By including the incomparability condition to CDP, we create CDP+I, a more explicit and even more efficient framework to represent these problems. We also prototype an automated system to deduct the optimal incomparability information for a given modelled problem. The third contribution of this thesis is to focus on the underlying solving of CDP+I to bring further efficiency. By creating the Solver Interactive Interface (SII) on SAT and SMT back-ends, we highly optimise not only CDP+I but any iterative modelling and solving, such as optimisation problems. The final contribution of this thesis is to investigate creating an automated configuration selection system to determine the best performing solving methodologies of CDP+I and introduce a portfolio of configurations that can perform better than any single best solver.

In summary, this thesis presents a highly efficient, high-level declarative framework to tackle pattern mining problems.
Date of Award14 Jun 2023
Original languageEnglish
Awarding Institution
  • University of St Andrews
SupervisorIan James Miguel (Supervisor) & Ozgur Akgun (Supervisor)

Keywords

  • Constraint programming
  • Data mining
  • CDP
  • SAT
  • SMT

Access Status

  • Full text open

Cite this

'