Courses
Thesis Proposals
A top-k subtree similarity query finds the k subtrees in a large
document tree that are most similar to a given reference tree (the query
tree). The similarity between two trees is assessed using the so-called
tree edit distance (TED), which is the minimum number of node
edit operations that transform one tree into another. The tree edit
distance for ordered trees (OTED) can be computed in cubic time and
quadratic space in the number of tree nodes. A tree is ordered if
all siblings follow a specific order, otherwise the tree is considered
unordered.
The Database research group recently proposed an efficient solution for
top-k subtree similarity queries for ordered labeled trees [1]. However,
the problem remains unsolved for unordered trees.
[...Read more...]
This thesis is in collaboration with Munich-based company CELONIS, world market
leader in process mining. CELONIS motivates the problem as follows:
Celonis entwickelt eine in C++ und Java geschriebene Process Mining Engine, die von weltweit
führenden Unternehmen eingesetzt wird, um Ihre Unternehmensprozesse zu analysieren und
zu verbessern. Die Engine verarbeitet Queries in der von uns entwickelten, proprietären
DSL PQL, die für Process Mining und die dabei üblicherweise verwendeten Snowflake-Schemata
optimiert wurde.
Um die Engine für weitere Tools zu öffnen, möchten wir im Rahmen einer
Master- oder Bachelorarbeit die Möglichkeiten zur Unterstützung von SQL
evaluieren.
[...Read more...]