Courses

Sommer Semester 2016/2017

Thesis proposals

Evaluating Non-standard Algorithms for the Tree Edit Distance

Trees are very common structures to represent data of hierarchical nature. Think of, for example, genealogical trees. Each data item is a node. Nodes have parents, children, and siblings. Such tree structures are used to represent a variety of data: RNA structures, neuronal cells, carbohydrates, sentence structures, websites, and many others. A very interesting problem is to find similarities between tree structures. A standard measure to do that is the tree edit distance.
[...Read more...]

Former Thesis Proposals

This thesis is in collaboration with FINDOLOGIC, a search engine for online shops. FINDOLOGIC motivates the problem as follows:
Eine Suche nach "Hemd" in einem Bekleidungs-Shop findet massenhaft Hemden, die alle gleich relevant sind. Im schlechtesten Fall kommen auf der ersten Seite der Ergebnisse nur blaue Herren-Hemden. Um für solche allgemeinen Anfragen einen guten Überblick über das Artikelsortiment zu liefern ist eine Durchmischung der Artikel wünschenswert, so dass z.B. Hemden in verschiedenen Farben gelistet werden.
[...Read more...]

In the world of rapid data growth, similarity search has emerged. Interesting questions are for example: Do we have in our dataset two different data items which represent the same real world entity? Which data items from two different datasets represent the same real world object? Are two data items similar and, if yes, how?
[...Read more...]

The tree edit distance measure is usually the first choice for comparing tree-structured data. It is defined as the minimum number of simple edit operations (delete, insert, and relabel) to transform one tree into another. However, due to its cubic runtime complexity it is disregarded in some applications. Efficient approximation techniques are required, especially for large input instances. Examples of large trees (having often millions of nodes) are merger trees of galaxies in astrophysics, or assets and bills of material in enterprise resource planning.
[...Read more...]

Many applications deal with hierarchical data. Examples are XML and JSON files which can be represented as trees. In data integration, a join matches elements that are common to two data sources. Since elements are represented slightly different in each source, an approximate join must be used to do the matching. Such a join searches for pairs of similar items. When dealing with massive datasets or large tree instances, efficient techniques must be developed to compute the join in reasonable time.
[...Read more...]

The tree edit distance (TED) is a commonly used measure for comparing rooted ordered labelled trees. TED is defined as the minimum number of simple node operations (delete, insert, relabel) that transform one tree into another. The first recursive formula for TED has been introduced in the late 1970's by Tai. Since then, several algorithms implementing that formula have been proposed. The recent and most efficient development is the RTED algorithm by Pawlik and Augsten (2011). Unfortunately, Demaine et al. have shown that all algorithms that implement the recursive formula by Tai have at least cubic runtime in the worst case.
[...Read more...]

This thesis is in collaboration with FINDOLOGIC, a search engine for online shops. FINDOLOGIC motivates the problem as follows:
Eine große Herausforderung bei der Suche in Online-Shops und der Suche allgemein ist es, die Absicht des Suchenden zu verstehen. Für einen Menschen ist klar, dass bei einer Suche nach "Smartphone" das Samsung Galaxy oder iPhone gewünscht ist. Vielfach werden aber Artikel wie "Smartphone Cover" und andere Zubehörteile gefunden, da in diesen das Wort "Smartphone" im Titel vorkommt, beim iPhone nicht. Gleiches gilt für "Festplatte" (findet den "Computer mit Festplatte", aber nicht die "Festplatte") oder den "Anzug" (findet den "Schlafanzug", obwohl sicherlich ein "Herren-Anzug" gemeint ist). Die spannende Frage ist hier, wie erkannt werden kann, dass ein iPhone ein "Smartphone" ist und der "Bugatti blau" ein "Herren-Anzug". Lösungen hierfür böten einen extremen Mehrwert für Online-Shops.
[...Read more...]