Univ.-Prof. Dipl. Ing. Nikolaus Augsten, PhD
I was previously affiliated to the Faculty of Computer Science at the Free University of Bozen-Bolzano. In 2010/2011 I visited Prof. Alfons Kemper at Technische Universität München (TUM, Munich, Germany) for 6 months (my website at Technische Universität München). In 2005/2006 I spent 6 months at Washington State University with Prof. Curtis Dyreson (now at Utah State University). I received my PhD from Aalborg University, Denmark, in 2008. My supervisor was Prof. Michael Böhlen (University of Zurich).
Main Research Interests
My current research interests include data-centric applications in database and information systems with a particular focus on approximate matching techniques for complex data structures, efficient index structures for distance computations, and similarity search in massive data collections. My research is triggered by problems that arise in concrete applications, for example, e-government and XML search engines.
Selected Publications
- Manuel Widmoser, Daniel Kocher, Nikolaus Augsten. Scalable Distributed Inverted List Indexes in Disaggregated Memory. In Proc. ACM Manag. Data 2(3): 171 (SIGMOD 2024)
- Daniel Ulrich Schmitt, Daniel Kocher, Nikolaus Augsten, Willi Mann, Alexander Miller. A Two-Level Signature Scheme for Stable Set Similarity Joins. In Proc. VLDB Endow. 16(11): 2686-2698 (VLDB 2023)
- Manuel Widmoser, Daniel Kocher, Nikolaus Augsten, Willi Mann. MetricJoin: Leveraging Metric Properties for Robust Exact Set Similarity Joins. In IEEE International Conference on Data Engineering (ICDE 2023)
- Konstantin Emil Thiel, Daniel Kocher, Nikolaus Augsten, Thomas Hütter, Willi Mann, Daniel Schmitt. FINEX: A Fast Index for Exact & Flexible Density-Based Clustering. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD 2023)
- George Papadakis, Marco Fisichella, Franziska Schoger, George Mandilaras, Nikolaus Augsten, Wolfgang Nejdl. Benchmarking Filtering Techniques for Entity Resolution. In IEEE International Conference on Data Engineering (ICDE 2023)
- Pranay Mundra, Jianhao Zhang, Fatemeh Nargesian, Nikolaus Augsten. KOIOS: Top-k Semantic Overlap Set Search. In IEEE International Conference on Data Engineering (ICDE 2023).
- Oksana Dolmatova, Nikolaus Augsten, Michael H. Böhlen. A Relational Matrix Algebra and its Implementation in a Column Store. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD 2020)
-
D. Kocher, N. Augsten. A Scalable Index for Top-k Subtree Similarity Queries.
In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD 2019)
-
T. Hütter, M. Pawlik, R. Löschinger, N. Augsten. Effective Filters and Linear Time Verification for Tree Similarity Joins.
In Proceedings of the IEEE International Conference on Data Engineering (ICDE 2019)
-
F. Fier, N. Augsten, P. Bouros, U. Leser, J.-C. Freytag. Set Similarity Joins on MapReduce. An Experimental Survey.
In The Proceedings of the VLDB Endowment (PVLDB 2018),
PDF -
W. Mann, N. Augsten, P. Bouros. An Empirical Evaluation of Set Similarity Join Techniques.
In The Proceedings of the VLDB Endowment (PVLDB 2016),
PDF and source code -
M. Pawlik, N. Augsten. Efficient Computation of the Tree Edit Distance.
In ACM Transactions on Database Systems (TODS),
40(1):1-40, 2015.
PDF, Abstract -
N. Augsten, A. Miraglia, T. Neumann, A. Kemper.
On-the-Fly Token Similarity Joins in Relational Databases.
In Proceedings of the ACM SIGMOD International Conference on
Management of Data (SIGMOD-14), Snowbird, UT, USA.
June 2014. ACM.
PDF, all downloads -
S. Helmer, N. Augsten, M. Böhlen.
Information-theoretic approaches for measuring the structural similarity
of semistructured documents.
In The VLDB Journal (VLDBJ).
21(5):677-702, 2012. DOI: 10.1007/s00778-012-0263-0.
(GRIN*: A)
PDF, Abstract -
B. Gufler, N. Augsten, A. Reiser, A. Kemper.
Load Balancing in MapReduce Based on Scalable Cardinality Estimates.
In Proceedings of the International Conference on Data Engineering
(ICDE-12), Washington, DC, USA.
April 2012. IEEE Computer Society.
(GRIN*: A)
PDF, Abstract -
M. Pawlik, N. Augsten.
RTED: A Robust Algorithm for the Tree Edit Distance.
In PVLDB 5(4):334-345, 2011 (VLDB-12).
(GRIN*: A)
PDF, Abstract -
N. Augsten, M. Böhlen, C. Dyreson, and J. Gamper.
Windowed pq-Grams for Approximate Joins of Data-Centric XML.
In The VLDB Journal (VLDBJ).
21(4):463-488, 2012. DOI: 10.1007/s00778-011-0254-6.
(GRIN*: A)
PDF, Abstract -
N. Augsten, D. Barbosa, M. Böhlen, and T. Palpanas.
Efficient top-k approximate subtree matching in small memory.
In IEEE Transactions on Knowledge and Data Engineering
(TKDE).
23(8): 1123-1137, 2011.
(GRIN*: A)
PDF, Abstract -
N. Augsten, D. Barbosa, M. Böhlen, and T. Palpanas.
TASM: Top-k approximate subtree matching.
In Proceedings of the International Conference on Data Engineering
(ICDE-10), pages 353-364, Long Beach, California,
USA, March 2010.
IEEE Computer Society.
(acceptance rate: 12.9%, GRIN*:
A)
ICDE 2010 Best Paper Award
PDF, all downloads -
N. Augsten, M. Böhlen, and J. Gamper.
The pq-Gram Distance between Ordered Labeled Trees.
In ACM Transactions on Database Systems (TODS),
35(1):1-36, 2010.
(GRIN*: A)
PDF, all downloads -
N. Augsten, M. Böhlen, C. Dyreson, and J. Gamper.
Approximate Joins for Data Centric XML.
In Proceedings of the International Conference on Data Engineering
(ICDE-08), Cancún, Mexico, April 2008.
IEEE Computer Society.
(acceptance rate: 12.1%, GRIN*:
A)
PDF, all downloads -
N. Augsten, M. Böhlen, and J. Gamper.
An incrementally maintainable index for approximate lookups in
hierarchical data.
In Proceedings of the 32th International Conference on Very Large
Databases (VLDB-06), pages 247-258, Seoul, Korea,
Sep. 2006.
ACM Press.
(acceptance rate: 13.2%, GRIN*:
A)
PDF, all downloads -
N. Augsten, M. Böhlen, and J. Gamper.
Approximate matching of hierarchical data using pq-grams.
In Proceedings of the 31th International Conference on Very Large
Databases (VLDB-05), pages 141-152, Trondheim,
Norway, Aug.-Sep.
2005. ACM Press.
(acceptance rate: 16.5%, GRIN*:
A)
PDF, all downloads
Source Code Downloads
- Set Similarity Join Algorithms: C++ source code for many set similarity join algorithms. We used this source code in our experimental evalution of set similarity join algorithm (PVLDB 2016).
- Approximate Tree Matching Library: pq-gram distance and other tree distances (Java source code).
- Tree Edit Distance code: implementation of all important tree edit distance algorithms (RTED, Demaine, Klein, Zhang-Shasha) with detailed documentation (Java source code).
- Repeatability package including source code, the Bolzano Address Tree dataset and all other datasets used in the paper The pq-Gram Distance between Ordered Labeled Tree (TODS 2010).
Courses at other Universities
Lab Database Systems (Free University of Bozen-Bolzano)
Similarity Search (Free University of Bozen-Bolzano)
This course will discuss similarity search techniques for flat strings and
hierarchical data (for example, XML). Selected methods will be presented,
their effectiveness and efficiency will be discussed. Filtering techniques
to improve the efficiency will be introduced. The students will implement
similarity joins in a relational database management system.
Database Management and Tuning (Free University of Bozen-Bolzano)
This course will give an in-depth understanding of the features that
off-the-shelf database management systems offer, in particular with respect
to system performance. This knowledge is used to tune the database system
and its environment: dimension the hardware for the database system, write
efficient queries, set effective indexes, communicate with the database
efficiently, and diagnose performance problems.
Scalable Similarity Search Algorithms (Technische Universität München, WS 2010)
Approximation: Theory and Algorithms (Free University of Bozen-Bolzano)
This course will discuss approximate matching techniques for flat strings
and hierarchical data. Selected methods will be presented, their
effectiveness and efficiency will be discussed. Filtering techniques to
improve the efficiency will be introduced. The students will implement
approximate matching techniques in a relational database management system.