WFM: Advances in Database Research 4
News
- Kickoff meeting March 16, 13:00, T04. The attendance of the kickoff meeting is compulsory.
- This seminar can be accredited as Seminar aus Informatik (Master Informatik, Pflichtmodul P2 "Software Vertiefung").
General
Questions and discussions
For questions and discussions (also among students) regarding course specific topics please use the Slack channel #advances-in-database-research (Workspace dbteaching.slack.com).
Slack registration: Students register with their university email here: https://dbteaching.slack.com/signup
Procedure and Evaluation
The seminar consists of presentations given bei the students and discussions on the presentations.
Presentations: The students choose a current research paper (from a given list), which they prepare and present during the seminar. The presentation should be very detailed and didactically well prepared. Students should not just recite the explanations in the research paper, but acquire the necessary background knowledge and build their own understanding. Students should be able to respond to detailed questions and create and solve examples on their own.
Discussion: students listen to presentations by fellow students and young researchers and engage with the content in discussions.
The quality of the presentation, the participation in the discussions, and the quality of the contributions to the discussion will be evaluated.
Research Papers
The papers are grouped by topic.
Fuzzy and Semantic Token Matching
- Jiannan Wang, Guoliang Li, Jianhua Feng: Fast-join: An efficient method for fuzzy token matching based string similarity join. ICDE 2011: 458-469
- Dong Deng, Albert Kim, Samuel Madden, Michael Stonebraker: SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints. Proc. VLDB Endow. 10(10): 1082-1093 (2017)
- Yeye He, Kris Ganjam, Xu Chu: SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora. Proc. VLDB Endow. 8(12): 1358-1369 (2015)
- Pranay Mundra, Jianhao Zhang, Fatemeh Nargesian, Nikolaus Augsten: KOIOS: Top-k Semantic Overlap Set Search ICDE 2023: to appear. (PDF will be provided)
Scalable Set Similarity Queries
- Chuan Xiao, Wei Wang, Xuemin Lin, Haichuan Shang: Top-k Set Similarity Joins. ICDE 2009: 916-927
- Erkang Zhu, Dong Deng, Fatemeh Nargesian, Renée J. Miller: JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes. SIGMOD Conference 2019: 847-864
- Pei Wang, Chuan Xiao, Jianbin Qin, Wei Wang, Xiaoyang Zhang, Yoshiharu Ishikawa: Local Similarity Search for Unstructured Text. SIGMOD Conference 2016: 1991-2005
- Manuel Widmoser, Daniel Kocher, Nikolaus Augsten, Willi Mann: MetricJoin: Leveraging Metric Properties for Robust Exact Set Similarity Joins. ICDE 2023: to appear. (PDF will be provided)
Data Processing Using Fast Networks
- Wolf Rödiger, Tobias Mühlbauer, Alfons Kemper, Thomas Neumann: High-Speed Query Processing over High-Speed Networks. Proc. VLDB Endow. 9(4): 228-239 (2015)
- Claude Barthels, Simon Loesing, Gustavo Alonso, Donald Kossmann: Rack-Scale In-Memory Join Processing using RDMA. SIGMOD Conference 2015: 1463-1475
- Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, Erfan Zamanian: The End of Slow Networks: It's Time for a Redesign. Proc. VLDB Endow. 9(7): 528-539 (2016)
- Erfan Zamanian, Carsten Binnig, Tim Harris, and Tim Kraska: The End of a Myth: Distributed Transactions Can Scale. PVLDB 10(6): 685-696 (2017)
Main-Memory Database Systems
- Michael Stonebraker, Samuel Madden, Daniel Abadi, Stavros Harizopoulos, Nabil Hachem, Pat Helland: The End of an Architectural Era: (It's Time for a Complete Rewrite). VLDB 2007. 1150–1160
- Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, Peter Boncz: Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask. Proc. VLDB Endow. 11(13): 2209-2222 (2018)
- Maximilian Rieger, Moritz Sichert, Thomas Neumann Integrating Deep Learning Frameworks into Main-Memory Databases. ADBIS 2023: To appear
Discovering JSON Schemas
- William Spoth, Oliver Kennedy, Ying Lu, Beda Hammerschmidt, Zhen Hua Liu: Reducing Ambiguity in JSON Schema Discovery. ACM SIGMOD (2021)
- Mohamed-Amine Baazizi, Houssem Ben Lahmar, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani: Schema inference for massive JSON datasets. Extending Database Technology (EDBT). (2017)
Schedule
Location: Seminar room T04
Date | Time | Presenter | Titel |
---|---|---|---|
2023-03-16 | 13:00-14:00 | Nikolaus Augsten | Kickoff meeting (compulsory) |
2023-03-23 | 12:00-13:30 | Manuel Widmoser | MetricJoin: Leveraging Metric Properties for Robust Exact Set Similarity Joins |
2023-04-27 | 11:30-12:30 | Fabian Schwaiger | JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes |
2023-05-04 | 11:30-12:30 | Luis Mark Thiele | Fast-join: An efficient method for fuzzy token matching based string similarity join |
2023-05-04 | 12:30-13:30 | David Ralser | KOIOS: Top-k Semantic Overlap Set Search |
2023-05-11 | 12:30-13:30 | Konstantin Thiel | FINEX: A Fast Index for Exact & Flexible Density-Based Clustering |
2023-05-25 | 11:30-12:30 | Jonathan Lainer | The End of an Architectural Era: (It's Time for a Complete Rewrite) |
2023-05-25 | 12:30-13:30 | Alexander Posch | Integrating Deep Learning Frameworks into Main-Memory Databases |
2023-06-01 | 11:30-12:30 | Serhat Yalcin | High-Speed Query Processing over High-Speed Networks (CANCELED) |
2023-06-15 | 11:30-12:30 | Markus Diller | Schema inference for massive JSON datasets |
2023-06-15 | 12:30-13:30 | Bianca Löhnert | Ontology Based Data Access for GQL databases |
2023-06-22 | 11:30-12:30 | Begüm Tosun | The End of a Myth: Distributed Transactions Can Scale |