Advances in Database Research 6
News
- Kickoff meeting March 7, 09:00, T04. The attendance of the kickoff meeting is compulsory.
- This seminar can be accredited as Seminar aus Informatik (Master Informatik, Pflichtmodul P2 "Software Vertiefung").
General
Questions and discussions
For questions and discussions (also among students) regarding course specific topics please use the Slack channel #advances-in-database-research (Workspace dbteaching.slack.com).
Slack registration: Students register with their university email here: https://dbteaching.slack.com/signup
Procedure and Evaluation
The seminar consists of presentations given bei the students and discussions on the presentations.
Presentations: The students choose a current research paper (from a given list), which they prepare and present during the seminar. The presentation should be very detailed and didactically well prepared. Students should not just recite the explanations in the research paper, but acquire the necessary background knowledge and build their own understanding. Students should be able to respond to detailed questions and create and solve examples on their own.
Discussion: students listen to presentations by fellow students and young researchers and engage with the content in discussions.
The quality of the presentation, the participation in the discussions, and the quality of the contributions to the discussion will be evaluated.
Research Papers
The papers are grouped by topic.
Fuzzy and Semantic Token Matching
- Jiannan Wang, Guoliang Li, Jianhua Feng: Fast-join: An efficient method for fuzzy token matching based string similarity join. ICDE 2011: 458-469
- Dong Deng, Albert Kim, Samuel Madden, Michael Stonebraker: SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints. Proc. VLDB Endow. 10(10): 1082-1093 (2017)
- Yeye He, Kris Ganjam, Xu Chu: SEMA-JOIN: Joining Semantically-Related Tables Using Big Table Corpora. Proc. VLDB Endow. 8(12): 1358-1369 (2015)
- Zhizhi Wang, Chaoji Zuo, Dong Deng: TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection. SIGMOD 2022. (PDF will be provided)
- Weiqi Feng, Dong Deng: Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts. SIGMOD 2021. (PDF will be provided)
- Pranay Mundra, Jianhao Zhang, Fatemeh Nargesian, Nikolaus Augsten: KOIOS: Top-k Semantic Overlap Set Search ICDE 2023: to appear. (PDF will be provided)
Scalable Set Similarity Queries
- Chuan Xiao, Wei Wang, Xuemin Lin, Haichuan Shang: Top-k Set Similarity Joins. ICDE 2009: 916-927
- Erkang Zhu, Dong Deng, Fatemeh Nargesian, Renée J. Miller: JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes. SIGMOD Conference 2019: 847-864
- Pei Wang, Chuan Xiao, Jianbin Qin, Wei Wang, Xiaoyang Zhang, Yoshiharu Ishikawa: Local Similarity Search for Unstructured Text. SIGMOD Conference 2016: 1991-2005
- Manuel Widmoser, Daniel Kocher, Nikolaus Augsten, Willi Mann: MetricJoin: Leveraging Metric Properties for Robust Exact Set Similarity Joins. ICDE 2023: to appear. (PDF will be provided)
Data Processing Using Fast Networks
- Wolf Rödiger, Tobias Mühlbauer, Alfons Kemper, Thomas Neumann: High-Speed Query Processing over High-Speed Networks. Proc. VLDB Endow. 9(4): 228-239 (2015)
- Claude Barthels, Simon Loesing, Gustavo Alonso, Donald Kossmann: Rack-Scale In-Memory Join Processing using RDMA. SIGMOD Conference 2015: 1463-1475
- Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, Erfan Zamanian: The End of Slow Networks: It's Time for a Redesign. Proc. VLDB Endow. 9(7): 528-539 (2016)
- Erfan Zamanian, Carsten Binnig, Tim Harris, and Tim Kraska: The End of a Myth: Distributed Transactions Can Scale. PVLDB 10(6): 685-696 (2017)
Main-Memory nad hardware-sensitive Database System Aspects
- Timo Kersten, Viktor Leis, Alfons Kemper, Thomas Neumann, Andrew Pavlo, Peter Boncz: Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask. Proc. VLDB Endow. 11(13): 2209-2222 (2018)
- Weijie Zhao, Shulong Tan, Ping Li: SONG: Approximate Nearest Neighbor Search on GPU. ICDE (2020)
- Xuchuan Luo, Pengfei Zuo, Jiacheng Shen, Jiazhen Gu, Xin Wang, Michael R. Lyu, Yangfan Zhou: SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory. OSDI (2023)
Discovering JSON Schemas
- William Spoth, Oliver Kennedy, Ying Lu, Beda Hammerschmidt, Zhen Hua Liu: Reducing Ambiguity in JSON Schema Discovery. ACM SIGMOD (2021)
- Mohamed-Amine Baazizi, Houssem Ben Lahmar, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani: Schema inference for massive JSON datasets. Extending Database Technology (EDBT). (2017)
Schedule
Location: Seminar room T04
Date | Time | Presenter | Titel |
---|---|---|---|
2024-03-7 | 09:00-10:00 | Martin Schäler | Kickoff meeting (compulsory) |