Current Events
12th DB Retreat, 2024Past Seminars
On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML
Matthias Böhm
(Graz University of Technology (TU Graz))
31.1.2019, 15:00 Uhr, room T03, Dept. of Computer Sciences
Large-scale machine learning (ML) underpins many applications that profoundly transform
our lives, but ML systems to execute these workloads are still in their infancy. In a first
part of this talk, we give an overview of Apache SystemML as a representative ML system
for declarative, large-scale ML. SystemML provides an R-like syntax and automatically
compiles these high-level linear algebra programs into hybrid runtime plans of single-
node, in-memory operations, and distributed operations on Spark. In a second part, we
then present a selected research result on optimizing operator fusion plans. The
opportunities for fused operators - in terms of fused chains of basis operators - are
ubiquitous, and include fewer intermediates, scan sharing, and sparsity exploitation
across operators. However, existing fusion heuristics struggle to find good plans for
complex operator DAGs or hybrid plans. Therefore, we introduce an exact yet practical
cost-based optimization framework for fusion plans, including techniques for candidate
exploration, candidate selection, and code generation of local and distributed operations
over dense, sparse, and compressed data. Finally, we share some lessons learned and
ongoing work on properly supporting the entire end-to-end data science lifecycle.
Content Recommendation for Viral Social Influence
Panagiotis Karras (Aarhus University)
18.12.2017, 14:00, room T04, Dept. of Computer Sciences
How do we select content that will become viral in a whole network after
we share it with friends or followers? Significant research activity has
been dedicated to the problem of strategically selecting a seed set of
initial adopters so as to maximize a meme's spread in a network. Yet
this line of work assumes that the success of such a campaign depends
solely on the choice of a tunable set of initiators, regardless of how
users perceive the propagated meme, which is fixed. Yet in many
real-world settings, the opposite holds: a meme's propagation depends on
users' perceptions of its tunable characteristics, while the set of
initiators is fixed.
We address the natural problem that arises in such circumstances:
Suggest content, expressed as a limited set of attributes, for a
creative promotion campaign that starts out from a given seed set of
initiators, so as to maximize its expected spread over a social network.
To our knowledge, no previous work addresses this problem. We find that
the problem is NP-hard and inapproximable. As a tight approximation
guarantee is not admissible, we design an efficient heuristic,
Explore-Update, as well as a conventional Greedy solution. Our
experimental evaluation demonstrates that Explore-Update selects
near-optimal attribute sets with real data, achieves 30% higher spread
than baselines, and runs an order of magnitude faster than Greedy.
Panagiotis Karras (Panos) is an Associate Professor in Computer Science
at Aarhus University. His interests are in the confluence of data
management, data mining, and database security. He earned a PhD in
Computer Science from the University of Hong Kong and an MEng in
Electrical and Computer Engineering from the National Technical
University of Athens. He has held positions at Aalborg University, the
Skolkovo Institute of Science and Technology, Rutgers Business School,
the National University of Singapore, the University of Zurich, and the
Technical University of Denmark. Panos' work has been published in over
50 research articles, awarded by the Hong Kong Institute of Science, and
funded by the Lee Kuan Yew Endowment Fund and the Skolkovo Foundation.
He regularly serves as a program committee member and referee for the
major international conferences and journals in the above areas.
Unnesting Arbitrary Queries
Thomas Neumann
(Technical University of Munich (TUM))
28.10.2016, 11:00, room T03, Dept. of Computer Sciences
SQL-99 allows for nested subqueries at nearly all places within a
query. From a user's point of view, nested queries can greatly
simplify the formulation of complex queries. However, nested
queries that are correlated with the outer queries frequently lead
to dependent joins with nested loops evaluations and thus poor
performance. Existing systems therefore use a number of
heuristics to unnest these queries, i.e., de-correlate them. These
unnesting techniques can greatly speed up query processing, but
are usually limited to certain classes of queries. To the best of our
knowledge no existing system can de-correlate queries in the
general case. We present a generic approach for unnesting
arbitrary queries. As a result, the de-correlated queries allow for
much simpler and much more efficient query evaluation.
Exploiting Knowledge Facets for Enhanced Information Search
Mouna Kacimi
(Free University of Bolzano)
29.04.2016, 15:30, room T06, Dept. of Computer Sciences
Search results about a given query topic are typically unstructured making
it hard to understand the relationships between the different sources of
information. Thus, there is a need for organizing search results to help
users to (1) gain more insights about query topics, and (2) have an easy
access to information sources that trigger their interests. This is
particularly helpful for ambiguous queries or faceted topics that involve
a variety of sub-topics, meanings, versions, arguments, opinions, and
many other aspects. In this talk, I present techniques that exploit
existing knowledge bases to enhance information search. I first show how
to exploit Wikipedia for query expansion and search results
diversification. Then, I proceed with the organization of information
sources allowing an effective navigation through knowledge facets.
Keyword-Based Querying with Local Intent
Christian Jensen
(Aalborg University)
15.04.2016, 09:00
Keynote at the Thesis Development Workshop of the Doctoral College GIScience.
Datenmodellierung in der Anwendungsentwicklung mit NoSQL-Datenbanken
Stefanie Scherzinger
(OTH Regensburg)
17.10.2014
NoSQL-Datenbanken sind gerade in der Webentwicklung zunehmend beliebt.
Oft sind es die großen Datenmengen, die es zu verwalten gilt, mitunter
sind diese Systeme aber auch wegen ihrer Schema-Flexibilität für agile
Entwicklungsteams interessant. Indem viele NoSQL-Datenbanken keine
Unterstützung für die Definition, Einhaltung und Wartung eines globalen
Schemas bieten, verlagern sich klassische Aufgaben des
Datenbank-managementsystems in die Anwendungssoftware. Dieser Vortrag gibt
einen Überblick über konkrete Herausforderungen, die sich in der Praxis
beim Entwurf eines Datenmodells für Key-Value- und Dokumenten-Datenbanken
ergeben. Dazu zählen eine Modellierung, die atomare Updates ermöglicht,
das Vermeiden von Hot-Spot-Datenobjekten, wie sie durch hochfrequente,
parallele Schreibzugriffe gegen dasselbe Objekt verursacht werden, sowie
Strategien zum Umgang mit kontinuierlicher Schema-Evolution. Der Vortrag
zeigt auf, dass gerade die Datenbank-Community mit ihrem Erfahrungsschatz
im Schema-Management und ihrem breiten Fundus an formalen Methoden hier
einen wertvollen Beitrag leisten kann.
Similarity Queries
Yasin N. Silva (Arizona State University)
11.07.2013
Many application scenarios can significantly benefit from the
identification and processing of similarities in the data. Even though
some work has been done to extend the semantics of some operators, e.g.,
join and selection, to be aware of data similarities; there has not been
much study on the role and implementation of similarity-aware operations
as first-class database operators. Furthermore, very little work has
addressed the problem of evaluating and optimizing queries that combine
several similarity operations. The focus of this presentation is the study
of similarity queries that contain one or multiple first-class similarity
database operators, e.g., Similarity Selection, Similarity Join, and
Similarity Group-by. We will present implementation techniques of several
similarity operators; a comprehensive conceptual evaluation model for
similarity queries; and a rich set of transformation rules to extend
cost-based query optimization to the case of similarity queries. We will
also discuss techniques to implement similarity operators using the
MapReduce framework to process massive datasets.
DB Retreats
09.02.2023 - 11.02.2023 at Hotel Oberwirt, Feldthurns, Italy
03.02.2023 - 05.02.2023 at Hotel Oberwirt, Feldthurns, Italy
19.01.2020 - 21.01.2020 at Hotel Schneeberg, Ridnaun, Italy
07.02.2019 - 09.02.2019 at Waldheim, Martell, Italy
23.02.2018 - 25.02.2018 at Zur Goldenen Rose, Karthaus, Italy
17.02.2017 - 19.02.2017 at Hotel Traube, Graun, Italy
14.02.2016 - 16.02.2016 at Glieshof, Matsch, Italy
04.02.2015 - 06.02.2015 at Zur Goldenen Rose, Karthaus, Italy
08.03.2014 - 10.03.2014 at Das Gerstl, Burgeis, Italy
03.02.2013 - 05.02.2013 at Hotel Rainer, Sterzing, Italy
04.03.2012 - 06.03.2012 at Hotel Villa Waldkönigin, St. Valentin auf der Haide, Italy
16.02.2011 - 18.02.2011 at Hotel Cevedale, Sulden, Italy