Parallel and Distributed Data Management
(Non-Standard Database Systems)

Vorlesung:
Proseminar:
Semester:
SS 2024/2025

News

About this Course

This course comes in two variants:

Data Science students are encouraged to enroll into Variant A (5 ECTS UV rather than VO+PS) to take advantage of the midterms for the lecture part of the course. The recognition of Variant A as Variant B (as required for Data Science students) is ensured. For students enrolled into Variant B the midterms will be part of the lab grade (PS), and an additional final exam will be required to pass the lecture (VO).

Lecture

Questions and discussions

For questions and discussions (also among students) regarding course specific topics, please use the Slack channel #pddm (Workspace dbteaching.slack.com).

Slack registration: Students register with their university email here: https://dbteaching.slack.com/signup

Schedule

Schedule of the course according to PlusOnline. Deviations will be communicated explicitly in the Slack channel #pddm and/or the course website.

Slides

Each set of slides treats a specific topic area and will be discussed in one or more lecture units. Slides that have not yet been discussed during the lecture may be subject to change. Once a slide set has been discussed in class, only bug fixes will be applied. Slide sets have a version (date) on the title page.

The slides and their discussion during the lecture are essential for the exam preparation.

Note: The slide version of last year is already online to give you an overview, but this version may be subject to change.

Topics Slides Handouts Literature
Database System Architectures [1x1] [2x2] DSC6 17
Parallel Databases [1x1] [2x2] DSC6 18
Distributed Databases [1x1] [2x2] 2-Phase-Commit
Persistent Messaging
Distributed Locking
DSC6 19

Previous Knowledge Expected

Basics of transactions:
DSC6 14.1–14.2, 14.4–14.6
Concurrency Control:
2-Phase Locking (2PL): DSC6 15.1.3
Timestamp-Based Protocols: DSC6 15.4
Deadlocks: DSC6 15.2

Literature

DSC6 — Database System Concepts
Silberschatz, Korth, Sudarshan. Database System Concepts.. McGraw-Hill, 2011, 6th edition.
Multiple copies of the book are available in the textbook collection of the department library (Itzling).
DSC7 — Database System Concepts
Silberschatz, Korth, Sudarshan. Database System Concepts.. McGraw-Hill, 2019, 7th edition.
The book is available online from our university library

Grading

The grading of the course is based on:

  1. Two midterms: you will write two midterm exams with 15 points each (30 points in total).
  2. Hands-on project: consisting of 3 assignments (see below) with 10 points each (30 points in total).
The overall score is the sum of the midterm score and the project score. The maximum overall score is 60. You need to achieve a midterm score of at least 15 points and a project score of at least 15 points to pass the course.

If you have already taken the lecture (e.g., last year) and passed the lecture exam, the grade of the exam can be accredited for the midterms. You will receive 15 points for grade 4, 18.75 points for grade 3, 22.5 points for grade 2, and 26.25 points for grade 1. Hence, you don’t have to write the midterms, if you have already passed the exam.

Midterm Exams

The midterm exams are planned for:

  1. Tue April 29, 15:00 (T03)
  2. Tue June 24, 15:00 (T03)

The midterm exam lasts for 60 minutes and you can get a total of 30 points for the two midterms.

Cheat sheet: You may use one A4 sheet with your personal notes (single-sided, handwritten or printed).

Previous exams: 20230627, 20230712, 20230919
Please note that these exams were a part of the lecture (VO) variant of this course and differ in their format from the midterm exams.

Project (Lab)

The goal of the project is to gain hands-on experience by working on three major programming assignments throughout the semester, in which we will implement parallel join algorithms using the Apache Spark framework.

For questions and discussions (also among students), please use the Slack channel #pddm-lab (Workspace dbteaching.slack.com).

Assignments

You will work on the project in groups of two people each. The assignment sheets will be published on Blackboard and we will provide skeleton files in which you are expected to implement your solution. We will ship the skeleton files in a single respository for all assignments using GitHub Classroom. More details on the setup will be given in the kick-off meeting and in the first assignment.

Assignment Total Points Release Date Due Date
A1: Setup & Warmup 10 points 18.03.2025 06.04.2025 @ 23:59
A2: Parallel Set Similarity Join 10 points 08.04.2025 04.05.2025 @ 23:59
A3: Fragment-and-Replicate Join 10 points 06.05.2025 01.06.2025 @ 23:59

Submission

Commit and push your team's solution to the repository provided by GitHub Classroom. You can push as many changes as you want, only the most recent commit on the main branch will be graded. The last commit before the deadline counts. Do not post your project on a public GitHub repository and do not copy solutions from anyone else.

Every assignment consists of multiple parts (implementation plus a few questions) and each part is labeled with the amount of points that can be achieved. In total, all assignments sum up to 30 points.

Meetings

We will meet every three to four weeks according to the schedule below. One or two team(s) will present their solutions and we will discuss preliminiaries for the upcoming assignment (attendance is obligatory). Between a unit and the deadline of the next assignment, an optional Q&A session is offered (attendance is optional).

Date Unit Attendance
11.03.2025 Kick-off meeting compulsory
18.03.2025 Unit 1: MapReduce and Apache Spark compulsory
01.04.2025 Q&A optional
08.04.2025 Unit 2: Set Similarity Joins compulsory
29.04.2025 Q&A optional
06.05.2025 Unit 3: Fragment-and-Replicate Joins compulsory
20.05.2025 Q&A optional
03.06.2025 Unit 4: Final Results compulsory

Course Unenrollment

Unenrollments are possible only until before the 3rd lab unit, i.e., all students that are still enrolled at the time of the 3rd lab unit will be graded.