Parallel and Distributed Data Management
(Non-Standard Database Systems)
News
- The lecture (VO) on March 11, 2025, will be held online in the form of a video on Blackboard. The course outline will be presented in the lab's kick-off meeting.
- Kick-off meeting (compulsory) for the lab is on March 11, 2025 at 15:00 in room T03.
About this Course
This course comes in two variants:
- Variant A for Computer Science students: combined lecture and lab (UV - Übung mit Vorlesung) with 5 ECTS.
- Variant B for Data Science students: lecture (VO, 2.5 ECTS) + lab (PS, 2.5 ECTS)
Data Science students are encouraged to enroll into Variant A (5 ECTS UV rather than VO+PS) to take advantage of the midterms for the lecture part of the course. The recognition of Variant A as Variant B (as required for Data Science students) is ensured. For students enrolled into Variant B the midterms will be part of the lab grade (PS), and an additional final exam will be required to pass the lecture (VO).
Lecture
Questions and discussions
For questions and discussions (also among students) regarding course specific topics, please use the Slack channel #pddm (Workspace dbteaching.slack.com).
Slack registration: Students register with their university email here: https://dbteaching.slack.com/signup
Schedule
Schedule of the course according to PlusOnline. Deviations will be communicated explicitly in the Slack channel #pddm and/or the course website.Slides
Each set of slides treats a specific topic area and will be discussed in one or more lecture units. Slides that have not yet been discussed during the lecture may be subject to change. Once a slide set has been discussed in class, only bug fixes will be applied. Slide sets have a version (date) on the title page.
The slides and their discussion during the lecture are essential for the exam preparation.
Note: The slide version of last year is already online to give you an overview, but this version may be subject to change.
Topics | Slides | Handouts | Literature |
---|---|---|---|
Database System Architectures | [1x1] [2x2] | — | DSC6 17 |
Parallel Databases | [1x1] [2x2] | — | DSC6 18 |
Distributed Databases | [1x1] [2x2] |
2-Phase-Commit
Persistent Messaging Distributed Locking |
DSC6 19 |
Previous Knowledge Expected
- Basics of transactions:
- DSC6 14.1–14.2, 14.4–14.6
-
- Concurrency Control:
- 2-Phase Locking (2PL): DSC6 15.1.3
Timestamp-Based Protocols: DSC6 15.4
Deadlocks: DSC6 15.2
Literature
- DSC6 — Database System Concepts
- Silberschatz, Korth, Sudarshan. Database System Concepts.. McGraw-Hill, 2011, 6th edition.
Multiple copies of the book are available in the textbook collection of the department library (Itzling). - DSC7 — Database System Concepts
- Silberschatz, Korth, Sudarshan. Database System Concepts.. McGraw-Hill, 2019, 7th edition.
The book is available online from our university library
Grading
The grading of the course is based on:
- Two midterms: you will write two midterm exams with 15 points each (30 points in total).
- Hands-on project: consisting of 3 assignments (see below) with 10 points each (30 points in total).
If you have already taken the lecture (e.g., last year) and passed the lecture exam, the grade of the exam can be accredited for the midterms. You will receive 15 points for grade 4, 18.75 points for grade 3, 22.5 points for grade 2, and 26.25 points for grade 1. Hence, you don’t have to write the midterms, if you have already passed the exam.
Midterm Exams
The midterm exams are planned for:
- Tue April 29, 15:00 (T03)
- Tue June 24, 15:00 (T03)
The midterm exam lasts for 60 minutes and you can get a total of 30 points for the two midterms.
Cheat sheet: You may use one A4 sheet with your personal notes (single-sided, handwritten or printed).
Previous exams:
20230627,
20230712,
20230919
Please note that these exams were a part of the lecture (VO) variant of this course and differ in their format from the midterm exams.
Project (Lab)
The goal of the project is to gain hands-on experience by working on three major programming assignments throughout the semester, in which we will implement parallel join algorithms using the Apache Spark framework.
For questions and discussions (also among students), please use the Slack channel #pddm-lab (Workspace dbteaching.slack.com).
Assignments
You will work on the project in groups of two people each. The assignment sheets will be published on Blackboard and we will provide skeleton files in which you are expected to implement your solution. We will ship the skeleton files in a single respository for all assignments using GitHub Classroom. More details on the setup will be given in the kick-off meeting and in the first assignment.
Assignment | Total Points | Release Date | Due Date |
A1: Setup & Warmup | 10 points | 18.03.2025 | 06.04.2025 @ 23:59 |
A2: Parallel Set Similarity Join | 10 points | 08.04.2025 | 04.05.2025 @ 23:59 |
A3: Fragment-and-Replicate Join | 10 points | 06.05.2025 | 01.06.2025 @ 23:59 |
Submission
Commit and push your team's solution to the repository provided by GitHub Classroom.
You can push as many changes as you want, only the most recent commit on the main
branch will be graded.
The last commit before the deadline counts.
Do not post your project on a public GitHub repository and do not copy solutions from anyone else.
Every assignment consists of multiple parts (implementation plus a few questions) and each part is labeled with the amount of points that can be achieved. In total, all assignments sum up to 30 points.
Meetings
We will meet every three to four weeks according to the schedule below. One or two team(s) will present their solutions and we will discuss preliminiaries for the upcoming assignment (attendance is obligatory). Between a unit and the deadline of the next assignment, an optional Q&A session is offered (attendance is optional).
Date | Unit | Attendance |
---|---|---|
11.03.2025 | Kick-off meeting | compulsory |
18.03.2025 | Unit 1: MapReduce and Apache Spark | compulsory |
01.04.2025 | Q&A | optional |
08.04.2025 | Unit 2: Set Similarity Joins | compulsory |
29.04.2025 | Q&A | optional |
06.05.2025 | Unit 3: Fragment-and-Replicate Joins | compulsory |
20.05.2025 | Q&A | optional |
03.06.2025 | Unit 4: Final Results | compulsory |
Course Unenrollment
Unenrollments are possible only until before the 3rd lab unit, i.e., all students that are still enrolled at the time of the 3rd lab unit will be graded.