Аспирантский семинар: Coordination Avoidance in Distributed Databases
Where: Faculty of Computer Science, Kochnovskii proezd, 3, room 205
When: 18:10 – 19:30 on February 7, 14, 21, and 28
Coordination Avoidance in Distributed Databases, the dissertation by Peter Bailis (Stanford University), recieved an Honorable Mention for the 2016 ACM Doctoral Dissertation Award. This dissertation will be in the focus of the four meetings of the PhD Seminar on February 7, 10, 21, and 28.
The rise of Internet-scale geo-replicated services has led to upheaval in the design of modern data management systems. Given the availability, latency, and throughput penalties associated with classic mechanisms such as serializable transactions, a broad class of systems (e.g., “NoSQL”) has sought weaker alternatives that reduce the use of expensive coordination during system operation, often at the cost of application integrity. When can we safely forego the cost of this expensive coordination, and when must we pay the price? In this thesis, the author investigates the potential for coordination avoidance—the use of as little coordination as possible while ensuring application integrity—in several modern data-intensive domains. He demonstrates how to leverage the semantic requirements of applications in data serving, transaction processing, and web services to enable more efficient distributed algorithms and system designs. The resulting prototype systems demonstrate regular order-of-magnitude speedups compared to their traditional, coordinated counterparts on a variety of tasks, including referential integrity and index maintenance, transaction execution under common isolation models, and database constraint enforcement. A range of open-source applications and systems exhibit similar results.
During the first seminar, Liliya Ziganurova will introduce the problem addressed in the dissertation. We will see why coordination in distributed databases is costly using the results of a measurement study of network behaviour on Amazon’s Elastic Compute Cloud. It is shown that the latency and throughput of the communicational network makes coordination quite expensive.
We will discuss when coordination is needed for correctness and if it is possible to design a coordination-free database. We will see into a formal model for transaction execution and define desirable criteria for coordination-free execution, which implies availability, low latency, and scalability.
This time, we will examine how the invariant confluence property can be applied to the weak isolation guarantees found in today’s database systems. We will show how these guarantees can be implemented in a coordination-free manner if we can prove that they are invariant confluent. Slides
The second part of the seminar will be devoted to a suite of algorithms for enforcing a common semantics in partially-replicated databases. First, we will identify a new isolation level—Read Atomic isolation—that provides atomic visibility and matches the requirements of a large class of real-world applications. In particular, an overview of RAMP (Read Atomic Multi-Partition) transactions will be presented. We will present a detailed comparison with existing isolation guarantees. Besides, three RAMP algorithms will be experimentally evaluated. Slides
We will introduce three Read Atomic Multi-Partition (RAMP) transaction algorithms during this talk. We will also discuss possible ways to implement such algorithms in multiple datacenter network architecture. We will touch upon read-subset-items-written (RSIW) proof and correctness of provided algorithms.
We will discuss invariant confluence of SQL constraints in coordination-free databases. A case study will reveal which of the most frequently used constraints can be expressed in a coordination-free manner. We will show that this approach outperforms the classical one by orders of magnitude on TPC-C New-Order benchmark. Finally, we will take Ruby on Rails MVC framework as an example of a real-world API supporting coordination-free DB interaction and management and discuss its invariant confluence.
It is shown experimentally how unsafe application-level invariant enforcement can lead to database inconsistency in heavy load cases, using Ruby on Rails built-in validations as an example. During investigation of these issues, a bug was unciovered in PostgreSQL serializable isolation implementation. Anomalies in uniqueness and association validations were observed and quantified. These problems are not widely discussed in documentation and are not considered a bug by Ruby on Rails developers even though the default behavior is provably unsafe. Some other ORM frameworks exhibit the same attitude towards database-level consistency. It is concluded that databases should work with ORM and web framework developers to lessen the impedance mismatch and provide useful constraint primitives.
We will also discuss work related to the general themes of the thesis and examine specific areas in depth. We will conclude by discussing the main results of the dissertation, limitations and opportunities for future work in coordination avoidance.