1. Foundations of Data Systems (DDIA)

1. Foundations of Data Systems (DDIA)

I highly recommend the book Designing Data Intensive Applications by Martin Kleppmann. This blog is about my notes on this book. Each post will cover a chapter in the book.

Today's systems are much more data-intensive than compute-intensive. Over the years, we have successfully abstracted the data storage engines as databases. But not all databases are equal. Can we classify them based on some fundamental requirements?

Reliability

This is the ability of a system to work correctly even in case of faults. There is a huge distinction between faults and failures.

  • Failure : System as a whole fails providing service to user
  • Fault : One component not working correctly.

Faults can be further classified as

  • Software
    • Runaway process
    • Bugs
  • Hardware
    • Hard disks crash, RAM becomes faulty
  • Human Error
    • Operator Error

Scalability

Distributing load across multiple machines is known as a shared-nothing architecture aka Horizontal Scalability. Vertical Scalability is simpler but often expensive. Some systems require elasticity i.e. need to scale up and down based on user needs.

How can we describe a system's performance ?

  • Increase the load on system without increasing the resources(CPU, RAM etc.)
  • How much resource has to be increased for maintaining the same performance with an increased load.

These are the key metrics of a performance

  • Throughput - Number of records processed per second
  • Latency - Time taken for the request to be received by the service
  • Response Time - Time taken for the request to be completed. It is good to measure them in percentiles as they are highly varied.

For batch applications, throughput would be the major consideration Whereas for online applications Response time is important.

Maintainability

It is the most costly process in software and also the most overlooked during development. The 3 design principles during construction of software can help software maintenance

  • Operability - Make routine tasks easy
  • Simplicity - Good Abstractions to reduce complexity
  • Evolvability - Make changes easier

The goal is to design a reliable, scalable and maintainable application. This book explores the common patterns and techniques involved in creating such a application.