Database Reliability Engineering: Designing and Operating Resilient Database Systems

The infrastructure-as-code revolution in IT is also affecting database administration. With this practical book, developers, system administrators, and junior to mid-level DBAs will learn how the modern practice of site reliability engineering applies to the craft of database architecture and operations. Authors Laine Campbell and Charity Majors provide a framework for professionals looking to join the ranks of today’s database reliability engineers (DBRE).

You’ll begin by exploring core operational concepts that DBREs need to master. Then you’ll examine a wide range of database persistence options, including how to implement key technologies to provide resilient, scalable, and performant data storage and retrieval. With a firm foundation in database reliability engineering, you’ll be ready to dive into the architecture and operations of any modern database.

This book covers:

  • Service-level requirements and risk management
  • Building and evolving an architecture for operational visibility
  • Infrastructure engineering and infrastructure management
  • How to facilitate the release management process
  • Data storage, indexing, and replication
  • Identifying datastore characteristics and best use cases
  • Datastore architectural components and data-driven architectures

Users Comments:

  • The authors have very, very deep knowledge – not just database specifics, but how the database interacts with applications and business requirements. They abstract their experience just enough to make it relevant to all data professionals, yet keep the language clear enough that it’s still directly mappable to the technologies you use today.
    It’s the kind of book that’s easy to read, and hard to implement. Seriously, just implementing the SLOs described in chapter 2 takes most traditional companies months to agree on and monitor.
    Over time, the brand names and open source tools will change, but the concepts are going to be rock solid for at least a decade. This book is a great waypoint marker set about 5-10 years in the future for most of us, but it’ll be one you’ll be excited to work towards.
  • A great book for anyone looking to take over the datastore layer of their growing company or for old schoolers looking to modernizes their skills for the times to come.
    Laine has founded one of the best and most well known MySQL DBA consultancies and Charity took a bleeding edge database like Mongo and helped shape it into a better technology and with both of them as authors you get expertise and perspective that is geared towards quality of service and operating at scale without any tech worship bias or market-speak.
  • Database Reliability Engineering – highly recommended. I learned a lot and will be referring to this book over the course of the year as I make some changes in my company’s engineering practices.
    I expected this book to cover techniques and patterns for building reliability and resiliency into databases. It delivered on that, but lots more as well. The book starts with a discussion of SLAs, SLOs, risk management, and visibility – after all, you can’t really say meaningful things about reliability until you say what you mean BY reliability, in the context of your organizational commitments. Much of the rest of the book grounds back in this discussion in terms of how different techniques are relevant at different SLO and risk tradeoffs, and what things you need to watch to know how you’re doing relative to these tradeoffs. It also notes that you can’t prevent failures, so rather than trying to engineer with that as the goal (which results in fragile and brittle systems), it’s wiser to aim for resiliency and recoverability. We’re almost to page 75 before the book gets into meaty database topics like database-specific metrics (IOPS, TPS). We then dive into infrastructure, backup and recovery (and the importance of practicing – backup is boring, but when you need it, functional recovery is awesome like nothing else). The later chapters cover security, crypto, replication, and an overview of datastore architectures (not just relational). The book is a great blend of general guidelines and advice, concrete actions to consider, and architectural patterns that help you reason about data storage in general. Across all of the topics, you’ll hear a lot about automation, infrastructure as code, and the need to automate routine operations to ensure consistency, repeatability, and ability to execute in a hurry.
    This should not be your first book on databases. It assumes a significant level of knowledge about data storage, devops, and automation. However, if you’re responsbile for keeping a database system up and running, because that database is keeping your business up and running, this book deserves a place on your bookshelf.
  • It is clear that Laine Campbell and Charity Majors know what modern database engineering is about. They know this from having worked in the field, they have fought through the issues, and they have found a way to describe and explain the concepts in a way that is informative, interesting, and relatable. Their writing feels as though it comes from a trusted adviser, from someone who truly understands and wants to share — without condescension.
    The topics in the book are grouped and sorted into well-ordered, carefully considered chapters which make it easier to internalize the information being presented. Topics range from deep technical to higher level soft skills and all are on-point and everything is very applicable to DBRE. Although, this book is for more than database developers and DBAs. Upper management, system architects, operational support engineers, and project managers would all do well to read and learn what this book has to offer. The chapter on Risk Management should be required reading for everyone who works in the computer industry.
  • A decade ago database guru Buck Woody wrote “Don’t be a DBA – Be a Data Professional”. His post explained why we needed to start being strategic (rather than tactical) with data. Soon after reading Buck’s post I was hired to lead the engineering team of a data management firm. In my new role I learned that there were points of stress between the engineering and operations teams, and much of that stress involved data and database issues. To get a feel for the goals and worldview of the operations domain I read books such as Schlossnagle’s “Scalable Internet Architectures” and Allspaw’s “Web Operations”, then I attempted to “shoehorn” the concepts of those books into the database realm.
    Things would have been much easier back then if the book “Database Reliability Engineering” had been available. First, data and databases are the focus and there is no need to attempt “shoehorning” them into concepts meant for another discipline. Second, the content of this book is backed by Laine and Charity’s many years of hands-on and intense data experience and their widely acknowledged expertise in database technologies. Finally, readers who put this book to practice will find it valuable that Service Level Objectives are introduced in an early chapter and then tied to concepts throughout the rest of the book.
    The obvious audience for this book will include those database developers and DBAs who want to be a Data Professional and see the bigger picture of their chosen field. However, this book should also be of interest to other technologists. Examples might be network administrators who find themselves tasked with database-related duties because their employer doesn’t have a dedicated DBA, system administrators interested in widening their skill set into the database field, or technology managers responsible for database resources.

Be the first to comment on "Database Reliability Engineering: Designing and Operating Resilient Database Systems"

Leave a comment

Your email address will not be published.