When I first started leading software projects that involved system design, I thought system design is some sort of dark magic that only a few special individuals posses. How do these people make decisions that can have such huge implications on the service we are building, and, in a lot of cases, impact on the success of an entire business. It is so complex and so difficult to wrap your head around.
During these times, one concept I learned was the idea of system decomposition, an essence of system architecture, breaking the whole down into its composite components and then developing understanding of how these components interact with one another.
The analogy that came to mind almost instantly is designing a house. When you are tasked with designing a house, it seems like a daunting task at first. But if we break it down into component parts, it becomes much easier to understand as long as we also understand how all of the components interface with one another it a way that creates a safe and functioning home.
Naturally I started breaking down a software system we were building into components. To do that I started thinking about our system’s functional requirements: Invoicing, Billing. Shipping. Thinking about these functions as separate system components made a lot of sense to me and made it easier to understand how the entire system needs to be built, one component at a time.
This, again, I thought, was similar to building a house, we got a kitchen, bathrooms, bedrooms, garage all parts of the house all performing a separate function, so designing each separately makes the entire process seem more manageable.
However, as I started looking at it deeper, I quickly realized that even though, it appears simple on the surface, in practice it’s really complex. For instance, if we break things down into functional components: Invoice, Billing, Shipping. For the system to perform its function A (invoice) would call B (billing), which in turn calls C (shipping). This tightly coupled design makes it a requirement for the logic in service A to include these calls to B and from B to C. The entire system needs to have a great degree of synchronization and awareness of any errors that occur in any of its coupled services or sub-components. What if we need to change any of the components? Because the services are so tightly coupled, it would involve having to make changes everywhere. Also if every single function is a separate service, pretty soon we may have hundreds of services to worry about. Also, because there are so many ways to perform a particular function depending the circumstance, each function’s business logic will just continue to expand in size and complexity. Also, if we make the client tier of our system responsible for managing the orchestration between different function-based services, the client will need to contain a lot of the business logic that shouldn’t really belong there. In addition to that, any service change will require significant changes on the client side also. If these problems weren’t enough, the client will need multiple entries into our system (via A, B and C in our example) so we have to worry about larger surface for security controls, authentication and other failure modes.
Even when we extend this logic into our house design it just doesn’t make sense. Breaking down our house design into its functional components such as eating, sleeping, bathing, learning, working, doesn’t seem right.
Can we tell the builder to go ahead and start building the cooking part of the house? Do we put a stove on a bit of flat flooring surface, connect the power, build some walls and roof over it and call the cooking part done? This cooking part is not going to work as a part of our house, how will the next functional part (e.g sleeping) be added, do we do the same thing we did for the kitchen all over again and call it done?
In additional to that, the system testability becomes a huge issue. Once we build one function such as invoicing, can we test it? Well, we could test invoicing as a unit test, but how will it work as a part of the overall system? We won’t know until other functional bits are built. What if by the time we are done with shipping component of the system, we discover invoicing is broken? Making changes this late in the process is very costly.
Going back to our house design example, can we test the “Eating” part of the house on its own and declare it done?
So what is right approach to system decomposition?
A much better approach is to decompose the system based on the areas of potential change and then encapsulate the areas of volatility as separate components.
For instance, our system has to read, write, store data, implement security controls, perform CI/CD functions, provide UX for its users, etc. Each of those functions can be performed in a variety of ways depending on the nature of our data, structure of our APIs and the database type and schema we choose to use. Each of these components can be designed and fully tested.
Similarly, we can test the power in our house independently from plumbing, for instance, with a high level of confidence that it will work correctly when the entire house is fully built.
It’s very important to be aware that we are not listing solutions, but rather our system requirements. For example, storage is a requirement (which can be solved in a number of ways including caching), while DB is a possible solution. Solutions are often implementation details, rather than a core requirements. The idea is that we should be able to swap out any system component with minimal consequences to the rest of the system.
So how do we identify these areas of voletility?
Let’s think from a business perspective, what drives the decision to make a change in a system? It is typically either the evolving and changing needs of our customers or the new customers coming onboard and asking for different features. In summary, volatility can be associated with a single customer over time or over a multitude of customers.
Going back to our house example, what things do we see change in our own homes over time. From the top of my head I am thinking furniture, appliances, utilities, appearance (paint color, decorations, etc), power (electrical system), plumbing, roofing, structure (framing). All of these may be fully implemented by separate subject matter expects without the need for a whole lot of coupling. Each of these systems has a significant degree of volatility depending on the implementation. Let’s take the electrical wiring for instance, we have AC / DC, cabling length, termination, etc.
Our goal is to encapsulate components that are likely to change in our system over its lifetime. For instance as our needs change we may need to introduce a different storage system (such as AWS Aurora or S2 buckets) and different 3rd party authentication mechanism, or a DDoS mitigation service, different UI, etc. These things can change without having to decommission and rebuild the entire system from scratch.
Thinking of the lifetime of our house, I can think of changing wiring, roofing, adding a room (framing, layout, etc), getting new appliances, changing interior / exterior paint color, etc. This is a good way to identify all possible areas of volatility that we should design as separate system components.
Does it make sense? Sure. It is easy? It is not. It’s very hard to rewire our brain to identify areas of volatility as our system components as we often gravitate to our comfort zone of functional decomposition. Just like with anything else in life, the only way to get better is practice. I find it very useful to practice this way of thinking by looking at the software systems I am familiar with, the past and current projects or even everyday object I use on a daily basis like my bike, camera and computers, and ask myself how are these systems designed and could they be improved with a volatility-based approach?
Unsurprisingly, the answer is often yes.