Saturday, July 3, 2021

Event-Driven (Notification) Architecture with AWS

Event-driven architecture has been becoming very popular. We recently built a billing system that is entirely based on event-driven architecture. Here are what we have learned.

When should we use Event-Driven architecture?

Event-driven architecture brings extra complexity (more error scenarios) and costs (more infrastructures) comparing to API calls. So DO NOT use it unless you want:
  • to reverse dependencies
  • to split loads, eg. 10k invoice generation takes a few mins but 10k payments take hours, so we can use a queue to decouple the 2 processes
  • eventual consistency (requires additional event storage to provide the capability of replay on the events' producer)
And make sure the producer does not care about the process results (success or failure) from consumers.

There are 2 patterns used in our systems.

Pattern 1: one producer → many consumers (SNS)

We create an AWS SNS topic that allows many consumers to subscribe
  • Lambda is used if a processor (such as generating invoices) is required to handle the events
  • An SQS is introduced if a service needs to know the events
  • Emails are sent if a third party system (such as alerting) needs to be integrated

Error Handling

Consumers handle errors based on their own situations. The following scenarios are only for the producer:

The effort to mitigate errors on the event producer instead of the upstream isn't worth it because it is impossible to 100% guarantee to save incoming events successfully. So make the event producer stateless to remove the unnecessary complexity if you can.
Additionally, the event producer should provide the capability to re-send events or query for historical events (it can rely on upstream to re-create the events).

Pattern 2: one producer → one consumer (SQS)

a lambda gets triggered by every message

consumer pulls messages from the queue regularly

Note: the push model does not reverse the dependencies since the implementation of AWS SQS and AWS Lamda is tightly coupled. However, the coupling can be solved by adding an SNS between the event producer and SQS, please refer to Pattern 1.

Error Handling:

All the scenarios in Pattern 1 apply.

Additionally, for the push model:
  • A DLQ can be easily configured with the original SQS.
  • A simple lambda can be introduced to copy messages between queues for replaying failed messages.

Best practices:

In an event-driven architecture, the delivery of messages is very hard to be predictable.  So to reduce the scenarios in error handling, we can follow these practices:
  • idempotency (retry-safe design) that each event should be safely executed multiple times without side effects
  • event order insensitive: the system should not assume the events always come in order. It should be able to handle the events regardless of the order.
  • the fully automated monitoring is in place
  • the producer should not care about the process results (success or failure) from consumers

An Example

The diagram describes a billing system that listens to an agreement (purchase) event and generates recursive invoices every day.
a billing system


Event-driven architecture is very useful when we want to decouple some parts of a system. However, it also brings extra complexity to our overall system design and requires good developers who can follow the best practices to run it well. So use it carefully and make sure the cost is well returned. 


Saturday, June 5, 2021

Stay Objective in Technical Discussions

In my last post, I mentioned that longstanding discussions and arguments are one of the biggest wastes in software development. I described a way to structure your “pros and cons” in technical discussions. However, if people keep expressing themselves subjectively, we’ll still suffer in endless arguments.

Today, I’m going to describe a framework that helps people focus on objective aspects when we compare technical options.

When should we apply this framework?

In my previous post (Best Time to do Technical Pre-design), I described that the time spent in discussions is not worth if the problem can be solved by just coding and refactoring easily. So this framework should only be used on the unavoidable discussions for complex problems.

For example:

  • Technical Vision
  • Aligning technical solutions with other teams
  • Technical solutions at the beginning of a project

It is not for choosing technologies such as a language or a type of database.

Before we dig into the framework, let’s have a look at the requirements.

Requirements are keeping changing

It is one of the always-true statements in software development. The requirements we think are important at the beginning might not be implemented after many years. Unexpected requirements come all the time. And the understanding of a requirement also changes over time.

So the stability of requirements looks like this:

Requirements change over time

Because of that, it is impossible to guess future requirements right at the beginning. It means changes are inevitable and unpredictable. So It is very important to maintain your systems in a state that easy to be changed.


However, maintaining changeability is not easy.
Clean Architecture in Math describes that the smaller scope a future change lands, the smaller effort the change requires. And also, the fewer dependencies we have in our architecture, the less chance that a change lands in many places. (If two systems are tightly coupled, they likely need to be changed together.)
fewer dependencies, easier to change

In an organization, normally we have 5 levels of scope:

  • Organization level: a change requires multiple Verticals involved
  • Vertical level: a change requires multiple Domains involved within one Vertical
  • Domain/team level: a change requires multiple systems/components involved within a Domain
  • System level: a change lands to a system including databases
  • Code level: a change requires only code changes in a system
organization structure

So by reducing the higher level dependencies, we can reduce the scope of changes.

There are two types of dependencies:

  • Data dependency: the consumer needs to understand the data mastered by the provider, vice verse
  • Function dependency: the consumer needs to call the provider to fulfil a requirement

The Framework

Now we know that to keep your systems changeable, we need to reduce the higher-level dependencies as much as possible.
We can compare technical options by visualising the dependencies and calculate the complexity of each option so the least complex solution can be picked to keep your system changeable.

The steps:

  1. List requirements: only pick the requirements that you are going to support in a near future, do not pick any uncertain requirements
  2. Choose 2 to 3 levels: pick the levels relating to your decision (the following example chooses higher levels)
  3. List your options: describe your options with diagrams
  4. Count dependencies in each level for the options
  5. Summarise your overall dependencies
  6. Calculate the complexity of each option: you can simply triple the complexity of higher-level dependencies because the changes in higher levels are much much harder than lower levels. You can also adjust the factor based on your context.
  7. Choose the least complex option
The table for the comparison looks like this:

If the difference of each option is small, then means it doesn't matter which one to pick.


Staying objective is one of the key aspects to keep discussions valuable. This framework is trying to give you a way to pick options with data instead of arguing. It helps you to calculate the complexity by Math of each technical option instead of talking about extensibility or flexibility by feelings, so teams can reach an agreement without valueless arguments.

Monday, May 31, 2021

Structure your “pros and cons” in technical discussions

If an argument can not be solved in 5 mins, it can not be solved forever unless you bring some data. Uncle Bob mentioned this in his book <Clean Coder> in 2011.

However, after 10 years, most people are still used to use only “pros and cons” in technical discussions. Different roles (developers, BAs, QAs, product owners) bring their “pros and cons” from their own perspective with their own language. They list everything they can think of whatever it matters in the discussion or not. The more people are involved, the longer list is. It ends up with a table like below:

options with pros and cons

How much does the table help you to choose one of the options?

Problems of “pros and cons”

  • Subjective: pros and cons listed by different people are from their own perspective, mostly they are not comparable
  • Ambiguous: the same point sometimes can be a pro in some context and a con in others
  • Lack of visualisation: the option with more pros or cons doesn’t mean better or worse
  • Not friendly to readers: readers have to read everything to understand each option
  • Time-consuming: the writer needs to type a lot of words

As we know, everything has pros and cons, so just listing them without a structure that allows people to compare options easily wouldn’t help to reach an agreement. It normally ends up with the tyranny of the majority or blind belief in authority.

Structure your “pros and cons”

The intention of “pros and cons” is to ask people to listen and understand others’ opinions. Get everyone to align on the following things during a decision-making process/discussion:

  • the aspects (make it objective as much as you can) matter to this decision
  • priority of the aspects
  • data or agreed rate on each aspect for each option

All of them can normally be extracted from the original “pros and cons”. The final structure looks like this:

options comparison

The table visualises things we care about and the trade-offs of each option. The discussions should focus on getting agreed on each cell in the table instead of listing more “pros and cons”. Once the content of the table gets agreed on by the participates, an aligned decision is normally reached.


Technical discussions are one of the biggest waste in software development nowadays. So stop wasting time arguing “pros and cons” with your colleges. Start visualising them with others so that you can improve your efficiency in your decision-making process.

Sunday, April 25, 2021

How programming languages contribute to clean code

I had a conversation with some friends about the programming languages different companies use. Many companies only choose the most popular language such as Java and C#. They believe "Developers can write clean code in any languages" so just pick one which has the biggest resource pool to make hiring easier.

However, clean code is the key for a technology company to succeed. It provides the company to maintain a fast pace for its delivery. Companies who limit the languages don't realize that they might be losing the speed of delivery, which can be measured by four key metrics.

What decides the level of clean code?

The good code is working code (pass all the tests) with very good readability (reveal intentions) as well as least elements (no duplication). The good code also allows the reader to exit early so they can understand the code quickly by reading the minimal amount of information.

We have been trying to achieve that for decades. Many frameworks, libraries, language features (I’ll call it “styles” in the rest of the article) have been introduced for this purpose to allow developers to write less and clear code for more things, to separates the concerns of different levels of abstractions.

However, we have a problem that all of the styles are contextual. If we use a low-level style (close to computer interaction such as copying an array from memory A to memory B) to solve the high-level problems (close to the real world such as generating an invoice), the code is very hard to be clean. But if we use a high-level style to solve the high-level problems, we can write much cleaner code easily. So it’s all about putting the right styles in the right places. The more styles a programming language provides, the higher chance to have clean code. But there is also a higher chance to have messy code as well if we fail to choose the right styles.

So many people think the level of clean code totally depends on the developers.

Is that true? Do languages have no contribution to clean code?

Styles supported by different languages

I choose 4 languages as examples, but keep in mind that there isn’t a language that supports all the styles (no language rules all):

  • Java - it is a pure OO language. It supports basic lambda expression since Java 8. But the capability of the language is very limited. Most time we have to write code in low-level styles to solve problems.
  • C# - very similar to Java. But because of LINQ syntax which is more powerful than stream in Java, it allows developers to write cleaner code in high-level styles. However, the patterns are still very limited.
  • Kotlin - it has massive improvements on the basic syntax from Java. It also creates immutable collections which provide a much better experience than stream. However, due to the lack of ways to combine types and methods, it is still hard to keep complicated logic clean.
  • Scala -  it is a multi-paradigm language. It supports many styles from low-level styles to high-level styles. It supports 7 levels of mastery in functional programming which Kotlin/Java/C# can only support 2 levels because we can’t implement type classes easily. Also because of the for-expression and implicity features, it allows developers to easily separate the different levels of abstractions.

From these examples, we can see different languages do support different numbers of styles. Some support much more styles than others, which means they provide more options to allow good developers to write cleaner code. On the other hand, it will be very hard for developers to write clean code if the right styles are not supported by a certain language.

Relationship among clean code, developers, and languages

So the relationship among clean code, developers, and languages look like this:

Notice: the diagram does not say a language is better than another, it says some languages provide more options than others. As long as developers make the right choice, they can write better code in a language than others. And the more options a language provide, the better code it can be.


Changing a language does not guarantee an increase in your team performance. However, stopping good developers to write better code guarantees a decrease in your team performance. Companies that want long-term success should allow good developers to choose a more powerful language when they reach the bottleneck. It helps the company to attract more good developers as well as to continuously improve the team performance.


How languages evolve?

By the way, good developers also try to extend the language to support the styles they want until it’s too hard, then they create a new programming language. Many new languages die because they couldn’t fit the purpose. Only a few survive with a strong community and its ecosystem. They can normally be safely chosen for commercial use.

Be careful about choosing a language for learning purpose

Trying new languages in a long-term system is dangerous. Writing clean code in a language requires a certain level of proficiency in that language. It takes time to learn. We should practice the new language in coding dojos first, then experiment with it in some short-term products or very simple products which you can rewrite within a week by your mastered language.

Sunday, April 18, 2021

Cohesion and coupling of an object in OO programming

I keep seeing developers extracting coupled logics into new classes, which reduces the cohesion of objects and makes objects tightly coupled to each other. This article describes a way to check the cohesion of the code in an object and the coupling between objects, which helps developers check whether the refactoring improves the cleanness of the code or not.

Please note that it can not be applied to pure functional programming.


VF = number of private fields (no non-private getter or setter, not a property)
OF = number of non-private fields
U = number of usage of fields by public methods (count 1 per public method per field)
M = number of public methods

C = cohesiveness (0 to 1, 0 means the worst, 1 means the best)

C = U / ((M + OF) * (VF + OF))

Other rules:

  • protected counts as private
  • Calculation includes all fields and methods in the parent classes (except Object) 


F = number of fields (both private and non-private) as:
  F = VF + OF

R = number of references

P = coupling factor (0 to 1, 0 means no coupling, 1 means fully coupled to others)

P = R / F

I = factor of independence
I = 1 - P


1. Full cohesion:

public class Example {
    private final String value; // VF: +1

    public Example(String value) {
        this.value = value;

    public int m1() { // M: +1
        return value.length(); // U: +1

// C = 1/((1 + 0) * (1 + 0)) = 1

2. No cohesion:

public class Example {
    private final String value; // OF: +1

    public Example(String value) {
        this.value = value;

    public String getValue() { // M: does not count
        return value;  // U: does not count

// C = 0/((1 + 0) * (1 + 0)) = 0

3. Half cohesion:

public class Example {
    private final String value; // OF: +1

    public Example(String value) {
        this.value = value;

    public String getValue() { // M: does not count
        return value;  // U: does not count
    public int m1() { // M: +1
        return value.length() + value.indexOf("a"); // U: +1

// C = 1/((1 + 1) * (1 + 0)) = 0.5

4. Coupling:

public class Reference {
    public int findI() {
        return 1;

public class Example {
    private final Reference r1; // F: +1 & R: +1

    public Example(Reference r1) {
        this.r1 = r1;

    public int m1() {
        return r1.findI();

// P = 1/1 = 1
// I = 0

5. Mix

public abstract class Parent {
    protected final int a; // VF: +1

    public Parent(int a) {
        this.a = a;

public class Reference {
    public int findI() {
        return 1;

public class Example extends Parent {
    public final String f1; // OF: +1

    private final String f2; // OF: +1

    public String getF2() {
        return f2;

    private final String f3; // VF: +1;

    private final Reference r1; // VF: +1; R: +1

    public Example(String f1, String f2, String f3, int a, Reference r1) {
        this.f1 = f1;
        this.f2 = f2;
        this.f3 = f3;
        this.r1 = r1;

    public int m1() { // M: +1
        return l() + a + r1.findI(); // U: +3 (f3, a, r1)

    private int l() {
        return f3.length();

// C (Example) = 3/((1 + 2)*(3 + 2)) = 0.2
// I (Example) = 1 - 1/(2 + 3) = 0.8

Monday, April 12, 2021

The best time to do technical pre-design

I've been seeing developers discussing the details of implementation a lot recently. They spend a lot of time writing documents for choosing an option of data models, drawing flow diagrams to explain different ideas for a requirement, discussing how to entities design should be in tech huddles. But they just don't write code.

Pre-designs are required but not all the time. Sometimes the best way to find the best design is to write code. In this blog, I'm going to explain when we should do pre-designs and when should not.

What is design (noun)?

  • High-level design: the structure of domains, the interactions (also called contracts or interfaces) between different high-level domains

  • Middle-level design: the structure in the high-level domains, which is about subdomains and systems, plus the interactions among them

  • Low-level design: the structure of code in a system, which is about components (packages, classes), plus interactions among them.

Interactions include:

  • what - input and output

  • how - invoke or subscribe, plus the direction of the dependency


What is good design?

A good design supports future changes at a low cost at all levels.

What is a design activity?

Try our best to make a series of decisions for good designs at different levels.


Can we get the best design from the first design activity?

Of course, NO. Decisions in design activities are objective. They are very similar to bets. We are not able to verify the correction until we implement them.

The only way to find the best design is through evolutions by following the 4 rules of simple design and YAGNI all the time. This is very similar to the TDD practice.

However, the cost of each evolution is also based on the level of designs. Code-level refactoring is much cheaper than the system level. The reason is the complexity of high-level designs is much higher than low-level designs.

Diagram 1: cost to make an evolution (design change) vs the complexity of the current design


The diagram shows that it is not possible to only rely on evolutions to find the best design when the system gets complicated. So we have to find another way to maintain them at a reasonable cost. Since we can’t reduce the cost of each change, the only way is to reduce the number of changes. This is why we need the pre-design activities which should help us to avoid some future changes. But please notice that we’ll no longer be able to have the best design, and we have to rely on experts to make good decisions.

Summarize the 2 ways to make a good design:

  1. TDD - refactor to make the design better

  2. Pre-design - rely on experts to make good decisions


When is the best time to have pre-design activities?

Diagram 2: compare 2 ways to make a good design


The diagram shows that if the systems involved (eg. systems in different domains) have high complexity, pre-design may perform better than the TDD approach. But keep in mind that keeping the designs in high-level (like contracts between domains) and avoiding the internal designs in each system because TDD is still the best approach for the internal designs which don’t have the high complexity.



We should follow the TDD approach by doing a lot of refactorings to find the best design in low-level designs. We should do pre-design to get a good enough design in high-level designs. We can mix the 2 approaches in middle-level designs.


By the way, a small company such as a startup usually doesn’t have high-level designs. So they can iterate very fast without having pre-design activities. However, they must make sure to do refactorings to keep the systems clean so that they can maintain the speed of delivery.


Monday, February 8, 2021

Clean Architecture in Math

What is software architecture?

Architecture is more like the decisions you wish you could get right early in a project.

What is a clean software architecture?

An architecture made by full of good decisions. As:

CA = cleanness of architecture

GD = number of good decisions

AD = number of all decisions

CA = GD / AD


What is a good decision?

A good decision allows future changes to become easier. As:

F = number of future changes

LE = low-effort changes

ME = middel-effort changes

HE = high-effort changes

EHE = extramely-high-effort chagnes

F = LE + ME + HE + EHE

CA = GD / AD = (10 * LE + 5 * ME + 2 * HE) / (10 * F)


How do we measure effort?

Effort depends on the complexity of the target systems and parties (teams, domains, verticals) involved. As:

C = Complexity of a system

T = number of teams involved

D = number of domains involved

V = number of verticals involved

E = total effort

E = (C1 + C2 + ... + Cn) + (T - 1) + 10 * (D - 1) + 100 * (V - 1)

LE = count(0 < E <= 10)

ME = count(10 < E <= 30)

HE = count(30 < E <= 100)

EHE = count(E > 100)


How do we measure complexity?

I use an extremely simplified version here for easy understanding.

Complexity is exponential to the number of requirements in a system. As:

C = complexity

R = number of requirement

P = complex constant (about 0.0004 depends on languages)

C = P * R * R

Assume all the systems adopt continuous deployment.


CA = (10 * count(0 < E <= 10) + 5 * count(10 < E <= 30) + 2 * count(30 < E <= 100)) / (10 * F)

F is decided by the business which is decided by the market. It’s not changeable.

So clean architecture is trying to do the following things:

  1. Maintain a system in a proper size by limiting the number of requirements supported.
  2. Analise potential future changes to predict future changes
  3. Reduce the impact of future changes as much as possible
  4. Choose the options which support above