Hello people, how are you? Good to see you again! We had a good discussion on data sharing in our previous article. In this article, we are going to discuss the event-driven design and distributed data management problems that arise in a microservice architecture. As we have discussed in our introductory articles, a monolithic application has a single relational database. The main benefit of using the relational database is ACID transactions. The application can use them. ACID transactions refer to Atomicity, Consistency, Isolation, and Durability.
So operations like insert, update, and delete multiple rows and commit can be done easily. There are more benefits of using the relational database. It provides the SQL! It is rich, declarative, and a standardized query language. One can easily write queries by combining data from multiple tables. Then, the RDBS or Relational Database System query planner determines an optimal way to execute this query. One doesn’t need to worry about low-level details like how to access the database? It is also easy to query data because all the application data is in one database.
But when we move to the microservice architecture, accessing data becomes complicated. Can you guess why? We discussed something related to API and microservices in our previous article. It is just a clue! Ok, the reason is that the data owned by each microservice is private and it can only be accessed with the help of its API. Obviously, encapsulating the data means that the microservices are loosely coupled and it is good to be loosely coupled. But if more service access the same data, the scheme updates frequently. This is time-consuming! In addition to this, different microservices use different kinds of data. The modern applications store and process diverse kind of data and the relational database isn’t a good choice. Interestingly, in some use cases, a particular NoSQL database might have a more convenient data model and it could also offer better performance as well as scalability. For certain requirements, the microservice-based applications use a mixture of SQL as well as NoSQL databases. It is known as the polyglot persistence approach.
Consequently, microservices‑based applications often use a mixture of SQL and NoSQL databases, the so‑called polyglot persistence approach. It has many benefits like including loosely coupled services for better performance and scalability. But it also exposes the application to the distributed data management challenges. The first challenge is how to implement the business transaction that maintains consistency across multiple services? many modern technologies like NoSQL databases don’t support 2PC or two-phase commit mechanism. Also, maintaining data consistency across services and databases becomes essential. So we need to have another solution.
The second challenge is, how to implement queries that retrieve data from multiple services? Here, sometimes there are situations like there is no other obvious way to retrieve the needed data. So again, we need some solution to such a challenge. So all this motivated for the events driven design. So what is the event-driven architecture? How does it appear like? Let us have a look.
In the event-driven architecture, a microservice will publish an event when something important occurs. For example, updating a business entity. To get notified about such events, other microservices need to subscribe these events. When a microservice receives the notification, it updates its own business entities. The process usually leads to more events being published.
To implement business transactions, one could use events that span multiple services. Each transaction is possible after a series of steps and each step consists of a microservice updating some business entity. Further, it publishes an event which leads to the next step. Let us take an example of an event-driven approach to check the available credit before creating an order. The events are exchanged by the microservices with the help of a message broker.
First of all, an Order Service will create an order. It will have the status of New and will publish an Order Created event. A service known as the Customer Service will consume the Order Created event and will confirm credit for the particular order. It will then publish the Credit Reserved event. The Order Service will consume the Credit Reserved event and change the status of the order to Open. So in this manner, the event-driven architecture is implemented.
If the application is complex, it may involve additional steps like reserving the inventory at the same time when the customer’s credit is checked. But there are some conditions. Firstly, each service should automatically update the database and publish the event. Second, the message broker guarantees that the events should be delivered at least once, only then one can implement business transactions. Please note that these aren’t ACID transactions.
An event-driven architecture has its own advantages and disadvantages. The advantage is that it allows the implementation of transactions that covers multiple services and facilitates eventual consistency. Materialized views are also made possible with the help of event-driven architecture. While these are the advantages, let us now discuss the disadvantages. The programming model is more complex when using the ACID transactions. One must implement compensating transactions for recovery from the application-level failures. Another disadvantage is that the application should be able to deal with inconsistent data because the changes made by in-flight transactions are visible. Inconsistencies can also be seen if the application reads from a materialized view that hasn’t been updated yet. In addition to all this, the subscribers must also detect and ignore duplicate events.
As mentioned earlier, in an event-driven architecture, the problem of atomically updating the database and publishing the event persists. For example, an update in one service that is related to another service should both be updated atomically. While doing this, if the system crashes after updating the database and before publishing the event, the system leads to inconsistency or the system itself becomes inconsistent. To ensure atomicity, one should use a distributed transaction. It involves the database and the message broker. There are a few ways to achieve atomicity and we will discuss them now.
In this approach, the application should publish events using a multi-step process that includes local transactions only. The main thing is to have an Event table that functions as a message queue in the database. It stores the state of business entities. The application will initiate a database transaction, update the state of the business entity, insert an event into the Event table, and commit the transaction. There will be separate application thread to query the Event table, publish the event and use some local transaction to mark the event.
There are two main advantages of this approach. The first, it assures that an event has published for each update without depending on 2PC or 2-phase commit. Second, the application publishes business-level events that eliminate the need to infer them. The drawback of this approach is that it is error-prone. This is because the developer should remember to publish events. It is challenging to implement this when using NoSQL database because of the limited transaction and query capabilities.
In this approach the events to be published by a thread or process mines the database’s transaction or commit log. In this way, atomicity without using 2PC can be achieved. The application updates the database and results in changes which are recorded in the database’s transaction log. Then, the transaction log miner thread or process reads the transaction log. Further, it also publishes the event to the Message broker.
The advantage of this approach is that it guarantees that an event is published without using 2PC. It also simplifies the application by separating event publishing and application business logic. The disadvantage of this approach is that the format of the transaction log is proprietary to each database and can change between database versions. It can also be difficult to reverse engineer the high‑level business events from the low‑level updates mentioned in the transaction log.
This approach eliminates the updates and relies on events. Event sourcing doesn’t need 2PC for achieving atomicity. Rather than storing the current state of an entity, this approach stores the sequence of state-changing events. It reconstructs the entity’s current state by replacing the events. When the state changes, a new event is appended. As the event is a single operation, it becomes inherently atomic. This approach has many advantages. Firstly, it solves the major problem of implementing an event-driven architecture. The approach makes it possible to publish events whenever the state changes. Thus, it solves the issues of data consistency as well. Secondly, it avoids the object‑relational impedance mismatch problem. Thirdly, it offers a 100% reliable audit log for the changes that have been done to any business entity. This helps in implementing temporal queries. Fourth, the approach makes it easier to migrate from a monolithic application to a microservice architecture because the business logic consists of loosely coupled business entities.
The disadvantage of event sourcing is that it offers a different style of programming. So the developers need to learn a few things. The event store supports the lookup of business entities with the help of a primary key. The developer should use Command Query Responsibility Segregation (CQRS) for implementing queries. Thus, the application should be able to handle consistent data.
In this article, we discuss the event-driven design. We know that each microservice has a private datastore but different microservices could use different SQL and NoSQL databases. There are many advantages of the database architecture, but on the other side, it also leads to many distributed data management challenges. So the solution for all this is to implement the event-driven architecture. So we understand the event-driven architecture and know about its advantages and disadvantages.
The main challenge with event-driven architecture is to atomically update the state and publish events. For obtaining atomicity, we discuss three different approaches known as the message queue, transaction log mining, and event sourcing. We discuss all the aspects of these approaches and see how the drawbacks of one approach lead to the birth of another approach. So this is all about event-driven architecture and we will meet soon in our upcoming article. So be there!
Here is the link to the previous article of this series.