Microservices Level Up
Inter-Service Communications
August 17, 2020
Microservices Level Up
Event-Driven Design for Microservices
August 18, 2020
Show all

The Concepts of Data Sharing

Microservices Level Up

Hello friends, good to see you again! Welcome to yet another informative article of our microservice architecture series. After discussing inter-service communication, it is time to discuss data sharing. Once you know how the communication takes place, the next step is to understand data sharing. Here, we will study how to manage inter-service dependencies dealing with the implicit interfaces in the form of data sharing. 

In our previous articles, we have discussed that the problem with distributed architecture is dependencies. In a microservice-based architecture, the services are put up as isolated units. They manage or deal with some set of issues, but complicated systems rely on their cooperation and integration. In monolithic applications, the dependencies are in the form of method calls and they are less complicated. In microservices, each and every microservice operates on its own. But in many functionalities, they need mutual cooperation. In some systems, one part of the system needs to access the data being managed by the other part of the system. This is where the concept of data sharing arises. Data sharing means, two separate parts of a system are sharing the same data. Developers might be aware of multithreaded programming, it isn’t easy to share data right? Yes, multithreading can be difficult but data sharing has its own problems. Can a single database be shared? Is it scalable while adding services? Can big data volumes be handled? etc.

So we already have such problems right? Could it be that we made a mistake while modeling the data or API? Yes, it could be the case. We need to understand the concept of separating concerns. There is something known as loose coupling which means that the microservices should be flexible enough to be modified without affecting other microservices. Loose coupling should always be there. The other thing is problem locality. It refers to the related problems, they should be grouped together. So if a change requires an update in another part of the system, they should be close to each other.

In loose coupling, the microservices should provide interfaces for modeling data and access patterns related to the data. They should stick to the interfaces. Problem locality refers to the concerns that microservices should be grouped together as per their problem domain. If there is any change, other microservices also need the same changes respectively. In short, there should be separate problem domains for the developing microservices. It should be quite clear. The problems shouldn’t be split into independent units of work. Sharing data inside the domain boundary makes more sense as compared to sharing data between unrelated domains.

Another interesting question is that, can the microservices be merged into one and are they too much into each other? The impact of making them into a single service can be positive as well as negative. Microservices should be small, but only till the level of convenience. One should always maintain a balance. It is the key!

Now let us understand what means shared database?

Shared database falls under the category of mutable data. The biggest issue with shared data is that “what to do when there are changes or when it changes?” To solve such issues, there are different approaches and the shared database is one of the approaches. So let us describe it and discuss its advantages and disadvantages.

For dealing with shared database across databases, there are two approaches available. One is known as transactions while the other is known as eventual consistency. One may consider transactions as a type of mechanism in which the database clients are allowed to make sure of the series of changes, whether the changes have occurred or not. It guarantees consistency, especially during distributed transactions. For implementing distributed transactions, there should be a transaction manager that should be notified when a client wants to initiate a transaction. Only when the transaction manager approves or allows to go ahead, then and only then the transaction process begins. The transaction manager is also responsible for informing the intentions of a transaction to other clients. For small changes and transactions, this approach is quite useful.

On the other side, eventual consistency deals with issues of distributed data by allowing inconsistencies. Yes, you read it correct! It allows inconsistencies but for a certain period of time. Here, the systems assume that the data is in an inconsistent state. The situation will be handled after some amount of time. The operation can be postponed. Eventual consistency is quite helpful when it comes to big volumes of data. Both the approaches are good, but their implementation depends on the type of application/system.

In most of the applications, services need to persist data in some database. So what are the conditions for a shared database in the microservice architecture?

Firstly, the services should be loosely coupled and then only, they can be developed, deployed, or scaled independently. Secondly, the business transactions should enforce invariants that could span multiple services. Thirdly, some business needs to query data owned by other services and some queries must join data that is owned by multiple services. Fourthly, databases should sometimes be replicated and shared for scaling. Fifth, some services have different storage requirements and for some, a relational database could be a good choice. So what is the solution for all this?

The solution is to use a single database. This database is single but shared by multiple services. Each service can freely access the data from this database even if it is owned by other services. It can be done using local ACID transactions.

What are the advantages and disadvantages of this pattern? The advantages are that a developer can use the most common, familiar, and straightforward ACID transactions for data consistency. The database is single and thus, it is easy simple to operate it. There are a few disadvantages as well. The first one being, development time coupling issue. A developer working on some service might need to coordinate schema changes with the developers of other services that are accessing the same tables. This can slow down the development. The second one is runtime coupling issue. All the services access the same database and interfering with one another can be frequent. So some services can be blocked as well or can go into wait mode. Also, a single database might not be enough for large data storage.

Now let us have look at the concept of Database per service?

As per the name, the concept of database per service relates to providing an individual database for an individual service. There are chances that every service doesn’t need an individual database, it depends on the type of application. For example, an online store application will have services like orders and customer service, these will need the database to store information regarding customers. But saying is a different thing and implementing it is a different thing. The main problem is, what kind of database architecture shall be followed in a microservice architecture? What are the conditions that need to be fulfilled in terms of a microservice architecture based application? Let us discuss!

First of all, the services need to be loosely coupled. Only then, the services can be developed, deployed, and scaled independently. Then, the business transactions should enforce invariants that span multiple services. For example, if one service ensures the fulfillment of a particular condition, then after the completion, all the other related services should be updated with the same information. Also, there are certain business transactions that need information from other services. Like one service needs to refer or extract data from other services. So this should also be possible. Some queries might also need to join data that is not owned by those services but is owned by multiple other services. A database should also be able to replicate and share for scaling. Also, different services have different storage requirements, and they should be considered. 

The solution for all this is simple. Each microservice persistent data should be kept private to that particular service only. It should be accessible through the API. Other services can’t access the data directly, they need to access using API. So how to keep a microservice persistent data private? There are a few ways in which it is possible. One doesn’t need to provision a database server for each service. If you are using a relational database then there are three options. First is the private-tables-per-service. Here, each service owns a set of tables that can be accessed by that service only. Second is the schema-per-service. Here, each service has its own database schema and is private to the service. Third is database-server-per-service. Here, each service has its own database server. Out of these, the private-tables-per-service and the schema-per-service have the lowest overhead. The schema per service is preferred as it has a clear ownership. In some big applications, high throughput services might need their own database server.

Creating barriers to enforce this concept is good. One could assign different database user id to each service and also use a database access control mechanism like grants. Developers will be tempted to enforce encapsulation for bypassing a service’s API if there are no barriers.

Conclusion

In this article, we have discussed data sharing. If you notice, each and every article of the microservice architecture series is a type of continuation of the previous article. Data sharing is the next step after inter-service communication. So in the introduction, we discuss the concept of a single database with its advantages and disadvantages. We also understand the concept of separating concerns. Once we are through with single database, we hop on to the shared database. Here, we understand the concept of a shared database, the conditions to fulfill it, the solution, advantages, and disadvantages. In the same manner, we discuss the concept of database per service. Here we end this article and will meet soon, in the upcoming article. So stay connected!

Here is the link to the previous article of this series.

Tao
Tao
Tao is a passionate software engineer who works in a leading big data analysis company in Silicon Valley. Previously Tao has worked in big IT companies such as IBM and Cisco. Tao has a MS degree in Computer Science from University of McGill and many years of experience as a teaching assistant for various computer science classes.

Leave a Reply

Your email address will not be published.

LEARN HOW TO GET STARTED WITH DEVOPS

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!

Level Up Big Data Pdf Book

LEARN HOW TO GET STARTED WITH BIG DATA

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!

Jenkins Level Up

Get started with Jenkins!!!

get free access to this free guide, downloaded over 200,00 times !

You have Successfully Subscribed!