system design Archives

What do you need to know before deploying Kafka in production?

Kafka is low-latency component that act as the broker between the producers and the consumers.

Low-latency is extremely important and you’ll need to verify and monitor that the cluster is as follows:

Low-latency disk IO

Kafka write messages to disk (from producers) and messages are read from disk (consumers)

Kafka use data in sequential disk access that is very fast.

Zero Copy – this means Kafka copy data from local disk directly to the network interface.

Disk size per Kafka broker should probably be 6TB or more, depending on the use case.

RAM

RAM is extremely important as Kafka process uses Java HEAP.

Page Cache – is the main disk cache, the Linux Kernel uses page cache as buffer for reading and writing to disk.

If there is memory available the page is kept in the cache without accessing the disk.
this is extremely efficient and what makes Kafka so fast.

Network

High Throughput as Kafka broker the entire data between services.

What do you need to know after deploying Kafka in production?

Tuning and reassign partitions

Even after your cluster is working as expected in production you’ll need to tune it using new config options and reassign partitions.

Monitor

Monitor your cluster is extremely important as it reflects that actual status of the cluster, and since the cluster is the main component of Event-Driven-System the cluster should perform in micro-seconds and milliseconds.

Monitor will make your debugging much easier since Kafka metrics will display current status.

SpinningOps helps startups improve their system design, contact HERE and ask what can we do for your application.

If you are using Microservices approach in your stack then you might want to take it a step further and add a dedicated database for each services.

In this blog post we’ll discuss the benefits of having databases in Microservices.

Dedicated database per service

When every service have its own database its actually simplify the process since every application is based on CRUD.

And every service can be built with specific requirements that it needs.

Mix databases types

One of the common scenarios when building a software is how to store the data, what kind of database should we use? SQL? NoSQL?

Choosing one type of database over another can limit the application stack, why not use all types of databases?

So lets assume that the login service stores users, passwords and emails. so it does not require too much efficiency and speed since the login is done once per user as long as the sessions is open and the user did not logged out.

In this case any we can choose the easiest and fastest login implementation.

What about feed service? lets say your application have a feed of data for the users so this should be low latency and very fast, so you’ll probably want to use key-value store database.

You get the gist, every service is now have its own mini stack.

Enhanced database security

Once your services use their own database, it means that only that specific service is accessing that database.

meaning access is based access.

Unlike one big database that all services connect to, and lets be honest probably use the same credentials.

So you can add a rules that only a specific service can access its database and not other services, also only that service have the credentials for CRUD for that specific database.

Reduce database load

If every service is connecting to its own database then the R/W operations are faster due to less database connections, unlike one big database that all services connect to.

That reduce hardware requirements as well, since the database hardware can be designed by the load of each service.

Clean database

What I mean by clean database is that every database have unused data or deprecated data that need to be cleaned.

By using dedicated database for each service that cleanup is easier since you know that the data belong only to its parent service and any modifications to the application can be easily applied to the database too.

Lets assume you decided that a specific service is deprecated, you’ll probably want to delete its data.

How do you do that if you work with one big database? but it that service is only using its dedicated database then you simply deprecate that service with its own database.

Backup is easier

When you have one big database for all services you need to backup that entire database regardless of the usage or unused data.

What about restore? same applies here that you’ll need to restore the entire database and not specific data that might be affected.

Lets assume that every service have its own database and you need to schedule a backup or restore, then you only need to so it per service and not the entire database with all services data.

Posts tagged system design

Kafka In Production