Posts in DevOps

Cloud Migration Is Easy

Home ยป DevOps

Cloud migration can be easy if you plan it and understand how cloud works

Cloud vs. On-premises

Both cloud and on-premises are just servers that act as compute resources but very different in approach, maintenance and information security.

For example, when your company have their own servers farm (it can be a server room in the office or at a hosted facility) the amount of maintenance is extremely high compares to cloud.

Some of the maintenance tasks can be:

  • Physical alarm system
  • Air conditioning monitoring
  • Operating network devices (switches, routers and firewalls)
  • Local backup and remote backup
  • Hardware maintenance (servers)
  • Operating local cloud (hypervisors & VM’s)
  • etc

For cloud it’s mostly:

  • Operating services
  • Optimizing billing
  • Cloud security
  • Automation
  • etc

It’s very different and that is why companies struggle with migrating their compute infrastructure to the cloud.

Cloud approach vs. On-premises approach

The main reason companies struggle to do an easy migration to the cloud is approach, they think that what worked with on-premises will work in the cloud too.

The two main positions for those two approaches are cloud engineer and system administrators, and both come from a different background (but that’s on a separate blog post).

The mindset of On-premises and Cloud correlates with production or non-production operation of the company, usually companies that have on-premises does not have production environment (but some do have production). and most of the companies that use the cloud have a product in production environment (some don’t).

So the approach is production vs. non-production and mainly how to deploy a software fast and reliable to the compute resources in the cloud.

How to have a successful and easy migration?

Understand the approach is very different between cloud and on-premises (that’s why DevOps was invented).

Train or hire cloud engineers that understand DevOps approach.

Get a cloud architect to lead the cloud migration project (verify the cloud architect credentials and experience before hiring them!).

Understand that this kind of project is not cheap and will require a budget.

To Learn more about cloud migration you can download the cloud-migration PDF HERE

spinningops team had very successful cloud migration projects but we only onboard clients that understand DevOps approach, if you’re interested in cloud migration fill this FORM

Build and deploy Flask app to EKS

Here’s an example of how to adopt DevOps approach in your development process.

Code your application

In this example we’ll use Flask to build a website and you can adopt this approach to any programming language

This app is simple Flask (video link bellow) with one route that returns index.html file.

Build your application

Once the current version is ready you can build the project, the pipeline should include build and tests.

If you’re using Static-typed language then you’ll need to compile the application before building the container image and then copy the artifact to the image.

For Dynamic-typed languages there no compile so it’s just building the container image using CI pipeline.

Deploy your application

To keep a software product updated and maintained, the deployment process should be on-demand and easy (click a button).

This code > test > deployment should be repeated for every code change.

What can be accomplished here?

By doing the work (coding) and implementing a deployment process the gap between development and production is resolved and the outcome is faster software delivery (CD) with reliable code (tests).

So DevOps is an approach and a way to build software.

Check this video tutorial for E2E build and deploy Flask website to Kubernetes cluster (EKS).

Cloud Native

How does your team is working with cloud infrastructure?

Cloud Native Topics

  • Development process
  • System design
  • Builds and packages
  • Deployments
  • Release
  • Cloud infrastructure

Development process

When working with cloud systems the proven methods to develop and run applications in production is DevOps

DevOps is the practice of code > build > test > deploy > release > repeat

DevOps is about bridging the gap between development to production

System design

You can use different methods of designing applications but if your applications are stored in the cloud then Microservices might be better to implement, otherwise why use on-premise approach when you can enable the benefits of using cloud infrastructure?

Builds CI and packages

Containers by now are the default approach to build applications as its comfortable to ship and deploy

Its easy to start local development environment using containers and work on your application

CI is an integral part as this will initiate your committed code to get build and get deployed

Deployments

Deployments to production is easier when the application is packaged as a container image

This is the next step after builds as this is the confirmation that the tests are ok and the container image with the latest code already pushed to the image repository and ready for deployment to production

Release

This is the step after the deployment is successful and the new image is used in production

Now the choice is when to enable new features using feature-toggle

Once the new features are enabled the release step is complete

Cloud Infrastructure

To manage the containers and successfully implement Microservices you’ll need Kubernetes cluster to orchestrate the containers runtime

Other services like sending email, databases, load balancers and more, can be integrated with your Kubernetes cluster to be used for the entire stack

Summary

Cloud Native is proven to have better results and happy developers

but Hey! you can always start a long running VM and install some stuff on it

How to deploy production systems

In some situations development teams are deploying some of the production systems from local, yes from their local laptops. that’s is a not recommended practice as it causes issues in prod.

Why developers modify from their local laptop?

There can be a few resources for why deploying new code and modifying production config is done from local laptop, here are a few examples

  • No CI pipelines
  • Failed CI pipeline
  • Just bad practice

In any case it is not recommended to modify production systems from local laptop and it’s better to use CI tools.

How to modify production systems?

When you update production with new code/config or new services the downtime should be zero.

for a successful deployment to production you’ll need to adopt a few approaches

Git Ops

In summary this means that every modification will be committed to your Git repository.

Infrastructure as code

In summary this means that the creation and update of the infrastructure should be declared in code/template.

CI Pipeline

When a commit is made to the relevant repository Git sends a hook to the CI system to start the CI pipeline, this will initiate: build > test > deploy

when the CI pipeline is complete we know that the build, tests and deploy successfully as expected. (DevOps outcome should be expected for every pipeline)

Note: there’s another step of release that “enable” the actual new code, and this is done via feature toggle.

Deployment to production

This is where the deployment strategy comes, obviously force-deploy will shutdown the services and start the services again (downtime)

we’ll need a better approach like blue-green that will create another group of resources with the new code and only after its active the current requests/traffic will be redirected to the new group of services.

after the deploy is OK the old resources can be deleted.

Debugging

To verify that the deploy is OK

  • check your metrics and logs
  • check that the service is operational (can be done via automated QA)

Summary

Do not be tempted to deploy from local laptop as this will cause issues and it will not be registered in logs or as a commit.

Use CI pipeline!

Kafka In Production

Kafka is the main component of any system that implement Event-Driven-Architecture.

What do you need to know before deploying Kafka in production?

Kafka is low-latency component that act as the broker between the producers and the consumers.

Low-latency is extremely important and you’ll need to verify and monitor that the cluster is as follows:

Low-latency disk IO

  • Kafka write messages to disk (from producers) and messages are read from disk (consumers)
  • Kafka use data in sequential disk access that is very fast.
  • Zero Copy – this means Kafka copy data from local disk directly to the network interface.
  • Disk size per Kafka broker should probably be 6TB or more, depending on the use case.

RAM

  • RAM is extremely important as Kafka process uses Java HEAP.
  • Page Cache – is the main disk cache, the Linux Kernel uses page cache as buffer for reading and writing to disk.
    • If there is memory available the page is kept in the cache without accessing the disk.
    • this is extremely efficient and what makes Kafka so fast.

Network

  • High Throughput as Kafka broker the entire data between services.

What do you need to know after deploying Kafka in production?

Tuning and reassign partitions

Even after your cluster is working as expected in production you’ll need to tune it using new config options and reassign partitions.

Monitor

Monitor your cluster is extremely important as it reflects that actual status of the cluster, and since the cluster is the main component of Event-Driven-System the cluster should perform in micro-seconds and milliseconds.

Monitor will make your debugging much easier since Kafka metrics will display current status.

SpinningOps helps startups improve their system design, contact HERE and ask what can we do for your application.

Benefits of Databases in Microservices

If you are using Microservices approach in your stack then you might want to take it a step further and add a dedicated database for each services.

In this blog post we’ll discuss the benefits of having databases in Microservices.

Dedicated database per service

When every service have its own database its actually simplify the process since every application is based on CRUD.

And every service can be built with specific requirements that it needs.

Mix databases types

One of the common scenarios when building a software is how to store the data, what kind of database should we use? SQL? NoSQL?

Choosing one type of database over another can limit the application stack, why not use all types of databases?

So lets assume that the login service stores users, passwords and emails. so it does not require too much efficiency and speed since the login is done once per user as long as the sessions is open and the user did not logged out.

In this case any we can choose the easiest and fastest login implementation.

What about feed service? lets say your application have a feed of data for the users so this should be low latency and very fast, so you’ll probably want to use key-value store database.

You get the gist, every service is now have its own mini stack.

Enhanced database security

Once your services use their own database, it means that only that specific service is accessing that database.

meaning access is based access.

Unlike one big database that all services connect to, and lets be honest probably use the same credentials.

So you can add a rules that only a specific service can access its database and not other services, also only that service have the credentials for CRUD for that specific database.

Reduce database load

If every service is connecting to its own database then the R/W operations are faster due to less database connections, unlike one big database that all services connect to.

That reduce hardware requirements as well, since the database hardware can be designed by the load of each service.

Clean database

What I mean by clean database is that every database have unused data or deprecated data that need to be cleaned.

By using dedicated database for each service that cleanup is easier since you know that the data belong only to its parent service and any modifications to the application can be easily applied to the database too.

Lets assume you decided that a specific service is deprecated, you’ll probably want to delete its data.

How do you do that if you work with one big database? but it that service is only using its dedicated database then you simply deprecate that service with its own database.

Backup is easier

When you have one big database for all services you need to backup that entire database regardless of the usage or unused data.

What about restore? same applies here that you’ll need to restore the entire database and not specific data that might be affected.

Lets assume that every service have its own database and you need to schedule a backup or restore, then you only need to so it per service and not the entire database with all services data.

SpinningOps helps startups improve their system design, contact HERE and ask what can we do for your application.

Bots As Part Of Your Cloud Ops

Do you have bots in your stack?

How do you assign permissions to a bot?

Bots in the context of your cloud ops

I’ll explain, let’s say you use Jenkins for your CI/CD pipelines, how does Jenkins clone new code from the code repository?

Or how does your Slack channel receives alerts from an app?

Sometimes you need bots in your stack but here’s the challenge, what permissions does a bot gets?

Let’s start with naming your bot

A good practice is to name the bot as what it’s supposed to do, for example:

  • bot-jenkins
  • bot-slack
  • bot-s3-read-only
  • etc..

you get the point, start the naming with bot so other people won’t confuse it with other users.

What about naming policies?

I set the same naming for permissions and policies, it’s very easy to manage something that you know what it is and what it does by it’s name

Keys or Roles?

I prefer using Roles, it’s better than just putting a secret somewhere not knowing who use it

but there might be a situation where you’ll need to use keys, for example if you need to rsync files from a remote server on a different cloud vendor than Roles is not an option, just make sure those keys have the exact permissions it needs for the task

Once the task is done deactivate the keys

Summary

Bots are part of your stack so add the exact permissions for the specific task

6 Rules For Cloud Architect

Are you a cloud architect? How do you plan a new infrastructure for a product?

How do you build workflow in the cloud?

What are the considerations of cloud security, costs and automation?

All are relevant questions when planning a new cloud design for a product runtime, so let’s discuss it.
Also, this is my approach and it serve me well in all of my designs and cloud operations in production.

Preplan

Preplan is not part of the 6 rules and just a starting point.

It’s better to plan before starting to build any project, in order to plan you’ll need to understand the product, ask these questions:

  • What problem is the product solves?
  • Who’s going to use it? (demographics)
  • What are the business risks of downtime?
  • What is the expected or current revenue?
  • What is the technical flow of the product? (user login, integrate with API, consume data from database, etc..)

The more you ask the more information you’ll have in the design process, so don’t skip this step.
It’s easy to just go into building stuff and not ask for need and requirements.

Costs

If your design will cost more than the revenue the product won’t justify itself, this is very important as a bad design in costs perspective can have a significant affect on the entire business operations.

So, in every step of the planning consider costs!

Cloud Security

In every product there’s a risk factor in term of business risks, what if the application is down for 1-hour? what is the affect in term of reputation and revenue?

What if some services and data are exposed to unauthorized parties?

So, ensure to include security measures in the design to make sure your product is protected, but don’t overdue it as it can cause issues with workflow and runtime.

Balance is key here.

Automation

Building and working without automation means spending time on repetitive tasks, this is not efficient and will cause slow delivery.

Try using IaC (Infrastructure as code) approach, this means you can deploy and modify entire infrastructure in minutes.

Also, you can find out the current stack components by checking the IaC files.

Combine IaC and Immutable infrastructure to get maximum results.

Decouple Dependencies

When building software and infrastructure it’s easy to tie components and hard-code stuff, the more hard-coded and dependency there is between different components the more issues it will cause.

Let’s say that you designed the infrastructure with hard-coded IP addresses, this means those IP cannot change, the same for other config files.

Another example is start-up of a service that is deepened on other services, for example application that’s require the monitoring agent to start, monitoring is nice but should not affect production services.

Continuous Software Updates

Software freeze is a risk in my opinion, this approach will lead to more work that needs planning and the longest the freeze the hardest it is to upgrade.

Let’s say you’re using Python3.6 and are using pip packages in your code, this means you cannot upgrade your OS because new OS comes with latest Python version and that python version uses the latest pip version.

So now you can’t upgrade your Python, pip or OS, just because you did not integrated updates in the regular operations.

Keep your code and system up to date!

Remove Single Point Of Failure

Similar to couple dependencies that can cause issues, relying on a single endpoint or component is risky, let’s say you’re using one load-balancer, what happened if that load-balancer is overloaded?

Single database? the same issue

Those are simple examples but in your product there are probably more components that are defined as single point of failure.

The less single point of failure the better!

Compute Group Vs. Cluster

Do you have a cluster or compute group for your production?

What is a compute group?

What is a cluster?

Compute Group

Compute Group is a set of identical servers doing the same function.

Example of a compute group: Apache (web servers)
Your website traffic is increasing and you need to add more servers to handle the load, so you add another web server (let’s say it’s Apache) and than another.

Those web servers are doing the same thing (function) which is presenting web files (HTML) to visitors on the website, now those web servers are not connected together and don’t “know” about other web servers.

So, how does this scenario work?

The Load-Balancer is forwarding traffic to the web servers (let’s say it’s round-robin) and those web servers are presenting files to the visitors.

You can add or remove web servers per demand.

Cluster

Cluster is a group of compute servers that are connected and can operate together and “know” about each member of the cluster.

Example of a cluster: Kafka

Kafka has a minimum of 3 nodes in a cluster and those 3 nodes are configured with the IP of the other members of the cluster and elect a leader out of the 3 nodes.

If there’s an issue with one of the nodes the other can take it’s responsibility, thus you achieve high-availability.