Posts by SpinningOps Editor

Bots As Part Of Your Cloud Ops

Do you have bots in your stack?

How do you assign permissions to a bot?

Bots in the context of your cloud ops

I’ll explain, let’s say you use Jenkins for your CI/CD pipelines, how does Jenkins clone new code from the code repository?

Or how does your Slack channel receives alerts from an app?

Sometimes you need bots in your stack but here’s the challenge, what permissions does a bot gets?

Let’s start with naming your bot

A good practice is to name the bot as what it’s supposed to do, for example:

  • bot-jenkins
  • bot-slack
  • bot-s3-read-only
  • etc..

you get the point, start the naming with bot so other people won’t confuse it with other users.

What about naming policies?

I set the same naming for permissions and policies, it’s very easy to manage something that you know what it is and what it does by it’s name

Keys or Roles?

I prefer using Roles, it’s better than just putting a secret somewhere not knowing who use it

but there might be a situation where you’ll need to use keys, for example if you need to rsync files from a remote server on a different cloud vendor than Roles is not an option, just make sure those keys have the exact permissions it needs for the task

Once the task is done deactivate the keys

Summary

Bots are part of your stack so add the exact permissions for the specific task

6 Rules For Cloud Architect

Are you a cloud architect? How do you plan a new infrastructure for a product?

How do you build workflow in the cloud?

What are the considerations of cloud security, costs and automation?

All are relevant questions when planning a new cloud design for a product runtime, so let’s discuss it.
Also, this is my approach and it serve me well in all of my designs and cloud operations in production.

Preplan

Preplan is not part of the 6 rules and just a starting point.

It’s better to plan before starting to build any project, in order to plan you’ll need to understand the product, ask these questions:

  • What problem is the product solves?
  • Who’s going to use it? (demographics)
  • What are the business risks of downtime?
  • What is the expected or current revenue?
  • What is the technical flow of the product? (user login, integrate with API, consume data from database, etc..)

The more you ask the more information you’ll have in the design process, so don’t skip this step.
It’s easy to just go into building stuff and not ask for need and requirements.

Costs

If your design will cost more than the revenue the product won’t justify itself, this is very important as a bad design in costs perspective can have a significant affect on the entire business operations.

So, in every step of the planning consider costs!

Cloud Security

In every product there’s a risk factor in term of business risks, what if the application is down for 1-hour? what is the affect in term of reputation and revenue?

What if some services and data are exposed to unauthorized parties?

So, ensure to include security measures in the design to make sure your product is protected, but don’t overdue it as it can cause issues with workflow and runtime.

Balance is key here.

Automation

Building and working without automation means spending time on repetitive tasks, this is not efficient and will cause slow delivery.

Try using IaC (Infrastructure as code) approach, this means you can deploy and modify entire infrastructure in minutes.

Also, you can find out the current stack components by checking the IaC files.

Combine IaC and Immutable infrastructure to get maximum results.

Decouple Dependencies

When building software and infrastructure it’s easy to tie components and hard-code stuff, the more hard-coded and dependency there is between different components the more issues it will cause.

Let’s say that you designed the infrastructure with hard-coded IP addresses, this means those IP cannot change, the same for other config files.

Another example is start-up of a service that is deepened on other services, for example application that’s require the monitoring agent to start, monitoring is nice but should not affect production services.

Continuous Software Updates

Software freeze is a risk in my opinion, this approach will lead to more work that needs planning and the longest the freeze the hardest it is to upgrade.

Let’s say you’re using Python3.6 and are using pip packages in your code, this means you cannot upgrade your OS because new OS comes with latest Python version and that python version uses the latest pip version.

So now you can’t upgrade your Python, pip or OS, just because you did not integrated updates in the regular operations.

Keep your code and system up to date!

Remove Single Point Of Failure

Similar to couple dependencies that can cause issues, relying on a single endpoint or component is risky, let’s say you’re using one load-balancer, what happened if that load-balancer is overloaded?

Single database? the same issue

Those are simple examples but in your product there are probably more components that are defined as single point of failure.

The less single point of failure the better!

Compute Group Vs. Cluster

Do you have a cluster or compute group for your production?

What is a compute group?

What is a cluster?

Compute Group

Compute Group is a set of identical servers doing the same function.

Example of a compute group: Apache (web servers)
Your website traffic is increasing and you need to add more servers to handle the load, so you add another web server (let’s say it’s Apache) and than another.

Those web servers are doing the same thing (function) which is presenting web files (HTML) to visitors on the website, now those web servers are not connected together and don’t “know” about other web servers.

So, how does this scenario work?

The Load-Balancer is forwarding traffic to the web servers (let’s say it’s round-robin) and those web servers are presenting files to the visitors.

You can add or remove web servers per demand.

Cluster

Cluster is a group of compute servers that are connected and can operate together and “know” about each member of the cluster.

Example of a cluster: Kafka

Kafka has a minimum of 3 nodes in a cluster and those 3 nodes are configured with the IP of the other members of the cluster and elect a leader out of the 3 nodes.

If there’s an issue with one of the nodes the other can take it’s responsibility, thus you achieve high-availability.

What is the responsibility of DPO

Does your company hire the services of a DPO?

What is a DPO?

Data Protection Officer

The role of a DPO is to protect the customers data, yes the customers.

And is usually an outsource or an independence consultant, not an employee (depends on the organization).

Why do you need a DPO?

If your company does business with customers that are located in the EU than you need to comply with GDPR.

What is GDPR?

The EU have a set of regulations in order to protect customer’s data within the EU.

GDPR stands for: General Data Protection Regulation

The purpose in general overview is: how do companies collect and protect customer’s data (in-depth details can be found at THIS LINK)

Is it a requirement to hire a DPO?

If your company is doing business is the EU than the answer is YES.

It is a legislation.

CSIO Vs. DPO

Chief Security Information Officer is responsible for the security of internal information and data of the company.

Data Protection Officer is responsible for the customer’s privacy, data and information as an external position to ensure the representation of the customers.

Do you need a DPO? Are you doing business in the EU?
If your answer is yes, than contact us now to hire your outsource DPO

To contact us click HERE

3 Rules For Cloud Security

What is your cloud security approach?

When designing a product to work on the cloud it’s best practice to include IT and Cloud security in the product runtime, infrastructure and operations.

The Challenge

When using cloud the approach needs to be different than on-perm or just consuming SaaS from another provider, it’s very easy to open ports and permit access to cloud resources, and because it’s in the “cloud” it might be accessible from public and external networks.

Keeping track of modifications or preventing admins and developers access to modify resources can hinder the normal operation of IT and Development, so it’s better to implement a different approach.

An approach that is a mindset of Cloud security considerations in every project and modification, changes are necessary in order to improve and develop the product you’re working on.

Authentication

Authentication means: who are you?

Examples of identify in roles and positions:

  • admin
  • developer
  • contractor
  • customers
  • etc..

Authorization

Authorization means: What can you do?

Examples of permissions:

  • add users
  • delete users
  • add new clients
  • open security-group ports
  • download files
  • access resources (databases, servers)
  • etc..

Connection

Connection means: Where are you connecting from?

Example of connections:

  • Official HQ Offices
  • Remote workers (VPN)
  • Customers (anywhere)
  • Private-Link
  • etc..

How to successfully have a secure cloud account?

Choose the best suited approach for you and your team and implement that approach as a mindset, the approach with those 3 recommended rules is easy to remember and easy to implement.

Do you maintain a regular cloud security operation?
Do you know the status of your cloud security?
If your answer is yes, than contact us now and we’ll do the cloud security for you.
To contact us click HERE

How software update freeze can make your stack obsolete

Do you update your software frequently?

Is software update part of your CI/CD pipelines?

What is continuous update?

The issue with hard-coding software versions and not updating

Just to clarify this post is relevant to 3rd-party software and packages you import to your application (via apt, yum, pip, gem etc.. or downloading binary .jar etc..) also for OS versions

The wrong approach in my opinion is to statically add version numbers to imported packages and use the same OS version throughout your infrastructure and code

Why you ask?

Once your code work with other software (3rd-party) and tests are ok, you assume that the process is complete and resume working on your code

everything works until it doesn’t !

Scenario 1

let’s say you’re using java and a vulnerability is discovered and fixed with a new release, now you need to upgrade to new release but your runtime version is too far behind the latest version and cannot be upgraded

or better yet, you can upgrade but other components that communicate with your code is not compatible with the latest version

what do you do? oh yes, revert !

Scenario 2

the operating system is a few versions behind the latest, let’s say Ubuntu 18.04 and now you want to use Ubuntu 22.04 and your code is python 3.6

guess what Ubuntu 22.04 does not ship with the same python as Ubuntu 18.04

now you need to compile python 3.6 from source and install it to Ubuntu 22.04 and make sure to update the PATH to use python 3.6

Backlog

so now you decided to use python 10 instead of python 3.6 but what about pip packages? they are probably not compatible, why? because pip use the python version too

now go over your entire code and make sure every function works with new python version, then you’ll probably decide it’s too much work right now and not to upgrade

Solution

Simple, don’t freeze software updates!

if you keep your software up to date (including your OS) it forces you to adapt as you go! no need to upgrade or schedule upgrades because it’s a mindset, your software is evolving

stopping your software from evolving does not make sense, in fact it’s the opposite from what your job description is… developer

How to keep your software with latest version?

The answer is CD (continuous delivery)

CD means how fast, reliable and how frequent you deploy your code to production

So the goal is to deploy to production whenever you want and a few times per day, if you do that you know your code is in a releasable state

So using the latest release software while keeping a releasable state will make your job easier and your product better

Backup vs. Restore

What is your approach regarding your application’s data recovery?

Is it Backup? or Restore?

What is application backup?

It depends on your application but in most cases your application will have database so that is basically your priority along with your core code of your application.

So you’re probably using code repository to manage and store your code, you can always clone the entire repo and save it somewhere safe.

As for the database you should dump it to a safe store location.

So both approaches are copy the data to a safe external location, it can be S3, local hard drive or remote server. all those options require a remote copy of the data.

The challenge is when you’ll need to use that backup, will it be restorable?

What is application restore?

Application restore is the process were you backup and restore at the same time, meaning a backup to the data and restore right away.

When you restore the data right after the backup is made you ensure that it is does what it supposed to do, and can be sure that when or if you’ll want to use it, you can.

How can you backup and restore in one process?

Let’s describe a simple example, let’s assume it’s a MySQL database so you can dump the database then copy it to a testing server and restore it to a MySQL container.

Once the restore is complete the next stage is to connect to the database and verify the data is ok.

so this is the process

  1. dump the db
  2. copy it to testing server / instance
  3. restore it to a container
  4. connect to the db and run a query

Why should you backup and restore in one process?

Again ensuring the backup is ok and the reliability and confidence that it’s available on demand.

How to choose the best performing hardware for a server

Choosing the best optional hardware for a server is something that happens often, and making the right decision can prevent issues later on with those servers, so how do you choose server hardware?

Ask

my approach is to ask first, what is the server meant for? what tasks should that server run?

the more you ask the better and more informed decision you’ll make.

Calculate

make a list of hardware flavors and add prices, choose the lowest price with the closest hardware requirements of the application.

Test

once you’ll have the information regarding what tasks the server should do you’re ready to test solutions.

start a test server with the minimal hardware requirements, install the application and work it in a lab environment, obviously you can’t test production systems like that but you can test everything except traffic and user behavior.

keep playing with the server’s hardware flavor until you’ll get the best performing and cost efficiency option.

Disposable Application Resources

build your server as disposable application resources, what does it means?

it means the application that is installed on your server is just the application, no database and no local configuration is saved. just the runtime code of the application.

for data and config use mounted volume or disk or NFS and attach it to the server thus making the server a disposable compute resource.

using this approach you can scale your server however you need or per runtime and load.

Deploy vs. Release

What is the difference between deploy and release?

when you’re working on servers and operating a runtime of application you need to make sure the deliver of new software to the users

doing so there are two option

  1. just deploy the new software to the servers
  2. deploy the new software but don’t make it available yet

Deploy

Deploy means you put the new software on the servers

Release

Release means that new software is now available to users

Two step process

making the deploy and release a two step process will ensure a smooth integration of new software into production servers, yes you can still can deploy after testing and make the two step process a one step process

How can you deploy and not make the new software operate instead of the current software?

using feature flags

with feature flags you can deploy the new software but not make it available yet, after you made a decision on when to activate the new software you can make the feature flag active and expose the new software to users (release)

a feature flag is just an if else statement on the new function of the new software

10 tools for cloud admins

As a cloud admin your work is broad and you need tools to improve your work and efficiency.

Here’s the best 10 tools in my opinion every cloud admin should have in their tool belt, or their laptop in our case.

This is my list and the best tools in my opinion.

Code

1. VS Code is one of the popular code editors and it is my preferred code editor, it has many plugins and you can customize it to your preferences

2. Python is the easiest and most robust coding tool for cloud admins, with plugin support to almost any cloud.
no need for introduction here

3. boto3 is specific for AWS but still worth mentioning if you’re using AWS as your cloud.
it have a very good documentation and does the job, it’s very important when you integrate automation with your projects.

Servers

4. Docker is by far the fastest way to install something, it’s just docker run image-name and the app is installed, although it’s not recommended for production it is suitable for IT servers, just remember to backup the volumes.

5. OpenVpn is a great tool to encrypt the connection to your servers, it can also save some public IP addresses since you connect to the server’s private IP addresses.

6. Sensu is great to know what’s up with your servers, get the metric you want to know and get alerts for something you need to know

7. Puppet is your preferred option if you don’t use Docker in your servers, it can configure your servers and maintain a state for your servers, easy to use and plenty of modules to choose from.

Misc

8. Zsh is better then the default terminal consoles and is can be used in your laptop and in your servers, install it on your servers as well to get the experience you get on your laptop.

9. Let’s Encrypt made it so easy to secure your public connections to your apps, there’s is really no reason why not to use it for your apps.

10. KeePass is a password manager that you need because you connect to many servers and services and it’s a great tool for this task, just remember to backup KeePass database.