Deployment Done Right

Deploying our applications is a critical part of our software development life-cycle. It occurs at various frequencies and situations such as validating a new feature in an integration environment or delivering a new version in production. The purpose of this article is to highlight the important points to consider, along with the pitfalls to avoid, when choosing a solution for deploying your applications.

Define responsabilities

Before going to the technical aspect, we must first focus on the overlooked people aspect. Your application is deployed on different environments (hopefully) and, depending on the environment, different teams may be involved. You may even have different teams involved for deploying on the same environment (example : a production operational deploying the binaries and a DBA running the SQL migration scripts).

For each environment, you must be able to determine the responsibilities of each team along with what they are allowed to do. It is not as simple as to say that the development team has access to the integration environment and the production team to the staging and production environments. Beside defining who has read and write access to an environment, you must also take into account who can request and who can validate a modification (the ‘can’ may imply both ‘has the permission’ and ‘is competent for’).

For example, suppose that a newer version of your application has to connect to a service from another application and you need 3 new configuration parameters : the service address (non-sensitive), the authentication login (moderately sensitive) and the authentication password (highly sensitive). Now ask yourself the following question : for each configuration parameter and each environment, who is supposed to have the knowledge or at least the responsibility to search for it ? If you can answer this question you should have a better idea of how to organize your deployment workflow.

Trace deliverables

Traceability of what is and can be deployed, especially binaries, is of the highest importance. First, you want to guarantee that the deliverables which will be deployed in a production environment are the ones which were deployed in a staging environment (or whatever name you may use for the environment used to validate your application before a production deployment), minus environment-specific settings. Then, if you encounter an issue with an already deployed deliverable, you want to be able to reproduce the issue in another environment (or in your local workstation), which implies either using the same deliverable or at least be able to build it.

In order to offer a good traceability, you need the following :

A single golden source for your deliverables used for deployment on all environments, with the Following exceptions :
- You may have a separate source for non-release builds (continuous, nightly…) used on development environments for quick testing purposes.
- You have a reliable way to replicate a binary from one source to another like a promotion process (but you still have a single origin for your deliverable).
A deliverable should be immutable and generated by an automatic process.
A deliverable should contain the needed information about which source was needed to build it. It may be a :
- Version number related to a source control tag, provided they are never overriden.
- Build number (implying the build is kept and contains the information related to your source control version).
- Source control revision/commit identifier.
- Timestamp is not enough to trace a deliverable, but may be fine for identifying its version.
Providing that they have the appropriate toolds, any team member (ex : developer for binary deliverables) should be able to recreate a specific deliverable from its source.

Binary deliverables are the most straightforward to manage. Configuration files are trickier and database migration scripts are even worse. We could write dedicated articles for how to handle those, so we’ll just focus on the basics.

The purpose of configuration is to customize your application for a given environment. Most of the time the configuration values differ from one environment to another so unlike binaries, it is hard to provide a single source for all environments (at best you may provide common templates with placeholders depending on your configuration file format and your deployment process). However you should apply the same principle as for binaries, except that you must have one golden source per environment and this source must be versionned (preferably using source control). The hardest part to get right is the modification process across environments (that’s why it is important to define responsibilities properly as discussed in the previous section).

Database migration scripts are trickier because, contrary to binaries and configuration where you replace a package, the deliverable provides a differential between two states (you usually don’t replace your existing database schema on each deployment). The main difficulty when handling differentials is that their scope differ from one environment to another : some intermediary versions may be deployed in a development environment but not in a staging environment, and the same may apply between a staging environment and a production environment.

Another issue is that it plays poorly with some concepts commonly used in source control such as branching an merging. While there is no silver bullet, especially if you want to handle rollbacks properly, there are many available tools which simplify the task. In any case, migration scripts should be environment-independent (at least the ones impacting the database schema) and should be versionned in source control. Also, each migration should be uniquely identified by an ordered index (how it is identified may depend on your database migration tool).

Automate your process

Each step on your deployment process which involves human intervention is a potential cause of failures. This issue is exactly the same as your build process : the less manual steps it takes, the better you’ll be. Not only manual steps are time consuming, but they are unreliable (even the most rigorous people make mistakes). That’s why it is important to automate your deployment process as much as you can.

There are 2 main axises to follow regarding deployment automation. One is the “how”, which consists of automating steps of the deployment process. The other, often overlooked but also harder, is the “what”, which is to determine what to deploy (which components, which instances, binary or configuration only, …).

Automation comes at a cost. Not only there is the upfront cost of integrating a deployment tool, but there is the maintenance cost of both adapting the tool to new needs and maintaining the data used by your automation tool. Also, you need to consider how far do you want to go : should it include provisioning of infrastructure components ? The further you want to go, the higher the cost, and the higher is the number of teams involved.

Supposing you have a limited time, manpower and scope, you need to prioritize what to automate. Some part of the deployment process are easier to automate than others, and some parts of the deployment process are more often applies during deployment than other (for example you usually upgrade an existing service more often that installing a new one). In theory those are orthogonal, but in practice people try to avoid performing costly operations whenever they can (which sometimes lead to strange design decisions, but that’s another topic).

The easiest scenario to automate is a version upgrade of an existing component, optionally with a way to perform a configuration-only upgrade. Depending on your architecture, it may fulfill most of your needs.

Installing a new component or a new instance is a next step (and their opposite : removing an instance or a component) can be a bit trickier depending on your technology stack. If you have a mostly-monolithic application and don’t face this scenario more than once a year, it may not be worth your time to automate it (you’ll still need to have a clear process documentation). However if you are using a microservice-based architecture, it becomes mandatory.

Whatever the solution chosen, make sure that it fits your deployment workflow with respect to each parties’ responsibilities.

Share the Tools

It the previous section, we emphasized the importance of automating deployments. However the main pitfall to avoid is choosing different solutions between environments. Unfortunately this scenario happens, usually for at least one of the following reasons :

Licensing costs which prevents its usage for non-production teams/purposes
Silo-effect between the different teams involved (which is what the DevOps initiative tries to break)
Design issues which prevents its usage on a specific environment for security, performance or scalability reasons.

The main issue when using different tools is that it adds another source of unforeseen failures when deploying an application from an environment to another. Those failures can be non-trivial to diagnose, especially if the tool is provided by a third-party, so can be the remediation cost (as the cause is most likely due to a high-level design choice than a faulty business logic).

For this reason it is important that all impacted teams share the same tools under the same conditions. The latter should not be overlook either as another pitfall is to use “open-bar” settings (mostly related to security) on development environments and restricted ones on production. While there are justifications for having unrestricted settings on some use-cases (proof of concepts, quick validation, …), all impacted teams should be able to validate a deployment under similar restrictions. Again, not following that rule may lead to failures which may be non-trivial to fix.

Check and fail fast

Even if the application has no technical bugs and if the deployment tool is reliable, bad things may happen (binaries may not available, the target servers may not have enough disk space, …). In some cases, the application may fail to start and in other cases it may be stuck in an unstable state (meaning it is running but not usable). It is important to detect those issues as soon as possible, even if it implies rollbacking your deployment.

An important part of the deployment is the post-check, which purpose is to validate whether the deployed component is ready for usage. There is no universal way to implement this check as it depends on the component purpose and the technology stack involved (it may be checking that a process is running, that the target hosts listen to a specific port, parse log files, …). Whatever the implementation chosen, it has to :

Provide a result (success or Failure) as soon as possible (beware of waits and timeouts)
Avoid false positives
Avoid false negatives

Do not forget that because a deployed component is detected as technically fully operational doesn’t mean that the whole chain is and that there won’t be non-technical issues. Those checks do not replace end-to-end testing, which is another subject.

In order to increase the speed and reliability of checks, it is important during the bootstrap of a component to fail as soon as an unexpected an unrecoverable issue happens. Having an unstable state makes failure-detection harder. This also applies to your deployment process : if a step fails (for example a file transfer on a server with not enough disk space), the deployment procedure must be either paused, interrupted or rolled back with a clear error message.

Conclusion

In this article we saw the different aspects to consider in order to choose an appropriate deployment strategy. First is assessing the organization and the responsibilities of the different actors involved. Then is being able to trace the deliverables to deploy and able to automate at least a reasonable part of the deployment process. Then, make sure that deployment can be done under the same tools and conditions on different environments. Finally is having an early failure detection in place after a deployment. Those points not only impact which tools you may choose, but also which way to use them.