Acceptance of a Risky Environment

5th of May 2016

There are many dogmas in the world of software development (such as S.O.L.I.D. or the Gang of Four Design Patterns), and most of them we, as IT people, accept at face value. Not really knowing why they are applied, just realizing that they need to be. Think of it as a modern day Charge of the Light Brigade for developers: “Theirs not to reason why, Theirs but to do and die”. One of those things we accept as a given, is that our Acceptance or Staging Environment needs to be as close to the targeted Production environment as possible. So how do we justify this need when management decides to challenge such a costly proposition? A quote from Alfred Lord Tennyson will not be enough to persuade them.

 

To tackle this justification, we need to look at what requirements an Acceptance environment is trying to address. The environment primarily needs to support the validation activities of the business users. The question “Did we build the right thing?” needs to be answered. User specifications need to receive a tick in the box. This is done by using scenarios which simulate the actual use of the TO-BE solution. When regarding the widespread V-Model (as shown in the illustration below), this is clearly indicated as a responsibility of this environment.

However, when developing a new solution, we not only have to take into account functional requirements, but also the non-functionals. For these, the other test types of the ISO/IEC 25010:2011 standard come into focus:

  1. Functional Testing
  2. Reliability Testing
  3. Performance Testing
  4. Operability Testing
  5. Security Testing
  6. Compatibility Testing
  7. Maintainability Testing
  8. Transferability Testing

While I do not consider this list to be exhaustive (think of regression testing, or usability testing), it does indicate that there are other needs that the Acceptance environment needs to address. These tests are aimed not at the functionality of the solution, but rather at its behavior. And this behavior is the one that the TO BE solution in production will have to exhibit. And here is where the similarity of both environments rears its head. Because if the environments are not the same, what exactly are you testing?

It is true that performance tests could be run on an environment that is a scaled down version of Production, but to state that performance can then be measured through extrapolation is a little shortsighted. Take the assumption for instance that performance is dictated by I/O capacity (2 imaginary units) and memory (2 imaginary units). The capacity of your system would then be 4 imaginary units. However, if we scale down this environment to half its size (one imaginary unit each), the overall capacity drops with a factor of 4. This simplistic view illustrates that it is not easily determined how a downscaling will affect the environment capacity for it to be extrapolated. And actual situations dictate a large number of factors in addition to the two factors I already stated, that impact capacity of an environment (L1 cache, CPU speed, network, transaction pools…) The conclusion is simple. Even if you cannot have a similar environment as production, try to have as many of these factors the same as possible.

When we talk about security testing, it becomes even more of an elusive extrapolation. Although the security on the software level and the system configuration level will be the same, the security hardware could be downscaled along with the capacity. An example of this would be a Denial of Service Attack. If your security hardware needs to protect the solution from this type of attack, the risk becomes very similar to that of running performance tests on reduced hardware.

It all boils down to risk appetite. Not having a similar Acceptance environment comes with a number of risks, the most obvious of these being:

  • Encountering issues upon deployment to Production.
  • The cost of the solution going down due to an unforeseen issue.
  • The potential cost in time for your support staff and developers to pinpoint the issues as they occur, and the off change that they will not be able to reproduce the problem in said Acceptance environment.
  • The increase in stress and unhappiness due to repeated failed deployments and the subsequent higher turnover rate of your employees.
  • If a rollback scenario is not possible, the cost in time and assets that the solution is down.

Thought Infrastructure Testing