I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 638 posts at DZone. You can read more from them at their website. View Full User Profile

What your test suite won't catch

  • submit to reddit

Automated test suites are a wonderful tool: they allow incremental development by catching regressions in functionality between one user story implementation and the next. They remove fear from deployments because the code is first exercised on development and CI machines, keeping bugs out of production.

However there is a diffused misconception of trying to write a test for every aspect of the code we want to ensure. While this is possible for some non-functional requirements, it's still in its infancy (or may not even be possible) for others.


Performance testing is one of the areas where automation of the load-generating process is very useful, provided there are some adjustments in executing the tests:

  • the environment they hit should have the same resources as production, or a vertical subset of it (e.g. at least a web server and a database separated, like in production). This may be costly especially if you have a few servers, whereas if they are already dozens adding a couple shouldn't be a problem.
  • The load must be generated from a separated machine.
  • The databases present on the environment should be of a comparable size to production, but considering the reduced capacity of these machines.
Judging the results of performance tests is not easy, but you can set up a graph of the last runs and a threshold you do not want to go over (such as 10 seconds to perform a payment or to open a new thread in a forum.)

Maintainability (and extensibility)

Testing a design's maintainability (against what changes, by the way?) automatically becomes already more difficult than its performance. By definition, maintenance is future cost and so it cannot be measured; we cna only hope to find some numbers or smells that correlate with high or low maintenance cost in our experience.

Metrics generation is a solved problem, but as Gojko would say coverage, indexes, duplication measurements and such are negative indicators: when they are red they tell you something is wrong. However, when they are green they cannot tell you anything about how good your code is.

Take for example unit tests: it is known they give a better feedback to developers about their code quality with respect to end-to-end ones; being able to write a unit tests with a few lines of code for every part of the application means you're avoiding singletons, global state and hidden dependencies; that the API of the single objects you have produced can be used in isolation and that objects do not chat with each other too much.

However, the presence of a battery of unit tests just means that you have avoided these problems, not that there aren't others. It is still possible to produce a perfectly unit-tested design of classes 100-line long that is then torn apart when it comes the time to implement the next requirement. Domain knowledge like the probability of changes and the reflection of real world concepts in the code proper of DDD are kings here.


While performance can be tested and unit tests can act as a first round of weapons against poor maintainability, we are at a loss when it comes to guarantee security requirements.

How do you test that multiple payments cannot be performed with the same money? Or that HTML code cannot be injected in your forms? Like for functionalty-oriented tests, a particular test can only guarantee that a certain attack does not work in a specific form. It cannot guarantee there is not a similar attack, and the set of attacks you can tests are the one known to you. The difference with functional tests is that functionality issues manifests as bugs that can be fixed, while security issues can cost you much more.

The only two ways I known of improving an application's security is to study common attacks and the pattern used to overcome them; and to have an audit by external security experts whose devote their career to the matter.


Concurrency bugs are similar to security issues in how difficult is to find them. In the former case, they manifest themselves when many different processes are running on the same data. A successful run of your test suite, even when performing multiple transactions at the same time, does not guarantee bugs won't manifest in production where there is much more data to deal with.

If you have a race condition in your code, at best the test suite will be able to fail intermittently, instilling the doubt in you and raising the time-to-production. At worst, the tests won't even expose the problem and just keep passing because the probability of the race condition to manifest with just a few hundred transaction is too low; or they are too spread in time to be a significant load. In fact, almost all of the work in resolving concurrency bugs is in reproducing them.


Test suites are one of the tools we have to improve the quality of software while we're building it; they are particularly fitting for checking functional requirements and some other properties such as performance and some forms of maintainability. However, there are other critical properties that may be important in your project and that your test suite cannot help you in fixing. However, we still can put in place processes that give us feedback on the matter, such as security audits, code review and pair programming for maintainability, and higher-level models than code for concurrency issues. Tests are a tool, not an end.

Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)



Christophe Blin replied on Wed, 2013/09/18 - 8:24am

I think you are contradictory with yourself : 

" While this is possible for some non-functional requirements, it's still in its infancy (or may not even be possible) for others." and the you describre how difficult it is for non functionnal requirements ...

Did you mean :  "While this is possible for some functional requirements, it's still in its infancy (or may not even be possible) for others."

BTW, I agree with your article, but IMHO, you are forgetting lots of thing, the most important point being the GUI.
Who is testing the GUI at 100% when the GUI has more than 20 screens ?

Secondary points like code homogeneity, code formatting, etc ... are also important.

The point of the tests, as said by uncle bob, is to be confident ENOUGH about the deployment in production.

Also, IMHO, you should keep in mind the ROI : the customer does not pay for a test suite, he pays for a working software ...

Giorgio Sironi replied on Wed, 2013/09/18 - 2:49pm in response to: Christophe Blin

Believe me, I trust test suites and I currently work on projects with several thousands of unit and end2end tests running in CI. It's just that when you become very good at removing functional defects, the only kind you see is related to the categories list above. A list that is by no means exhaustive, let's just think about usability...

What I meant in the passage was automated testing is possible for some non-functional requirements (performance, via stress test) but not possible or very difficult other non-functional requirements (security). It's a spectrum.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.