Saturday, April 5, 2008

Generated Unit Tests

So recently, my employer asked me to evaluate a tool called AgitarOne. The tool claimed to achieve an average of 80% test coverage in unit tests that were pre-generated. The idea is to test the code using permutations. The online demos were quite impressive, but seemed rather simplistic. They were testing basically a beans validation of its input parameters in a constructor. Right away my concern was how would it handle large classes that had a lot of "connections" to other classes and complex interplay. One of our developers had a test/demo system we could use to send off our code to see what was generated. Well, their claims were correct. It did reach 77-85% coverage in the files I passed. But what was I to make of the output? Some of the tests were 100s of lines long. It turns out the mocking framework used (Mockingbird), was mocking out all occurrences of "B" used by "A". While this seemed inevitable to reach the coverage goals in a generated test, after discussion with our testing "round circle" we came to some common conclusions which I think are relevant to discuss in general:
  • The process of manually creating tests forces the developer to think about their code, how they would test it, whether they have covered all the conditions etc. Generating a test for them, essentially takes away this process, thus losing that valuable design, test, code, refactor cycle. This reasoning accounts for 25-50% of the benefits one achieves by writing unit tests.
  • Tests become executable documentation for the code under test. Generated tests can never know (without a brilliant generator that can use Javadoc and other design deliverables to generate test conditions - this does not exist to my knowledge) the intent of the test. They only know that a method takes N arguments, returns a value of Y (or throws exception Z) and smarter tools like AgitarOne can actually make some assertions on what is being modified during method invocation.
  • Generation of tests on existing code assumes the code as written is correct. There in is a problem, but as I mention in the previous point, without some sort of language that can derive proofs, I do not see the test generation being able to make this assertion.
  • Generation violates one of the key development practices. The test code should be maintained and written with the same care, style and detail production code.
  • One simple question. Have you ever come back to a failed test after 6 months to try and see why it is failing, only to see a hundred lines of mock code followed by 30 assertions trying to figure out why your code change caused the test to fail? Enough said.

In no way am I not advocating the generation unit tests. It is a very novel concept, but I think it removes a crucial component of good software development practice. Automation tends to lead to dependence, and dependence can lead to ignorance. From my viewpoint, generation of tests does not inspire one to follow good software development practices, but to ignore them with the safety net. As one of collegues said.... "I'd rather have one good test that is comprehensible than 20 tests I can not read".

No comments: