The identity crisis: “Testing costs me too much time for nothing?”

Is short term pain of testing for long term gain worth it? Is there even a long term gain in the first place? Simulation time!!!

Don’t burn what you earn by not investing additional time now for the better tomorrow (image source Jp Valery)

Tradeoffs… We are the software crew, so this is our daily routine. Picking the coding methods, frameworks, cloud providers and anything in between.

I am using TDD (or at least I like to think I am) when writing code. Don’t worry, it’s not one of those TDD or TDDie posts.

This is a story about how I questioned some of the premises I have taken for granted for so long, and wanted them answered using hard solid data. The question is:

Under which conditions is method X profitable in the long-run even though it takes more time short-term?

For me , but for you it can be something else.

It can be a controversial topic, but what I know for sure is that every day at least I have a method for approaching the problems I face. This is a sufficient benefit for me on its own. Though, is it worth the price? Let me guide us through the process of trying to find out.

The story begins with the my first green passing test suite.

At first, it all seemed simple: It is true that with tests, some things take longer to implement initially. On the other hand, by first writing tests, I wrap my head around a problem. Besides, I am counting on the long-term savings I am making by introducing less technical debt and preventing regressions from occurring when refactoring or otherwise changing code. And dem gainzzz are real long-term, right? … Right? Anybody?

The other day, I asked myself if this was really true. There are those moments when you start questioning even the most elementary premise you go by. I mean, I do believe it is, but can I somehow quantify it? Or at least show to myself that it makes some sense? Once I started thinking about it, I felt more and more uncomfortable not having the answer and blindly following the method without evert trying to analyze it a bit deeper. At least in this way.

So finally one afternoon, I came up with an idea:

Can I make a simple mathematical model that simulates what would happen with and without using some method that initially takes longer, but reduces technical debt in the long run.

This is not TDD specific, and is applicable to any method with such characteristics. I personally would use this simulation to justify my use of TDD of course.

Please take a note that this model is really crude, the numbers and assumptions are best guesstimates off top of my head. So consider this post as a conversation starter of sort.

If it really does get the conversation started, you can contact me directly via my Twitter handle or in the comments down below.

Sorry to say this, but I have a disclaimer to make: don’t use the data shown here to go to your naggy friend to notify him that you told him so. Do your own research, play with the model and make your own conclusions.

Or even make a new shinny fancy statistical model that will take a lot more factors into consideration. I would be glad in that case, because inspiring somebody to do something has a merit of its own.

Our goal

It is always good to clearly state what you want, not just when running simulations. Repeat after me… We… Want…

We want to measure if it is profitable to invest additional effort in TDD or similar approaches given the potential reductions in cost and time.

Model assumptions

First let’s agree to this: we will examine the assumptions and results and only then go into the mathy meat of the model. Also if we are on the streak 🤭.

So let’s open up the question that each modeler meditates on:

What does our model assume?

Before dwelling on that, we should briefly introduce some naming.

Let’s call coding methods:

  • TESTED: i.e. TDD and other methods that involve testing or other practices that improve reliability
  • QUICK: i.e. non-tested OOP code with bad patters sprinkled on top

Below, in Table 1, is a list of our estimates for method attributes. Of course you still don’t know what these attributes means, but you if you kept track so far, you know that the next step is explain each one of them.

Table 1: Comparison of TESTED and QUICK attribute values.

The meanings of the factors are as follows:

  • = Measures how many times longer the method takes to implement the same code compared to the QUICK method. Or in a nutshell, this factor measures how much additional work is added while coding. We assume that each feature takes the same amount of time to build (unrealistic I know, but let’s go one step at a time 👶).
  • = This one might seem complex, but it is entirely natural and logical. Just follow along. It measures how much more time each new feature or a change takes just because of the effect of technical debt. So for example, if this means even if we add a new feature of the sample complexity as the previous one, it will still take 1% more time to implement. This happens for various reasons, and one of the best known ones is the so called spaghetti code. The more you add to it, the more time you need to spend locating the place where the new feature or change should reside.
  • = The percentage of time spent on fixing the old bugs. These are bugs that were introduced into the system with some of the previous features and not the ones being added at the moment. Or even the bugs re-introduced again because we didn’t have tests covering those cases.
  • = We all know the feeling, we spend some of our time chasing bugs caused by edge cases we forgot to cover when hastily trying to bring a new feature closer to production. There is a saying Festina lente in Latin meaning Make haste slowly. Sometimes, by trying to be faster, you end up being slower. With this attribute, we measure the percentage of time spent fixing this type of bugs and glitches.

Estimates in Table 1 roughly assume that TESTED halves the number of both old and new bugs, reduces technical debt by a factor of 3 when it’s really small, but lasts 50% longer. These estimates are near the ones outlined in this article. Initially we assume a small amount of technical debt to be reduced in the first place (0.3% variable), but later on we will see that the larger the , the more the effect of TESTED is felt.

You might have noticed while reading the attributes that we are already making some bold assumptions here, exponential growth due to technical debt, roughly the same complexity of each feature… What’s next, constant array search time?

Now you are starting to see why I said this is a simple, rough, model. I already hear the suggestions flowing, all valid and fair points, but I only wanted to get some initial numbers to play with, model can be improved for sure. Besides, you haven’t yet read the entire post yet. Maybe I have some niceties down the road 🔽

Given we know what the attributes mean, here they come, the long awaited assumptions of our model that can be read out from the factors, but let’s explicitly state them:

  • TESTED takes more time. This is what we started with, if it took same time, here would be nothing to discuss.
  • TESTED reduces the accumulation and the effect of technical debt.
  • TESTED reduces the rate of bug introduction since it tests more cases than QUICK. Also impossible to re-introduce the bug or break the tested case that worked before.
  • Part of time is spent debugging new code (). Since TESTED includes writing down the tests and edge cases first, this reduces the percentage of time spend debugging. Or at least speeds up debugging process by telling us where the code fails (in which tests).

Results

Aaand lift off 🚀. We start our simulations with initial parameters as shown in Table 1 so we get some rough ideas and get this baby rollin’.

We use feature count as our x variable rather than time. It is purely for convenience. This way it was simpler to write down the model.

We assume all features take roughly the same, about a week, or 5 working days with 8 working hours with QUICK method (hah the POs eternal dream).

Remember, we compare TESTED to the QUICK that has bad practices. But we still assume reasonably small variable that measures technical debt accumulation.This is intentional — our goal is to compare the different approaches, even in case of low potential for improvement.

Figure 1: left: Simulation result for F = 20 run with parameters from Table 1. right: Same simulation with F = 500.

If we run simulation for only features, even after only 20 features built, the QUICK method takes double the time to build the same number of features as TESTED as per Figure 1. Not so quick after all.

But if we crank it up to , we start to notice the even larger difference. And it is growing exponentially it seems!

Okay, we kind of understand what’s happening. The technical debt slows down the QUICK method more and more.

If we convert developer hours to dollars by a fixed rate of 40$/hour, we get result as shown in Figure 2. Means rough loss of 1,350,000$ on ~500 features each taking a week to implement. Ouch. Considering we are just a startup, the results we got is not happy news. But then again, perhaps our assumptions are way off.

From now on we will only show this final graph, because money talks after all.

Figure 2: Simulation result for F = 500 run with initial parameters.

Real world is non-deterministic

By fixing the values to some constants or by calculating some deterministic transformation of the value, we do gain some insight, but we only see one possible scenario. That’s why we are going to add something to our model that we have all been waiting for: raNDoMnEsS. That’s right, let us get our hands dirty with some real-world like data and see what happens. I am starting to get that feeling, are you? You know the one… You must know the one. Some people are using stims, we are using sims. And we sure do like it.

Before the goodies, Clap Clap to you for reading this far! Here is the notebook link as promised so you can play around if you want to. I will leave it at the bottom as well.

Since it is randomized simulation, we run each one 20 times and pick the most representative sample. We will do 4 simulations:

  • The same old: Parameters from Table 1
  • Methods closer together: for QUICK
  • We are really slow: for TESTED
  • A lot of debt to reduce: for QUICK

The results are as follows:

Figure 3: Results. First three assume small percentage of technical debt already progressively make TESTED method worse. The last one assumes larger percentage of technical debt.

If we take a look Figure 3 and a first graph which is simulation with our default parameters, we can see that by running the randomized version we save up to $2,400,000. That’s a full funding round!

Conclusion

Recall that we wanted to

Under which conditions is method X profitable in the long-run even though it takes more time short-term?

To sum the results up, after these experiment runs, we see data that roughly shows (in order of graphs shown in Figure 3)

Simulation 1: Even with small values for technical debt factor, where there is little margin for improvement,.it is almost twice as profitable to use the TESTED approach (if done right).

Simulation 2: If the approach TESTED takes even more time compared to QUICK than assumed by default parameters, but still reliefs an existing non-trivial debt burden, it is profitable. Catch is that it will take some time to show the improvements.

Simulation 3: When TESTED takes too much extra time and there is no real technical burden to relief, only then using TESTED makes no sense as it only makes the development process longer.

Simulation 4: When technical debt accumulation variable for QUICK is high, TESTED is a no brainer to use.

We now see that is the dominating variable. Variable can play a role for some time if large enough, so it should not be ignored. We should always strive to have a simple actionable conclusion. Preferably measurable. So, to simplify even more, the question is

How low can you go?

Meaning, how low can you reduce the factor relative to the other method. And also, how much do you perfect and become proficient at your method so that you reduce your .

You can’t control how bad the other method is. But you can control how good your method is. Because of this, it pays off to research various options until you find the sweet spot which enables you develop at the right multiple compared to other method so that you are saving time and money.

The variables you do control by your skill and hard work are your and . For example, testing strategy is of uttermost importance, since if we have redundant tests, we are directly impacting our in a negative way. So we must find a good one. Also if we don’t test the right things, we increase our since bugs will slip under our radars.

And this is where consuming quality content from people that have already went through that journey comes in. This is my initial post to get the ball rolling with something quick and dirty I had in my back pocket. But I do plan on publishing more higher quality content in areas such as teaching coding without code, adaptable code via good software architecture and structure, software development techniques, TDD and more.

If interested, swipe that thunder and let’s get this party started. Following the COVID restrictions of course.

Based on the points listed, there are some actionable conclusions as well:

  • If the current method is good enough, i.e. using tests but not TDD, it might not be worth switching since there is not sufficient technical debt to be reduced.
  • Avoid actions that increase the development time without decreasing the technical debt levels. For example, we should not write redundant tests. This is why you should plan your testing strategy really well. But given good sources and readings, you should be good to go.

Caveats

Even though this model is far from a fully accurate one, I do feel more assured that I tend to gain time and money long-term by using TDD. I also hope to spark somebody’s passion to make an even better and more accurate model.

Better model with more variables would perhaps spit out different numbers, but I am now pretty certain that if the current method is accumulating technical debt and/or not testing code, there is more than enough room to introduce TDD as an approach.

Once again (getting boring) be my guest and fiddle with the model and prove the point in one or the other direction. There are many variables I didn’t consider or assumptions that I could have modeled differently. Maybe I am totally wrong and am living in delusions 😕 You know where to find me if urgent information pops up.

The Math Part

If you are one of those people that are not into it, sorry, it had to come to this at one point. But we just can’t go any further without knowing the model. Heck, we got pretty far without knowing it anyways.

As per The Great Math Modeling Guide, let’s go to Step 1 (that we do as Step 100): Explicitly write the mathematical equation of the model

Equation 1: And then there was light

All clear? Nice, we shall continue with simulations then.

Okay, fine, I can tell you which symbol stands for what:

  • is average feature/change time
  • is method factor
  • is technical debt factor that makes each new feature last longer
  • and are percent of time spent fixing new and old bugs respectively
  • is the time it takes to code the -th feature/change
  • Then total time cost is just a sum it takes time to code each new feature/change. We sum as many features as we want to make the comparison on.
  • , , and are functions that allow us to control and scale duration in hours of each feature. For initial experiments and are identity functions , while is exponential

Resources

Notebook: here

References

[x] Not this time, I have put them in links directly

Heey heey

Wanna connect via Twitter or spiritually?
My Handle: https://twitter.com/faruk_mustafic
My mantra
: If the code comes last, it lasts. Think more, code less.

Thanks for reading this chunk of text, not a small thing at all.
I am Faruk Mustafic, the author of the blogpost,
or sometimes I go by tms1337, the Gradual Simplifier .
All feedback welcome, especially the negative one, since that way you improve.

Swipe me right, the Medium way:

Gradual visual education is an art || If the code comes last, it lasts || { Senior Javascript software engineer in love with functional pipes and TDD }

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store