It's fairly telling of the state of the software industry that the exotic craft of 'fixing bugs' is apparently worth a LinkedIn-style self-promotional blog post.
I don't mean to be too harsh on the author. They mean well. But I am saddened by the wider context, where a dev posts 'we fix bugs occasionally' and everyone is thrilled, because the idea of ensuring software continues to work well over time is now as alien to software dev as the idea of fair dealing is to used car salesmen.
That is why I stand on the side of better law for company responsibilities.
We as industry have taught people that broken products is acceptable.
In any other industry, unless people are from the start getting something they know is broken or low quality, flea market, 1 euro shop, or similar, they will return the product, ask for the money back, sue the company whatever.
> But I am saddened by the wider context, where a dev posts 'we fix bugs occasionally' and everyone is thrilled, because the idea of ensuring software continues to work well over time is now as alien to software dev as the idea of fair dealing is to used car salesmen
This is not the vibe I got from the post at all. I am sure they fix plenty of bugs throughout the rest of the year, but this will be balanced with other work on new features and the like and is going to be guided by wider businesses priorities. It seems the point in the exercise is focusing solely on bugs to the exclusion of everything else, and a lot of latitude to just pick whatever has been annoying you personally.
Sometimes, a "bug" can be caused by nasty architecture with intertwined hacks. Particularly on games, where you can easily have event A that triggers B unless C is in X state...
What I want to say is that I've seen what happens in a team with a history of quick fixes and inadequate architecture design to support the complex features. In that case, a proper bugfix could create significant rework and QA.
I do this frequently. But sometimes identifying and/or fixing takes more than 2 days.
But you hit on a point that seems to come up a lot. When a user story takes longer than the alloted points, I encourage my junior engineers to split it into two bugs. Exactly like what you say... One bug (or issue or story) describing what you did to typify the problem and another with a suggestion for what to do to fix it.
There doesn't seem to be a lot of industry best practice about how to manage this, so we just do whatever seems best to communicate to other teams (and to ourselves later in time after we've forgotten about the bug) what happened and why.
Bug fix times are probably a pareto distribution. The overwhelming majority will be identifiable within a fixed time box, but not all. So in addition to saying "no bug should take more than 2 days" I would add "if the bug takes more than 2 days, you really need to tell someone, something's going on." And one of the things I work VERY HARD to create is a sense of psychological safety so devs know they're not going to lose their bonus if they randomly picked a bug that was much more wicked than anyone thought.
At Amazon we had a bug that was the result of a compiler bug and the behaviour of intel cores being mis-documented. It was intermittent and related to one core occasionally being allowed to access stale data in the cache. We debugged it with a logic analyzer, the commented nginx source and a copy of the C++ 11 spec.
Sometimes you find the cause of the bug in 5 minutes because its precisely where you thought it was, sometimes its not there and you end up writing some extra logging to hopefully expose its cause in production after the next release because you can't reproduce as its transient. I don't know how to predict how long a bug will take to reproduce and track down and only once its understood do we know how long it takes to fix.
LLMs have helped me here the most. Adding copious detailed logging across the app on demand, then inspecting the logs to figure out the bug and even how to reproduce it.
> It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.
In my experience there are two types of low-priority bugs (high-priority bugs just have to be fixed immediately no matter how easy or hard they are).
1. The kind where I facepalm and go “yup, I know exactly what that is”, though sometimes it’s too low of a priority to do it right now, and it ends up sitting on the backlog forever. This is the kind of bug the author wants to sweep for, they can often be wiped out in big batches by temporarily making bug-hunting the priority every once in a while.
2. The kind where I go “Hmm, that’s weird, that really shouldn’t happen.” These can be easy and turn into a facepalm after an hour of searching, or they can turn out to be brain-broiling heisenbugs that eat up tons of time, and it’s difficult to figure out which. If you wipe out a ton of category 1 bugs then trying to sift through this category for easy wins can be a good use of time.
And yeah, sometimes a category 1 bug turns out to be category 2, but that’s pretty unusual. This is definitely an area where the perfect is the enemy of the good, and I find this mental model to be pretty good.
> It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.
This is explained later in the post. The 2 day hard limit is applied not to the estimate but rather to the actual work: "If something is ballooning, cut your losses. File a proper bug, move it to the backlog, pick something else."
Most of the work in finding/fixing bugs is reproducing them reliably enough to determine the root cause.
Once I find a bug, the fix is often negligible.
But I can get into a rabbithole, tracking down the root cause. I don’t know if I’ve ever spent more than a day, trying to pin down a bug, but I have walked away from rabbitholes, a couple of times. I hate doing that. Leaves an unscratchable itch.
I don’t. I worked on firmware stuff where unexplainable behavior occurs; digging around the code, you start to feel like it’s going to take some serious work to even start to comprehend the root cause; and suddenly you find the one line of code that sets the wrong byte somewhere as a side effect, and what you thought would fill up your week ended up taking 2 hours.
I just find it so oversimplified that I can't believe you're sincere. Like you have entirely no internal heuristic for even a coarse estimation of a few minutes, hours, or days? I would say you're not being very introspective or are just exaggerating.
Working on drivers, a relatively recent example is when we started looking at a "small" image corruption issue in some really specific cases, that slowly spidered out to what was fundamentally a hardware bug affecting an entire class of possible situations, it was just this one case happened to be noticed first.
There was even talk about a hardware ECO at points during this, though an acceptable workaround was eventually found.
I could never have predicted that when I started working on it, and it seemed every time we thought we'd got a decent idea about what was happening even more was revealed.
And then there's been many other issues when you fall onto the cause pretty much instantly and a trivial fix can be completed and in testing faster than updating the bugtracker with an estimate.
True there's probably a decent amount, maybe even 50%, where you can probably have a decent guess after putting in some length of time and be correct within a factor of 2 or so, but I always felt the "long tail" was large enough to make that pretty damn inaccurate.
My team once encountered a bug that was due to a supplier misstating the delay timing needed for a memory chip.
The timings we had in place worked, for most chips, but they failed for a small % of chips in the field. The failure was always exactly identical, the same memory address for corrupted, so it looked exactly like an invalid pointer access.
It took multiple engineers months of investigating to finally track down the root cause.
But what was the original estimate? And even so I'm not saying it must be completely and always correct. I'm saying it seems wild to have no starting point, to simply give up.
Have you ever fixed random memory corruption in an OS without memory protection?
Best case you trap on memory access to an address if your debugger supports it (ours didn't). Worst case you go through every pointer that is known to access nearby memory and go over the code very very carefully.
Of course it doesn't have to be a nearby pointer, it can be any pointer anywhere in the code base causing the problem, you just hope it is a nearby pointer because the alternative is a needle in a haystack.
I forget how we did find the root cause, I think someone may have just guessed bit flip in a pointer (vs overrun) and then un-bit-flipped every one of the possible bits one by one (not that many, only a few MB of memory so not many active bits for pointers...) and seen what was nearby (figuring what the originally intended address of the pointer was) and started investigating what pointer it was originally supposed to be.
Then after confirming it was a bit flip you have to figure out why the hell a subset of your devices are reliably seeing the exact same bit flipped, once every few days.
So to answer your question, you get a bug (memory is being corrupted), you do an initial investigation, and then provide an estimate. That estimate can very well be "no way to tell".
The principal engineer on this particular project (Microsoft Band) had a strict 0 user impacting bugs rule. Accordingly, after one of my guys spend a couple weeks investigating, the principal engineer assigned one of the top firmware engineers in the world to track down this one bug and fix it. It took over a month.
Where have you worked where this was practiced if you don’t mind sharing?
I’ve seen very close to bug free backends (more early on in development). But every frontend code base ever just always seems to have a long list of low impact bugs. Weird devices, a11y things, unanticipated screen widths, weird iOS safari quirks and so on.
Also I feel like if this was official policy, many managers would then just start classifying whatever they wanted done as a bug (and the line can be somewhat blurry anyway). So curious if that was an issue that needed dealing with.
I've had to inform leadership that stability is a feature, just like anything else, and that you can't just expect it to happen without giving it time.
One leader kind of listened. Sort of. I'm pretty sure I was lucky.
Bugs have priorities associated with them, too. It's reasonable for a new feature to be more important than fixing a lower priority bug. For example, if reading the second "page" of results for an API isn't working correctly; but nobody is actually using that functionality; then it might not be that important to fix it.
I'd love to see an actual bug-free codebase. People who state the codebase in bug-free probably just lack awareness. Even stating we 'have only x bugs' is likely not true.
> The type that claims they're going to achieve zero known and unknown bugs is also going to be the type to get mad at people for finding bugs.
This is usually EMs in my experience.
At my last job, I remember reading a codebase that was recently written by another developer to implement something in another project, and found a thread safety issue. When I brought this up and how we’ll push this fix as part of the next release, he went on a little tirade about how proper processes weren’t being followed, etc. although it was a mistake anyone could have made.
In your experience, is there a lot of contention over whether a given issue counts as a bug fix or a feature/improvement? In the article, some of the examples were saving people a few clicks in a frequent process, or updating documentation. Naively, I expect that in an environment where bug fixes get infinite priority, those wouldn't count as bugs, so they would potentially stick around forever too.
This is the 'Zero Defects'[1] mode of development. A Microsoft department adopted it in 1989 after their product quality dropped. (Balmer is cc'd on the memo.)
In my experience, having a fixit week on the calendar encourages teams to just defer what otherwise could be done relatively easily at first report. ("ah we'll get to it in fixit week"). Sometimes it's a PM justifying putting their feature ahead of product quality, other times it's because a dev thinks they're lining up work for an anticipated new hire's onboarding. It's even hinted at in the article ('All year round, we encourage everyone to tag bugs as “good fixit candidates” as they encounter them.')
My preferred approach is to explicitly plan in 'keep the lights on' capacity into the quarter/sprint/etc in much the same way that oncall/incident handling is budgeted for. With the right guidelines, it gives the air cover for an engineer to justify spending the time to fix it right away and builds a culture of constantly making small tweaks.
That said, I totally resonate with the culture aspect - I think I'd just expand the scope of the week-long event to include enhancements and POCs like a quasi hackathon
We do this too sometimes and I love it. When I work on my own projects I always stop and refactor/fix problems before adding any new features. I wish companies would see the value in doing this
Also love the humble brag. "I've just closed my 12th bug" and later "12 was maximum number of bugs closed by one person"
We did this ages ago at our company (back then we were making silly Facebook games, remember those?)
It was by far the most fun, productive, and fulfilling week.
It went on to shape the course of our development strategy when I started my own company. Regularly work on tech debt and actively applaud it when others do it too.
You criticize the initiative because you judge it doesn't have impact on the product or business. I would challenge the assumption with the claim that a sense of acconplishment, of decision-making and of completion are strong retention and productivity enhancers. Therefore, they're absolutely, albeit indirectly, impacting product and business.
I've never understood why bugs get treated differently from new features. If there was a bug, the old feature was never completed. The time cost and benefits should be considered equally.
Because the goal of most businesses is not to create complete features. There's only actions in response to the repeated question of "which next action do we think will lead us to the most money"?
A company I worked at also did this, though there was no limits. Some folks would choose to spend the whole week working on a larger refactor, for example, I unified all of our redis usage to use a single modern library compared to the mess of 3 libraries of various ages across our codebase. This was relatively easy, but tedious, and required some new tests/etc.
Overall, I think this kind of thing is very positive for the health of building software, and morale to show that it is a priority to actually address these things.
I'm a bit torn on Fix-it weeks. They are nice but many bugs simply aren't worth fixing. Generally, if they were worth fixing - they would have been fixed.
I do appreciate though that certain people, often very good detail oriented engineers, find large backlogs incredibly frustrating so I support fix-it weeks even if there isn't clear business ROI.
> Generally, if they were worth fixing - they would have been fixed.
???
Basically any major software product accumulates a few issues over time. There's always a "we can fix that later" mindset and it all piles up. MacOS and Windows are both buggy messes. I think I speak for the vast majority of people when I say that I'd prefer they have a fix-it year and just get rid of all the issues instead of trying to rush new features out the door.
Maybe rushing out features is good for more money now, but someday there'll be a straw that breaks the camel's back and they'll need to devote a lot of time to fix things or their products will be so bad that people will move to other options.
>For iOS 27 and next year’s other major operating system updates — including macOS 27 — the company is focused on improving the software’s quality and underlying performance.
A greedy algorithm (in the academic sense, although I suppose also in the colloquial sense) isn't the optimal solution to every problem. Sometimes doing the next most valuable thing at a given step can still lead you down a path where you're stuck at a local optimum, and the only way to get somewhere better is to do something that might not be the most valuable thing measured at the current moment only; fixing bugs is the exact type of thing that sometimes has a low initial return but can pay dividends down the line.
I just had a majorly fun time addressing tech debt, deleting about 15k lines-of-code from a codebase that now has ~45k lines of implementation, and 50k lines of tests. This was made possible by moving from a homegrown auth system to Clerk, as well as consolidating some Cloudflare workers, and other basic stuff. Not as fun as creating the tech debt in the first place, but much more satisfying. Open source repo if you like to read this sort of thing: https://github.com/VibesDIY/vibes.diy/pull/582
I wanted to take a look at some of these bug fixes, and one of the linked ones [1] seems more like a feature to me. So maybe it should be the week of "low priority" issues, or something like that.
I don't mean to sound negative, I think it's a great idea. I do something like this at home from time to time. Just spend a day repairing and fixing things. Everything that has accumulated.
> We also have a “points system” for bugs and a leaderboard showing how many points people have. [...] It’s a simple structure, but it works surprisingly well.
What good and bad experiences have people had with software development metrics leaderboards?
One nice thing if you work on the B2B software side - end of year is generally slow in terms of new deals. Definitely a good idea to schedule bug bashes, refactors, and general tech debt payments with greater buy in from the business
Focused bug-fixing weeks like this really help improve product quality and team morale. It’s impressive to see the impact when everyone pitches in on these smaller but important issues that often get overlooked.
We’ve done little mini competitions like this at my company, and it’s always great for morale. Celebrating tiny wins in a light, semi-competitive way goes a long way for elevating camaraderie. Love it!
They said they only pick bugs that take 2 days to fix.
Places where you can move fast and actually do things are actually far better places to work for. I mean the ones were you can show up, do 5 hours of really good work, and then slack off/leave a little early.
I can find more of these that I've run into if I look. I've had tricky bugs in my team's code too, but those don't result in public artifacts, and I'm responsible for all the code that runs on my server, regardless of who wrote it... And I also can't crash client code, regardless of who wrote it, even if my code just follows the RFC.
Oh. Well, I've done easy fixes too. There's plenty of things that just need a couple minutes, like a copy error somewhere.
Or just an hour or two. I can't find it anymore, but I've run into libraries where simple things with months didn't work, because like May only has three letters or July and June both start with Ju. That can turn into a big deal, but often it's easy, once someone notices it.
So much of the tech debt work scheduling feels like a coordination or cover problem. We’re overdue for a federal “Tech Debt Week” holiday once a year, and just save people all the hand-wringing of how when or how much. If big tech brands can keep affording to celebrate April fools jokes, they can afford to celebrate this.
It's fairly telling of the state of the software industry that the exotic craft of 'fixing bugs' is apparently worth a LinkedIn-style self-promotional blog post.
I don't mean to be too harsh on the author. They mean well. But I am saddened by the wider context, where a dev posts 'we fix bugs occasionally' and everyone is thrilled, because the idea of ensuring software continues to work well over time is now as alien to software dev as the idea of fair dealing is to used car salesmen.
That is why I stand on the side of better law for company responsibilities.
We as industry have taught people that broken products is acceptable.
In any other industry, unless people are from the start getting something they know is broken or low quality, flea market, 1 euro shop, or similar, they will return the product, ask for the money back, sue the company whatever.
> But I am saddened by the wider context, where a dev posts 'we fix bugs occasionally' and everyone is thrilled, because the idea of ensuring software continues to work well over time is now as alien to software dev as the idea of fair dealing is to used car salesmen
This is not the vibe I got from the post at all. I am sure they fix plenty of bugs throughout the rest of the year, but this will be balanced with other work on new features and the like and is going to be guided by wider businesses priorities. It seems the point in the exercise is focusing solely on bugs to the exclusion of everything else, and a lot of latitude to just pick whatever has been annoying you personally.
I love the idea, but this line:
> 1) no bug should take over 2 days
Is odd. It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.
That said, unless fixing a bug requires a significant refactor/rewrite, I can’t imagine spending more than a day on one.
Also, I tend to attack bugs by priority/severity, as opposed to difficulty.
Some of the most serious bugs are often quite easy to find.
Once I find the cause of a bug, the fix is usually just around the corner.
Sometimes, a "bug" can be caused by nasty architecture with intertwined hacks. Particularly on games, where you can easily have event A that triggers B unless C is in X state...
What I want to say is that I've seen what happens in a team with a history of quick fixes and inadequate architecture design to support the complex features. In that case, a proper bugfix could create significant rework and QA.
In that case, maybe having bug fixing be a two-step process (identify, then fix), might be sensible.
I do this frequently. But sometimes identifying and/or fixing takes more than 2 days.
But you hit on a point that seems to come up a lot. When a user story takes longer than the alloted points, I encourage my junior engineers to split it into two bugs. Exactly like what you say... One bug (or issue or story) describing what you did to typify the problem and another with a suggestion for what to do to fix it.
There doesn't seem to be a lot of industry best practice about how to manage this, so we just do whatever seems best to communicate to other teams (and to ourselves later in time after we've forgotten about the bug) what happened and why.
Bug fix times are probably a pareto distribution. The overwhelming majority will be identifiable within a fixed time box, but not all. So in addition to saying "no bug should take more than 2 days" I would add "if the bug takes more than 2 days, you really need to tell someone, something's going on." And one of the things I work VERY HARD to create is a sense of psychological safety so devs know they're not going to lose their bonus if they randomly picked a bug that was much more wicked than anyone thought.
You sound like a great team leader.
Wish there were more like you, out there.
At Amazon we had a bug that was the result of a compiler bug and the behaviour of intel cores being mis-documented. It was intermittent and related to one core occasionally being allowed to access stale data in the cache. We debugged it with a logic analyzer, the commented nginx source and a copy of the C++ 11 spec.
It took longer than 2 days to fix.
I’m old enough to have used ICEs to trace program execution.
They were damn cool. I seriously doubt that something like that, exists outside of a TSMC or Intel lab, these days.
ICE meaning in-circuit emulator in this instance, I assume?
What kind of LA did you use to de bug an Intel core?
Sometimes you find the cause of the bug in 5 minutes because its precisely where you thought it was, sometimes its not there and you end up writing some extra logging to hopefully expose its cause in production after the next release because you can't reproduce as its transient. I don't know how to predict how long a bug will take to reproduce and track down and only once its understood do we know how long it takes to fix.
I find most bugs take less time to fix than it takes time to verify and reproduce.
LLMs have helped me here the most. Adding copious detailed logging across the app on demand, then inspecting the logs to figure out the bug and even how to reproduce it.
Yes. I often just copy the whole core dump, and feed it into the prompt.
> It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.
In my experience there are two types of low-priority bugs (high-priority bugs just have to be fixed immediately no matter how easy or hard they are).
1. The kind where I facepalm and go “yup, I know exactly what that is”, though sometimes it’s too low of a priority to do it right now, and it ends up sitting on the backlog forever. This is the kind of bug the author wants to sweep for, they can often be wiped out in big batches by temporarily making bug-hunting the priority every once in a while.
2. The kind where I go “Hmm, that’s weird, that really shouldn’t happen.” These can be easy and turn into a facepalm after an hour of searching, or they can turn out to be brain-broiling heisenbugs that eat up tons of time, and it’s difficult to figure out which. If you wipe out a ton of category 1 bugs then trying to sift through this category for easy wins can be a good use of time.
And yeah, sometimes a category 1 bug turns out to be category 2, but that’s pretty unusual. This is definitely an area where the perfect is the enemy of the good, and I find this mental model to be pretty good.
> It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.
This is explained later in the post. The 2 day hard limit is applied not to the estimate but rather to the actual work: "If something is ballooning, cut your losses. File a proper bug, move it to the backlog, pick something else."
Most of the work in finding/fixing bugs is reproducing them reliably enough to determine the root cause.
Once I find a bug, the fix is often negligible.
But I can get into a rabbithole, tracking down the root cause. I don’t know if I’ve ever spent more than a day, trying to pin down a bug, but I have walked away from rabbitholes, a couple of times. I hate doing that. Leaves an unscratchable itch.
Bugs taking less than 2 days are great to have as a target but will not be something that can be guaranteed.
Next up: a new programming language or methodology that guarantees all bugs take less than two days to fix.
> It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.
Now I find that odd.
I don’t. I worked on firmware stuff where unexplainable behavior occurs; digging around the code, you start to feel like it’s going to take some serious work to even start to comprehend the root cause; and suddenly you find the one line of code that sets the wrong byte somewhere as a side effect, and what you thought would fill up your week ended up taking 2 hours.
And sometimes, the exact opposite happens.
Yeah, I’m obviously a terrible programmer. Ya got me.
I just find it so oversimplified that I can't believe you're sincere. Like you have entirely no internal heuristic for even a coarse estimation of a few minutes, hours, or days? I would say you're not being very introspective or are just exaggerating.
I think it's very sector dependent.
Working on drivers, a relatively recent example is when we started looking at a "small" image corruption issue in some really specific cases, that slowly spidered out to what was fundamentally a hardware bug affecting an entire class of possible situations, it was just this one case happened to be noticed first.
There was even talk about a hardware ECO at points during this, though an acceptable workaround was eventually found.
I could never have predicted that when I started working on it, and it seemed every time we thought we'd got a decent idea about what was happening even more was revealed.
And then there's been many other issues when you fall onto the cause pretty much instantly and a trivial fix can be completed and in testing faster than updating the bugtracker with an estimate.
True there's probably a decent amount, maybe even 50%, where you can probably have a decent guess after putting in some length of time and be correct within a factor of 2 or so, but I always felt the "long tail" was large enough to make that pretty damn inaccurate.
My team once encountered a bug that was due to a supplier misstating the delay timing needed for a memory chip.
The timings we had in place worked, for most chips, but they failed for a small % of chips in the field. The failure was always exactly identical, the same memory address for corrupted, so it looked exactly like an invalid pointer access.
It took multiple engineers months of investigating to finally track down the root cause.
But what was the original estimate? And even so I'm not saying it must be completely and always correct. I'm saying it seems wild to have no starting point, to simply give up.
Have you ever fixed random memory corruption in an OS without memory protection?
Best case you trap on memory access to an address if your debugger supports it (ours didn't). Worst case you go through every pointer that is known to access nearby memory and go over the code very very carefully.
Of course it doesn't have to be a nearby pointer, it can be any pointer anywhere in the code base causing the problem, you just hope it is a nearby pointer because the alternative is a needle in a haystack.
I forget how we did find the root cause, I think someone may have just guessed bit flip in a pointer (vs overrun) and then un-bit-flipped every one of the possible bits one by one (not that many, only a few MB of memory so not many active bits for pointers...) and seen what was nearby (figuring what the originally intended address of the pointer was) and started investigating what pointer it was originally supposed to be.
Then after confirming it was a bit flip you have to figure out why the hell a subset of your devices are reliably seeing the exact same bit flipped, once every few days.
So to answer your question, you get a bug (memory is being corrupted), you do an initial investigation, and then provide an estimate. That estimate can very well be "no way to tell".
The principal engineer on this particular project (Microsoft Band) had a strict 0 user impacting bugs rule. Accordingly, after one of my guys spend a couple weeks investigating, the principal engineer assigned one of the top firmware engineers in the world to track down this one bug and fix it. It took over a month.
This is weird to me...
The way I learned the trade, and usually worked, is that bug fixing always comes first!
You don't work on new features until the old ones work as they should.
This worked well for the teams I was on. Having a (AFAYK) bug free code base is incredibly useful!!
Where have you worked where this was practiced if you don’t mind sharing?
I’ve seen very close to bug free backends (more early on in development). But every frontend code base ever just always seems to have a long list of low impact bugs. Weird devices, a11y things, unanticipated screen widths, weird iOS safari quirks and so on.
Also I feel like if this was official policy, many managers would then just start classifying whatever they wanted done as a bug (and the line can be somewhat blurry anyway). So curious if that was an issue that needed dealing with.
Depending on the size of the team/org/company, working on anything other than the next feature is a hard sell to PM/PO/PgM/management.
I've had to inform leadership that stability is a feature, just like anything else, and that you can't just expect it to happen without giving it time.
One leader kind of listened. Sort of. I'm pretty sure I was lucky.
That's what I hear.
I've had some mix of luck and skill in finding these jobs. Working with people you've worked with before helps with knowing what you're in for.
I also don't really ask anyone, I just fix any bugs I find. That may not work in all organizations :)
Bugs have priorities associated with them, too. It's reasonable for a new feature to be more important than fixing a lower priority bug. For example, if reading the second "page" of results for an API isn't working correctly; but nobody is actually using that functionality; then it might not be that important to fix it.
I'd love to see an actual bug-free codebase. People who state the codebase in bug-free probably just lack awareness. Even stating we 'have only x bugs' is likely not true.
Top commenter's "AFAYK" acronym is covering that.
The type that claims they're going to achieve zero known and unknown bugs is also going to be the type to get mad at people for finding bugs.
> The type that claims they're going to achieve zero known and unknown bugs is also going to be the type to get mad at people for finding bugs.
This is usually EMs in my experience.
At my last job, I remember reading a codebase that was recently written by another developer to implement something in another project, and found a thread safety issue. When I brought this up and how we’ll push this fix as part of the next release, he went on a little tirade about how proper processes weren’t being followed, etc. although it was a mistake anyone could have made.
We kinda always leave documentation and test bugs in. Documentation teams have different scheduling, and tests are nice TODO's.
There are also always bugs detected after shipping (usually in beta), which need to be accounted for.
https://github.com/kelseyhightower/nocode
>I'd love to see an actual bug-free codebase.
cat /dev/null .
In the places that I worked, features came before all else, and bugs weren't fixed unless customers complain
In your experience, is there a lot of contention over whether a given issue counts as a bug fix or a feature/improvement? In the article, some of the examples were saving people a few clicks in a frequent process, or updating documentation. Naively, I expect that in an environment where bug fixes get infinite priority, those wouldn't count as bugs, so they would potentially stick around forever too.
In my world, improving the UI to save clicks is a new feature, not a bug fix.
Assuming it works as intended.
This is the 'Zero Defects'[1] mode of development. A Microsoft department adopted it in 1989 after their product quality dropped. (Balmer is cc'd on the memo.)
1. https://sriramk.com/memos/zerodef.pdf
As opposed to the current 100% defects approach they seem to have adopted.
In my experience, having a fixit week on the calendar encourages teams to just defer what otherwise could be done relatively easily at first report. ("ah we'll get to it in fixit week"). Sometimes it's a PM justifying putting their feature ahead of product quality, other times it's because a dev thinks they're lining up work for an anticipated new hire's onboarding. It's even hinted at in the article ('All year round, we encourage everyone to tag bugs as “good fixit candidates” as they encounter them.')
My preferred approach is to explicitly plan in 'keep the lights on' capacity into the quarter/sprint/etc in much the same way that oncall/incident handling is budgeted for. With the right guidelines, it gives the air cover for an engineer to justify spending the time to fix it right away and builds a culture of constantly making small tweaks.
That said, I totally resonate with the culture aspect - I think I'd just expand the scope of the week-long event to include enhancements and POCs like a quasi hackathon
We do this too sometimes and I love it. When I work on my own projects I always stop and refactor/fix problems before adding any new features. I wish companies would see the value in doing this
Also love the humble brag. "I've just closed my 12th bug" and later "12 was maximum number of bugs closed by one person"
We did this ages ago at our company (back then we were making silly Facebook games, remember those?)
It was by far the most fun, productive, and fulfilling week.
It went on to shape the course of our development strategy when I started my own company. Regularly work on tech debt and actively applaud it when others do it too.
False sense of accomplishment.
Doing what you want to do instead of what you should doing (hint: you should be busy making money).
Inability to triage and live with imperfections.
Not prioritizing business and democratizing decision making.
You criticize the initiative because you judge it doesn't have impact on the product or business. I would challenge the assumption with the claim that a sense of acconplishment, of decision-making and of completion are strong retention and productivity enhancers. Therefore, they're absolutely, albeit indirectly, impacting product and business.
I've never understood why bugs get treated differently from new features. If there was a bug, the old feature was never completed. The time cost and benefits should be considered equally.
If the bug affects 1 customer and the feature affects the rest, is the old feature complete?
It's not binary.
Because the goal of most businesses is not to create complete features. There's only actions in response to the repeated question of "which next action do we think will lead us to the most money"?
Bugs can get introduced for other reasons besides “feature not completed”.
until we develop a way for MBA's with spreadsheets to quantify profit/loss w.r.t. bugs, it will never be valued.
The solution is to never hire an MBA.
A company I worked at also did this, though there was no limits. Some folks would choose to spend the whole week working on a larger refactor, for example, I unified all of our redis usage to use a single modern library compared to the mess of 3 libraries of various ages across our codebase. This was relatively easy, but tedious, and required some new tests/etc.
Overall, I think this kind of thing is very positive for the health of building software, and morale to show that it is a priority to actually address these things.
I'm a bit torn on Fix-it weeks. They are nice but many bugs simply aren't worth fixing. Generally, if they were worth fixing - they would have been fixed.
I do appreciate though that certain people, often very good detail oriented engineers, find large backlogs incredibly frustrating so I support fix-it weeks even if there isn't clear business ROI.
> Generally, if they were worth fixing - they would have been fixed.
???
Basically any major software product accumulates a few issues over time. There's always a "we can fix that later" mindset and it all piles up. MacOS and Windows are both buggy messes. I think I speak for the vast majority of people when I say that I'd prefer they have a fix-it year and just get rid of all the issues instead of trying to rush new features out the door.
Maybe rushing out features is good for more money now, but someday there'll be a straw that breaks the camel's back and they'll need to devote a lot of time to fix things or their products will be so bad that people will move to other options.
Oh boy, I’d trade one(or easily 2/3) major MacOs version for a year worth of bug fixes in a heartbeat.
You got it per Gurman:
>For iOS 27 and next year’s other major operating system updates — including macOS 27 — the company is focused on improving the software’s quality and underlying performance.
-via Bloomberg today
A greedy algorithm (in the academic sense, although I suppose also in the colloquial sense) isn't the optimal solution to every problem. Sometimes doing the next most valuable thing at a given step can still lead you down a path where you're stuck at a local optimum, and the only way to get somewhere better is to do something that might not be the most valuable thing measured at the current moment only; fixing bugs is the exact type of thing that sometimes has a low initial return but can pay dividends down the line.
I just had a majorly fun time addressing tech debt, deleting about 15k lines-of-code from a codebase that now has ~45k lines of implementation, and 50k lines of tests. This was made possible by moving from a homegrown auth system to Clerk, as well as consolidating some Cloudflare workers, and other basic stuff. Not as fun as creating the tech debt in the first place, but much more satisfying. Open source repo if you like to read this sort of thing: https://github.com/VibesDIY/vibes.diy/pull/582
I would be weirdly happy to have a role whose entire job was literally just deleting code. It is extremely satisfying.
[dead]
I wanted to take a look at some of these bug fixes, and one of the linked ones [1] seems more like a feature to me. So maybe it should be the week of "low priority" issues, or something like that.
I don't mean to sound negative, I think it's a great idea. I do something like this at home from time to time. Just spend a day repairing and fixing things. Everything that has accumulated.
1: https://github.com/google/perfetto/issues/154
To be fair, the blog post does not explicitly say anywhere that the week was for bug fixes only.
> We also have a “points system” for bugs and a leaderboard showing how many points people have. [...] It’s a simple structure, but it works surprisingly well.
What good and bad experiences have people had with software development metrics leaderboards?
One nice thing if you work on the B2B software side - end of year is generally slow in terms of new deals. Definitely a good idea to schedule bug bashes, refactors, and general tech debt payments with greater buy in from the business
FYI, this article describes how traditional Google fixit was conducted: https://mike-bland.com/2011/10/04/fixits.html
Focused bug-fixing weeks like this really help improve product quality and team morale. It’s impressive to see the impact when everyone pitches in on these smaller but important issues that often get overlooked.
We’ve done little mini competitions like this at my company, and it’s always great for morale. Celebrating tiny wins in a light, semi-competitive way goes a long way for elevating camaraderie. Love it!
189 bugs in one week. How many employees quit after that?
They said they only pick bugs that take 2 days to fix.
Places where you can move fast and actually do things are actually far better places to work for. I mean the ones were you can show up, do 5 hours of really good work, and then slack off/leave a little early.
Too bad many places care more about how long you stay warming the seat than how useful the work done actually is.
Nothing takes 2 days to fix. Those are definitely not bugs, like someone else mentioned
You haven't seen the same kind of bugs I have, I guess.
This kind of thing takes more than 2 days to fix, unless you're really good.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217637
Or this one
https://security.stackexchange.com/questions/104845/dhe-rsa-...
I can find more of these that I've run into if I look. I've had tricky bugs in my team's code too, but those don't result in public artifacts, and I'm responsible for all the code that runs on my server, regardless of who wrote it... And I also can't crash client code, regardless of who wrote it, even if my code just follows the RFC.
That's what I'm saying. Nothing takes 2 days to fix meaning it takes more time
Oh. Well, I've done easy fixes too. There's plenty of things that just need a couple minutes, like a copy error somewhere.
Or just an hour or two. I can't find it anymore, but I've run into libraries where simple things with months didn't work, because like May only has three letters or July and June both start with Ju. That can turn into a big deal, but often it's easy, once someone notices it.
189 presumably
How did you not get fired?
So much of the tech debt work scheduling feels like a coordination or cover problem. We’re overdue for a federal “Tech Debt Week” holiday once a year, and just save people all the hand-wringing of how when or how much. If big tech brands can keep affording to celebrate April fools jokes, they can afford to celebrate this.
Fixing bugs before new code can shed interesting lights on how a dev team can become more effective.
hello b/Googler :)