Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

Ask HN: Have you ever worked on a product that was killed by technical debt?

$
0
0

I'm currently working on the reincarnation of a project that was killed by technical debt -- TWICE.

The original codebase was about 20 years old. It was control code for something best described as an industrial robot. Written for the last 20 years by greybeards who knew a lot about the manufacturing process, and were reasonably good at getting a product out the door.

But the whole thing was riddled with #ifdefs for this customer or that, or one batch of machines or another. All long forgotten, written by people who had since left, or been pensioned. It was in dire need of improvement and extension, but it would have been superhuman to inject new features into this rat's nest. Plus their electronics supplier was discontinuing the control electronics the system was designed for. The UI also looked like it had been designed by German engineers in the 1980s. Which was the case.

So they made the defensible decision to start from scratch. A team of engineers was to develop an brand new machine, with all new electronics and all new code. They got to work -- and had to scrap the new software about three years in. It was just utterly misdesigned, and riddled with bugs. It featured wonderful WTFs like the embedded realtime code depending on the Qt libraries.

I observed its instability myself: it would just spontaneously crash every five minutes, sometimes just while idling. Once the project lead was on holiday, the programmers revolted, went to the head of the company, and the project lead found himself without a project on his return. Whee.

Now we've started from scratch again, and have at least succeeded in making different mistakes this time around. Fingers crossed, this might end up working.


It happened to me twice. The first time was in a start-up at the beginning of the century, we were developing an electronic health record and we had outsourced the database abstraction layer to a company in Greece. In the beginning things went fine but after a while the development of the DAL went slower and slower and it became unstable as well. Eventually the word came out: the main developer of the DAL framework had left the company and, according to the Greek CEO, she had been 'too smart' which meant that nobody understood her code. They had tried adding features but that had made things only worse and the DAL had started to crash randomly. We tried to take over the framework by ourselves but it was written in Eiffel and the code was a horrible entangled mess. Eventually we rewrote it in Java but, being a start-up, we lost too much precious time already and eventually went almost bankrupt and were bought up by a competitor.

The second time was in a small company whose product was a search engine for consumers. The web layer was written in a mixture of JSF, JQuery and Ajax. While that combination already slowed down development on the front end, the main problem was the performance of JSF on the server. Because JSF is rendered on the backend, it placed a massive load on our server for certain heavily used pages and we just couldn't scale any further. Skipping JSF for a framework that was rendered on the front-end would be the solution but that was a massive refactor for which the company just didn't have enough resources. Eventually the company had to skip their search product and change their business model to a more community based website.


This sounds less like technical debt, and more like liabilities of over engineering. Possibly feature creep.

That is, technical debt is not necessarily tangled over-engineered code. It is more compromises that were made to actually ship and operate in the world. You can see this in the world with devices.

Consider, technical debt is the reason you have AC delivered to your house going through as many converters as you do devices. Often to the same target power characteristics for those devices. It is not the reason that your coffee machine that also grinds and whatever, is likely to fail within the year.

Another example; Technical debt is the reason we are still predominantly using petrol for automobiles. It is not the reason the dashboards are horribly non-responsive on modern cars.


Yes and no.

The project wasn't killed specifically because "you have technical debt". It was killed because there was no way for anyone to be effective with the combination of poor undocumented code.

"We need to change the email message that goes out when someone registers". This took a team of (4?) people 5 calendar days to change. As a contractor, I had to vpn in to one system, then remote desktop over another vpn to another system. Building web apps, these dev systems were not allowed to talk to the internet at all, so things like pulling external dependencies (security libraries, templating libraries, etc) was impossible - pretty much everything was handrolled, largely due to this restriction.

The last big killer was that the system was not passing accessibility audits. Trying to determine where to make a change to any single element would take minutes to hours, vs seconds to minutes you'd normally expect. Much of the 'templates' used were the result of a SQL statement joining 12 tables (html_meta, html_form, html_link, html_grid, etc) and complex concat()s, so adding a page or making a change might take an hour to track down the appropriate collection of tables, then figure out a SQL script to run, then send it to the person who had permissions to make updates to the SQL, then wait and see.

Did the technical debt itself kill the project? Technically no, but the inability to do anything productive in a reasonable amount of time forced the project to shut down.


This account kind of comfort me in what I think of technical debt : most of time, the problem is most likely lack of documentation than anything else.

I don't see how a big project could be coded without containing anything specific to the project. And even then, the architecture by itself is unique and deserves documentation.


Templates stored across a database is probably the worst thing I've seen repeatedly across projects. Just because a database can store everything doesn't mean it has to.

Some people really seem(ed) to have an allergy to plain files for storage. A plain file with OS level caching will beat most (if not all) databases for static content. But doesn't sound as fancy, so it's probably harder to charge a lot of money for it.


Template in one database table I can live with (pros and cons, multiple front-ends, etc). One template broken up in to 12 tables requiring an 100+ line SQL statement with concat()s and HTML interspersed is insane. Had there been an API or utilities with it to manage it, it might have been manageable, but nope - just "write some queries".

Also, just repeated your comment to a friend who said "that's the worst thing you've seen? can i have your job?" :)


The closest I've come was a Rails project I inherited from a star developer who had just left the company. It was a B2B project that involved importing large Excel spreadsheets of various different formats into a standardized database for itemized review.

The code was pretty sloppy, but didn't deviate much from standard Rails idioms. Not many people on the team understood Rails well enough to read it, but I did. Bug reports were constantly flooding in. I suggested taking a sprint to build up an integration test suite and then letting loose on the backlog.

We did build up a sufficient test suite in one sprint. But the bug reports never slowed. By the time we had the confidence to truly start tackling bugs at speed, the battle had been lost. We had been so busy writing tests that we forgot to manage the bug tracker. The impression was that we were overwhelmed and unable to make progress. The project was swiftly closed.

People remembered that codebase as an exemplar of sloppy code and technical debt, but that's not the lesson I took from it. I had seen, and others would see later, much worse. The lesson I took was that perceptions are as important to manage as results.


The last place I worked at will die because it will take them years to migrate from Oracle to postgres due to "technical debt" (the codebase is coupled with the database to a hilarious degree; business logic in triggers, huge plsql packages, plain sql queries in the java codebase, halfassed homerolled ORM). They're not getting as many new customers as they could because, for various reasons, the Oracle licensing terms are now unacceptable for the new customers they have been in contact with over the last two years.

That's the most concrete reason I can come up with why the technical debt will kill them, but there's plenty of vaguer reasons why it's been killing them for the past 5 years and will finish them off over the next 5. The attrition rate have been around 20% a year since I joined. For most of the time I worked there they compensated somewhat by hiring new people. Word has gotten around though, and they've run out of qualified candidates willing to work on their mess. Hell, we even had a couple of gifted hires leave after a month or two while shaking their heads.

My current workplaces main product is using the same tech, is the same size (loc) and has the same functionality of the other company, but serving a different market. They did the oracle to postgres migration in 2 months. 2 MAN months, one guy.

New workplace: 15ish developers, serving the same amount of customers, doing similar revenue, making stable releases every week

Old workplace: 80 developers at its peak, doing non-hotfix releases around every 3 months. Just a mess in every way. Mostly stemmed from the codebase and the architectural choices that had been made along the way.


Hey, sounds like we worked at the same place! That, or the "wedded to Oracle for life" is a common antipattern. I'd add "shared everything architecture" to the horrors.

Yeah, once you get that deeply entrenched in Oracle, it's almost impossible to get away, and after that experience I vowed never to work at another Oracle shop.


Projects rarely die because of technical debt. Instead, it becomes ridiculously expensive and difficult to add new features. But the software itself can remain in use for decades, gradually decaying and rarely adapting to changes in the business environment. Eventually either the software gets thrown out and replaced with something new, or the company is no longer able to compete.

I've seen this play out probably close to a dozen times now, at different employers and consulting clients.


Wow.

"The consequences of the failures were substantial. For the 212 incoming parent orders that were processed by the defective Power Peg code, SMARS sent millions of child orders, resulting in 4 million executions in 154 stocks for more than 397 million shares in approximately 45 minutes. Knight inadvertently assumed an approximately $3.5 billion net long position in 80 stocks and an approximately $3.15 billion net short position in 74 stocks. Ultimately, Knight realized a $460 million loss on these positions. "

https://www.sec.gov/litigation/admin/2013/34-70694.pdf


I'd say it wasn't due to technical debt, more a start-up like development approach to a company that trades millions within seconds in full automation. It sounds like the deployment process wasn't that complicated for a company of that size, but it was deployed without a single check by a second person.

If you're trading automatically, you'll need a very, very solid deployment and audit process, even if you're just a small company. The reason banks are so slow in deploying software is because most of them lost a few millions at some point due to some bug.

Startups that think they can act faster than banks just haven't had that bug yet. That's also why I'm rather negative on the whole Fintech scene at the moment.


> During the deployment of the new code, however, one of Knight’s technicians did not copy the new code to one of the eight SMARS computer servers.

Was the issue technical debt or a sloppy deployment?


I have seen a product getting killed by trying to resolve technical debt. The refactor took nine months and in the end didn't work better.

I am a big fan of constant refactoring on a small scale but I am very skeptical of large refactoring of a whole project. You may end up with something that's just different but not really better.


I've had the opposite happen every time the team I've been on decided to refactor a large portion (or even the entire code base). Every time, what was a source of constant bugs (i.e., X bugs per week, every week, never lessening), became tractable and moved to stable post the rewrite (X bugs first week, .7x bugs second week, etc, until finally we're encountering the odd bug only once every few months, if at all).

I'm not sure what the differentiator is. I'd be curious if others have ideas. I think part of it is that in both cases it was a small team, who caught the issues early enough that it hadn't gotten too bad yet, but late enough that the right direction to move in was clear.


The problem is when you need to change your "platform".

I worked on a 300k LOC business basic application at one point.

The big question everyone was asking is how do you move to something else? Everyone wanted something else, they started writing new services on top of the old system, they had some ideas on where to go, but it just didn't seem like a gradual rewrite was possible.

And to be honest, a Greenfield rewrite just wouldn't work work for something this size with the resources they had. So it stayed in business basic.


Except for one measure: Netscape died as a company. The huge rewrite contributed to killing it. If you don't ship a product (for like 4-6 years?) you're gonna die. Mozilla originally chose the name phoenix, (then firebird to avoid trademark problems, then finally firefox) was chosen because it was a phoenix rising from Netscape's ashes. Its major innovation: It was 'blazing fast' when compared to ie 5.5 / 6. Tabbed browsing was also pretty cool.

You can learn a lot of lessons from Netscape, but this isn't one of them. Servo is a great example of how a rewrite should / can work. Mozilla hasn't devoted 100% of resources to Servo, but instead is letting servo build all on its own, and someday unclearly defined in the future, the two could merge. (but might not!) It's a separate product, and nobody is pinning all their hopes and dreams on it.


I remember how long it took to release a stable version of Mozilla and Mozilla Phoenix. In the meantime, had to recompile newer releases all the time manually. There was no alternative browser on Linux or *NIX for that matter (OK, macOS still had MSIE).

The successor of Netscape Communicator was Mozilla (IIRC it was just called that, later renamed Mozilla SeaMonkey), and the successor of Netscape Navigator was Mozilla Phoenix (later renamed Mozilla Firebird and eventually Mozilla Firefox). Firefox and Thunderbird were once again separate clients.

Mozilla was still considered bloated, but Phoenix was far less bloated which is nice on lower RAM machines, and allowed the start of Web 2.0. It was also the return of doing one thing and doing it right: browsing the WWW. As Netscape Communicator (unlike its predecessor, Netscape Navigator) came with a Usenet client and e-mail client.

Later in development, addons became a thing, and you could add features which were previously part of Netscape Communicator such as calendar, HTML editor, etc. You can also add such features with addons to Mozilla Thunderbird.

Then Google Chrome happened, and people switched to that, but I'm not entirely sure why.


Also, Servo is just the engine. And modern web render engines are themselves highly modular. I think the Gecko engine powering Firefox have had its javascript interpreter replaced 2-3 times.

So when it comes to it, the most likely outcome will be a kind of "my grandfather's axe" scenario where over time parts of Servo replace Gecko within Firefox until Servo has completely replaced Gecko.


Then again Firefox was itself a strip down of a rewritten Netscape suite. Stripped down in that the suite included not just a browser but also a email client, IRC client and a HTML editor, and the UI was done using JS and XUL markup.

What Firefox devs did was to take the browser part, make it stand alone, and replace much of the XUL UI with native widgets (GTK on _nix).


I worked for a startup that had basically the right idea but proper execution took so long we ran out of runway. The first iteration of the product was built in an extremely haphazard, cowboy way - and took months, if not years, to refactor into something stable, usable and crash-proof. By the time the product was operational, the company was bankrupt. We simply hemorrhaged money until we bled to death.

As someone else pointed out - technological debt is not a cause per se; it's an indication of some deeper problem - usually of human, not technological, nature.


That may be a 'startup problem'. Do it cheap and coyboy because, runway. Assuming the money will come along later to do it all again. But that happens (lots of money later) only if you get bought out. Not if you have to make it on your own.

So any business plan that includes the steps "A miracle occurs" and then "We get bought out" is probably going to suffer that fate?


Our product, a large-scale enterprise software, is slowly getting killed. It's old and it's rather unusable (by the users). Plus, for "backward compatibility", it supports dozens of strange configurations. It's dragged down by so much technical debt (functions longer than 3000 lines with 60 parameters!) that every small changes requires so much time.

We're slowly killing (i.e. no big new developments, but only maintenance for existing customers) and abandoning it. And luckily we're not rewriting it. :-)


I've seen a whole company killed by technical debt. Because the software was written so badly far more developers had to be hired to firefight than the company could afford. The technical support team was similarly bloated to deal with the endless problems the customers had. Sales were low due to the bad reputation.

A rewrite was started, but never got anywhere. The company folded under the weight of its massive salary costs.


I've worked somewhere that died for a combination of reasons, one of which was an effect of technical debt and inappropriate outsourcing.

I don't think technical debt alone will kill you. But it may render you unable to cope with another problem, which will then kill you.


"Killed" as in "That was the reason it was ultimately replaced by a green field project, after ten years in the market" or "Killed" as in "It never shipped because it was so bogged down in technical debt we could not ship it"?

The former has happened to every project I know, which doesn't die for another reason (market disappearing, etc). The latter I have not experienced.


ITA software is a good example of a company that succeeded due to the collective technical debt across their competitors.

Though they only really succeeded on the shopping part. They didn't ever get to a credible booking engine that anyone would buy. Which may point to something other than tech debt being the biggest barrier to modernizing an airline reservation system.


I do agree that QPX was untouchable, but I still think tech debt in competitors was a major factor. There were plenty of smart people at your competitors...I'm sure graph search occured to them. I suspect efforts to green field that were squashed...nobody wanted to throw out the hairball they had because of the existing investment. Thus, they tried to "fix" what they already had...with obviously bad results.

Edit: And, worth mentioning that your competitors wouldn't have had to be better than, or even as good as QPX. "Good enough" would have squashed several big sales, since shopping was typically bundled in with what their customers already paid.


Their shopping/pricing engine was used in those two companies as well as others. It was wildy successful, but...you shop on ITA, and book elsewhere.

A booking engine (CRS/GDS) would be used by either airlines or a reservations system (Amadeus, Sabre, etc). That's the piece they didn't deliver on.

Edit: Reference to the announcement of abandoning the booking space: https://skift.com/2013/05/15/google-and-ita-software-abandon...

"This is indeed a bitter pill for ITA Software’s founders to swallow as they put years and millions of dollars into their dream to transform the nuts and bolts of the way airline reservations systems...are handled"


I think implementations are more often killed by technical debt rather than products or brands tbh.

Although at that point I wouldn't call it technical debt. If you've a million lines of spaghetti code, then you've a million lines of spaghetti code not technical debt. I.e. a camel is a camel. It's not a horse with technical debt.


Sort-of. I worked on a product that was a rewrite of an older product, and suffered second system syndrome.

The problem was that the original programmers didn't understand how to program with a database, and management was unwilling to address the core design flaws.

As a result, upper management told us we missed our market window and the project was killed. In reality, it was the technical debt from not understanding how to correctly write a data access layer that made us move too slowly to meet our market window.


Yes. Technical debt rendered it difficult to create features, and difficult to hire. The project dragged on as engineers came, tried to refactor and then left. I didn't stay.

My understanding is that it was never released, so all of the money the company put into the project was wasted.

This is maybe different from what people normally consider 'technical debt'. I don't mean just code aesthetics but also bugs, redundant code, and bad abstractions.


This is true: the firm I work for at the moment just completed buying an incredibly expensive Fortran-based platform. That's despite the fact it cannot integrate with our existing products without serious work, nobody knows Fortran here, and the original developer sold it so he could retire.

So even though it makes no technical sense it bolsters a gap in the product offering, and they'll have to find consultants to limp it along every time they need something small done that would otherwise be very cheap. It's all about the balancing act.


Sort of. I worked on a project that gradually lost customers due to instability and a lack of new features.

It's still around and still being maintained but it's a shadow of what it once was.

I've worked at multiple other companies which have gone through rounds of expensive rewrites.


Killed - no. But I worked on several products that weren't able to move fast enough because of it, and lost money as result.

One of those products was released half a year late and turned out to be a poor market fit. The company closed several months later. It could've used this half a year to complete a pivot with another product, which could have been successful.


Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>