Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

Ask HN: What open source project, in your opinion, has the highest code quality?

$
0
0

SQLite.

and for this reason alone!

https://www.sqlite.org/testing.html

    As of version 3.23.0 (2018-04-02), the SQLite library consists of approximately 
    128.9 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in 
    other words, lines of code excluding blank lines and comments.) 

    By comparison, the project has 711 times as much test code and test scripts - 
    91772.0 KSLOC.

Automated testing is useful and good. But I really feel it's reached a lever of fetishisation that is quite concerning.

Testing code is code which needs to be written, read, maintained, refactored. Very often nowadays I have to wade through tests which test nothing useful, except syntax. Even worse, with developers who adopt the mock-everything approach, I often find tests which only verify that the implementation is exactly the one they wrote, which is even worse: it makes refactoring a pain, because, even if you rewrote a method in a better way which produces exactly the results you wanted, the test will fail.

So, the ratio of testing code vs implementation code is a completely wrong proxy for code quality.

EDIT: I'm not criticising SQLite and their code quality - which I never studie - but the idea that you can judge code quality for a project just by the ratio of test code vs implementation code.


They actually have to test to that degree to follow aviation standards (DO-178b [0]) because they're used in aviation equipment.

Dr. Hipp said he started really following it when Android came out and included SQLite and suddenly there were 200M mobile SQLite users finding edge cases: https://youtu.be/Jib2AmRb_rk?t=3413

Lightly edited transcript here:

> It made a huge difference. That that was when Android was just kicking off. In fact Android might not have been publicly announced, but we had been called in to help with getting Android going with SQLite. [Actually], they had been publicly announced and there were a bunch of Android phones out and we were getting flooded with problems coming in from Android.

> I mean it worked great in the lab it worked great in all the testing and then [...] you give it to 200 million people and let them start clicking on their phone all day and suddenly bugs come up. And this is a big problem for us.

> So I started doing following this DO-178b process and it took a good solid year to get us there. Good solid year of 12 hour days, six days a week, I mean we really really pushed but we got it there. And you know, once we got SQLite to the point where it was at that DO-178b level, standard, we still get bugs but you know they're very manageable. They're infrequent and they don't affect nearly as many people.

> So it's been a huge huge thing. If you're writing an application deal ones, you know a website, a DO-178b/a is way overkill, okay? It's just because it's very expensive and very time-consuming, but if you're running an infrastructure thing like SQL, it's the only way to do it.

[0]: https://youtu.be/Jib2AmRb_rk?t=677 "SQLite: The Database at the Edge of the Network with Dr. Richard Hipp"


SQlite is very high quality software, but they use DO-178b "inspired" testing process. As far as I know they don't have version of software that is or can be used in safety critical parts despite their boasting.

They say in their site that:

> Airbus confirms that SQLite is being used in the flight software for the A350 XWB family of aircraft.

Flight software does not imply safety critical parts of avionics. It can be the entertainment system or some logging that is not critical.


Yeah DO-178B gives several levels for software from DALA (highest) to DALE (lowest). If DALA software fails the results are catastrophic if DALE fails there is no effect on the aircraft. Since DALE is usually just test equipment and such they might be at a DALD level. So still requires a lot of testing but not nearly to the level that DALA requires.

https://en.wikipedia.org/wiki/DO-178B


While I agree in general I disagree here. If you read about the Sqlite tests you will find that they do test sensibly.

One suite I'm particularly impressed with will run tests from zero bytes with slowly increasing available memory until the program passes. The tests verify that at no point the DB is corrupted by an OOM event.


A better comparison would be some sort of defect rate. Does SQLLite manage less defects per line of code per month (or whatever) than PostgreSQL with that test suite?

Is there a distinction between the best codebase and the best test suite? Probably.


I hold the source code of Go standard library & base distribution (i.e. compiler, etc.) in very high regard. Especially the standard library is, in my opinion, stunningly easy to read, explore and understand, while at the same time being well thought through, easy to use (great and astonishingly well documented APIs), of very good performance, and with huge amounts of (also well readable!) tests. The compiler (including the runtime library) is noticeably harder to read and understand (especially because of sparse comments and somewhat idiosyncratic naming conventions; that's partly explained by it being constantly in flux). But still doable for a human being, and I guess probably significantly easier than in most modern compilers. (Though I'd love to be proven wrong on this account!)

At the same time, the apparent simplicity should not be mistaken for lack of effort; on the contrary, I feel every line oozes with purpose, practicality, and to-the-point-ness, like a well sharpened knife, or a great piece of art where it's not about that you cannot add more, but that you cannot remove more.


> standard library is, in my opinion, stunningly easy to read

Reading this brought to mind the JDK. All well structured, neatly formatted and well documented. I’ll often just click thru to the source to get the nitty-gritty on a function, I rarely need to consult the actual docs!


Just wanted to +1 this.

Once had to make some changes to OpenSSH for an internal project and it was surprisingly easy to find the relevant code and make the necessary changes. One of the few times my code worked on the first compile.


How is the coreutils true faster?

I would expect the openbsd true to be the fastest, it doesn't need to spawn a subshell and it doesn't do more than the posix specification requires (afaik --help/--version should be ignored).


I did say “one of the goals”.

I don’t see how those examples are relevant. Why would that last one be faster?

I agree that the OpenBSD code here is good, no more and no less than needed.

I assumed the grandparent was referring to cases where an O(n) algorithm is used where it might be O(log n) or O(1) with just a little more effort. It’s a tradeoff, sure, and in some cases linear searches can work surprisingly well, but in general I think this kind of thing should always be considered in good code.

Micro-optimizations like inline assembly for inner loops may or may not be a good idea, depending on the application. All else being equal, I’d certainly agree that good clean code would not use assembly.


Its older than some HN posters, but the GPLed DOOM source code was one I liked.

The performance reached by the game was considered impossible until Carmack did show us otherwise. So I expected lots of ASM and weird hacks, especially as compiler optimization wasnt as good as it is today.

Surprise, surprise, the thing was easy to read, easy to get going, easy to port, reasonablye documented . It has shown me what a goog balance between nice code and usable code is.

If you want tho browse: https://github.com/id-Software/DOOM


I didnt have interrnet at the time so I didn't check github ;-)

On the more serous side, i wanted to say something about the TODOs as example of the balance, but couldnt find any. I thought i was confusing with quake, but the cleanup might explain it better.


NetBSD.

Why? I was able to do substantial changes to the kernel when I was a teenager (late 90s), mostly on my first try. There was no giant wall of abstraction I had to climb over or some huge swath of mutually interacting code I had to comprehend. There was also nothing that required fancy code navigation and the creation of something like the ctags database in order to find out what on earth was happening.

No action at a distance or lasagna style dereferencing or mysterious type names that are just typedef'd and #define'd around dozens of times back to something basic like char. No fancy obscure GNU preprocessor extensions or exotic programming patterns.

Nothing had obtuse documentation that tried my patience or required much more than enthusiasm and basic C knowledge.

I did things like got a wireless card working from code written for one with a similar chipset and got various other things like the IrDA transmitter on my laptop at the time to do a slattach and thus work as a primitive wireless network - all in the late 90s.

I likely had no idea what, say, the difference between network byte order and host byte order was at the time or how the 802.11b protocol worked or what a radiotap header was or any of that. The separation of concerns was so good however, that none of that knowledge was actually needed.

Compare that to say, the Qualcomm compatible WWAN I just dealt with over the past few weeks where I needed to have in-depth knowledge of an exhaustive number of things (very specific chipset and network details) to get a basic ipv4 address working. Then I needed to read up on GNSS technology and NMEA data to debug codes over USBmon to get the GPS from the wwan working. Then after I had the qmi kernel modules doing what I wanted and the qmi userland toolsets, I had to write some python scripts to talk to dbus to get the data from the modemmanager that I needed in order to log the GPS. All the maintainers of these pieces were very nice and helpful and I have nothing negative to say. This is just how it usually is these days.

Back then however, I wasn't a good programmer, I was likely pretty terrible in fact but with the NetBSD codebase I was able to knockout whatever I wanted every time, fast, on a 486.

I miss those days.


I have no idea, haven't kept up with it. I'd recommend 1.x (<=4) any day though, simply for the education alone.

I don't really use it these days because I need systems that future cheap devs can maintain and once you enter userland it takes commitment and time I simply don't have to stay with netbsd.

Debian permits me to usually not have to care and that's pretty invaluable


This is not necessarily about the code, but I've been really impressed for a while by the lodash project and its maintainer's dedication to constantly keep the number of open issues at 0. Any issues get dealt with at record speed, it's quite a sight to see.

https://github.com/lodash/lodash/issues


It's not clean at all. Thousands of different styles, no single convention on function-naming, etc.

Want a clean kernel, go look at the BSDs.


Python: I really like requests, scikit-learn, the Path module from the stardard library, Keras, Django.

C: Redis, SQLite, LUA.

Java: Joda Time, Guava


Joda Time is one of my all time favourite libraries.

After struggling with JVM stdlib time nonsense, JodaTime was a breath of fresh air and actually made programming with time fun.


Strictly talking about code quality, I will nominate RCP100, which is a small, virtually unknown, now-abandoned routing software written in C [0]. I started programming with C way back in the 90s, and this is one of only two projects I can recall being immediately struck by the beauty of the code (Redis being the other). I know almost nothing about the author but he seems not to want to be known by name. You can browse the source on Github [1], which I uploaded myself, since you can only get a tarball from sourceforge. Anyway, as someone else mentions, C is usually a mess, but RCP100 struck me as beautiful.

[0] http://rcp100.sourceforge.net/

[1] https://github.com/curtiszimmerman/rcp100


There are too many in very different domains and languages.

However, I opt for jQuery here. It is one of the greatest examples of how constant refactoring and thoughful usage of design pattern get you a very long way.

If you are designing JavaScript libraries, pls have a look at jQuery. So many great design decisions aka great code quality.


When we take the language into consideration, unwound like to mention Redis.

Often codebases written in C are a a mess to understand, a mess to read. The Redis Source Code is understandable even without deep knowledge of C


I definitely admired PostgreSQL's code when I first looked at it.

Projects written in C require a fair amount of care and discipline to be scaled up to larger codebases and teams. PostgreSQL is such a codebase.

I've also seen various parts of Spring's codebase and found all of it to be consistently solid and careful. They take a lot of care to structure carefully and comment immaculately.

Disclosure: I work for Pivotal, which sponsors Spring. Which is why Spring is highly visible in my working life.


Nice to see you include / mention docs and community. I believe a code-based product has a UX. That UX is the code (with comments), documentation and community. That UX is your (i.e., a dev / engineer) end to end experience with "the product." It's not simply the code.

Put another way, there's more to a product that's easy and sensible to work with than code quality.


Agreed. Almost every time I've looked deeply into stdlib code I was surprised by how hard to follow it is and how frequently antipatterns are employed. Doubly so for anything near a C module.

I consider the Python stdlib in a similar vein as the C++ stdlib or Boost: Yes, some useful bits in there, but (1) lots of rot (2) you don't want to have your code look anything like it.


I really liked the clojure core, I read it quite a lot when learning the language.

I have heard good things about sqlite, and some day, I plan to read it :-)


my first experience with high quality code was with tge quake2 engine.

i was both amazed by the simplicity of the architecture (a huge single event loop), and the attention to code presentation and indentation.


The open source code I know from web development has to be fixed with various hacks - PHP and the frontend javascript that goes with it. Therefore the code I know is not 'highest code quality'. If it was 'highest code quality' then I would not know the code.

Therefore the highest code quality is likely to be in projects where I do not have to go under the hood, e.g. the Chromium project where all contributors are vastly more educated and capable than myself.


Software quality:https://en.wikipedia.org/wiki/Software_quality

Software metric:https://en.wikipedia.org/wiki/Software_metric

''' Common software measurements include:

- Balanced scorecard - Bugs per line of code - Code coverage - Cohesion - Comment density[1] - Connascent software components - Constructive Cost Model - Coupling - Cyclomatic complexity (McCabe's complexity) - DSQI (design structure quality index) - Function Points and Automated Function Points, an Object Management Group standard[2] - Halstead Complexity - Instruction path length - Maintainability index - Number of classes and interfaces[citation needed] - Number of lines of code - Number of lines of customer requirements[citation needed] - Program execution time - Program load time - Program size (binary) - Weighted Micro Function Points - CISQ automated quality characteristics measures '''

Category:Software metricshttps://en.wikipedia.org/wiki/Category:Software_metrics


well, if trolling is permitted, I would say that "Hello World" example has the most exquisite code.

in most cases "Hello World" is open-source, but I still don't know if can be named "project"


It's funny to see nobody is even questioning the question.

What does it even hold as a value to be the project of the highest code quality in the world ? How can it exist as a consensus if we can't even agree on best practices ?

If it's for learning purposes, why even look for the ONE project with the HIGHEST quality ? Just go by any GOOD ENOUGH project.

I see this all the time: what's the best editor, the best color scheme, the best font, etc.

How about we just start saying: what's a good enough X for my purpose ?


Spring Framework

1. Elegant structure 2. Strict code style 3. Project size is not too large 4. Have detailed documentation


Just wanted to mention some bias in successful open source projects: they are often structured as a number of similar plug-in pieces, like youtube-dl for different video publishers.

This is great for open source, because you can easily discover and navigate to the part you want, and change it. You might need to understand the plugin interface - or you might not. This flat architecture makes it easy for people to contribute, an important aspect of a successful open source project.

But it's not the ideal architecture for every project. In some cases, a cleverer, harder to understand approach is more elegant, shorter, more efficient, simpler.

Of course... one might argue that ease of understanding is more important than anything else.


Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>