Why Programming Is Hard

Posted Oct 29, 2024

By Michael de Lang 9 min read

Running into issues

Let me share you some of my regular run-ins with maintaining medium to large sized C++ codebases.

Have you ever run into a situation where you’re using a library or dependency for a while and an upgrade breaks it for you? Obviously, changes in API require refactoring, but I often run into issues with libraries that has only been tested in a narrow sense or upgrades the required toolchain regularly.

I have multiple examples to ~~rant about~~share:

Glaze

Glaze is a header-only c++ library to serialize and deserialize JSON (and more) without needing a lot of boilerplate, with performance that rivals simdjson. It is an amazing addition to the C++ ecosystem, provided you are on a new enough compiler that supports the C++23 standard.

Up to and including version 2.9.5, I happily used glaze in my own project. Upgrading it, however, gave me this beautiful error message (cue C++ error jokes):

In file included from test.cpp:1:
In file included from ../external/glaze/include/glaze/glaze.hpp:35:
In file included from ../external/glaze/include/glaze/binary.hpp:6:
In file included from ../external/glaze/include/glaze/binary/custom.hpp:6:
In file included from ../external/glaze/include/glaze/binary/read.hpp:10:
../external/glaze/include/glaze/core/refl.hpp:1474:19: error: constexpr variable 'keys_info<EtcdReply>' must be initialized by a constant expression
 1474 |    constexpr auto keys_info = make_keys_info(refl<T>.keys);
      |                   ^           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../external/glaze/include/glaze/core/refl.hpp:1481:35: note: in instantiation of variable template specialization 'glz::detail::keys_info' requested here
 1481 |          constexpr auto& k_info = keys_info<T>;
      |                                   ^
../external/glaze/include/glaze/core/refl.hpp:1477:34: note: while substituting into a lambda expression here
 1477 |    constexpr auto hash_info = [] {
      |                                  ^
../external/glaze/include/glaze/json/read.hpp:2138:60: note: in instantiation of variable template specialization 'glz::detail::hash_info' requested here
 2138 |                                                    && bool(hash_info<T>.type);
      |                                                            ^
../external/glaze/include/glaze/json/read.hpp:69:39: note: in instantiation of function template specialization 'glz::detail::from_json<EtcdReply>::op<opts{10, 1, 0, 1, 1, 1, 0, 0, 32, 3, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 32}, string_literal<1>{""}, EtcdReply &, glz::context &, const char *&, const char *&>' requested here
   69 |                from_json<V>::template op<Opts>(std::forward<T>(value), std::forward<Ctx>(ctx), std::forward<It0>(it),
      |                                       ^
../external/glaze/include/glaze/core/read.hpp:63:46: note: in instantiation of function template specialization 'glz::detail::read<10>::op<opts{10, 1, 0, 1, 1, 1, 0, 0, 32, 3, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 32}, EtcdReply &, glz::context &, const char *&, const char *&>' requested here
   63 |          detail::read<Opts.format>::template op<is_padded_on<Opts>()>(value, ctx, it, end);
      |                                              ^
../external/glaze/include/glaze/json/read.hpp:3155:14: note: in instantiation of function template specialization 'glz::read<opts{10, 1, 0, 1, 1, 1, 0, 0, 32, 3, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0}, EtcdReply, std::vector<unsigned char> &, glz::context &>' requested here
 3155 |       return read<opts{}>(value, std::forward<Buffer>(buffer), ctx);
      |              ^
test.cpp:17:21: note: in instantiation of function template specialization 'glz::read_json<EtcdReply, std::vector<unsigned char> &>' requested here
   17 |     auto err = glz::read_json(etcd_reply, body);
      |                     ^
../external/glaze/include/glaze/core/refl.hpp:989:41: note: non-constexpr constructor 'vector' cannot be used in a constant expression
  989 |       std::vector<std::vector<uint8_t>> cols(min_length);
      |                                         ^
../external/glaze/include/glaze/core/refl.hpp:1299:31: note: in call to 'find_unique_index<std::array<std::basic_string_view<char>, 2>>(keys)'
 1299 |       if (const auto uindex = find_unique_index(keys)) {
      |                               ^~~~~~~~~~~~~~~~~~~~~~~
../external/glaze/include/glaze/core/refl.hpp:1474:31: note: in call to 'make_keys_info<2UL>(keys)'
 1474 |    constexpr auto keys_info = make_keys_info(refl<T>.keys);
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/debug/vector:181:7: note: declared here
  181 |       vector(size_type __n, const _Allocator& __a = _Allocator())

What was the kicker? Glaze 3.0.0 and newer require containers to be constexpr. In my library, for debug builds, I am using _GLIBCXX_DEBUG to add expensive and ABI-breaking checks to c++’s STL like bounds and precondition checking. However, the implementation of this debug mode isn’t constexpr until gcc 14.

Now one can say the “ABI-breaking” part of the debug mode is a self-inflicted pain. But the debug mode has helped me immensely in finding bugs and issues of various kinds. It is one feature that I do not want to do without. Unfortunately, the maintainer of glaze has put the onus of supporting this use case solely on the compiler vendors. So I’m left with working around the issue somehow, enforcing a newer compiler requirement on my users or dropping the debug mode support.

spdlog

Spdlog is another great header-only C++ library that I love to use. It relatively recently made a change to the flush function, making it synchronous. On the surface, that looks like an innocent change that improves the library.

However, and I’m not trying to call anyone out here just showing how difficult programming is, the change forgot to add the new member variable to the VS2013 specific move constructor and assignment. Breaking compilation with VS2013. But that is not what I initially ran into. I compile Ichor with the -ftls-mode=local-exec flag. This is a bit of an esoteric flag, but it essentially boils down to only allowing the thread_local keyword being usable inside the final executable being linked, not in dynamically loaded libraries.

The aforementioned change uses the std::promise and std::future classes to implement waiting on the flush request being handled or discarded on the logging thread. the promise::set_value() function ends up calling std::call_once, which by its very nature requires a thread safe mechanism and by default uses thread-local storage. This results in the following cryptic error when compiling with the local-exec mode:

/usr/bin/ld: /opt/ichor/src/bin/libichor.a(async.cpp.o)(.text+0x266e): unresolvable R_X86_64_TPOFF32 relocation against symbol `_ZSt15__once_callable@@GLIBCXX_3.4.11'

The linker is essentially saying “look, you disabled thread-local storage outside of your executable but you’re trying to use a thread-local storage outside of your executable.”

libstdc++ does provide a way to disable this and use a mutex, but that requires users of my library to need a special compilation of gcc everywhere they want to use it in combination with spdlog.

But then, after some ~~stackoverflowing~~ ~~googling~~ rigorous research I discovered that there’s a long standing bug in gcc with std::call_once: exception support is broken. And fixing this is currently not possible, because it would break ABI and there are probably users relying on said ABI.

So where does the hard part of programming come into effect? Well, I made a pull request to fix this, in which I put a lot of effort into making a description of why I believe this change was needed and which decisions I made. However, from the resulting conversation, it is, in my eyes at least, apparent that the library maintainer is under time pressure and not able to put in the time and effort required to review it properly. Moreover, after merging, only then do I find out another issue: the move assignment function should swap the flush_callback member variable instead. Apparently, there is no test that catches this. And lastly, in THAT resulting PR, there is a basic misunderstanding of how move assignment works.

Improving stuff is hard for all parties involved.

fmt

A third issue I ran into is that whenever I compiled Ichor’s Etcd tests in release mode using clang 19.0.0 or gcc 12.4.0 with fmt 11.0.2 and called fmt::format_to deeply-nested inside multiple coroutines (jesus, that’s a long sentence), the program would get in a never-ending loop. After some investigation, it seems that sometimes the function overload resolution by the compiler failed, setting a function pointer to completely wrong function that (in this case “luckily”) never ended up growing the capacity of the underlying container. So that is not a security bug AFAIK, but the printing function in fmt has an implicit assumption that the grow function always grows the capacity with a minimum of 1. But it didn’t.

I went for the stupid workaround: forking fmt and renaming functions by adding a number in their name, so that overload resolution always works correctly.

What do you do when your compiler fails you?

Where does it come from?

After these exchange, I started thinking to myself “how can it be that maintainers of otherwise high-quality libraries mess up?” Which is a question I see floating around in the programming community and on in the workplace a lot: “Why is this so hard?” Which is IMO a pitfall that human thinking falls into regularly. Programmers are known to underestimate effort for projects.

Let’s list a number of contributing factors in the issues I ran into:

Different projects use different testing methodologies/standards.
- Projects either do not have the knowledge or the resources to test for all available combinations of deployments.
- Creating exhaustive tests requires knowing what to test, which is not a given.
Best practices are not shared/used by enough people to achieve enough penetration and reduce defects. (or, maybe we have too many new developers coming in each year)
PR reviews are very time consuming and humans optimize ourselves towards being as lazy as possible.
Human knowledge deteriorates unless it is constantly being excersized.
Knowing the ins-and-outs of C++ is nigh-on-impossible for one person, given that it also includes compiler specific implementations & bugs or how executables work in operating systems (e.g. the ELF spec).
- One can say this also applies to other languages like C#, where knowledge of JIT and MSIL specifications are required to get the best performance, or when dealing with IL rewriting.

Every project has a different collection of knowledge through which it looks at solving problems. Adding people changes this, but no project will deal with the same set of circumanstances, knowledge and experiences as any other project.

Given these factors, one could become cynical and determine that inventing everything yourself, creating your own pool of knowledge (and problems), is the way forward.

What could be done?

One of the more recent, and in my opinion very positive, articles on C++ safety is the Compiler Options Hardening Guide for C and C++ guide, by the OpenSSF. Jason Turner has a great resource on C++ best practices, which I can also recommend.

However, I think the underlying issue is that programming just is hard. Period. Improving tooling and reducing fragmentation is the best solution I can come up with. I can’t wait until the Safe C++ proposal gets wide adoption.

It would be nice if there was a single organisation that went through all open-source projects and tried to uplift the quality. Setup CI/CD pipelines for a large set of architectures, OSes, compilation flags. Give suggestions on which tests are missing, what possible bugs there are.

What do you think? Can we make programming easier?

programming

This post is licensed under CC BY 4.0 by the author.

Running into issues

Where does it come from?

What could be done?

Trending Tags