Some (GNU) C Heavy Wizardry

José Goudet Alvim

2023-11-27

We have λs at Home

I happen to think C is a beautiful language. I also happen to have deep intrusive thoughts about macros. Recently (a year or so ago) I came to know about certain compiler extensions that gcc provides, namely 1) nested functions and 2) statement expressions. By themselves, those are rather innocuous constructs:

  1. Nested functions permit function declarations and implementations to occur within other function’s blocks, wherever a variable declaration would be allowed, and being such that the variables visible at the point of definition remain visible within the nested function’s scope. Calling a nested function after its parent’s scope has exited is ill-advised.
  2. A statement can be turned into an expression (something that evaluates to a value) with the appropriate incantations. Namely this means that you can turn some code block {/*...*/} into a value by doing ({/*...*/}); and the value that expression takes is the value of the last expression in the statement (with some rules to account for when that statement doesn’t have a value etc.)

Statement expressions, specially, are used in the Linux Kernel—inside macros—to sanitize inputs and avoid macros evaluating an input more than once, leading to weird side-effects.

Put together though, nothing stops you from writing something like

({
	void* some_function(void*) {
		/* some crazy stuff */
	}
	some_function;
})

This fellow is a sort of anonymous function, in that you cannot refer to that function by name anywhere outside of that statement (you get a proper undefined reference, which is somewhat surprising considering how the linker eagerly gives you symbols that someone forgot to mark as static). Now, this is rather tame, we need more #defines to make this spicy interesting. Consider the following incantation:

#define λ(Ret, Args, Body) ({Ret λ Args Body λ;})

Which I believe is wonderfully disruptive. It doesn’t even look like C. Anyway, unfortunately I cannot lay claim to this spell. I’ve seen it referenced on Stack Overflow 1

Such lambda constructions have two major limitations: their lifetime and their closure’s lifetime. A nested function may not be called after its containing function has exited—the online docs state “all hell breaks loose”. And if that function refers to a variable that has gone out of scope by the time the function is called, then you’re probably on thin ice.

Anyway, let’s play with them

λ(int, (int a, int b), {return a + b;})(10, 5);

works perfectly fine, for instance (pay no attention to the warning)

/usr/bin/ld: warning: /tmp/ccxTpcUr.o: requires executable stack (because the .note.GNU-stack section is executable)

The linker was written by bigots and regressive old farts.

These little buggers are actually somewhat useful for ad-hoc comparison functions for functions like qsort or bsearch etc.

Overloading

An interesting side-effect of lambdas—one that I can actually claim as my own (as far as I know no one had this combination of bad ideas all at the same time nope, someone had it before)2, for once—is that you can take a list of types and turn them into term that (conceptually and very tenuously) represents the product of those types. Why would you do this? Well, because it lets us do overloading:

Suppose you can take a couple of expressions x_1x_n, there are incantations (say, __typeof__(x_1)) to get the type of an expression (in an unevaluated context!) and we can abuse this to write λ(void, (TYPEOF(x_1),...,TYPEOF(x_n)), {}) which is a perfectly valid term, and is even callable. Now, to do this variadically, we need to kneed the preprocessor a bit and make use of some awful techniques but we’ll get there when it is relevant.

For now, let us call the lambda-expression above something like TERMINAL(x_1,...,x_n) because it acts like the terminal arrow. Don’t worry about it. Anyway, moving on, we use a standard feature for a change: the recently added _Generic macro which takes a value and effectively yields a janky switch statement on its type:

_Generic((some_var), 
	int: "int",
	char: "char",
	my_struct_t: "banana struct"
)

You probably know where this is going:

_Generic((TERMINAL(...)), 
	void(*)(some, args): overload_some_args,
	void(*)(other, args): overload_other_args
)

We can use TERMINAL’s value to query the collective type of all the arguments at the same time. There are some technicalities. Let us go on a tangent:

For Each According to Their Macro

I’ve come across this cute post by David Mazières 3 that made me realize I had all the things I needed for function overloading with cursed macros. I’ve copied the relevant portions for our purposes but I recommend you give David’s post a read nonetheless.

#define PARENS ()

#define  EXPAND(...) EXPAND4(EXPAND4(EXPAND4(EXPAND4(__VA_ARGS__))))
#define EXPAND4(...) EXPAND3(EXPAND3(EXPAND3(EXPAND3(__VA_ARGS__))))
#define EXPAND3(...) EXPAND2(EXPAND2(EXPAND2(EXPAND2(__VA_ARGS__))))
#define EXPAND2(...) EXPAND1(EXPAND1(EXPAND1(EXPAND1(__VA_ARGS__))))
#define EXPAND1(...) __VA_ARGS__

#define FOR_EACH(macro, ...)                                    \
  __VA_OPT__(EXPAND(FOR_EACH_HELPER(macro, __VA_ARGS__)))
#define FOR_EACH_HELPER(macro, a1, ...)                         \
  macro(a1)                                                     \
  __VA_OPT__(FOR_EACH_AGAIN PARENS (macro, __VA_ARGS__))
#define FOR_EACH_AGAIN() FOR_EACH_HELPER

This specific ritual just sets up a recursive macro FOR_EACH which takes in the name of a macro function and applies it to the rest of the arguments you pass it. The crazy EXPAND repetition is due to the fact that macro expansion can’t really go on forever, and this gives you sufficient depth that I don’t believe we will need to worry about running out. Consider the following:

#define TYPEOF(x) __typeof__((x)),

Which gives us TYPEOF(1)int,4. Obviously the comma next to int is going to be a problem. Why do we put the comma in to begin with? Because we want to pass TYPEOF to the FOR_EACH macro, and we want to get a list of types so we can write TERMINAL for real. To handle that comma, we can just conjure up a dummy argument type and put it at the end of the function.

typedef struct dummy {} dummy;
#define TERMINAL(...) λ(void, (FOR_EACH(TYPEOF, __VA_ARGS__) dummy), {})

so TERMINAL(x,y,z) is a function pointer of a function that takes in values of the same types as x, y and z and yields void. Now write

#define CASE(...) void (*)(__VA_ARGS__ dummy)

We are ready: Suppose we have a string_view_t type which can be built from a null-terminated string, or from a buffer and a size. We wish to write a constructor-like operation string_view that takes some parameters overloads to string_view_str and string_view_nstr for the null-terminated and for bounded-buffer cases respectively.

#define string_view(...) _Generic(TERMINAL(__VA_ARGS__), \
    CASE(char const*):         string_view_str,          \
    CASE(char const*, size_t): string_view_nstr          \
)(__VA_ARGS__)

Caveats

All of this is deeply deeply cursed. As mentioned, lambdas are a gateway and a fast-track to UB and, even worse, functional programming. We like our programming dysfunctional, and this is perfect for the task though, so I will allow it. In terms of language lawyering, I imagine there isn’t anything particularly wrong with this, the lambda is never called, the argument types get synthesized in an unevaluated context, so… it seems fine.

I wouldn’t recommend putting this up for code-review, but then again if you’re at a place that does raw C like this and also has code review you probably have a wonderful job and I envy you.

The biggest problem is that it’s GNU-C specific heavy-magic. Clang doesn’t seem to like my code here, and I can’t blame it.

I have tested it and it works on my machine™ on my compiler™ (gcc (Debian 12.2.0-14) 12.2.0), but I’ve modified it for brevity and exposition, and so I haven’t actually tested to see if I mistyped or made any mistakes. It should be simple to fix.

References & Footnotes

Title5


  1. User “ouah”’s answer on Stack Overflow implies the trick has been known since at least May 2012, but as it isn’t treated as anything new or remarkable. Probably pointing towards the fact that the old sages were long aware of this cantrip.↩︎

  2. I stand corrected. Cosinus on Stack Overflow did it, apparently, did in May 2023 link.↩︎

  3. David Mazières, June 2021, “Recursive macros with C++20 __VA_OPT__”↩︎

  4. This is a bit wrong, it doesn’t expand to int strictly speaking, but __typof__(1) is the same type as int.↩︎

  5. Refers to Heavy Wizardry

    ↩︎