2023-11-27
I happen to think C
is a beautiful language. I also
happen to have deep intrusive thoughts about macros. Recently (a year or
so ago) I came to know about certain compiler extensions that
gcc
provides, namely 1) nested functions and 2) statement
expressions. By themselves, those are rather innocuous constructs:
{/*...*/}
into a value by doing ({/*...*/})
;
and the value that expression takes is the value of the last expression
in the statement (with some rules to account for when that statement
doesn’t have a value etc.)Statement expressions, specially, are used in the Linux Kernel—inside macros—to sanitize inputs and avoid macros evaluating an input more than once, leading to weird side-effects.
Put together though, nothing stops you from writing something like
({
void* some_function(void*) {
/* some crazy stuff */
}
;
some_function})
This fellow is a sort of anonymous function, in that you cannot refer
to that function by name anywhere outside of that statement (you get a
proper undefined reference, which is somewhat surprising considering how
the linker eagerly gives you symbols that someone forgot to mark as
static
). Now, this is rather tame, we need more
#define
s to make this spicy interesting.
Consider the following incantation:
#define λ(Ret, Args, Body) ({Ret λ Args Body λ;})
Which I believe is wonderfully disruptive. It doesn’t even look like C. Anyway, unfortunately I cannot lay claim to this spell. I’ve seen it referenced on Stack Overflow 1
Such lambda constructions have two major limitations: their lifetime and their closure’s lifetime. A nested function may not be called after its containing function has exited—the online docs state “all hell breaks loose”. And if that function refers to a variable that has gone out of scope by the time the function is called, then you’re probably on thin ice.
Anyway, let’s play with them
(int, (int a, int b), {return a + b;})(10, 5); λ
works perfectly fine, for instance (pay no attention to the warning)
/usr/bin/ld: warning: /tmp/ccxTpcUr.o: requires executable stack (because the .note.GNU-stack section is executable)
The linker was written by bigots and regressive old farts.
These little buggers are actually somewhat useful for ad-hoc
comparison functions for functions like qsort
or
bsearch
etc.
An interesting side-effect of lambdas—one that I can actually claim
as my own (as far as I know no one had this combination of bad
ideas all at the same time nope, someone had it before)2, for once—is that you can take a
list of types and turn them into term that (conceptually and very
tenuously) represents the product of those types. Why would you do this?
Well, because it lets us do overloading:
Suppose you can take a couple of expressions x_1
…
x_n
, there are incantations (say,
__typeof__(x_1)
) to get the type of an expression (in an
unevaluated context!) and we can abuse this to write
λ(void, (TYPEOF(x_1),...,TYPEOF(x_n)), {})
which is a
perfectly valid term, and is even callable. Now, to do this
variadically, we need to kneed the preprocessor a bit and make use of
some awful techniques but we’ll get there when it is relevant.
For now, let us call the lambda-expression above something like
TERMINAL(x_1,...,x_n)
because it acts like the terminal
arrow. Don’t worry about it. Anyway, moving on, we use a standard
feature for a change: the recently added _Generic
macro
which takes a value and effectively yields a janky switch statement on
its type:
_Generic((some_var),
int: "int",
char: "char",
: "banana struct"
my_struct_t)
You probably know where this is going:
_Generic((TERMINAL(...)),
void(*)(some, args): overload_some_args,
void(*)(other, args): overload_other_args
)
We can use TERMINAL
’s value to query the collective type
of all the arguments at the same time. There are some technicalities.
Let us go on a tangent:
I’ve come across this cute post by David Mazières 3 that made me realize I had all the things I needed for function overloading with cursed macros. I’ve copied the relevant portions for our purposes but I recommend you give David’s post a read nonetheless.
#define PARENS ()
#define EXPAND(...) EXPAND4(EXPAND4(EXPAND4(EXPAND4(__VA_ARGS__))))
#define EXPAND4(...) EXPAND3(EXPAND3(EXPAND3(EXPAND3(__VA_ARGS__))))
#define EXPAND3(...) EXPAND2(EXPAND2(EXPAND2(EXPAND2(__VA_ARGS__))))
#define EXPAND2(...) EXPAND1(EXPAND1(EXPAND1(EXPAND1(__VA_ARGS__))))
#define EXPAND1(...) __VA_ARGS__
#define FOR_EACH(macro, ...) \
__VA_OPT__(EXPAND(FOR_EACH_HELPER(macro, __VA_ARGS__)))
#define FOR_EACH_HELPER(macro, a1, ...) \
macro(a1) \
__VA_OPT__(FOR_EACH_AGAIN PARENS (macro, __VA_ARGS__))
#define FOR_EACH_AGAIN() FOR_EACH_HELPER
This specific ritual just sets up a recursive macro
FOR_EACH
which takes in the name of a macro function and
applies it to the rest of the arguments you pass it. The crazy
EXPAND
repetition is due to the fact that macro expansion
can’t really go on forever, and this gives you sufficient depth that I
don’t believe we will need to worry about running out. Consider the
following:
#define TYPEOF(x) __typeof__((x)),
Which gives us TYPEOF(1)
→ int,
4. Obviously the comma next to
int
is going to be a problem. Why do we put the comma in to
begin with? Because we want to pass TYPEOF
to the
FOR_EACH
macro, and we want to get a list of types so we
can write TERMINAL
for real. To handle that comma, we can
just conjure up a dummy argument type and put it at the end of the
function.
typedef struct dummy {} dummy;
#define TERMINAL(...) λ(void, (FOR_EACH(TYPEOF, __VA_ARGS__) dummy), {})
so TERMINAL(x,y,z)
is a function pointer of a function
that takes in values of the same types as x
, y
and z
and yields void. Now write
#define CASE(...) void (*)(__VA_ARGS__ dummy)
We are ready: Suppose we have a string_view_t
type which
can be built from a null-terminated string, or from a buffer and a size.
We wish to write a constructor-like operation string_view
that takes some parameters overloads to string_view_str
and
string_view_nstr
for the null-terminated and for
bounded-buffer cases respectively.
#define string_view(...) _Generic(TERMINAL(__VA_ARGS__), \
CASE(char const*): string_view_str, \
CASE(char const*, size_t): string_view_nstr \
)(__VA_ARGS__)
All of this is deeply deeply cursed. As mentioned, lambdas are a gateway and a fast-track to UB and, even worse, functional programming. We like our programming dysfunctional, and this is perfect for the task though, so I will allow it. In terms of language lawyering, I imagine there isn’t anything particularly wrong with this, the lambda is never called, the argument types get synthesized in an unevaluated context, so… it seems fine.
I wouldn’t recommend putting this up for code-review, but then again if you’re at a place that does raw C like this and also has code review you probably have a wonderful job and I envy you.
The biggest problem is that it’s GNU-C specific heavy-magic. Clang doesn’t seem to like my code here, and I can’t blame it.
I have tested it and it works on my machine™ on my compiler™ (gcc (Debian 12.2.0-14) 12.2.0), but I’ve modified it for brevity and exposition, and so I haven’t actually tested to see if I mistyped or made any mistakes. It should be simple to fix.
Title5
User “ouah”’s answer on Stack Overflow implies the trick has been known since at least May 2012, but as it isn’t treated as anything new or remarkable. Probably pointing towards the fact that the old sages were long aware of this cantrip.↩︎
I stand corrected. Cosinus on Stack Overflow did it, apparently, did in May 2023 link.↩︎
David Mazières, June 2021, “Recursive macros with C++20 __VA_OPT__”↩︎
This is a bit wrong, it doesn’t
expand to int
strictly speaking, but
__typof__(1)
is the same type as
int
.↩︎