Zig extension for Zephyr - better error handling

Apr 27, 2026

Among the things one can improve in the Zig extension for Zephyr RTOS series, one is how we handle the errors from Zephyr syscalls.

Being written in C, Zephyr doesn’t have many options to handle errors. The infamous errno is not used, instead Zephyr usually returns a negative number, whose value is usually one of the POSIX error numbers.

Zig instead has errors, so it makes sense that we use them.

Mapping error values

In Zig, one can use error sets to group all the errors a function can return. Take for instance std.posix and note how functions have an error set for their returns: accept has the AcceptError and the access family has AccessError. So this seems an approach we can reuse for Zephyr.

Also worth mentioning that all these sets include (or better, are merged with) the UnexpectedError set. Which is more of a Zig library issue, when it didn’t account for a possible error return from the call being mapped. We should probably have something similar for Zephyr too.

With all that in mind, we can start. Using k_sem_init as our first function, we see that in Zephyr it can return -EINVAL if the parameters are... invalid. Which is fair, but too broad. If we look at the code, we see a nice comment about the accepted values:

/*
 * Limit cannot be zero and count cannot be greater than limit
 */

And what is good about that is that we can check those restrictions on the Zig side! So we can start our error set:

pub const SemInitError = error{
    InvalidLimit,
    InvalidCount,
} || UnexpectedError;

Where UnexpectedError is:

pub const UnexpectedError = error{
    Unexpected,
};

Now, looking at the k_sem_init translation:

pub fn k_sem_init(sem: *struct_k_sem, initial_count: u32, limit: u32) i32 {
    if (comptime CONFIG_USERSPACE == 1) {
        if (z_syscall_trap()) {
            return @bitCast(arch_syscall_invoke3(@intFromPtr(sem), initial_count, limit, K_SYSCALL_K_SEM_INIT));
        }
    }

    compiler_barrier();
    return z_impl_k_sem_init(sem, initial_count, limit);
}

We see that it has two return points: one when running on user space, which goes the syscall way, and the other when running on kernel space, where it directly calls the implementation. We need to encapsulate that so we can simplify dealing with the return value. We could have another function, but then we remember that in Zig one can return values from normal blocks. So we refactor it to:

pub fn k_sem_init(sem: *struct_k_sem, initial_count: u32, limit: u32) SemInitError!void {
    if (limit == 0) return error.InvalidLimit;
    if (initial_count > limit) return error.InvalidCount;

    const ret: i32 = blk: {
        if (comptime CONFIG_USERSPACE == 1) {
            if (z_syscall_trap()) {
                break :blk @bitCast(arch_syscall_invoke3(@intFromPtr(sem), initial_count, limit, K_SYSCALL_K_SEM_INIT));
            }
        }

        compiler_barrier();
        break :blk z_impl_k_sem_init(sem, initial_count, limit);
    };

    switch (ret) {
        SUCCESS => return,
        -EINVAL => unreachable,
        else => |err| return unexpectedErrno(err),
    }
}

Let’s unpack that. First, the new signature returns either void or our error set SemInitError. That’s easy to understand: either we are successful in initialising the semaphore or we get an error.

Then, we did the validity checks on Zig, so that we can return better, finer grained errors, instead of the “invalid parameter” umbrella that is EINVAL:

if (limit == 0) return error.InvalidLimit;
if (initial_count > limit) return error.InvalidCount;

Next, we get the return of the syscall into our new ret variable, independently if it was via the syscall or the directly implementation route:

const ret: i32 = blk: {
    if (comptime CONFIG_USERSPACE == 1) {
        if (z_syscall_trap()) {
            break :blk @bitCast(arch_syscall_invoke3(@intFromPtr(sem), initial_count, limit, K_SYSCALL_K_SEM_INIT));
        }
    }

    compiler_barrier();
    break :blk z_impl_k_sem_init(sem, initial_count, limit);
};

In Zig, we use break to return values from blocks. Finally, we handle the return:

switch (ret) {
    SUCCESS => return,
    -EINVAL => unreachable,
    else => |err| return unexpectedErrno(err),
}

As we are checking the parameters before, we can be sure -EINVAL won’t be returned, hence the unreachable. That allows the compiler to optimise things away. Of course, if we were wrong, and -EINVAL can still be returned, we can land on undefined behaviour on production builds. But on safe and debug builds, Zig will still check we never reach the unreachable.

Astute readers may be wondering if SUCCESS is defined in Zephyr at all, to be used here. After all, only the errors are. And they would be right, we do need to define SUCCESS:

pub const SUCCESS = 0;

It seems success is not much ;-)

We’re also missing unexpectedErrno, that handles the case where we failed to properly identify the possible return values:

pub fn unexpectedErrno(err: i32) UnexpectedError {
    if (builtin.mode == .Debug) {
        printk("unexpected errno: %d\n", err);
    }

    return error.Unexpected;
}

Taking again inspiration from Zig standard library, we log the unexpected value in debug mode. Moving ahead, if we try to build, we’d get something like:

(...)k-ext1/src/main.zig:38:17: error: error union is ignored
    c.k_sem_init(my_sem, 0, 1);

So now we have to handle it. Zig provides us a few ways to do that. Here, the only possible error is not even possible, since the values we are passing are valid and actually static. Thus, we can just ignore it:

c.k_sem_init(my_sem, 0, 1) catch unreachable;

We can now move to the other functions. Most of the work is more of the same, we only need to use our brains to decide the best error set to map the errors from Zephyr and how to handle the error on our extension.

Some interesting cases

If interested, the complete code is available on my Zephyr fork. Here I’ll list some interesting cases I’ve found during this work.

k_sem_take

The case of k_sem_take is another one of “overloading” the error return value. Its documentation states that -EAGAIN will be returned if “Waiting period timed out, or the semaphore was reset during the waiting period”. This time, we can’t check that from Zig, so we have to ride along. In an attempt to be more meaningful, we can define our error set as:

pub const SemTakeError = error{
    Busy,
    TimedOutOrReset
} || UnexpectedError;

So we can be a bit more explicit about the error, but not by much. The switch that handles the errors is straightforward:

switch (ret) {
    SUCCESS => return,
    -EBUSY => return error.Busy,
    -EAGAIN => return error.TimedOutOrReset,
    else => |err| return unexpectedErrno(err),
}

register_subscriber

This is a pretty simple case, but what is interesting is that one has to look into the sample code to figure out what errors can be returned, as there’s no documentation for the sample API - it’s a sample, after all, and not something someone is expected to use.

After some digging, one will learn that:

-EINVAL is returned for invalid channels only;
-ENOENT is returned for invalid subscribers only;
-ENOMEM is returned if the number of subscribers would exceed the limit supported.

So it becomes natural to have the Zig counterpart be like:

pub const RegisterSubscriberError = error{
    InvalidChannel,
    InvalidSubscriber,
    TooManySubscribers,
} || UnexpectedError;

One could go a step further and make the InvalidChannel error impossible by properly translate the enum of channels to Zig, thus making it impossible to pass an invalid channel to start with. Left as an exercise to the reader.

But a simpler thing to check is if the channel is in the valid range (replicating the check done by C in Zig):

if (channel >= CHAN_LAST) return error.InvalidChannel;

So we can make it unreachable when handling the returned errors:

switch (ret) {
    SUCCESS => return,
    -EINVAL => unreachable,
    -ENOENT => return error.InvalidSubscriber,
    -ENOMEM => return error.TooManySubscribers,
    else => |err| return unexpectedErrno(err),
}

receive

For receive, we also have -EINVAL being returned for invalid channels, and we know how to handle that. It may also return -EINVAL if data is NULL, but in Zig we don’t have null pointers: only optional ones. So removing the optional from the signature should take care of that:

pub fn receive(channel: Channels, data: *anyopaque, data_len: usize) ReceiveError!void {

Finally -EINVAL can be returned if the buffer used to receive the data is too small. After all the parameter checks succeed, it will return the result of zbus_chan_read. Here, the only interesting bit is that -EFAULT can only be returned if CONFIG_ZBUS_ASSERT_MOCK is enabled, which is not in the EDK sample app. So the switch will look like:

switch (ret) {
    SUCCESS => return,
    -EINVAL => return error.ReceivingBufferTooSmall,
    -EBUSY => return error.BusyChannel,
    -EAGAIN => return error.TimedOut,
    -EFAULT => unreachable,
    else => |err| return unexpectedErrno(err),
}

For a ReceiveError:

pub const ReceiveError = error{
    InvalidChannel,
    ReceivingBufferTooSmall,
    BusyChannel,
    TimedOut,
} || UnexpectedError;

gpio_pin_configure

This one is pretty straightforward, but the issue is that we actually use gpio_pin_configure_dt, which got an automatic translation. So after we take care of gpio_pin_configure, we need to “re-translate” gpio_pin_configure_dt:

pub fn gpio_pin_configure_dt(spec: *const struct_gpio_dt_spec, extra_flags: gpio_flags_t) GPIOPinConfigureError!void {
    return gpio_pin_configure(spec.*.port, spec.*.pin, spec.*.dt_flags | extra_flags);
}

After that, we need to replace the original translation by the new one. Another quick’n’dirty hack to our build.sh:

DUPLICATE_TRANSLATED_FUNCTIONS=$(grep -Pro "(?<=pub fn )(\w+)" "$MANUAL_IMPORTS")

for function in $DUPLICATE_TRANSLATED_FUNCTIONS ; do
    sed -ri "/pub fn $function\>.*$/,/^}/d" $TRANSLATED_FILE
done

gpio_port_toggle_bits

This is the same case as before, in which the application actually calls a translated function, except that we need to take it a bit deeper: after we handle gpio_port_toggle_bits, we have to take care of gpio_pin_toggle_dt and gpio_pin_toggle.

But when taking care of gpio_pin_toggle, we face a small problem: Zig doesn’t allow using a u8 to shift a u32 value, only a u5. So we have to truncate the pin value:

pub fn gpio_pin_toggle(port: *const struct_device, pin: gpio_pin_t) GPIOPortToggleBitsError!void {
    return gpio_port_toggle_bits(port, @as(u32, 1) << @as(u5, @truncate(pin)));
}

(If someone ever gets to a pin number that doesn’t fit on u5, Zephyr would have problems before.)

Is there a catch?

Not really. The size of the final executable went from 4.0K to 4.2K. I didn’t compare the assembly files, but we now do some checks on the Zig side that were not done before, so it’s the traditional space vs speed: with the checks on Zig, we save calling into Zephyr for some checks that, if fail, will save us some effort. But if they don’t fail, one could argue that then we’re wasting cycles, as Zephyr will do those checks again. On the third side, we have better error messages on Zig, so I guess its a win!?

Naturally, these compromises can be decided on case by case manner. However, being able to write the extensions as true Zig - for some definition of true - instead of programming in C using Zig syntax, is definitely a win.

Again, the complete code is available on my Zephyr fork.

tags: zig, zephyr, zig-zephyr-series