Blog | About

Writing a Zig extension for Zephyr - Part IV

Oct 12, 2025

In previous installment, we rewrote the kernel space extension from Zephyr EDK Sample in Zig. In this installment - the last one on this series - we’ll rewrite one of the userspace extensions!

We’ll look at ext1. Its code (at samples/subsys/llext/edk/ext1/src/main.c) is fairly straightforward: simply sets up the subscription, and loop waiting for events. So let’s dive in!

Hello world from userspace

As with the kernel space extension, let’s start simple. Borrowing from the kernel extension, we can write a simple “hello world!” one:

const c = @cImport({
    @cInclude("autoconf.h");
    @cInclude("zephyr/kernel.h");
    @cImport("app_api.h");
});

pub fn start() callconv(.c) c_int {
    c.printk("[zig][k-ext1]Hello world!\n");

    return 0;
}

const StartSym = extern struct {
    name: [*:0]const u8,
    addr: *const fn() callconv(.c) c_int,
};

export const start_sym: StartSym linksection(".exported_sym") = .{
    .name = "start",
    .addr = start,
};

And we can also borrow the build.sh used in the kernel extension (don’t forget to use ext1 instead of kext1):

#!/bin/bash

set -ex

LLEXT_ALL_INCLUDE_CFLAGS="-I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/include/generated/zephyr -I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/include -I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/include/generated -I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/soc/renode/cortex_r8_virtual -I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/lib/libc/common/include -I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/soc/renode/cortex_r8_virtual/. -I${LLEXT_EDK_INSTALL_DIR}/include/modules/hal/cmsis/CMSIS/Core_R/Include -I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/modules/cmsis/. -I${LLEXT_EDK_INSTALL_DIR}/include/modules/hal/ti/mspm0/source/ti/devices/msp/. -I${LLEXT_EDK_INSTALL_DIR}/include/modules/hal/ti/mspm0/source/ti/devices/msp/m0p -I${LLEXT_EDK_INSTALL_DIR}/include/modules/hal/ti/mspm0/source/ti/devices/msp/peripherals -I${LLEXT_EDK_INSTALL_DIR}/include/modules/hal/ti/mspm0/source/ti/devices/msp/peripherals/m0p -I${LLEXT_EDK_INSTALL_DIR}/include/modules/hal/ti/mspm0/source/ti/devices/msp/peripherals/m0p/sysctl -I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/build/modules/picolibc/picolibc/include -I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/build/modules/picolibc/picolibc/include -I${LLEXT_EDK_INSTALL_DIR}/include/sdk/zephyr-sdk-0.17.2/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.2.0/include -I${LLEXT_EDK_INSTALL_DIR}/include/sdk/zephyr-sdk-0.17.2/arm-zephyr-eabi/bin/../lib/gcc/arm-zephyr-eabi/12.2.0/include-fixed -I${LLEXT_EDK_INSTALL_DIR}/include/zephyr/samples/subsys/llext/edk/app/include"

zig build-obj -target thumb-freestanding-eabi $LLEXT_ALL_INCLUDE_CFLAGS -OReleaseSmall ../src/main.zig

arm-none-eabi-objcopy --remove-section .ARM.exidx main.o ext1.llext
xxd -ip ext1.llext ext1.inc

That we can also place in a zigbuild directory inside the extension one, samples/subsys/llext/edk/ext1. From there, we can build our extension:

 $ ./build.sh

And it builds! Can we run it?

As before, let’s edit samples/subsys/llext/edk/app/src/main.c so that it uses our Zig extension instead of the C one:

// (...)
#ifndef EDK_BUILD
#include "../../ext1/zigbuild/ext1.inc"
#define ext1_inc ext1_llext
#define ext1_len ext1_llext_len
// (...)

When we run it with west build -t run, we can see that in the log:

(...)
12:34:12.5935 [INFO] uart0: [host: 0.72s (+2.47ms)|virt:  41.3ms (+3.4ms)] [app]Thread 0x23a38 created to run extension [ext1], at userspace.
12:34:12.5938 [INFO] uart0: [host:  0.72s (+0.3ms)|virt:  41.7ms (+0.4ms)] [zig][ext1]Hello world!
12:34:12.5942 [INFO] uart0: [host: 0.72s (+0.36ms)|virt:  42.3ms (+0.6ms)] [app]Thread 0x23a38 done
(...)

It just works! Seriously? What about the syscall stuff?

Adding more features

Let’s try to write the k_object_alloc line. Borrowing (and adjusting) from our previous Zig extension:

// (...)
const obj: *anyopaque = c.z_impl_k_object_alloc(c.K_OBJ_EVENT) orelse {
    c.printk("[zig][ext1]z_impl_k_object_alloc failed!\n");
    return 1;
};
const tick_evt: [*c]c.k_event = @ptrCast(@alignCast(obj));
c.printk("[zig][ext1]Got tick_evt %p\n", tick_evt);
// (...)

After building it and running, we see things are not that simple:

(...)
12:40:33.0902 [INFO] uart0: [host:  0.7s (+1.84ms)|virt:  42.2ms (+3.4ms)] [app]Thread 0x23ad8 created to run extension [ext1], at userspace.
12:40:33.0908 [INFO] uart0: [host:  0.7s (+0.54ms)|virt:  42.7ms (+0.3ms)] E: ***** DATA ABORT *****
12:40:33.0913 [INFO] uart0: [host:   0.7s (+0.6ms)|virt:  43.2ms (+0.5ms)] E: Permission Fault @ 0x000242c8
12:40:33.0918 [INFO] uart0: [host: 0.71s (+0.51ms)|virt:  43.9ms (+0.7ms)] E: r0/a1:  0x00000004  r1/a2:  0x00000018  r2/a3:  0x00009d6d
12:40:33.0924 [INFO] uart0: [host: 0.71s (+0.56ms)|virt:    44.9ms (+1ms)] E: r3/a4:  0x000242c8 r12/ip:  0x0000a027 r14/lr:  0x00007bd1
12:40:33.0926 [INFO] uart0: [host: 0.71s (+0.25ms)|virt:  45.4ms (+0.5ms)] E:  xpsr:  0x20000130
12:40:33.0933 [INFO] uart0: [host: 0.71s (+0.71ms)|virt:    46.4ms (+1ms)] E: s[ 0]:  0x00000000  s[ 1]:  0x00000000  s[ 2]:  0x00000000  s[ 3]:  0x00000000
12:40:33.0940 [INFO] uart0: [host: 0.71s (+0.63ms)|virt:    47.4ms (+1ms)] E: s[ 4]:  0x00000000  s[ 5]:  0x00000000  s[ 6]:  0x00000000  s[ 7]:  0x00000000
12:40:33.0947 [INFO] uart0: [host: 0.71s (+0.65ms)|virt:    48.4ms (+1ms)] E: s[ 8]:  0x00000000  s[ 9]:  0x00000000  s[10]:  0x00000000  s[11]:  0x00000000
12:40:33.0955 [INFO] uart0: [host: 0.71s (+0.81ms)|virt:  50.2ms (+1.8ms)] E: s[12]:  0x00000000  s[13]:  0x00000000  s[14]:  0xffffffff  s[15]:  0xffffffff
12:40:33.0956 [INFO] uart0: [host: 0.71s (+0.18ms)|virt:  50.5ms (+0.3ms)] E: fpscr:  0x00000000
12:40:33.0961 [INFO] uart0: [host: 0.71s (+0.41ms)|virt:  51.1ms (+0.6ms)] E: Faulting instruction address (r15/pc): 0x0000af26
12:40:33.0969 [INFO] uart0: [host: 0.71s (+0.83ms)|virt:  51.7ms (+0.6ms)] E: >>> ZEPHYR FATAL ERROR 48: Unknown error on CPU 0
12:40:33.0973 [INFO] uart0: [host: 0.71s (+0.45ms)|virt:  52.2ms (+0.5ms)] E: Current thread: 0x23ad8 (unknown)
(...)

We’ve got some “permission fault”. Not unexpected though. We’re not in kernel space, so we can’t access data there. Simply calling the implementation of k_object_alloc won’t cut it now.

Trapping to the kernel

Taking a look at what Zephyr does during a syscall, we see the following code1:

__pinned_func
static inline void * k_object_alloc(enum k_objects otype)
{
#ifdef CONFIG_USERSPACE
    if (z_syscall_trap()) {
        union { uintptr_t x; enum k_objects val; } parm0 = { .val = otype };
        return (void *) arch_syscall_invoke1(parm0.x, K_SYSCALL_K_OBJECT_ALLOC);
    }
#endif
    compiler_barrier();
    return z_impl_k_object_alloc(otype);
}

As we are now in userspace, the code inside the #ifdef CONFIG_USERSPACE block is the one that matters. The if (z_syscall_trap()) checks if current thread lives on kernel or userspace. If the latter, then it uses arch_syscall_invoke1 to invoke the K_SYSCALL_K_OBJECT_ALLOC.

So that’s what we need to reproduce in our Zig extension. We don’t need to care about the ifdef or verifying if we are running in userspace: our extension certainly is.

An initial implementation for k_object_alloc may look like:

pub fn k_object_alloc(arg_otype: c.k_objects) ?*anyopaque {
    const ret: usize = arch_syscall_invoke1(arg_otype, c.K_SYSCALL_K_OBJECT_ALLOC);
    return @ptrFromInt(ret);
}

For the arch_syscall_invoke1 implementation, we can look at how Zephyr does it (code is at <zephyr-base>/include/zephyr/arch/arm/syscall.h):

static inline uintptr_t arch_syscall_invoke1(uintptr_t arg1,
                         uintptr_t call_id)
{
    register uint32_t ret __asm__("r0") = arg1;
    register uint32_t r6 __asm__("r6") = call_id;

    __asm__ volatile("svc %[svid]\n"
             IF_ENABLED(CONFIG_ARM_BTI, ("bti\n"))
             : "=r"(ret)
             : [svid] "i" (_SVC_CALL_SYSTEM_CALL),
               "r" (ret), "r" (r6)
             : "r8", "memory", "r1", "r2", "r3", "ip");
    return ret;
}

The equivalent Zig code for this looks like (let’s just ignore the ARM BTI for now):

pub fn arch_syscall_invoke1(arg1: usize, call_id: usize) usize {
    return asm volatile ("svc %[svid]\n"
        : [ret] "={r0}" (-> usize),
        : [svid] "i" (c._SVC_CALL_SYSTEM_CALL),
          [call_id] "{r6}" (call_id),
          [arg1] "{r0}" (arg1),
        : .{ .r8 = true, .memory = true, .r1 = true, .r2 = true, .r3 = true, .r12 = true }
    );
}

Note that the ip register is r12 on Zig - it doesn’t seem to recognize the alias. We can now use our local k_object_alloc implementation:

// (...)
const obj: *anyopaque = k_object_alloc(c.K_OBJ_EVENT) orelse {
    c.printk("[zig][ext1]z_impl_k_object_alloc failed!\n");
    return 1;
};
const tick_evt: [*c]c.k_event = @ptrCast(@alignCast(obj));
c.printk("[zig][ext1]Got tick_evt %p\n", tick_evt);
// (...)

And if we build and run, we can see:

17:34:51.2161 [INFO] uart0: [host: 0.71s (+1.95ms)|virt:  41.6ms (+3.4ms)] [app]Thread 0x23ab8 created to run extension [ext1], at userspace.
17:34:51.2173 [INFO] uart0: [host: 0.71s (+0.34ms)|virt:    43ms (+0.6ms)] [zig][ext1]Got tick_evt 0x2b060
17:34:51.2177 [INFO] uart0: [host: 0.71s (+0.74ms)|virt:  43.6ms (+0.6ms)] [app]Thread 0x23ab8 done

In the logs. It worked! Now, we’ll just need our own implementation for the other syscalls, namely k_event_init, k_event_wait, register_subscriber and receive. And any arch_syscall_invokeX needed. Let’s keep toiling!

Churning syscalls out

We’ll write the arch_syscall_invokeX first. We’ll need the two, three and five parameters ones:

pub fn arch_syscall_invoke2(arg1: usize, arg2: usize, call_id: usize) usize {
    return asm volatile ("svc %[svid]\n"
        : [ret] "={r0}" (-> usize),
        : [svid] "i" (c._SVC_CALL_SYSTEM_CALL),
          [call_id] "{r6}" (call_id),
          [arg1] "{r0}" (arg1),
          [arg2] "{r1}" (arg2),
        : .{ .r8 = true, .memory = true, .r2 = true, .r3 = true, .r12 = true }
    );
}

pub fn arch_syscall_invoke3(arg1: usize, arg2: usize, arg3: usize, call_id: usize) usize {
    return asm volatile ("svc %[svid]\n"
        : [ret] "={r0}" (-> usize),
        : [svid] "i" (c._SVC_CALL_SYSTEM_CALL),
          [call_id] "{r6}" (call_id),
          [arg1] "{r0}" (arg1),
          [arg2] "{r1}" (arg2),
          [arg3] "{r2}" (arg3),
        : .{ .r8 = true, .memory = true, .r3 = true, .r12 = true }
    );
}

pub fn arch_syscall_invoke5(arg1: usize, arg2: usize, arg3: usize, arg4: usize, arg5: usize, call_id: usize) usize {
    return asm volatile ("svc %[svid]\n"
        : [ret] "={r0}" (-> usize),
        : [svid] "i" (c._SVC_CALL_SYSTEM_CALL),
          [call_id] "{r6}" (call_id),
          [arg1] "{r0}" (arg1),
          [arg2] "{r1}" (arg2),
          [arg3] "{r2}" (arg3),
          [arg4] "{r3}" (arg4),
          [arg5] "{r4}" (arg5),
        : .{ .r8 = true, .memory = true, .r12 = true }
    );
}

And for all needed syscalls:

pub fn register_subscriber(arg_channel: c.Channels, arg_evt: *c.k_event) i32 {
    return @bitCast(arch_syscall_invoke2(arg_channel, @intFromPtr(arg_evt), c.K_SYSCALL_REGISTER_SUBSCRIBER));
}

pub fn k_event_init(arg_event: *c.k_event) void {
    _ = arch_syscall_invoke1(@intFromPtr(arg_event), c.K_SYSCALL_K_EVENT_INIT);
}

pub fn k_event_wait(arg_event: *c.k_event, arg_events: u32, arg_reset: bool, arg_timeout: c.k_timeout_t) u32 {
    const ticks: u64 = @bitCast(arg_timeout.ticks);
    const hi: u32 = @intCast(ticks >> 32);
    const lo: u32 = @intCast(ticks & 0xffffffff);

    return arch_syscall_invoke5(@intFromPtr(arg_event), arg_events, @intFromBool(arg_reset), lo, hi, c.K_SYSCALL_K_EVENT_WAIT);
}

pub fn receive(arg_channel: c.Channels, arg_data: ?*anyopaque, arg_data_len: usize) i32 {
    return @bitCast(arch_syscall_invoke3(arg_channel, @intFromPtr(arg_data), arg_data_len, c.K_SYSCALL_RECEIVE));
}

That was hard work! A hint here is that the prototypes for the functions are available on auto generated cimport.zig, but with callconv, extern and extra enum_ or struct_ in them. Note that sometimes we need @bitCast, as this is the more C-like cast on Zig: simply assume the other type. The @intCast is “safe”, in the sense that the value being cast must not change in the target type. Nice, but when interacting with C, we need to do as C.

Using the syscalls

We can finish writing our start function. Without the z_impl_ bits, it even looks a bit better than our kernel extension:

// (...)
k_event_init(tick_evt);

_ = register_subscriber(c.CHAN_TICK, tick_evt);

while (true) {
    var l: usize = undefined;

    c.printk("[zig][ext1]Waiting event\n");
    _ = k_event_wait(tick_evt, c.CHAN_TICK, true, K_FOREVER);

    c.printk("[zig][ext1]Got event, reading channel\n");
    _ = receive(c.CHAN_TICK, &l, @sizeOf(@TypeOf(l)));
    c.printk("[zig][ext1]Read val: %ld\n", l);
}
// (...)

Don’t forget to define K_FOREVER as we did for the kernel extension! When we build and run, we get this in the log:

(...)
17:58:43.1516 [INFO] uart0: [host: 0.74s (+0.62ms)|virt:  64.1ms (+0.8ms)] [zig][ext1]Got event, reading channel
17:58:43.1523 [INFO] uart0: [host: 0.74s (+0.68ms)|virt:  64.6ms (+0.5ms)] [zig][ext1]Read val: 0
17:58:43.1535 [INFO] uart0: [host: 0.74s (+1.14ms)|virt:  65.1ms (+0.5ms)] [zig][ext1]Waiting event
(...)
17:58:44.0134 [INFO] uart0: [host:  1.6s (+0.57ms)|virt:   1.07s (+0.9ms)] [zig][ext1]Got event, reading channel
17:58:44.0137 [INFO] uart0: [host:  1.6s (+0.28ms)|virt:   1.07s (+0.5ms)] [zig][ext1]Read val: 1
17:58:44.0139 [INFO] uart0: [host:  1.6s (+0.25ms)|virt:   1.07s (+0.4ms)] [zig][ext1]Waiting event
(...)
17:58:45.0131 [INFO] uart0: [host:  2.6s (+0.37ms)|virt:   2.07s (+0.9ms)] [zig][ext1]Got event, reading channel
17:58:45.0136 [INFO] uart0: [host:  2.6s (+0.41ms)|virt:   2.07s (+0.5ms)] [zig][ext1]Read val: 2
17:58:45.0142 [INFO] uart0: [host:   2.6s (+0.6ms)|virt:   2.07s (+0.4ms)] [zig][ext1]Waiting event
(...)
17:58:46.0144 [INFO] uart0: [host:  3.6s (+0.89ms)|virt:   3.07s (+0.9ms)] [zig][ext1]Got event, reading channel
17:58:46.0148 [INFO] uart0: [host:   3.6s (+0.5ms)|virt:   3.07s (+0.5ms)] [zig][ext1]Read val: 3
17:58:46.0151 [INFO] uart0: [host:  3.61s (+0.4ms)|virt:   3.07s (+0.4ms)] [zig][ext1]Waiting event
(...)
17:58:47.0154 [INFO] uart0: [host: 4.61s (+0.89ms)|virt:   4.07s (+0.9ms)] [zig][ext1]Got event, reading channel
17:58:47.0160 [INFO] uart0: [host:  4.61s (+0.5ms)|virt:   4.07s (+0.5ms)] [zig][ext1]Read val: 4
17:58:47.0164 [INFO] uart0: [host: 4.61s (+0.48ms)|virt:   4.07s (+0.4ms)] [zig][ext1]Waiting event
(...)

OMG!! It works!

Ziggy

Everything worked, but this is still just using the C API as is. Is there any opportunity to have things a bit more Zig-like, Zigonic, Ziggy? It turns out there is at least one low-hanging fruit.

Let’s look at k_object_alloc again. Using it is a bit cumbersome:

// (...)
const obj: *anyopaque = k_object_alloc(c.K_OBJ_EVENT) orelse {
    c.printk("[zig][ext1]z_impl_k_object_alloc failed!\n");
    return 1;
};
const tick_evt: [*c]c.k_event = @ptrCast(@alignCast(obj));
// (...)

We get this ?*anyopaque that we need to unwrap and cast the result to our desired type. This needs to be done on every call site - here there’s only one, but still. If Zig had some generic mechanism, we could have the cast be done inside the k_object_alloc. It turns out, Zig does have generics.

We can rewrite it as follows:

pub fn k_object_alloc(arg_otype: c.k_objects, comptime T: type) !*T {
    const ret: usize = arch_syscall_invoke1(arg_otype, c.K_SYSCALL_K_OBJECT_ALLOC);
    return if (ret == 0) error.OutOfMemory else @ptrFromInt(ret);
}

Now we ask for a type, and return a pointer to this type. And instead of using an optional, we simply return an error if the allocation failed. After all, a failed allocation is a... failure.

The callsite can be simplified a bit:

// (...)
const tick_evt: *c.k_event = k_object_alloc(c.K_OBJ_EVENT, c.k_event) catch {
    c.printk("[zig][ext1]z_impl_k_object_alloc failed!\n");
    return 1;
};
// (...)

We now use catch to handle the error instead of orelse. And we get directly our type. And while at it, we changed the type from a C pointer to a Zig pointer, which can’t be null. Much better!

One can also go a step further: instead of using the C enum to define the type of the allocation, create a map in Zig from the types to the C enum, and make k_object_alloc get a single parameter, the type (without the enum). This is left as an exercise to the reader =D

Where to now?

This concludes this series: we were able to write not one, but two Zephyr extensions in Zig. And one of them even runs on userspace!

What else can we do now? There are a few options. Having to write locally things the Zig compiler couldn’t translate is not ideal. Indeed, Zig documentation states that for these cases, instead of using @cImport, one should use the C Translation CLI, so that the .zig generated is not hidden in Zig’s cache, but is available for editions. I may take a look into that.

A problem with that approach is that as Zephyr keeps evolving, this Zig file may become obsolete. Ideally, this would live inside Zephyr upstream tree, with tests in CI, so that it could be kept up to date. But that’s some fair work to do.

This series didn’t touch any devices or Zephyr devicetree - the EDK sample conspicuously does not deal with them, but any realistic Zephyr application will deal with devices.

Another interesting question is why limit it to extensions? While it’s nice to have a clear separation, what if we want to write a Zephyr driver using Zig? Besides the .zig exposing the Zephyr API, one would also need to deal with building bits of Zephyr using the Zig compiler. That may not be easy - but as Zig is a full fledged C compiler, one approach would be to define Zig compiler as another Zephyr toolchain. That also looks an interesting avenue to explore.


tags: zig, zephyr

  1. This code lives at <llext-edk-dir>/include/zephyr/include/generated/zephyr/syscalls/kobject.h