Chapter 6. Beyond Standard Rust

The Rust toolchain includes support for a much wider variety of environments than just pure Rust application code, running in userspace:

These nonstandard Rust environments can be harder to work in and may be less safe—they can even be unsafe—but they give more options for getting the job done.

This chapter of the book discusses just a few of the basics for working in these environments. Beyond these basics, you’ll need to consult more environment-specific documentation (such as the Rustonomicon).

Item 33: Consider making library code no_std compatible

Rust comes with a standard library called std, which includes code for a wide variety of common tasks, from standard data structures to networking, from multithreading support to file I/O. For convenience, several of the items from std are automatically imported into your program, via the prelude: a set of common use statements that make common types available without needing to use their full names (e.g., Vec rather than std::vec::Vec).

Rust also supports building code for environments where it’s not possible to provide this full standard library, such as bootloaders, firmware, or embedded platforms in general. Crates indicate that they should be built in this way by including the #![no_std] crate-level attribute at the top of src/lib.rs.

This Item explores what’s lost when building for no_std and what library functions you can still rely on—which turns out to be quite a lot.

However, this Item is specifically about no_std support in library code. The difficulties of making a no_std binary are beyond this text,1 so the focus here is how to make sure that library code is available for those poor souls who do have to work in such a minimal environment.

core

Even when building for the most restricted of platforms, many of the fundamental types from the standard library are still available. For example, Option and Result are still available, albeit under a different name, as are various flavors of Iterator.

The different names for these fundamental types start with core::, indicating that they come from the core library, a standard library that’s available even in the most no_std of environments. These core:: types behave exactly the same as the equivalent std:: types, because they’re actually the same types—in each case, the std:: version is just a re-export of the underlying core:: type.

This means that there’s a quick and dirty way to tell if a std:: item is available in a no_std environment: visit the doc.rust-lang.org page for the std item you’re interested in and follow the “source” link (at the top right).2 If that takes you to a src/core/…​ location, then the item is available under no_std via core::.

The types from core are available for all Rust programs automatically. However, they typically need to be explicitly used in a no_std environment, because the std prelude is absent.

In practice, relying purely on core is too limiting for many environments, even no_std ones. A core (pun intended) constraint of core is that it performs no heap allocation.

Although Rust excels at putting items on the stack and safely tracking the corresponding lifetimes (Item 14), this restriction still means that standard data structures—vectors, maps, sets—can’t be provided, because they need to allocate heap space for their contents. In turn, this also drastically reduces the number of available crates that work in this environment.

alloc

However, if a no_std environment does support heap allocation, then many of the standard data structures from std can still be supported. These data structures, along with other allocation-using functionality, are grouped into Rust’s alloc library.

As with core, these alloc variants are actually the same types under the covers. For example, the real name of std::vec::Vec is actually alloc::vec::Vec.

A no_std Rust crate needs to explicitly opt in to the use of alloc, by adding an extern crate alloc; declaration to src/lib.rs:3

//! My `no_std` compatible crate.
#![no_std]

// Requires `alloc`.
extern crate alloc;

Pulling in the alloc crate enables many familiar friends, now addressed by their true names:

With these things available, it becomes possible for many library crates to be no_std compatible—for example, if a library doesn’t involve I/O or networking.

There’s a notable absence from the data structures that alloc makes available, though—the collections HashMap and HashSet are specific to std, not alloc. That’s because these hash-based containers rely on random seeds to protect against hash collision attacks, but safe random number generation requires assistance from the operating system—which alloc can’t assume exists.

Another notable absence is synchronization functionality like std::sync::Mutex, which is required for multithreaded code (Item 17). These types are specific to std because they rely on OS-specific synchronization primitives, which aren’t available without an OS. If you need to write code that is both no_std and multithreaded, third-party crates such as spin are probably your only option.

Writing Code for no_std

The previous sections made it clear that for some library crates, making the code no_std compatible just involves the following:

  • Replacing std:: types with identical core:: or alloc:: crates (which requires use of the full type name, due to the absence of the std prelude)

  • Shifting from HashMap/HashSet to BTreeMap/BTreeSet

However, this only makes sense if all of the crates that you depend on (Item 25) are also no_std compatible—there’s no point in becoming no_std compatible if any user of your crate is forced to link in std anyway.

There’s also a catch here: the Rust compiler will not tell you if your no_std crate depends on a std-using dependency. This means that it’s easy to undo the work of making a crate no_std compatible—all it takes is an added or updated dependency that pulls in std.

To protect against this, add a CI check for a no_std build so that your CI system (Item 32) will warn you if this happens. The Rust toolchain supports cross-compilation out of the box, so this can be as simple as performing a cross-compile for a target system that does not support std (e.g., --target thumbv6m-none-eabi); any code that inadvertently requires std will then fail to compile for this target.

So: if your dependencies support it, and the simple transformations above are all that’s needed, then consider making library code no_std compatible. When it is possible, it’s not much additional work, and it allows for the widest reuse of the library.

If those transformations don’t cover all of the code in your crate but the parts that aren’t covered are only a small or well-contained fraction of the code, then consider adding a feature (Item 26) to your crate that turns on just those parts.

Such a feature is conventionally named either std, if it enables use of std-specific functionality:

#![cfg_attr(not(feature = "std"), no_std)]

or alloc, if it turns on use of alloc-derived functionality:

#[cfg(feature = "alloc")]
extern crate alloc;

Note that there’s a trap for the unwary here: don’t have a no_std feature that disables functionality requiring std (or a no_alloc feature similarly). As explained in Item 26, features need to be additive, and there’s no way to combine two users of the crate where one configures no_std and one doesn’t—the former will trigger the removal of code that the latter relies on.

As ever with feature-gated code, make sure that your CI system (Item 32) builds all the relevant combinations—including a build with the std feature disabled on an explicitly no_std platform.

Fallible Allocation

The earlier sections of this Item considered two different no_std environments: a fully embedded environment with no heap allocation whatsoever (core) and a more generous environment where heap allocation is allowed (core + alloc).

However, there are some important environments that fall between these two camps⁠—​in particular, those where heap allocation is possible but may fail because there’s a limited amount of heap.

Unfortunately, Rust’s standard alloc library includes a pervasive assumption that heap allocations cannot fail, and that’s not always a valid assumption.

Even a simple use of alloc::vec::Vec could potentially allocate on every line:

let mut v = Vec::new();
v.push(1); // might allocate
v.push(2); // might allocate
v.push(3); // might allocate
v.push(4); // might allocate

None of these operations returns a Result, so what happens if those allocations fail?

The answer depends on the toolchain, target, and configuration but is likely to descend into panic! and program termination. There is certainly no answer that allows an allocation failure on line 3 to be handled in a way that allows the program to move on to line 4.

This assumption of infallible allocation gives good ergonomics for code that runs in a “normal” userspace, where there’s effectively infinite memory—or at least where running out of memory indicates that the computer as a whole has bigger problems elsewhere.

However, infallible allocation is utterly unsuitable for code that needs to run in environments where memory is limited and programs are required to cope. This is a (rare) area where there’s better support in older, less memory-safe, languages:

Historically, the inability of Rust’s standard library to cope with failed allocation was flagged in some high-profile contexts (such as the Linux kernel, Android, and the Curl tool), and so work to fix the omission is ongoing.

The first step was the “fallible collection allocation” changes, which added fallible alternatives to many of the collection APIs that involve allocation. This generally adds a try_<operation> variant that results in a Result<_, AllocError>; for example:

These fallible APIs only go so far; for example, there is (as yet) no fallible equivalent to Vec::push, so code that assembles a vector may need to do careful calculations to ensure that allocation errors can’t happen:

fn try_build_a_vec() -> Result<Vec<u8>, String> {
    let mut v = Vec::new();

    // Perform a careful calculation to figure out how much space is needed,
    // here simplified to...
    let required_size = 4;

    v.try_reserve(required_size)
        .map_err(|_e| format!("Failed to allocate {} items!", required_size))?;

    // We now know that it's safe to do:
    v.push(1);
    v.push(2);
    v.push(3);
    v.push(4);

    Ok(v)
}

As well as adding fallible allocation entrypoints, it’s also possible to disable infallible allocation operations, by turning off the no_global_oom_handling config flag (which is on by default). Environments with limited heap (such as the Linux kernel) can explicitly disable this flag, ensuring that no use of infallible allocation can inadvertently creep into the code.

Item 34: Control what crosses FFI boundaries

Even though Rust comes with a comprehensive standard library and a burgeoning crate ecosystem, there is still a lot more non-Rust code in the world than there is Rust code.

As with other recent languages, Rust helps with this problem by offering a foreign function interface (FFI) mechanism, which allows interoperation with code and data structures written in different languages—despite the name, FFI is not restricted to just functions. This opens up the use of existing libraries in different languages, not just those that have succumbed to the Rust community’s efforts to “rewrite it in Rust” (RiiR).

The default target for Rust’s interoperability is the C programming language, which is the same interop target that other languages aim at. This is partly driven by the ubiquity of C libraries but is also driven by simplicity: C acts as a “least common denominator” of interoperability, because it doesn’t need toolchain support of any of the more advanced features that would be necessary for compatibility with other languages (e.g., garbage collection for Java or Go, exceptions and templates for C++, function overrides for Java and C++, etc.).

However, that’s not to say that interoperability with plain C is simple. By including code written in a different language, all of the guarantees and protections that Rust offers are up for grabs, particularly those involving memory safety.

As a result, FFI code in Rust is automatically unsafe, and the advice in Item 16 has to be bypassed. This Item explores some replacement advice, and Item 35 will explore some tooling that helps to avoid some (but not all) of the footguns involved in working with FFI. (The FFI chapter of the Rustonomicon also contains helpful advice and information.)

Invoking C Functions from Rust

The simplest FFI interaction is for Rust code to invoke a C function, taking “immediate” arguments that don’t involve pointers, references, or memory addresses:

/* File lib.c */
#include "lib.h"

/* C function definition. */
int add(int x, int y) {
  return x + y;
}

This C code provides a definition of the function and is typically accompanied by a header file that provides a declaration of the function, which allows other C code to use it:

/* File lib.h */
#ifndef LIB_H
#define LIB_H

/* C function declaration. */
int add(int x, int y);

#endif  /* LIB_H */

The declaration roughly says: somewhere out there is a function called add, which takes two integers as input and returns another integer as output. This allows C code to use the add function, subject to a promise that the actual code for add will be provided at a later date—specifically, at link time.

Rust code that wants to use add needs to have a similar declaration, with a similar purpose: to describe the signature of the function and to indicate that the corresponding code will be available later:

use std::os::raw::c_int;
extern "C" {
    pub fn add(x: c_int, y: c_int) -> c_int;
}

The declaration is marked as extern "C" to indicate that an external C library will provide the code for the function.5 The extern "C" marker also automatically marks the function as no_mangle, which we explore in “Name mangling”.

Linking logistics

The details of how the C toolchain generates an external C library—and its format—are environment-specific and beyond the scope of a Rust book like this. However, one simple variant that’s common on Unix-like systems is a static library file, which will normally have the form lib<something>.a (e.g., libcffi.a) and which can be generated using the ar tool.

The Rust build system then needs an indication of which library holds the relevant C code. This can be specified either via the link attribute in the code:

#[link(name = "cffi")] // An external library like `libcffi.a` is needed
extern "C" {
    // ...
}

or via a build script that emits a cargo:rustc-link-lib instruction to cargo:6

// File build.rs
fn main() {
    // An external library like `libcffi.a` is needed
    println!("cargo:rustc-link-lib=cffi");
}

The latter option is more flexible, because the build script can examine its environment and behave differently depending on what it finds.

In either case, the Rust build system is also likely to need information about how to find the C library, if it’s not in a standard system location. This can be specified by having a build script that emits a cargo:rustc-link-search instruction to cargo, containing the library location:

// File build.rs
fn main() {
    // ...

    // Retrieve the location of `Cargo.toml`.
    let dir = std::env::var("CARGO_MANIFEST_DIR").unwrap();
    // Look for native libraries one directory higher up.
    println!(
        "cargo:rustc-link-search=native={}",
        std::path::Path::new(&dir).join("..").display()
    );
}

Code concerns

Returning to the source code, even this simplest of examples comes with some gotchas. First, use of FFI functions is automatically unsafe:

let x = add(1, 1);
error[E0133]: call to unsafe function is unsafe and requires unsafe function
              or block
   --> src/main.rs:176:13
    |
176 |     let x = add(1, 1);
    |             ^^^^^^^^^ call to unsafe function
    |
    = note: consult the function's documentation for information on how to
            avoid undefined behavior

and so needs to be wrapped in unsafe { }.

The next thing to watch out for is the use of C’s int type, represented as std::os::raw::c_int. How big is an int? It’s probably true that the following two things are the same:

  • The size of an int for the toolchain that compiled the C library

  • The size of a std::os::raw::c_int for the Rust toolchain

But why take the chance? Prefer sized types at FFI boundaries, where possible—which for C means making use of the types (e.g., uint32_t) defined in <stdint.h>. However, if you’re dealing with an existing codebase that already uses int/long/size_t, this may be a luxury you don’t have.

The final practical concern is that the C code and the equivalent Rust declaration need to exactly match. Worse still, if there’s a mismatch, the build tools will not emit a warning—they will just silently emit incorrect code.

Item 35 discusses the use of the bindgen tool to prevent this problem, but it’s worth understanding the basics of what’s going on under the covers to understand why the build tools can’t detect the problem on their own. In particular, it’s worth understanding the basics of name mangling.

Name mangling

Compiled languages generally support separate compilation, where different parts of the program are converted into machine code as separate chunks (object files), which can then be combined into a complete program by the linker. This means that if only one small part of the program’s source code changes, only the corresponding object file needs to be regenerated; the link step then rebuilds the program, combining both the changed object and all the other unmodified objects.

The link step is (roughly speaking) a “join-the-dots” operation: some object files provide definitions of functions and variables, and other object files have placeholder markers indicating that they expect to use a definition from some other object, but it wasn’t available at compile time. The linker combines the two: it ensures that any placeholder in the compiled code is replaced with a reference to the corresponding concrete definition.

The linker performs this correlation between the placeholders and the definitions by simply checking for a matching name, meaning that there is a single global namespace for all of these correlations.

Historically, this was fine for linking C language programs, where a single name could not be reused in any way—the name of a function is exactly what appears in the object file. (As a result, a common convention for C libraries is to manually add a prefix to all symbols so that lib1_process doesn’t clash with lib2_process.)

However, the introduction of C++ caused a problem because C++ allows overridden definitions with the same name:

// C++ code
namespace ns1 {
int32_t add(int32_t a, int32_t b) { return a+b; }
int64_t add(int64_t a, int64_t b) { return a+b; }
}
namespace ns2 {
int32_t add(int32_t a, int32_t b) { return a+b; }
}

The solution for this is name mangling: the compiler encodes the signature and type information for the overridden functions into the name that’s emitted in the object file, and the linker continues to perform its simple-minded 1:1 correlation between placeholders and definitions.

On Unix-like systems, the nm tool can help show what the linker works with:

% nm ffi-lib.o | grep add  # what the linker sees for C
0000000000000000 T _add

% nm ffi-cpp-lib.o | grep add  # what the linker sees for C++
0000000000000000 T __ZN3ns13addEii
0000000000000020 T __ZN3ns13addExx
0000000000000040 T __ZN3ns23addEii

In this case, it shows three mangled symbols, all of which refer to code (the T indicates the text section of the binary, which is the traditional name for where code lives).

The c++filt tool helps translate this back into what would be visible in C++ code:

% nm ffi-cpp-lib.o | grep add | c++filt  # what the programmer sees
0000000000000000 T ns1::add(int, int)
0000000000000020 T ns1::add(long long, long long)
0000000000000040 T ns2::add(int, int)

Because the mangled name includes type information, the linker can and will complain about any mismatch in the type information between placeholder and definition. This gives some measure of type safety: if the definition changes but the place using it is not updated, the toolchain will complain.

Returning to Rust, extern "C" foreign functions are implicitly marked as #[no_mangle], and the symbol in the object file is the bare name, exactly as it would be for a C program. This means that the type safety of function signatures is lost: because the linker sees only the bare names for functions, if there are any differences in type expectations between definition and use, the linker will carry on regardless and problems will arise only at runtime.

Accessing C Data from Rust

The C add example in the previous section passed the simplest possible type of data back and forth between Rust and C: an integer that fits in a machine register. Even so, there were still things to be careful about, so it’s no surprise then that dealing with more complex data structures also has wrinkles to watch out for.

Both C and Rust use the struct to combine related data into a single data structure. However, when a struct is realized in memory, the two languages may well choose to put different fields in different places or even in different orders (the layout). To prevent mismatches, use #[repr(C)] for Rust types used in FFI; this representation is designed for the purpose of allowing C interoperability:

/* C data structure definition. */
/* Changes here must be reflected in lib.rs. */
typedef struct {
    uint8_t byte;
    uint32_t integer;
} FfiStruct;
// Equivalent Rust data structure.
// Changes here must be reflected in lib.h / lib.c.
#[repr(C)]
pub struct FfiStruct {
    pub byte: u8,
    pub integer: u32,
}

The structure definitions have a comment to remind the humans involved that the two places need to be kept in sync. Relying on the constant vigilance of humans is likely to go wrong in the long term; as for function signatures, it’s better to automate this synchronization between the two languages via a tool like bindgen (Item 35).

One particular type of data that’s worth thinking about carefully for FFI interactions is strings. The default definitions of what makes up a string are somewhat different between C and Rust:

Fortunately, dealing with C-style strings in Rust is comparatively straightforward, because the Rust library designers have already done the heavy lifting by providing a pair of types to encode them. Use the CString type to hold (owned) strings that need to be interoperable with C, and use the corresponding CStr type when dealing with borrowed string values. The latter type includes the as_ptr() method, which can be used to pass the string’s contents to any FFI function that’s expecting a const char* C string. Note that the const is important: this can’t be used for an FFI function that needs to modify the contents (char *) of the string that’s passed to it.

Lifetimes

Most data structures are too big to fit in a register and so have to be held in memory instead. That in turn means that access to the data is performed via the location of that memory. In C terms, this means a pointer: a number that encodes a memory address—with no other semantics attached (Item 8).

In Rust, a location in memory is generally represented as a reference, and its numeric value can be extracted as a raw pointer, ready to feed into an FFI boundary:

extern "C" {
    // C function that does some operation on the contents
    // of an `FfiStruct`.
    pub fn use_struct(v: *const FfiStruct) -> u32;
}
let v = FfiStruct {
    byte: 1,
    integer: 42,
};
let x = unsafe { use_struct(&v as *const FfiStruct) };

However, a Rust reference comes with additional constraints around the lifetime of the associated chunk of memory, as described in Item 14; these constraints get lost in the conversion to a raw pointer.

As a result, the use of raw pointers is inherently unsafe, as a marker that Here Be Dragons: the C code on the other side of the FFI boundary could do any number of things that will destroy Rust’s memory safety:

All of these dangers form part of the cost-benefit analysis of using an existing library via FFI. On the plus side, you get to reuse existing code that’s (presumably) in good working order, with only the need to write (or auto-generate) corresponding declarations. On the minus side, you lose the memory protections that are a big reason to use Rust in the first place.

As a first step to reduce the chances of memory-related problems, allocate and free memory on the same side of the FFI boundary. For example, this might appear as a symmetric pair of functions:

/* C functions. */

/* Allocate an `FfiStruct` */
FfiStruct* new_struct(uint32_t v);
/* Free a previously allocated `FfiStruct` */
void free_struct(FfiStruct* s);

with corresponding Rust FFI declarations:

extern "C" {
    // C code to allocate an `FfiStruct`.
    pub fn new_struct(v: u32) -> *mut FfiStruct;
    // C code to free a previously allocated `FfiStruct`.
    pub fn free_struct(s: *mut FfiStruct);
}

To make sure that allocation and freeing are kept in sync, it can be a good idea to implement an RAII wrapper that automatically prevents C-allocated memory from being leaked (Item 11). The wrapper structure owns the C-allocated memory:

/// Wrapper structure that owns memory allocated by the C library.
struct FfiWrapper {
    // Invariant: inner is non-NULL.
    inner: *mut FfiStruct,
}

and the Drop implementation returns that memory to the C library to avoid the potential for leaks:

/// Manual implementation of [`Drop`], which ensures that memory allocated
/// by the C library is freed by it.
impl Drop for FfiWrapper {
    fn drop(&mut self) {
        // Safety: `inner` is non-NULL, and besides `free_struct()` copes
        // with NULL pointers.
        unsafe { free_struct(self.inner) }
    }
}

The same principle applies to more than just heap memory: implement Drop to apply RAII to FFI-derived resources—open files, database connections, etc. (see Item 11).

Encapsulating the interactions with the C library into a wrapper struct also makes it possible to catch some other potential footguns, for example, by transforming an otherwise invisible failure into a Result:

type Error = String;

impl FfiWrapper {
    pub fn new(val: u32) -> Result<Self, Error> {
        let p: *mut FfiStruct = unsafe { new_struct(val) };
        // Raw pointers are not guaranteed to be non-NULL.
        if p.is_null() {
            Err("Failed to get inner struct!".into())
        } else {
            Ok(Self { inner: p })
        }
    }
}

The wrapper structure can then offer safe methods that allow use of the C library’s functionality:

impl FfiWrapper {
    pub fn set_byte(&mut self, b: u8) {
        // Safety: relies on invariant that `inner` is non-NULL.
        let r: &mut FfiStruct = unsafe { &mut *self.inner };
        r.byte = b;
    }
}

Alternatively, if the underlying C data structure has an equivalent Rust mapping, and if it’s safe to directly manipulate that data structure, then implementations of the AsRef and AsMut traits (described in Item 8) allow more direct use:

impl AsMut<FfiStruct> for FfiWrapper {
    fn as_mut(&mut self) -> &mut FfiStruct {
        // Safety: `inner` is non-NULL.
        unsafe { &mut *self.inner }
    }
}
let mut wrapper = FfiWrapper::new(42).expect("real code would check");
// Directly modify the contents of the C-allocated data structure.
wrapper.as_mut().byte = 12;

This example illustrates a useful principle for dealing with FFI: encapsulate access to an unsafe FFI library inside safe Rust code. This allows the rest of the application to follow the advice in Item 16 and avoid writing unsafe code. It also concentrates all of the dangerous code in one place, which you can then study (and test) carefully to uncover problems—and treat as the most likely suspect when something does go wrong.

Invoking Rust from C

What counts as “foreign” depends on where you’re standing: if you’re writing an application in C, then it may be a Rust library that’s accessed via a foreign function interface.

The basics of exposing a Rust library to C code are similar to the opposite direction:

Also like the opposite direction, more subtle problems arise when dealing with pointers, references, and lifetimes. A C pointer is different from a Rust reference, and you forget that at your peril:

/* C code invoking Rust. */
uint32_t result = add_contents(NULL); // Boom!

When you’re dealing with raw pointers, it’s your responsibility to ensure that any use of them complies with Rust’s assumptions and guarantees around references:

#[no_mangle]
pub extern "C" fn add_contents_safer(p: *const FfiStruct) -> u32 {
    let s = match unsafe { p.as_ref() } {
        Some(r) => r,
        None => return 0, // Pesky C code gave us a NULL.
    };
    s.integer + s.byte as u32
}

In these examples, the C code provides a raw pointer to the Rust code, and the Rust code converts it to a reference in order to operate on the structure. But where did that pointer come from? What does the Rust reference refer to?

The very first example in Item 8 showed how Rust’s memory safety prevents references to expired stack objects from being returned; those problems reappear if you hand out a raw pointer:

Any pointers passed back from Rust to C should generally refer to heap memory, not stack memory. But naively trying to put the object on the heap via a Box doesn’t help:

The owning Box is on the stack, so when it goes out of scope, it will free the heap object and the returned raw pointer will again be invalid.

The tool for the job here is Box::into_raw, which abnegates responsibility for the heap object, effectively “forgetting” about it:

#[no_mangle]
pub extern "C" fn new_struct_raw(v: u32) -> *mut FfiStruct {
    let s = FfiStruct::new(v); // create `FfiStruct` on stack
    let b = Box::new(s); // move `FfiStruct` to heap

    // Consume the `Box` and take responsibility for the heap memory.
    Box::into_raw(b)
}

This raises the question of how the heap object now gets freed. The previous advice was to perform allocation and freeing of memory on the same side of the FFI boundary, which means that we need to persuade the Rust side of things to do the freeing. The corresponding tool for the job is Box::from_raw, which builds a Box from a raw pointer:

#[no_mangle]
pub extern "C" fn free_struct_raw(p: *mut FfiStruct) {
    if p.is_null() {
        return; // Pesky C code gave us a NULL
    }
    let _b = unsafe {
        // Safety: p is known to be non-NULL
        Box::from_raw(p)
    };
} // `_b` drops at end of scope, freeing the `FfiStruct`

This still leaves the Rust code at the mercy of the C code; if the C code gets confused and asks Rust to free the same pointer twice, Rust’s allocator is likely to become terminally confused.

That illustrates the general theme of this Item: using FFI exposes you to risks that aren’t present in standard Rust. That may well be worthwhile, as long as you’re aware of the dangers and costs involved. Controlling the details of what passes across the FFI boundary helps to reduce that risk but by no means eliminates it.

Controlling the FFI boundary for C code invoking Rust also involves one final concern: if your Rust code ignores the advice in Item 18, you should prevent panic!s from crossing the FFI boundary, as this always results in undefined behavior—undefined but bad!8

Item 35: Prefer bindgen to manual FFI mappings

Item 34 discussed the mechanics of invoking C code from a Rust program, describing how declarations of C structures and functions need to have an equivalent Rust declaration to allow them to be used over FFI. The C and Rust declarations need to be kept in sync, and Item 34 also warned that the toolchain wouldn’t help with this—mismatches would be silently ignored, hiding problems that would arise later.

Keeping two things perfectly in sync sounds like a good target for automation, and the Rust project provides the right tool for the job: bindgen. The primary function of bindgen is to parse a C header file and emit the corresponding Rust declarations.

Taking some of the example C declarations from Item 34:

/* File lib.h */
#include <stdint.h>

typedef struct {
    uint8_t byte;
    uint32_t integer;
} FfiStruct;

int add(int x, int y);
uint32_t add32(uint32_t x, uint32_t y);

the bindgen tool can be manually invoked (or invoked by a build.rs build script) to create a corresponding Rust file:

% bindgen --no-layout-tests \
          --allowlist-function="add.*" \
          --allowlist-type=FfiStruct \
          -o src/generated.rs \
          lib.h

The generated Rust is identical to the handcrafted declarations in Item 34:

/* automatically generated by rust-bindgen 0.59.2 */

#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct FfiStruct {
    pub byte: u8,
    pub integer: u32,
}
extern "C" {
    pub fn add(
        x: ::std::os::raw::c_int,
        y: ::std::os::raw::c_int,
    ) -> ::std::os::raw::c_int;
}
extern "C" {
    pub fn add32(x: u32, y: u32) -> u32;
}

and can be pulled into Rust code with the source-level include! macro:

// Include the auto-generated Rust declarations.
include!("generated.rs");

For anything but the most trivial FFI declarations, use bindgen to generate Rust bindings for C code—this is an area where machine-made, mass-produced code is definitely preferable to artisanal handcrafted declarations. If a C function definition changes, the C compiler will complain if the C declaration no longer matches the C definition, but nothing will complain that a handcrafted Rust declaration no longer matches the C declaration; auto-generating the Rust declaration from the C declaration ensures that the two stay in sync

This also means that the bindgen step is an ideal candidate to include in a CI system (Item 32); if the generated code is included in source control, the CI system can error out if a freshly generated file doesn’t match the checked-in version.

The bindgen tool comes into its own when you’re dealing with an existing C codebase that has a large API. Creating Rust equivalents to a big lib_api.h header file is manual and tedious, and therefore error-prone—and as noted, many categories of mismatch error will not be detected by the toolchain. bindgen also has a panoply of options that allow specific subsets of an API to be targeted (such as the --allowlist-function and --allowlist-type options previously illustrated).9

This also allows a layered approach for exposing an existing C library in Rust; a common convention for wrapping some xyzzy library is to have the following:

  • An xyzzy-sys crate that holds (just) the bindgen-erated code—use of which is necessarily unsafe

  • An xyzzy crate that encapsulates the unsafe code and provides safe Rust access to the underlying functionality

This concentrates the unsafe code in one layer and allows the rest of the program to follow the advice in Item 16.

1 See The Embedonomicon or Philipp Oppermann’s older blog post for information about what’s involved in creating a no_std binary.

2 Be aware that this can occasionally go wrong. For example, at the time of writing, the Error trait is defined in core:: but is marked as unstable there; only the std:: version is stable.

3 Prior to Rust 2018, extern crate declarations were used to pull in dependencies. This is now entirely handled by Cargo.toml, but the extern crate mechanism is still used to pull in those parts of the Rust standard library (the sysroot crates) that are optional in no_std environments.

4 It’s also possible to add the std::nothrow overload to calls to new and check for nullptr return values. However, there are still container methods like vector<T>::push_back that allocate under the covers and that can therefore signal allocation failure only via an exception.

5 If the FFI functionality you want to use is part of the standard C library, then you don’t need to create these declarations—the libc crate already provides them.

6 A corresponding links key in the Cargo.toml manifest can help to make this dependency visible to Cargo.

7 A Rust equivalent of the c++filt tool for translating mangled names back to programmer-visible names is rustfilt, which builds on the rustc-demangle command.

8 Note that Rust version 1.71 includes the C-unwind ABI, which makes some cross-language unwinding functionality possible.

9 The example also used the --no-layout-tests option to keep the output simple; by default, the generated code will include #[test] code to check that structures are indeed laid out correctly.