4 Data structures

This chapter covers

Up to this point in the book, we haven’t spent much time talking about the Rust language itself. In the previous two chapters, we discussed tooling. With that out of the way, we can start diving into the Rust language and its features, which we’ll focus on for the rest of this book. In this chapter, we’ll cover the most important part of Rust after its basic syntax: data structures.

When working with Rust, you’ll spend a great deal of time interacting with its data structures, as you would any other language. Rust offers most of the features you’d expect from data structures, as you’d expect with any modern programming language, but it does so while offering exceptional safety and performance. Once you get a handle on Rust’s core data types, you’ll find the rest of the language comes into great clarity, as the patterns often repeat themselves.

In this chapter, we’ll discuss how Rust differs from other languages in its approach to data, review the core data types and structures, and discuss how to effectively use them. We’ll also discuss how Rust’s primitive types map to C types, which allows you to integrate with non-Rust software.

When working with Rust, you’ll likely spend most of your time working with three core data structures: strings, vectors, and maps. The implementations included with Rust’s standard library are fast and full featured and will cover the majority of your typical programming use cases. We’ll begin by discussing strings, which are commonly used to represent a plethora of data sources and sinks.

4.1 Demystifying String, str, &str, and &'static str

In my first encounters with Rust, I was a little confused by the string types. If you find yourself in a similar position, worry not, for I have good news: while they seem complicated, largely due to Rust’s concepts of borrowing, lifetimes, and memory management, I can assure you it’s all very straightforward once you get a handle on the underlying memory layout.

Sometimes, you may find yourself with a str when you want a String, or you may end up with String but have a function that wants a &str. Getting from one to the other isn’t hard, but it may seem confusing at first. We’ll discuss all that and more in this section.

It’s important to separate the underlying data (a contiguous sequence of characters) from the interface you’re using to interact with them. There is only one kind of string in Rust, but there are multiple ways to handle a string’s allocation and references to that string.

4.1.1 String vs str

Let’s start by clarifying a few things: first, there are, indeed, two separate core string types in Rust (String and str). And while they are technically different types, they are—for the most part—the same thing. They both represent a UTF-8 sequence of characters of arbitrary length, stored in a contiguous region of memory. The only practical difference between String and str is how the memory is managed. Additionally, to understand all core Rust types, it’s helpful to think about them in terms of how memory is managed. Thus, the two Rust string types can be summarized as

In languages like C and C++, the difference between heap- and stack-allocated data can be blurry, as C pointers don’t tell you how memory was allocated. At best, they tell you that there’s a region of memory of a specific type, which might be valid and may be anywhere from 0 to N elements in length. In Rust, memory allocation is explicit; thus your types, themselves, usually define how memory is allocated, in addition to the number of elements.

In C, you can allocate strings on the stack and mutate them, but this is not allowed in Rust without using the unsafe keyword. Not surprisingly, this is a major source of programming errors in C.

Let’s illustrate some C strings:

char *stack_string = "stack-allocated string";
char *heap_string  = strndup("heap-allocated string");

In this code, we have two identical pointer types, pointing to different kinds of memory. The first, stack_string, is a pointer to stack-allocated memory. Memory allocated on the stack is usually handled by the compiler, and the allocation is essentially instantaneous. heap_string is a pointer of the same type, to a heap-allocated string. strndup() is a standard C library function that allocates a region of memory on the heap using malloc(), copies the input into that region, and returns the address of the newly allocated region.

Note If we’re being pedantic, we might say that heap-allocated string in the preceding example is initially stack allocated but converted into a heap- allocated string after the call to strndup(). You can prove this by examining the binary generated by the compiler, which would contain the literal heap-allocated string in the binary.

Now, as far as C is concerned, all strings are the same: they’re just contiguous regions of memory of arbitrary length, terminated by a null character (hex byte value 0x00). So if we switch back to thinking about Rust, we can think of str as equivalent to the first line, stack_string. String is equivalent to the second line, heap_string. While this is somewhat of an oversimplification, it’s a good model to help us understand strings in Rust.

4.1.2 Using strings effectively

Most of the time, when working in Rust, you’re going to be working with either a String or &str but never a str. The Rust standard library’s immutable string functions are implemented for the &str type, but the mutable functions are only implemented for the String type.

It’s not possible to create a str directly; you can only borrow a reference to one. The &str type serves as a convenient lowest common denominator, such as when used as a function argument because you can always borrow a String as &str.

Let’s quickly discuss static lifetimes: In Rust, 'static is a special lifetime specifier that defines a reference (or borrowed variable) that is valid for the entire life of a process. There are a few special cases in which you may need an explicit &'static str, but in practice, it’s something infrequently encountered.

Deciding to use String or a static string comes down to mutability, as shown in figure 4.1. If you don’t require mutability, a static string is almost always the best choice.

CH04_F01_Matthews

Figure 4.1 Deciding when to use str or a String, in a very simple flowchart

The only real difference between &'static str and &str is that, while a String can be borrowed as &str, String can never be borrowed as &'static str because the life of a String is never as long as the process. When a String goes out of scope, it’s released with the Drop trait (we’ll explore traits in greater detail in chapter 8).

Under the hood, a String is actually just a Vec of UTF-8 characters. We’ll discuss Vec in greater detail later in the chapter. Additionally, a str is just a slice of UTF-8 characters, and we’ll discuss slices more in the next section. Table 4.1 summarizes the core string types you will encounter and how to differentiate them.

Table 4.1 String types summarized

Type

Kind

Components

Use

str

Stack-allocated UTF-8 string slice

A pointer to an array of characters plus its length

Immutable string, such as logging or debug statements or anywhere else you may have an immutable stack-allocated string

String

Heap-allocated UTF-8 string

A vector of characters

Mutable, resizable string, which can be allocated and deallocated as needed

&str

Immutable string reference

A pointer to either borrowed str or String plus its length

Can be used anywhere you want to borrow either a str or a String immutably

&'static str

Immutable static string reference

A pointer to a str plus its length

A reference to a str with an explicit static lifetime

Another difference between str and String is that String can be moved, whereas str cannot. In fact, it’s not possible to own a variable of type str—it’s only possible to hold reference to a str. To illustrate, consider the following listing.

Listing 4.1 Movable and nonmovable strings

fn print_String(s: String) {
    println!("print_String: {}", s);
}
 
fn print_str(s: &str) {
    println!("print_str: {}", s);
}
 
fn main() {
    // let s: str = "impossible str";       
    print_String(String::from("String"));   
    print_str(&String::from("String"));     
    print_str("str");                       
    // print_String("str");                 
}

Does not compile; rustc will report “error[E0277]: the size for values of type str cannot be known at compilation time.”

OK: moves a String out of main into print_String

OK: returns a &str from a String in main

OK: creates a str on the stack within main and passes a reference to that str as &str to print_str

Does not compile; rustc will report “error[E0308]: mismatched types, expected struct String, found &str.”

The preceding code, when run, prints the following output:

print_String: String
print_str: String
print_str: str

4.2 Understanding slices and arrays

Slices and arrays are special types in Rust. They represent a sequence of arbitrary values of the same type. You can also have multidimensional slices or arrays (i.e., slices of slices, arrays of arrays, arrays of slices, or slices of arrays).

Slices are a somewhat new programming concept, as you generally won’t find the term slice used when discussing sequences in the language syntax for Java, C, C++, Python, or Ruby. Typically, sequences are referred to as either arrays (as in Java, C, C++, and Ruby), lists (as in Python), or simply sequences (as in Scala). Other languages may provide equivalent behavior, but slices are not necessarily a first-class language concept or type in the way they are in Rust or Go (although the slice abstraction has been catching on in other languages). C++ does have std::span and std::string_view, which provide equivalent behavior, but the term slice is not used in C++ when describing these.

Note The term slices appears to have originated with the Go language, as described in this blog post from 2013 by Rob Pike: https://go.dev/blog/slices.

In Rust, specifically, slices and arrays differ subtly. An array is a fixed-length sequence of values, and a slice is a sequence of values with an arbitrary length. That is, a slice can be of a variable length, determined at run time, whereas an array has a fixed length known at compile time. Slices have another interesting property, which is that you can destructure slices into as many nonoverlapping subslices as desired; this can be convenient for implementing algorithms that use divide-and-conquer or recursive strategies.

Working with arrays can, at times, be tricky in Rust because knowing the length of a sequence at compile time requires the information to be passed to the compiler at compile time and present in the type signature. As of Rust 1.51, it’s possible to use a feature called const generics (discussed in greater detail in chapter 10) to define generic arrays of arbitrary length but only at compile time.

Let’s illustrate the difference between slices and arrays with the following code.

Listing 4.2 Creating an array and a slice

let array = [0u8; 64];        
 let slice: &[u8] = &array;   

The type signature here is [u8; 64], an array, initialized with zeroes.

This borrows a slice of the array.

In this code, we’ve initialized a byte array containing 64 elements, all of which are zero. 0u8 is shorthand for an unsigned integral type, 8 bits in length, with a value of 0. 0 is the value, and u8 is the type.

On the second line, we’re borrowing the array as a slice. Up until now, this hasn’t been particularly interesting. You can do some slightly more interesting things with slices, such as borrowing twice:

let (first_half, second_half) = slice.split_at(32);     
println!(
    "first_half.len()={} second_half.len()={}",
    first_half.len(),
    second_half.len()
);

Splits and borrows a slice twice, destructuring it into two separate, nonoverlapping subslices

The preceding code is calling the split_at() function, which is part of Rust’s core library and implemented for all slices, arrays, and vectors. split_at()destructures the slice (which is already borrowed from array) and gives us two nonoverlapping slices that correspond to the first and second half of the original array.

This concept of destructuring is important in Rust because you may find yourself in situations where you need to borrow a portion of an array or slice. In fact, you can borrow the same slice or array multiple times using this pattern, as slices don’t overlap. One common use case for this is parsing or decoding text or binary data. For example:

let wordlist = "one,two,three,four";
for word in wordlist.split(‘,’) {
    println!("word={}", word);
}

Looking at the preceding code, it may be immediately obvious that we’ve taken a string, split it on ,, and then printed each word within that string. The output from this code prints the following:

word=one
word=two
word=three
word=four

What’s worth noting about the preceding code is that there’s no heap allocation happening. All of the memory is allocated on the stack, of a fixed length known at compile time, with no calls to malloc() under the hood. This is the equivalent of working with raw C pointers, but there’s no reference counting or garbage collection involved; therefore, there is none of the overhead. And unlike C pointers, the code is succinct, safe, and not overly verbose.

Slices, additionally, have a number of optimizations for working with contiguous regions of memory. One such optimization is the copy_from_slice() method, which works on slices. A call to copy_from_slice() from the standard library uses the memcpy() function to copy memory, as shown in the following listing.

Listing 4.3 Snippet of slice/mod.rs, from http://mng.bz/5oRO

pub fn copy_from_slice(&mut self, src: &[T])
where
    T: Copy,
{
           
 
    // SAFETY: `self` is valid for `self.len()` elements by definition,
    // and `src` was checked to have the same length. The slices cannot
    // overlap because mutable references are exclusive.
    unsafe {
        ptr::copy_nonoverlapping(
          src.as_ptr(),
          self.as_mut_ptr(),
          self.len()
        );
    }
}

Code intentionally omitted

In the preceding listing, which comes from Rust’s core library, ptr::copy_nonoverlapping() is just a wrapper around the C library’s memcpy(). On some platforms, memcpy() has additional optimizations beyond what you might be able to accomplish with normal code. Other optimized functions are fill() and fill_with(), which both use memset() to fill memory.

Let’s review the core attributes of arrays and slices:

4.3 Vectors

Vectors are, arguably, Rust’s most important data type (the next most important being String, which is based on Vec). When working with data in Rust, you’ll find yourself frequently creating vectors when you need a resizable sequence of values. If you’re coming from C++, you’ve likely heard the term vectors before, and in many ways Rust’s vector type is very similar to what you’d find in C++. Vectors serve as a general-purpose container for just about any kind of sequence.

Vectors are one of the ways to allocate memory on the heap in Rust (another being smart pointers, like Box; smart pointers are covered in greater detail in chapter 5). Vectors have a few internal optimizations to limit excessive allocations, such as allocating memory in blocks. Additionally, in nightly Rust, you can supply a custom allocator (discussed in greater detail in chapter 5) to implement your own memory allocation behavior.

4.3.1 Diving deeper into Vec

Vec inherits the methods of slices because we can obtain a slice reference from a vector. Rust does not have inheritance in the sense of object-oriented programming, but rather Vec is a special type that is both a Vec and a slice at the same time. For example, let’s take a look at the standard library implementation for as_slice().

Listing 4.4 Snippet of vec/mod.rs, from http://mng.bz/6nRe

pub fn as_slice(&self) -> &[T] {
    self
}

The preceding code listing is performing a special conversion that (under normal circumstances) wouldn’t work. It’s taking self, which is Vec<T> in the preceding code, and simply returning it as &[T]. If you try to compile the same code yourself, it will fail.

How does this work? Rust provides a trait called Deref (and its mutable companion DerefMut), which may be used by the compiler to coerce one type into another, implicitly. Once implemented for a given type, that type will also automatically implement all the methods of the dereferenced type. In the case of Vec, Deref and DerefMut are implemented in the Rust standard library, as shown in the following listing.

Listing 4.5 Snippet of the Deref implementation for Vec, from http://mng.bz/6nRe

impl<T, A: Allocator> ops::Deref for Vec<T, A> {
    type Target = [T];
 
    fn deref(&self) -> &[T] {
        unsafe { slice::from_raw_parts(self.as_ptr(), self.len) }
    }
}
 
impl<T, A: Allocator> ops::DerefMut for Vec<T, A> {
    fn deref_mut(&mut self) -> &mut [T] {
        unsafe { slice::from_raw_parts_mut(self.as_mut_ptr(), self.len) }
    }
}

In the preceding code listing, dereferencing the vector will coerce it into a slice from its raw pointer and length. It should be noted that such an operation is temporary—that is to say, a slice cannot be resized, and the length is provided to the slice at the time of dereferencing.

If, for some reason, you took a slice of a vector and resized the vector, the slice’s size would not change. This would only be possible in unsafe code, however, because the borrow checker will not let you borrow a slice from a vector and change the vector at the same time. Take the following to illustrate:

let mut vec = vec![1, 2, 3];
let slice = vec.as_slice();       
vec.resize(10, 0);                
println!("{}", slice[0]);         

Returns &[i32] because vec is borrowed here

This is a mutable operation.

This fails to compile.

The preceding code will fail to compile, as the borrow checker returns this error:

error[E0502]: cannot borrow `vec` as mutable because it is also borrowed as
immutable
 --> src/main.rs:4:5
  |
3 |     let slice = vec.as_slice();
  |                 --- immutable borrow occurs here
4 |     vec.resize(10, 0);
  |     ^^^^^^^^^^^^^^^^^ mutable borrow occurs here
5 |     println!("{}", slice[0]);
  |                    -------- immutable borrow later used here

4.3.2 Wrapping vectors

Some types in Rust merely wrap a Vec, such as String. The String type is a Vec<u8> and dereferences (using the previously mentioned Deref trait) into a str.

Listing 4.6 Snippet of string.rs, from http://mng.bz/orAZ

pub struct String {
    vec: Vec<u8>,
}

Wrapping vectors is a common pattern, as Vec is the preferred way to implement a resizable sequence of any type.

4.3.3 Types related to vectors

In 90% of cases, you’ll want to use a Vec. In the other 10% of cases, you’ll probably want to use a HashMap (discussed in the next section). Container types other than Vec or HashMap may make sense in certain situations, or cases when you need special optimization, but most likely, a Vec will be sufficient, and using another type will not provide noticeable performance improvements. A quote comes to mind:

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

—Donald Knuth

In cases where you are concerned about allocating excessively large regions of contiguous memory or about where the memory is located, you can easily get around this problem by simply stuffing a Box into a Vec (i.e, using Vec<Box<T>>). With that said, there are several other collection types in Rust’s standard library, some of which wrap a Vec internally, and you may occasionally need to use them:

Additional recommendations, including up-to-date performance details of Rust’s core data structures, can be found in the Rust standard library collections reference at https://doc.rust-lang.org/std/collections/index.html.

Tip It’s also reasonable to build your own data structures on top of Vec, should you need to. For an example of how to do this, the BinaryHeap from Rust’s standard library provides a complete example, which is documented at http://mng.bz/n1A5.

4.4 Maps

HashMap is the other container type in Rust that you’ll find yourself using. If Vec is the preferred resizable type of the language, HashMap is the preferred type for cases where you need a collection of items that can be retrieved in constant time, using a key. Rust’s HashMap is not much different from hash maps you may have encountered in other languages, but Rust’s implementation is likely faster and safer than what you might find in other libraries, thanks to some optimizations Rust provides.

HashMap uses the Siphash-1-3 function for hashing, which is also used in Python (starting from 3.4), Ruby, Swift, and Haskell. This function provides good tradeoffs for common cases, but it may be inappropriate for very small or very large keys, such as integral types or very large strings.

It’s also possible to supply your own hash function for use with HashMap. You may want to do this in cases where you want to hash very small or very large keys, but for most cases, the default implementation is adequate.

4.4.1 Custom hashing functions

To use a HashMap with a custom hashing function, you need to first find an existing implementation or write a hash function that implements the necessary traits. HashMap requires that std::hash::BuildHasher, std::hash::Hasher, and std::default::Default are implemented for the hash function you wish to use. Traits are discussed in greater detail in chapter 8.

Let’s examine the implementation of HashMap from the standard library in the following listing.

Listing 4.7 Snippet of HashMap, from http://mng.bz/vPAp

impl<K, V, S> HashMap<K, V, S>
where
    K: Eq + Hash,
    S: BuildHasher,
{
   
}

Code intentionally omitted

In this listing, you can see BuildHasher specified as a trait requirement on the S type parameter. Digging a little deeper, in the following listing, you can see BuildHasher is just a wrapper around the Hasher trait.

Listing 4.8 Snippet of BuildHasher, from http://mng.bz/46RR

pub trait BuildHasher {
    /// Type of the hasher that will be created.
    type Hasher: Hasher;      
 
                              
}

Here, there’s a requirement on the Hasher trait.

Code intentionally omitted

The BuildHasher and Hasher APIs leave most of the implementation details up to the author of the hash function. For BuildHasher, only a build_hasher() method is required, which returns the new Hasher instance. The Hasher trait only requires two methods: write() and finish(). write() takes a byte slice (&[u8]), and finish() returns an unsigned 64-bit integer representing the computed hash. The Hasher trait also provides a number of blanket implementations, which you inherit for free if you implement the Hasher trait. It’s worth examining the documentation for the traits themselves at http://mng.bz/QR76 and http://mng.bz/Xqo9 to get a clearer picture of how they work.

Many crates are available on https://crates.io that already implement a wide variety of hash functions. As an example, in the following listing, let’s construct a HashMap with MetroHash, an alternative to SipHash, designed by J. Andrew Rogers, described at https://www.jandrewrogers.com/2015/05/27/metrohash/. The MetroHash crate already includes the necessary implementation of the std::hash::BuildHasher and std::hash::Hasher traits, which makes this very easy.

Listing 4.9 Code listing for using HashMap with MetroHash

use metrohash::MetroBuildHasher;
use std::collections::HashMap;
 
let mut map = HashMap::<String, String,
 MetroBuildHasher>::default();                
map.insert("hello?".into(), "Hello!".into());   
 
println!("{:?}", map.get("hello?"));            

Creates a new HashMap instance, using MetroHash

Inserts a key and value pair into the map, using the Into trait for conversion from &str to String

Retrieves the value from the map, which returns an Option; the {:?} argument to the println! macro tells it to format this value using the fmt::Debug trait.

4.4.2 Creating hashable types

HashMap can be used with arbitrary keys and values, but the keys must implement the std::cmp::Eq and std::hash::Hash traits. Many traits, such as Eq and Hash, can be automatically derived using the #[derive] attribute. Consider the following example.

Listing 4.10 Code listing for a compound key type

#[derive(Hash, Eq, PartialEq, Debug)]
struct CompoundKey {
    name: String,
    value: i32,
}

The preceding code represents a compound key composed of a name and value. We’re using the #[derive] attribute to derive four traits: Hash, Eq, PartialEq, and Debug. While HashMap only requires Hash and Eq, we need to also derive PartialEq because Eq depends on PartialEq. I’ve also derived Debug, which provides automatic debug print methods. This is extremely convenient for debugging and testing code.

We haven’t discussed #[derive] much in this book yet, but it’s something you’ll use frequently in Rust. We’ll go into more detail on traits and #[derive] in chapters 8 and 9. For now, you should just think of it as an automatic way to generate trait implementations. These trait implementations have the added benefit in that they’re composable: so long as they exist for any subset of types, they can also be derived for a superset of types.

4.5 Rust types: Primitives, structs, enums, and aliases

Being a strongly typed language, Rust provides several ways to model data. At the bottom are primitive types, which handle our most basic units of data, like numeric values, bytes, and characters. Moving up from there, we have structs and enums, which are used to encapsulate other types. Finally, aliases let us rename and combine other types into new types.

To summarize, in Rust, there are four categories of types:

4.5.1 Using primitive types

Primitive types are provided by the Rust language and core library. These are equivalent to the primitives you’d find in any other strongly typed language, with a few exceptions, which we’ll review in this section. The core primitive types are summarized in table 4.2, which includes integers, floats, tuples, and arrays.

Table 4.2 Summary of primitive types in Rust

Class

Kind

Description

Scalar

Integers

Can be either a signed or unsigned integer, anywhere from 8-128 bits in length (bound to a byte; i.e., 8 bits)

Scalar

Sizes

An architecture-specific size type, which can be signed or unsigned

Scalar

Floating point

32- or 64-bit floating point numbers

Compound

Tuples

Fixed-length collection of types or values, which can be destructured.

Sequence

Arrays

Fixed-length sequence of values of a type that can be sliced.

Integer types

Integer types can be recognized by their signage designation (either i or u for signed and unsigned, respectively), followed by the number of bits. Sizes begin with i or u, followed by the word size. Floating-point types begin with f, followed by the number of bits. Table 4.3 summarizes the primitive integer types.

Table 4.3 Summary of integer-type identifiers

Length

Signed identifier

Unsigned identifier

C equivalent

8 bits

i8

u8

char and uchar

16 bits

i16

u16

short and unsigned short

32 bits

i32

u32

int and unsigned int

64 bits

i64

u64

long, long long, unsigned long, and unsigned long long, depending on the platform

128 bits

i128

u128

Extended integers are nonstandard C but provided as _int128 or _uint128 with GCC and Clang

The type for an integer literal can be specified by appending the type identifier. For example, 0u8 denotes an unsigned 8-bit integer with a value of 0. Integer values can be prefixed with 0b, 0o, 0x, or b for binary, octal, hexadecimal, and byte literals. Consider the following listing, which prints each value as a decimal (base 10) integer.

Listing 4.11 Code listing with integer literals

let value = 0u8;
println!("value={}, length={}", value, std::mem::size_of_val(&value));
let value = 0b1u16;
println!("value={}, length={}", value, std::mem::size_of_val(&value));
let value = 0o2u32;
println!("value={}, length={}", value, std::mem::size_of_val(&value));
let value = 0x3u64;
println!("value={}, length={}", value, std::mem::size_of_val(&value));
let value = 4u128;
println!("value={}, length={}", value, std::mem::size_of_val(&value));
 
println!("Binary (base 2)         0b1111_1111={}", 0b1111_1111);
println!("Octal (base 8)          0o1111_1111={}", 0o1111_1111);
println!("Decimal (base 10)         1111_1111={}", 1111_1111);
println!("Hexadecimal (base 16)   0x1111_1111={}", 0x1111_1111);
println!("Byte literal            b’A’={}", b’A’);

When we run this code, we get the following output.

Listing 4.12 Output from listing 4.11

value=0, length=1
value=1, length=2
value=2, length=4
value=3, length=8
value=4, length=16
Binary (base 2)         0b1111_1111=255
Octal (base 8)          0o1111_1111=2396745
Decimal (base 10)         1111_1111=11111111
Hexadecimal (base 16)   0x1111_1111=286331153
Byte literal            b’A’=65

Size types

For size types, the identifiers are usize and isize. These are platform-dependent sizes, which are typically 32 or 64 bits in length for 32- and 64-bit systems, respectively. usize is equivalent to C’s size_t, and isize is provided to permit signed arithmetic with sizes. In the Rust standard library, functions returning or expecting a length parameter expect a usize.

Arithmetic on primitives

Many languages permit unchecked arithmetic on primitive types. In C and C++, in particular, many arithmetic operations have undefined results and produce no errors. One such example is division by zero. Consider the following C program.

Listing 4.13 Code of divide_by_zero.c

#include <stdio.h>
 
int main() {
    printf("%d\n", 1 / 0);
}

If you compile and run this code with clang divide_by_zero.c && ./a.out, it will print a value that appears random. Both Clang and GCC happily compile this code, and they both print a warning, but there is no run-time check for an undefined operation.

In Rust, all arithmetic is checked by default. Consider the following Rust program:

// println!("{}", 1 / 0);         
 
let one = 1;
let zero = 0;
// println!("{}", one / zero);    
 
let one = 1;
let zero = one - 1;
// println!("{}", one / zero);    
 
let one = { || 1 }();
let zero = { || 0 }();
println!("{}", one / zero);       

Does not compile

Still doesn’t compile

The code panics here!

In the preceding code, Rust’s compiler is pretty good at catching errors at compile time. We need to trick the compiler to allow the code to compile and run. In the preceding code, we do this by initializing a variable from the return value of a closure. Another way to do it would be to just create a regular function that returns the desired value. In any case, running the problem produces the following output:

Running `target/debug/unchecked-arithmetic`
thread ‘main’ panicked at ‘attempt to divide by zero’, src/main.rs:14:20
note: run with `RUST_BACKTRACE=1` environment variable to display a
backtrace

If you need more control over arithmetic in Rust, the primitive types provide several methods for handling such operations. For example, to safely handle division by zero, you can use the checked_div() method, which returns an Option:

assert_eq!((100i32).checked_div(1i32), Some(100i32));    
assert_eq!((100i32).checked_div(0i32), None);            

100 / 1 = 1

100 / 0—the result is undefined.

For scalar types (integers, sizes, and floats), Rust provides a collection of methods that provide basic arithmetic operations (e.g., division, multiplication, addition, and subtraction) in checked, unchecked, overflowing, and wrapping forms.

When you want to achieve compatibility with the behavior from languages like C, C++, Java, C#, and others, the method you probably want to use is the wrapping form, which performs modular arithmetic and is compatible with the C-equivalent operations. Keep in mind that overflow on signed integers in C is undefined. Here’s an example of modular arithmetic in Rust:

assert_eq!(0xffu8.wrapping_add(1), 0);
assert_eq!(0xffffffffu32.wrapping_add(1), 0);
assert_eq!(0u32.wrapping_sub(1), 0xffffffff);
assert_eq!(0x80000000u32.wrapping_mul(2), 0);

The full listing of arithmetic functions for each primitive is available in the Rust documentation. For i32, it can be found at https://doc.rust-lang.org/std/primitive.i32.html.

4.5.2 Using tuples

Rust’s tuples are similar to what you’ll find in other languages. A tuple is a fixed-length sequence of values, and the values can each have different types. Tuples in Rust are not reflective; unlike arrays, you can’t iterate over a tuple, take a slice of a tuple, or determine the type of its components at run time. Tuples are essentially a form of syntax sugar in Rust, and while useful, they are quite limited.

Consider the following example of a tuple:

let tuple = (1, 2, 3);

This code looks somewhat similar to what you might expect for an array, except for the limitations mentioned above (you can’t slice, iterate, or reflect tuples). To access individual elements within the tuple, you can refer to them by their position, starting at 0:

println!("tuple = ({}, {}, {})", tuple.0, tuple.1, tuple.2);   

This prints "tuple = (1, 2, 3)".

Alternatively, you can use match, which provides temporary destructuring, provided there’s a pattern match (pattern matching is discussed in greater detail in chapter 8):

match tuple {
    (one, two, three) => println!("{}, {}, {}", one, two, three),    
}

This prints "1, 2, 3".

We can also destructure a tuple into its parts with the following syntax, which moves the values out of the tuple:

let (one, two, three) = tuple;
println!("{}, {}, {}", one, two, three);    

This prints "1, 2, 3".

In my experience, the most common use of tuples is returning multiple values from a function. For example, consider this succinct swap() function:

fn swap<A, B>(a: A, b: B) -> (B, A) {
    (b, a)
}
 
fn main() {
    let a = 1;
    let b = 2;
 
    println!("{:?}", swap(a, b));    
}

This prints "(2, 1)".

Tip It’s recommended that you don’t make tuples with more than 12 arguments, although there is no strict upper limit to the length of a tuple. The standard library only provides trait implementations for tuples with up to 12 elements.

4.5.3 Using structs

Structs are the main building block in Rust. They are composite data types, which can contain any set types and values. They are similar in nature to C structs or classes in object-oriented languages. They can be composed generically in a fashion similar to templates in C++ or generics in Java, C#, or TypeScript (generics are covered in greater detail in chapter 8).

You should use a struct any time you need to

You are not required to use structs. You can write APIs with functions only, if you desire, in a fashion similar to C APIs. Additionally, structs are only needed to define implementations—they are not for specifying interfaces. This differs from object- oriented languages, like C++, Java, and C#.

The simplest form of a struct is an empty struct:

struct EmptyStruct {}
 
struct AnotherEmptyStruct;     

Unit struct, which ends with semicolon with no braces

Empty structs (or unit structs) are something you may encounter occasionally. Another form of struct is the tuple struct, which looks like this:

struct TupleStruct(String);
 
let tuple_struct = TupleStruct("string value".into());    
println!("{}", tuple_struct.0);                           

Initializes the struct similarly to a tuple

The first tuple element can be accessed with .0, the second with .1, the third with .2, and so on.

A tuple struct is a special form of struct, which behaves like a tuple. The main difference between a tuple struct and a regular struct is that, in a tuple struct, the values have no names, only types. Notice how a tuple struct has a semicolon (;) at the end of the declaration, which is not required for regular structs (except for an empty declaration). Tuple structs can be convenient in certain cases by allowing you to omit the field names (thereby shaving a few characters off your source code), but they also create ambiguity.

A typical struct has a list of elements with names and types, like this:

struct TypicalStruct {
  name: String,
  value: String,
  number: i32,
}

Each element within a struct has module visibility by default. That means values within the struct are accessible anywhere within the scope of the current module. Visibility can be set on a per-element basis:

pub struct MixedVisibilityStruct {    
  pub name: String,                   
  pub(crate) value: String,           
  pub(super) number: i32,             
}

A public struct, visible outside the crate

This element is public, accessible outside of the crate.

This element is public anywhere within the crate.

This element is accessible anywhere within the parent scope.

Most of the time, you shouldn’t need to make struct elements public. An element within a struct can be accessed and modified by any code within the public scope for that struct element. The default visibility (which is equivalent to pub(self)) allows any code within the same module to access and modify the elements within a struct.

Visibility semantics also apply to the structs themselves, just like their member elements. For a struct to be visible outside of a crate (i.e., to be consumed from a library), it must be declared with pub struct MyStruct { ... }. A struct that’s not explicitly declared as public won’t be accessible outside of the crate (this also applies generally to functions, traits, and any other declarations).

When you declare a struct, you’ll probably want to derive a few standard trait implementations:

#[derive(Debug, Clone, Default)]
struct DebuggableStruct {
    string: String,
    number: i32,
}

In this code, we’re deriving the Debug, Clone, and Default traits. These traits are summarized as follows:

You can derive these traits yourself if you wish (such as in cases where you want to customize their behavior), but so long as all elements within a struct implement each trait, you can derive them automatically and save a lot of typing.

With these three traits derived for the preceding example, we can now do the following:

let debuggable_struct = DebuggableStruct::default();
println!("{:?}", debuggable_struct);                  
println!("{:?}", debuggable_struct.clone());          

Prints DebuggableStruct { string: "", number: 0 }

Also prints DebuggableStruct { string: "", number: 0 }

To define methods for a struct, you will implement them using the impl keyword:

impl DebuggableStruct {
  fn increment_number(&mut self) {    
    self.number += 1;
  }
}

A function that takes a mutable reference to self

This code takes a mutable reference of our struct and increments it by 1. Another way to do this would be to consume the struct and return it from the function:

impl DebuggableStruct {
  fn incremented_number(mut self) -> Self {     
    self.number += 1;
    self
  }
}

A function that takes an owned mutable instance of self

There’s a subtle difference between these two implementations, but they are functionally equivalent. There may be cases when you want to consume the input to a method to swallow it, but in most cases, the first version (using &mut self) is preferred.

4.5.4 Using enums

Enums can be thought of as a specialized type of struct that contains enumerated mutually exclusive variants. An enum can be one of its variants at a given time. With a struct, all elements of the struct are present. With an enum, only one of the variants is present. An enum can contain any kind of type, not just integral types. The types may be named or anonymous.

This is quite different from enums in languages like C, C++, Java, or C#. In those languages, enums are effectively used as a way to define constant values. Rust’s enums can emulate enums, as you might expect from other languages, but they are conceptually different. While C++ has enums, Rust’s enums are more similar to std::variant than C++’s enum.

Consider the following enum:

#[derive(Debug)]
enum JapaneseDogBreeds {
    AkitaKen,
    HokkaidoInu,
    KaiKen,
    KishuInu,
    ShibaInu,
    ShikokuKen,
}

For the preceding enum, JapaneseDogBreeds is the name of the enum type, and each of the elements within the enum is a unit-like type. Since the types in the enum don’t exist outside the enum, they are created within the enum. We can run the following code now:

println!("{:?}", JapaneseDogBreeds::ShibaInu);          
println!("{:?}", JapaneseDogBreeds::ShibaInu as u32);   

This prints "ShibaInu".

This prints "4", the 32-bit unsigned integer representation of the enum value.

Casting the enum type to a u32 works because enum types are enumerated. Now, what if we want to go from the number 4 to the enum value? For that, there is no automatic conversion, but we can implement it ourselves using the From trait:

impl From<u32> for JapaneseDogBreeds {
    fn from(other: u32) -> Self {
        match other {
            other if JapaneseDogBreeds::AkitaKen as u32 == other => {
                JapaneseDogBreeds::AkitaKen
            }
            other if JapaneseDogBreeds::HokkaidoInu as u32 == other => {
                JapaneseDogBreeds::HokkaidoInu
            }
            other if JapaneseDogBreeds::KaiKen as u32 == other => {
                JapaneseDogBreeds::KaiKen
            }
            other if JapaneseDogBreeds::KishuInu as u32 == other => {
                JapaneseDogBreeds::KishuInu
            }
            other if JapaneseDogBreeds::ShibaInu as u32 == other => {
                JapaneseDogBreeds::ShibaInu
            }
            other if JapaneseDogBreeds::ShikokuKen as u32 == other => {
                JapaneseDogBreeds::ShikokuKen
            }
            _ => panic!("Unknown breed!"),
        }
    }
}

In the preceding code, we must cast the enum type to a u32 to perform the comparison, and then we return the enum type if there’s a match. In the case where no value matches, we call panic!(), which causes the program to crash. The preceding syntax uses the match guard feature, which lets us match using an if statement.

It’s possible to specify the enumeration variant types in an enum as well. This can be used to achieve behavior similar to C enums:

enum Numbers {
    One = 1,
    Two = 2,
    Three = 3,
}
 
fn main() {
  println!("one={}", Numbers::One as u32);    
}

This prints "one=1". Note that without the as cast, this does not compile because One doesn’t implement std::fmt.

Enums may contain tuples, structs, and anonymous (i.e., unnamed) types as variants:

enum EnumTypes {
    NamedType,                     
    String,                        
    NamedString(String),           
    StructLike { name: String },   
    TupleLike(String, i32),        
}

A named type

An unnamed String type

A named String type, specified as a tuple with one item

A struct-like type, with a single element called name

A tuple-like type with two elements

To clarify, an unnamed enum variant is a variant that’s specified as a type, rather than with a name. A named enum variant is equivalent to creating a new type within the enum, which also happens to correspond to an enumerated integer value. In other words, if you want to emulate the behavior of enums from languages like C, C++, or Java, you’ll be using named variants, which conveniently emulate the enumeration behavior by casting the value to an integer type, even though enum variants are also types (i.e., not just values).

As a general rule, it’s good practice to avoid mixing named and unnamed variants within an enum, as it can be confusing.

4.5.5 Using aliases

Aliases are a special type in Rust that allows you to provide an alternative and equivalent name for any other type. They are equivalent to C and C++’s typedef or the C++ using keyword. Defining an alias does not create a new type.

Aliases have two common uses:

For example, I may want to create a type alias for a hash map I frequently use within my crate:

pub(crate) type MyMap = std::collections::HashMap<String, MyStruct>;

Now, rather than having to type the full std::collections::HashMap<String, MyStruct>, I can use MyMap instead.

For libraries, it’s common practice to export public type aliases with sensible defaults for type construction when generics are used. It can be difficult at times to determine which types are required for a given interface, and aliases provide one way for library authors to signal that information.

In the dryoc crate, I provide a number of type aliases, for convenience. The API makes heavy use of generics. One such example is shown in the following listing.

Listing 4.14 Snippet for kdf.rs, from http://mng.bz/yZAp

/// Stack-allocated key type alias for key derivation with [`Kdf`].
pub type Key = StackByteArray<CRYPTO_KDF_KEYBYTES>;
/// Stack-allocated context type alias for key derivation with [`Kdf`].
pub type Context = StackByteArray<CRYPTO_KDF_CONTEXTBYTES>;

In the preceding code, the Key and Context type aliases are provided within this module, so the user of this library does not need to worry about implementation details.

4.6 Error handling with Result

Rust provides a few features to make error handling easier. These features are based on an enum called Result, defined in the following listing.

Listing 4.15 Snippet of std::result::Result, from http://mng.bz/M97Q

pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

A Result represents an operation that can either succeed (returning a result) or fail (returning an error). You will quickly become accustomed to seeing Result as the return type for many functions in Rust.

You will likely want to create your own error type in your crate. That type could be either an enum containing all the different kinds of errors you expect or simply a struct with something actionable, such as an error message. I, being a simple person, prefer to just provide a helpful message and move on with my life. Here’s a very simple error struct:

#[derive(Debug)]
struct Error {
    message: String,
}

Within your crate, you’ll need to decide what type of errors you want your functions to return. My suggestion is to have your crate return its own error type. This is convenient for anyone else using your crate because it will be clear to them where the error originates from.

To make this pattern work, you’ll need to implement the From trait to convert your error type into the target error type returned from the function where the ? operator is used in cases where the types differ. Doing this is relatively easy because the compiler will tell you when it’s necessary.

Now, within your crate, suppose you have a function that reads the contents of a file, like this:

fn read_file(name: &str) -> Result<String, Error> {
    use std::fs::File;
    use std::io::prelude::*;
 
    let mut file = File::open(name)?;      
    let mut contents = String::new();
    file.read_to_string(&mut contents)?;   
    Ok(contents)
}

Using the ? operator here for implicit error handling

Using the ? operator here too

In the preceding code, we have a function that opens a file, name; reads the contents into a string; and returns the contents as a result. We use the ? operator twice, which works by returning the result of the function upon success or returning the error immediately. Both File::open and read_to_string() use the std::io::Error type, so we’ve provided the following From implementation, which permits this conversion automatically:

impl From<std::io::Error> for Error {
    fn from(other: std::io::Error) -> Self {
        Self {
            message: other.to_string(),
        }
    }
}

4.7 Converting types with From/Into

Rust provides two very useful traits as part of its core library: the From and Into traits. If you browse the Rust standard library, you may notice that From and Into are implemented for a great number of different types because of the usefulness of these traits. You will frequently encounter these traits when working with Rust.

These traits provide a standard way to convert between types. They are occasionally used by the compiler to automatically convert types on your behalf.

As a general rule, you only need to implement the From trait and almost never Into. The Into trait is the reciprocal of From and will be derived automatically by the compiler. There is one exception to this rule: versions of Rust prior to 1.41 had slightly stricter rules, which didn’t allow implementing From when the conversion destination was an external type.

From is preferred because it doesn’t require specifying the destination type, resulting in slightly simpler syntax. The signature for the From trait (from the standard library) is as follows:

pub trait From<T>: Sized {
    /// Performs the conversion.
    fn from(_: T) -> Self;
}

Let’s create a very simple String wrapper and implement this trait for our type:

struct StringWrapper(String);
 
impl From<&str> for StringWrapper {
    fn from(other: &str) -> Self {
        Self(other.into())            
    }
}
 
fn main() {
    println!("{}", StringWrapper::from("Hello, world!").0);
}

Returns a copy of the string, wrapped in a new StringWapper

In the preceding code, we’re allowing conversion from a &str, a borrowed string, into a string. To convert the other string into our string, we just call into(), which comes from the Into trait implemented for String. In this example, we use both From and Into.

In practice, you will find yourself needing to convert between types for a variety of reasons. One such case is for handling errors when using Result. If you call a function that returns a result and use the ? operator within that function, you’ll need to provide a From implementation if the error type returned by the inner function differs from the error type used by the Result.

Consider the following code:

use std::{fs::File, io::Read};
 
struct Error(String);
 
fn read_file(name: &str) -> Result<String, Error> {
    let mut f = File::open(name)?;
    let mut output = String::new();
 
    f.read_to_string(&mut output)?;
 
    Ok(output)
}

The preceding code attempts to read a file into a string and returns the result. We have a custom error type, which just contains a string. The code, as is, does not compile:

error[E0277]: `?` couldn’t convert the error to `Error`
 --> src/main.rs:6:33
  |
5 | fn read_file(name: &str) -> Result<String, Error> {
  |                             --------------------- expected `Error`
  because of this
6 |     let mut f = File::open(name)?;
  |                                 ^ the trait `From<std::io::Error>` is
  not implemented for `Error`
  |
  = note: the question mark operation (`?`) implicitly performs a conversion
  on the error value using the `From` trait
  = note: required by `from`
 
error[E0277]: `?` couldn’t convert the error to `Error`
 --> src/main.rs:9:34
  |
5 | fn read_file(name: &str) -> Result<String, Error> {
  |                             --------------------- expected `Error`
  because of this
...
9 |     f.read_to_string(&mut output)?;
  |                                  ^ the trait `From<std::io::Error>` is
  not implemented for `Error`
  |
  = note: the question mark operation (`?`) implicitly performs a conversion
  on the error value using the `From` trait
  = note: required by `from`

To make it compile, we need to implement the From trait for Error such that the compiler knows how to convert std::io::Error into our own custom error. The implementation looks like this:

impl From<std::io::Error> for Error {
    fn from(other: std::io::Error) -> Self {
        Self(other.to_string())
    }
}

Now, if we compile and run the code, it works as expected.

4.7.1 TryFrom and TryInto

In addition to the From and Into traits, there are TryFrom and TryInto. These traits are nearly identical, except they are for cases in which the type conversion may fail. The conversion methods in these traits return Result, whereas with From and Into, there is no way to return an error aside from panicking, which causes the entire program to crash.

4.7.2 Best practices for type conversion using From and Into

We can summarize the best practices for type conversion with the From and Into traits as follows:

4.8 Handling FFI compatibility with Rust’s types

You may, occasionally, need to call functions from non-Rust libraries (or vice versa), and in many cases, that requires modeling C structs in Rust. To do this, you must use Rust’s foreign function interface features (FFI). Rust’s structs are not compatible with C structs. To make them compatible, you should do the following:

To make this whole process much easier, the Rust team provides a tool called rust-bindgen. With rust-bindgen, you can generate bindings to C libraries automatically from C headers. Most of the time, you should use rust-bindgen to generate bindings, and you can follow the instructions at http://mng.bz/amgj to do so.

In some cases, I have found I need to call C functions for test purposes or some other reason, and dealing with rust-bindgen is not worth the trouble for simple cases. In those cases, the process for mapping C structs to Rust is as follows:

Following up on the zlib example from chapter 2, let’s quickly implement zlib’s file struct, which looks like this in C:

struct gzFile_s {
    unsigned have;
    unsigned char *next;
    z_off64_t pos;
};

The corresponding Rust struct, after conversion, would look like this:

#[repr(C)]             
struct GzFileState {   
    have: c_uint,
    next: *mut c_uchar,
    pos: i64,
}

Instructs rustc to align the memory in this struct as a C compiler would, for compatibility with C

A C struct representing a zlib file state, as defined in zlib.h

Putting it all together, you can call C functions from zlib with the struct that zlib expects:

type GzFile = *mut GzFileState;
 
#[link(name = "z")]                                                   
extern "C" {                                                          
    fn gzopen(path: *const c_char, mode: *const c_char) -> GzFile;    
    fn gzread(file: GzFile, buf: *mut c_uchar, len: c_uint) -> c_int; 
    fn gzclose(file: GzFile) -> c_int;                                
    fn gzeof(file: GzFile) -> c_int;                                  
}
 
fn read_gz_file(name: &str) -> String {
    let mut buffer = [0u8; 0x1000];
    let mut contents = String::new();
    unsafe {
        let c_name = CString::new(name).expect("CString failed");     
        let c_mode = CString::new("r").expect("CString failed");
        let file = gzopen(c_name.as_ptr(), c_mode.as_ptr());
        if file.is_null() {
            panic!(
                "Couldn’t read file: {}",
                std::io::Error::last_os_error()
            );
        }
        while gzeof(file) == 0 {
            let bytes_read = gzread(
                file,
                buffer.as_mut_ptr(),
                (buffer.len() - 1) as c_uint,
            );
            let s = std::str::from_utf8(&buffer[..(bytes_read as usize)])
                .unwrap();
            contents.push_str(s);
        }
        gzclose(file);
    }
 
    contents
}

Instructs rustc that these functions belong to the external z library

External zlib functions as defined in zlib.h

Converts a Rust UTF-8 string into an ASCII C string, raising an error if there’s a failure

The read_gz_file() will open a gzipped file, read its contents, and return them as a string.

Summary