All hail the dirt bike / Philosopher dirt bike /
Silence as we gathered round / We saw the word and were on our wayThey Might Be Giants, “Dirt Bike” (1994)
For this chapter’s challenge, you will create a version of the venerable wc (word count) program, which dates back to version 1 of AT&T Unix.
This program will display the number of lines, words, and bytes found in text from STDIN or one or more files.
I often use it to count the number of lines returned by some other process.
In this chapter, you will learn how to do the following:
Use the Iterator::all function
Create a module for tests
Fake a filehandle for testing
Conditionally format and print a value
Conditionally compile a module when testing
Break a line of text into words, bytes, and characters
Use Iterator::collect to turn an iterator into a vector
I’ll start by showing how wc works so you know what is expected by the tests.
Following is an excerpt from the BSD wc manual page that describes the elements that the challenge program will implement:
WC(1) BSD General Commands Manual WC(1)
NAME
wc -- word, line, character, and byte count
SYNOPSIS
wc [-clmw] [file ...]
DESCRIPTION
The wc utility displays the number of lines, words, and bytes contained
in each input file, or standard input (if no file is specified) to the
standard output. A line is defined as a string of characters delimited
by a <newline> character. Characters beyond the final <newline> charac-
ter will not be included in the line count.
A word is defined as a string of characters delimited by white space
characters. White space characters are the set of characters for which
the iswspace(3) function returns true. If more than one input file is
specified, a line of cumulative counts for all the files is displayed on
a separate line after the output for the last file.
The following options are available:
-c The number of bytes in each input file is written to the standard
output. This will cancel out any prior usage of the -m option.
-l The number of lines in each input file is written to the standard
output.
-m The number of characters in each input file is written to the
standard output. If the current locale does not support multi-
byte characters, this is equivalent to the -c option. This will
cancel out any prior usage of the -c option.
-w The number of words in each input file is written to the standard
output.
When an option is specified, wc only reports the information requested by
that option. The order of output always takes the form of line, word,
byte, and file name. The default action is equivalent to specifying the
-c, -l and -w options.
If no files are specified, the standard input is used and no file name is
displayed. The prompt will accept input until receiving EOF, or [^D] in
most environments.
A picture is worth a kilobyte of words, so I’ll show you some examples using the following test files in the 05_wcr/tests/inputs directory:
empty.txt: an empty file
fox.txt: a file with one line of text
atlamal.txt: a file with the first stanza from “Atlamál hin groenlenzku” or “The Greenland Ballad of Atli,” an Old Norse poem
When run with an empty file, the program reports zero lines, words, and bytes in three right-justified columns eight characters wide:
$ cd 05_wcr
$ wc tests/inputs/empty.txt
0 0 0 tests/inputs/empty.txt
Next, consider a file with one line of text with varying spaces between words and a tab character.
Let’s take a look at it before running wc on it.
Here I’m using cat with the flag -t to display the tab character as ^I and -e to display $ for the end of the line:
$ cat -te tests/inputs/fox.txt The quick brown fox^Ijumps over the lazy dog.$
This example is short enough that I can manually count all the lines, words, and bytes as shown in Figure 5-1, where spaces are noted with raised dots, the tab character with \t, and the end of the line as $.
I find that wc is in agreement:
$ wc tests/inputs/fox.txt
1 9 48 tests/inputs/fox.txt
As mentioned in Chapter 3, bytes may equate to characters for ASCII, but Unicode characters may require multiple bytes. The file tests/inputs/atlamal.txt contains many such examples:1
$ cat tests/inputs/atlamal.txt Frétt hefir öld óvu, þá er endr of gerðu seggir samkundu, sú var nýt fæstum, æxtu einmæli, yggr var þeim síðan ok it sama sonum Gjúka, er váru sannráðnir.
According to wc, this file contains 4 lines, 29 words, and 177 bytes:
$ wc tests/inputs/atlamal.txt
4 29 177 tests/inputs/atlamal.txt
If I want only the number of lines, I can use the -l flag and only that column will be shown:
$ wc -l tests/inputs/atlamal.txt
4 tests/inputs/atlamal.txt
I can similarly request only the number of bytes with -c and words with -w, and only those two columns will be shown:
$ wc -w -c tests/inputs/atlamal.txt
29 177 tests/inputs/atlamal.txt
I can request the number of characters using the -m flag:
$ wc -m tests/inputs/atlamal.txt
159 tests/inputs/atlamal.txt
The GNU version of wc will show both character and byte counts if you provide both the flags -m and -c, but the BSD version will show only one or the other, with the latter flag taking precedence:
$ wc -cm tests/inputs/atlamal.txt159 tests/inputs/atlamal.txt $ wc -mc tests/inputs/atlamal.txt
177 tests/inputs/atlamal.txt
Note that no matter the order of the flags, like -wc or -cw, the output columns are always ordered by lines, words, and bytes/characters:
$ wc -cw tests/inputs/atlamal.txt
29 177 tests/inputs/atlamal.txt
If no positional arguments are provided, wc will read from STDIN and will not print a filename:
$ cat tests/inputs/atlamal.txt | wc -lc
4 177
The GNU version of wc will understand a filename consisting of a dash (-) to mean STDIN, and it also provides long flag names as well as some other options:
$ wc --help
Usage: wc [OPTION]... [FILE]...
or: wc [OPTION]... --files0-from=F
Print newline, word, and byte counts for each FILE, and a total line if
more than one FILE is specified. With no FILE, or when FILE is -,
read standard input. A word is a non-zero-length sequence of characters
delimited by white space.
The options below may be used to select which counts are printed, always in
the following order: newline, word, character, byte, maximum line length.
-c, --bytes print the byte counts
-m, --chars print the character counts
-l, --lines print the newline counts
--files0-from=F read input from the files specified by
NUL-terminated names in file F;
If F is - then read names from standard input
-L, --max-line-length print the length of the longest line
-w, --words print the word counts
--help display this help and exit
--version output version information and exit
If processing more than one file, both versions will finish with a total line showing the number of lines, words, and bytes for all the inputs:
$ wc tests/inputs/*.txt
4 29 177 tests/inputs/atlamal.txt
0 0 0 tests/inputs/empty.txt
1 9 48 tests/inputs/fox.txt
5 38 225 total
Nonexistent files are noted with a warning to STDERR as the files are being processed.
In the following example, blargh represents a nonexistent file:
$ wc tests/inputs/fox.txt blargh tests/inputs/atlamal.txt
1 9 48 tests/inputs/fox.txt
wc: blargh: open: No such file or directory
4 29 177 tests/inputs/atlamal.txt
5 38 225 total
As I first showed in Chapter 2, I can redirect the STDERR filehandle 2 in bash to verify that wc prints the warnings to that channel:
$ wc tests/inputs/fox.txt blargh tests/inputs/atlamal.txt 2>err1 9 48 tests/inputs/fox.txt 4 29 177 tests/inputs/atlamal.txt 5 38 225 total $ cat err
wc: blargh: open: No such file or directory
There is an extensive test suite to verify that your program implements all these options.
The challenge program should be called wcr (pronounced wick-er) for our Rust version of wc.
Use cargo new wcr to start, then modify your Cargo.toml to add the following dependencies:
[dependencies]clap="2.33"[dev-dependencies]assert_cmd="2"predicates="2"rand="0.8"
Copy the 05_wcr/tests directory into your new project and run cargo test to perform an initial build and run the tests, all of which should fail.
Use the same structure for src/main.rs from previous programs:
fnmain(){ifletErr(e)=wcr::get_args().and_then(wcr::run){eprintln!("{}",e);std::process::exit(1);}}
Following is a skeleton for src/lib.rs you can copy.
First, here is how I would define the Config to represent the command-line parameters:
useclap::{App,Arg};usestd::error::Error;typeMyResult<T>=Result<T,Box<dynError>>;#[derive(Debug)]pubstructConfig{files:Vec<String>,lines:bool,words:bool,bytes:bool,chars:bool,}

The files parameter will be a vector of strings.

The lines parameter is a Boolean for whether or not to print the line count.

The words parameter is a Boolean for whether or not to print the word count.

The bytes parameter is a Boolean for whether or not to print the byte count.

The chars parameter is a Boolean for whether or not to print the character count.
The main function assumes you will create a get_args function to process the command-line arguments.
Here is an outline you can use:
pubfnget_args()->MyResult<Config>{letmatches=App::new("wcr").version("0.1.0").author("Ken Youens-Clark <kyclark@gmail.com>").about("Rust wc")// What goes here?.get_matches();Ok(Config{files:...lines:...words:...bytes:...chars:...})}
You will also need a run function, and you can start by printing the configuration:
pubfnrun(config:Config)->MyResult<()>{println!("{:#?}",config);Ok(())}
Try to get your program to generate --help output similar to the following:
$ cargo run -- --help
wcr 0.1.0
Ken Youens-Clark <kyclark@gmail.com>
Rust wc
USAGE:
wcr [FLAGS] [FILE]...
FLAGS:
-c, --bytes Show byte count
-m, --chars Show character count
-h, --help Prints help information
-l, --lines Show line count
-V, --version Prints version information
-w, --words Show word count
ARGS:
<FILE>... Input file(s) [default: -]
The challenge program will mimic the BSD wc in disallowing both the -m (character) and -c (bytes) flags:
$ cargo run -- -cm tests/inputs/fox.txt
error: The argument '--bytes' cannot be used with '--chars'
USAGE:
wcr --bytes --chars
The default behavior will be to print lines, words, and bytes from STDIN, which means those values in the configuration should be true when none have been explicitly requested by the user:
$ cargo run
Config {
files: [
"-",
],
lines: true,
words: true,
bytes: true,
chars: false,
}

The default value for files should be a dash (-) for STDIN.

The chars value should be false unless the -m|--chars flag is present.
If any single flag is present, then all the other flags not mentioned should be false:
$ cargo run -- -l tests/inputs/*.txtConfig { files: [ "tests/inputs/atlamal.txt", "tests/inputs/empty.txt", "tests/inputs/fox.txt", ], lines: true,
words: false, bytes: false, chars: false, }

The -l flag indicates only the line count is wanted, and bash will expand the file glob tests/inputs/*.txt into all the filenames in that directory.

Because the -l flag is present, the lines value is the only one that is true.
Stop here and get this much working. My dog needs a bath, so I’ll be right back.
Following is the first part of my get_args.
There’s nothing new to how I declare the parameters, so I’ll not comment on this:
pubfnget_args()->MyResult<Config>{letmatches=App::new("wcr").version("0.1.0").author("Ken Youens-Clark <kyclark@gmail.com>").about("Rust wc").arg(Arg::with_name("files").value_name("FILE").help("Input file(s)").default_value("-").multiple(true),).arg(Arg::with_name("words").short("w").long("words").help("Show word count").takes_value(false),).arg(Arg::with_name("bytes").short("c").long("bytes").help("Show byte count").takes_value(false),).arg(Arg::with_name("chars").short("m").long("chars").help("Show character count").takes_value(false).conflicts_with("bytes"),).arg(Arg::with_name("lines").short("l").long("lines").help("Show line count").takes_value(false),).get_matches();
After clap parses the arguments, I unpack them and try to figure out the default
values:
letmutlines=matches.is_present("lines");letmutwords=matches.is_present("words");letmutbytes=matches.is_present("bytes");letchars=matches.is_present("chars");if[lines,words,bytes,chars].iter().all(|v|v==&false){lines=true;words=true;bytes=true;}Ok(Config{files:matches.values_of_lossy("files").unwrap(),lines,words,bytes,chars,})}

Unpack all the flags.

If all the flags are false, then set lines, words, and bytes to true.

Use the struct field initialization shorthand to set the values.
I want to highlight how I create a temporary list using a slice with all the flags.
I then call the slice::iter method to create an iterator so I can use the Iterator::all function to find if all the values are false.
This method expects a closure, which is an anonymous function that can be passed as an argument to another function.
Here, the closure is a predicate or a test that figures out if an element is false.
The values are references, so I compare each value to &false, which is a reference to a Boolean value.
If all the evaluations are true, then Iterator::all will return true.2
A slightly shorter but possibly less obvious way to write this would be:
if[lines,words,bytes,chars].iter().all(|v|!v){

Negate each Boolean value v using std::ops::Not, which is written using a prefix exclamation point (!).
Now to work on the counting part of the program.
This will require iterating over the file arguments and trying to open them, and I suggest you use the open function from Chapter 2 for this:
fnopen(filename:&str)->MyResult<Box<dynBufRead>>{matchfilename{"-"=>Ok(Box::new(BufReader::new(io::stdin()))),_=>Ok(Box::new(BufReader::new(File::open(filename)?))),}}
Be sure to expand your imports to the following:
useclap::{App,Arg};usestd::error::Error;usestd::fs::File;usestd::io::{self,BufRead,BufReader};
Here is a run function to get you going:
pubfnrun(config:Config)->MyResult<()>{forfilenamein&config.files{matchopen(filename){Err(err)=>eprintln!("{}: {}",filename,err),Ok(_)=>println!("Opened {}",filename),}}Ok(())}
You are welcome to write your solution however you like, but I decided to create a function called count that would take a filehandle and possibly return a struct called FileInfo containing the number of lines, words, bytes, and characters, each represented as a usize.
I say that the function will possibly return this struct because the function will involve I/O, which could go sideways.
I put the following definition in src/lib.rs just after the Config struct.
For reasons I will explain shortly, this must derive the PartialEq trait in addition to Debug:
#[derive(Debug, PartialEq)]pubstructFileInfo{num_lines:usize,num_words:usize,num_bytes:usize,num_chars:usize,}
My count function might succeed or fail, so it will return a MyResult<FileInfo>, meaning that on success it will have a FileInfo in the Ok variant or else will have an Err.
To start this function, I will initialize some mutable variables to count all the elements and will return a FileInfo struct:
pubfncount(mutfile:implBufRead)->MyResult<FileInfo>{letmutnum_lines=0;letmutnum_words=0;letmutnum_bytes=0;letmutnum_chars=0;Ok(FileInfo{num_lines,num_words,num_bytes,num_chars,})}

The count function will accept a mutable file value, and it might return a
FileInfo struct.

Initialize mutable variables to count the lines, words, bytes, and characters.

For now, return a FileInfo with all zeros.
I’m introducing the impl keyword to indicate that the file value must implement the BufRead trait. Recall that open returns a value that meets this criterion. You’ll shortly see how this makes the function flexible.
In Chapter 4, I showed you how to write a unit test, placing it just after the function it was testing.
I’m going to create a unit test for the count function, but this time I’m going to place it inside a module called tests.
This is a tidy way to group unit tests, and I can use the #[cfg(test)] configuration option to tell Rust to compile the module only during testing.
This is especially useful because I want to use std::io::Cursor in my test to fake a filehandle for the count function.
According to the documentation, a Cursor is “used with in-memory buffers, anything implementing AsRef<[u8]>, to allow them to implement Read and/or Write, allowing these buffers to be used anywhere you might use a reader or writer that does actual I/O.”
Placing this dependency inside the tests module ensures that it will be included only when I test the program.
The following is how I create the tests module and then import and test the count function:
#[cfg(test)]mod tests {
use super::{count, FileInfo};
use std::io::Cursor;
#[test] fn test_count() { let text = "I don't want the world. I just want your half.\r\n"; let info = count(Cursor::new(text));
assert!(info.is_ok());
let expected = FileInfo { num_lines: 1, num_words: 10, num_chars: 48, num_bytes: 48, }; assert_eq!(info.unwrap(), expected);
} }

The cfg enables conditional compilation, so this module will be compiled only when testing.

Define a new module (mod) called tests to contain test code.

Import the count function and FileInfo struct from the parent module super, meaning next above and referring to the module above tests that contains it.

Import std::io::Cursor.

Run count with the Cursor.

Ensure the result is Ok.

Compare the result to the expected value. This comparison requires FileInfo to implement the PartialEq trait, which is why I added derive(PartialEq) earlier.
Run this test using cargo test test_count.
You will see lots of warnings from the Rust compiler about unused variables or variables that do not need to be mutable.
The most important result is that the test fails:
failures:
---- tests::test_count stdout ----
thread 'tests::test_count' panicked at 'assertion failed: `(left == right)`
left: `FileInfo { num_lines: 0, num_words: 0, num_bytes: 0, num_chars: 0 }`,
right: `FileInfo { num_lines: 1, num_words: 10, num_bytes: 48,
num_chars: 48 }`', src/lib.rs:146:9
This is an example of test-driven development, where you write a test to define the expected behavior of your function and then write the function that passes the unit test.
Once you have some reasonable assurance that the function is correct, use the returned FileInfo to print the expected output.
Start as simply as possible using the empty file, and make sure your program prints zeros for the three columns of lines, words, and bytes:
$ cargo run -- tests/inputs/empty.txt
0 0 0 tests/inputs/empty.txt
Next, use tests/inputs/fox.txt and make sure you get the following counts. I specifically added various kinds and numbers of whitespace to challenge you on how to split the text into words:
$ cargo run -- tests/inputs/fox.txt
1 9 48 tests/inputs/fox.txt
Be sure your program can handle the Unicode in tests/inputs/atlamal.txt correctly:
$ cargo run -- tests/inputs/atlamal.txt
4 29 177 tests/inputs/atlamal.txt
And that you correctly count the characters:
$ cargo run -- tests/inputs/atlamal.txt -wml
4 29 159 tests/inputs/atlamal.txt
Next, use multiple input files to check that your program prints the correct total column:
$ cargo run -- tests/inputs/*.txt
4 29 177 tests/inputs/atlamal.txt
0 0 0 tests/inputs/empty.txt
1 9 48 tests/inputs/fox.txt
5 38 225 total
When all that works correctly, try reading from STDIN:
$ cat tests/inputs/atlamal.txt | cargo run
4 29 177
Now, I’ll walk you through how I went about writing the wcr program.
Bear in mind that you could have solved this many different ways.
As long as your code passes the tests and produces the same output as the BSD version of wc, then it works well and you should be proud of your accomplishments.
I left you with an unfinished count function, so I’ll start there.
As we discussed in Chapter 3, BufRead::lines will remove the line endings, and I don’t want that because newlines in Windows files are two bytes (\r\n) but Unix newlines are just one byte (\n).
I can copy some code from Chapter 3 that uses BufRead::read_line to read each line into a buffer.
Conveniently, this function tells me how many bytes have been read from the file:
pubfncount(mutfile:implBufRead)->MyResult<FileInfo>{letmutnum_lines=0;letmutnum_words=0;letmutnum_bytes=0;letmutnum_chars=0;letmutline=String::new();loop{letline_bytes=file.read_line(&mutline)?;ifline_bytes==0{break;}num_bytes+=line_bytes;num_lines+=1;num_words+=line.split_whitespace().count();num_chars+=line.chars().count();line.clear();}Ok(FileInfo{num_lines,num_words,num_bytes,num_chars,})}

Create a mutable buffer to hold each line of text.

Create an infinite loop for reading the filehandle.

Try to read a line from the filehandle.

End of file (EOF) has been reached when zero bytes are read, so break out of the loop.

Add the number of bytes from this line to the num_bytes variable.

Each time through the loop is a line, so increment num_lines.

Use the str::split_whitespace method to break the string on whitespace and use Iterator::count to find the number of words.

Use the str::chars method to break the string into Unicode characters and use Iterator::count to count the characters.

With these changes, the test_count test will pass.
To integrate this into my code, I will first change run to simply print the FileInfo struct or print a warning to STDERR when the file can’t be opened:
pubfnrun(config:Config)->MyResult<()>{forfilenamein&config.files{matchopen(filename){Err(err)=>eprintln!("{}: {}",filename,err),Ok(file)=>{ifletOk(info)=count(file){println!("{:?}",info);}}}}Ok(())}
When I run it on one of the test inputs, it appears to work for a valid file:
$ cargo run -- tests/inputs/fox.txt
FileInfo { num_lines: 1, num_words: 9, num_bytes: 48, num_chars: 48 }
It even handles reading from STDIN:
$ cat tests/inputs/fox.txt | cargo run
FileInfo { num_lines: 1, num_words: 9, num_bytes: 48, num_chars: 48 }
Next, I need to format the output to meet the specifications.
To create the expected output, I can start by changing run to always print the lines, words, and bytes followed by the filename:
pubfnrun(config:Config)->MyResult<()>{forfilenamein&config.files{matchopen(filename){Err(err)=>eprintln!("{}: {}",filename,err),Ok(file)=>{ifletOk(info)=count(file){println!("{:>8}{:>8}{:>8} {}",info.num_lines,info.num_words,info.num_bytes,filename);}}}}Ok(())}
If I run it with one input file, it’s already looking pretty sweet:
$ cargo run -- tests/inputs/fox.txt
1 9 48 tests/inputs/fox.txt
If I run cargo test fox to run all the tests with the word fox in the name, I pass one out of eight tests.
Huzzah!
running 8 tests test fox ... ok test fox_bytes ... FAILED test fox_chars ... FAILED test fox_bytes_lines ... FAILED test fox_words_bytes ... FAILED test fox_words ... FAILED test fox_words_lines ... FAILED test fox_lines ... FAILED
I can inspect tests/cli.rs to see what the passing test looks like. Note that the tests reference constant values declared at the top of the module:
constPRG:&str="wcr";constEMPTY:&str="tests/inputs/empty.txt";constFOX:&str="tests/inputs/fox.txt";constATLAMAL:&str="tests/inputs/atlamal.txt";
Again I have a run helper function to run my tests:
fnrun(args:&[&str],expected_file:&str)->TestResult{letexpected=fs::read_to_string(expected_file)?;Command::cargo_bin(PRG)?.args(args).assert().success().stdout(expected);Ok(())}

Try to read the expected output for this command.

Run the wcr program with the given arguments. Assert that the program succeeds and that STDOUT matches the expected value.
The fox test is running wcr with the FOX input file and no options, comparing it to the contents of the expected output file that was generated using 05_wcr/mk-outs.sh:
#[test]fnfox()->TestResult{run(&[FOX],"tests/expected/fox.txt.out")}
Look at the next function in the file to see a failing test:
#[test]fnfox_bytes()->TestResult{run(&["--bytes",FOX],"tests/expected/fox.txt.c.out")}
When run with --bytes, my program should print only that column of output, but it always prints lines, words, and bytes.
So I decided to write a function called format_field in src/lib.rs that would conditionally return a formatted string or the empty string depending on a Boolean value:
fnformat_field(value:usize,show:bool)->String{ifshow{format!("{:>8}",value)}else{"".to_string()}}

The function accepts a usize value and a Boolean and returns a String.

Check if the show value is true.

Return a new string by formatting the number into a string eight characters wide.

Otherwise, return the empty string.
Why does this function return a String and not a str? They’re both strings, but a str is an immutable, fixed-length string. The value that will be returned from the function is dynamically generated at runtime, so I must use String, which is a growable, heap-allocated structure.
I can expand my tests module to add a unit test for this:
#[cfg(test)]modtests{usesuper::{count,format_field,FileInfo};usestd::io::Cursor;#[test]fntest_count(){}// Same as before#[test]fntest_format_field(){assert_eq!(format_field(1,false),"");assert_eq!(format_field(3,true),"3");assert_eq!(format_field(10,true),"10");}}

Add format_field to the imports.

The function should return the empty string when show is false.

Check width for a single-digit number.

Check width for a double-digit number.
Here is how I use the format_field function in context, where I also handle printing the empty string when reading from STDIN:
pubfnrun(config:Config)->MyResult<()>{forfilenamein&config.files{matchopen(filename){Err(err)=>eprintln!("{}: {}",filename,err),Ok(file)=>{ifletOk(info)=count(file){println!("{}{}{}{}{}",format_field(info.num_lines,config.lines),format_field(info.num_words,config.words),format_field(info.num_bytes,config.bytes),format_field(info.num_chars,config.chars),iffilename=="-"{"".to_string()}else{format!("{}",filename)});}}}}Ok(())}

Format the output for each of the columns using the format_field function.

When the filename is a dash, print the empty string; otherwise, print a space and the filename.
With these changes, all the tests for cargo test fox pass.
But if I run the entire test suite, I see that my program is still failing the tests with names that include the word all:
failures:
test_all
test_all_bytes
test_all_bytes_lines
test_all_lines
test_all_words
test_all_words_bytes
test_all_words_lines
Looking at the test_all function in tests/cli.rs confirms that the test is using all the input files as arguments:
#[test]fntest_all()->TestResult{run(&[EMPTY,FOX,ATLAMAL],"tests/expected/all.out")}
If I run my current program with all the input files, I can see that I’m missing the total line:
$ cargo run -- tests/inputs/*.txt
4 29 177 tests/inputs/atlamal.txt
0 0 0 tests/inputs/empty.txt
1 9 48 tests/inputs/fox.txt
Here is my final run function that keeps a running total and prints those values when there is more than one input:
pubfnrun(config:Config)->MyResult<()>{letmuttotal_lines=0;letmuttotal_words=0;letmuttotal_bytes=0;letmuttotal_chars=0;forfilenamein&config.files{matchopen(filename){Err(err)=>eprintln!("{}: {}",filename,err),Ok(file)=>{ifletOk(info)=count(file){println!("{}{}{}{}{}",format_field(info.num_lines,config.lines),format_field(info.num_words,config.words),format_field(info.num_bytes,config.bytes),format_field(info.num_chars,config.chars),iffilename.as_str()=="-"{"".to_string()}else{format!("{}",filename)});total_lines+=info.num_lines;total_words+=info.num_words;total_bytes+=info.num_bytes;total_chars+=info.num_chars;}}}}ifconfig.files.len()>1{println!("{}{}{}{} total",format_field(total_lines,config.lines),format_field(total_words,config.words),format_field(total_bytes,config.bytes),format_field(total_chars,config.chars));}Ok(())}

Create mutable variables to track the total number of lines, words, bytes, and characters.

Update the totals using the values from this file.

Print the totals if there is more than one input.
This appears to work well:
$ cargo run -- tests/inputs/*.txt
4 29 177 tests/inputs/atlamal.txt
0 0 0 tests/inputs/empty.txt
1 9 48 tests/inputs/fox.txt
5 38 225 total
I can count characters instead of bytes:
$ cargo run -- -m tests/inputs/atlamal.txt
159 tests/inputs/atlamal.txt
And I can show and hide any columns I want:
$ cargo run -- -wc tests/inputs/atlamal.txt
29 177 tests/inputs/atlamal.txt
Write a version that mimics the output from the GNU wc instead of the BSD version.
If your system already has the GNU version, run the mk-outs.sh program to generate the expected outputs for the given input files.
Modify the program to create the
correct output according to the tests.
Then expand the program to handle the additional options like --files0-from for reading the input filenames from a file and
--max-line-length to print the length of the longest line.
Add tests for the new
functionality.
Next, ponder the mysteries of the iswspace function mentioned in the BSD manual page noted at the beginning of the chapter.
What if you ran the program on the spiders.txt file of the Issa haiku from Chapter 2, but it used Japanese characters?3
隅の蜘案じな煤はとらぬぞよ
What would the output be? If I place this into a file called spiders.txt, BSD wc thinks there are three words:
$ wc spiders.txt
1 3 40 spiders.txt
The GNU version says there is only one word:
$ wc spiders.txt 1 1 40 spiders.txt
I didn’t want to open that can of worms (or spiders?), but if you were creating a version of this program to release to the public, how could you replicate the BSD and GNU versions?
Well, that was certainly fun. In about 200 lines of Rust, we wrote a pretty passable replacement for one of the most widely used Unix programs. Compare your version to the 1,000 lines of C in the GNU source code. Reflect upon your progress in this chapter:
You learned that the Iterator::all function will return true if all the elements evaluate to true for the given predicate, which is a closure accepting an element. Many similar Iterator methods accept a closure as an argument for testing, selecting, and transforming the elements.
You used the str::split_whitespace and str::chars methods to break text into words and characters.
You used the Iterator::count method to count the number of items.
You wrote a function to conditionally format a value or the empty string to support the printing or omission of information according to the flag arguments.
You organized your unit tests into a tests module and imported functions from the parent module, called super.
You used the #[cfg(test)] configuration option to tell Rust to compile the tests module only when testing.
You saw how to use std::io::Cursor to create a fake filehandle for testing a function that expects something that implements BufRead.
You’ve learned quite a bit about reading files with Rust, and in the next chapter, you’ll learn how to write files.
1 The text shown in this example translates to: “There are many who know how of old did men, in counsel gather / little good did they get / in secret they plotted, it was sore for them later / and for Gjuki’s sons, whose trust they deceived.”
2 When my youngest first started brushing his own teeth before bed, I would ask if he’d brushed and flossed. The problem was that he was prone to fibbing, so it was hard to trust him. In an actual exchange one night, I asked, “Did you brush and floss your teeth?” Yes, he replied. “Did you brush your teeth?” Yes, he replied. “Did you floss your teeth?” No, he replied. So clearly he failed to properly combine Boolean values because a true statement and a false statement should result in a false outcome.
3 A more literal translation might be “Corner spider, rest easy, my soot-broom is idle.”