Introduction

This repository contains a set of relatively small Rust programs that I ported from C++ to learn Rust.

Programs

The C++ programs ported are the non-iteractive programs from the LLVM Test Suite Shootout C++ which originates in the Programming Language Shootout:

The current criteria for programs I picked are:

they are small
they produce some output so that the correctness of porting can be checked
they are deterministic and can be run without any interactive user input from a beginning to an end
their execution times indicate their run time performance

I'd like to add more programs in the future.

Porting notes

When porting from C++ to Rust, I stick with the idiomatic style, including the use of the standard library types, as much as appropriate. For example, I'd use Rust's String type where C++ std::string (or a C char* string) is used. Similarly, I'd use Rust's Vec type in place for C++ std::vector and C/C++ raw array types.

I wrote down porting notes for each program.

Benchmark results and performance analysis

I included benchmark results between the C++ and the Rust versions. I also included performance analysis based on the generated code for the programs whose benchmark results were significantly different,

ackermann

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <iostream>
#include <stdlib.h>

using namespace std;

int Ack(int M, int N) { return(M ? (Ack(M-1,N ? Ack(M,(N-1)) : 1)) : N+1); }

int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 11
#else
#define LENGTH 12
#endif
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);

    cout << "Ack(3," << n << "): " << Ack(3, n) << endl;
    return(0);
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 11;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 12;

fn ack(m: i32, n: i32) -> i32 {
    if m != 0 { ack(m - 1, if n != 0 { ack(m, n - 1) } else { 1 }) } else { n + 1 }
}

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };
    println!("Ack(3,{}): {}", n, ack(3, n));
}

Porting notes

Conditionally declaring constants

C++

#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 11
#else
#define LENGTH 12
#endif

Rust

#![allow(unused)]
fn main() {
#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 11;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 12;
}

It's idiomatic to use const variables along with the cfg attributes to conditionally declare constants, as opposed to using macros in C/C++.

Declaring a function

C++

int Ack(int M, int N) { return(M ? (Ack(M-1,N ? Ack(M,(N-1)) : 1)) : N+1); }

Rust

#![allow(unused)]
fn main() {
fn ack(m: i32, n: i32) -> i32 {
    if m != 0 { ack(m - 1, if n != 0 { ack(m, n - 1) } else { 1 }) } else { n + 1 }
}
}

Declaring a function in Rust is not too dissimilar to C++ with some differences:

Functions are declared with the fn keyword.
The parameter type comes after the parameter name and a colon.
The return type comes after the parameter list and the ->.
The if statement condition expression doesn't need parentheses around them.
Rust has if expressions in addition to if statements (or the last statement in the blocks can be an expression), and lacks the ternary operator.
Rust has no int type and uses integer types with explicit widths such as i32.

Accessing program arguments

C++

int main(int argc, char *argv[]) {
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    ...

Rust

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };
    ...

Rust's main function doesn't have the argc and argv parameters. Instead the program arguments are accessed with the std::env API.

The std::option type is commonly used in Rust to represent the optional result of an operation. The unwrap function is used to access the result when it is assumed that it contains a valid result, which would cause a panic if it does not. For example, accessing the nth element of an argument array and parsing a string into an integer.

This common pattern on the program arguments are used in most of the programs here. I believe it achieves dual purposes, to allow the problem size changed and to makes it harder for the compiler to compile-time evaluate (constant-fold) the entire program logic.

Printing to standard output

C++

    cout << "Ack(3," << n << "): " << Ack(3, n) << endl;

Rust

#![allow(unused)]
fn main() {
    println!("Ack(3,{}): {}", n, ack(3, n));
}

println! is the standard way to print to standard output in Rust. The ! at the end indicates it's a macro. I won't go into macros here.

To print without the new line, we'd use print! instead.

ary

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <cstdlib>
#include <iostream>
#include <vector>

int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 900000
#else
#define LENGTH 9000000
#endif
    int i, n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    typedef std::vector<int> ARY;
    ARY x(n);
    ARY y(n);

    for (i=0; i<n; i++) {
        x[i] = i;
    }
    for (int i = n - 1; i >= 0; --i) {
        y[i] = x[i];
    }

    std::cout << y.back() << std::endl;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 900000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 9000000;

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut x = Vec::<i32>::new();
    x.resize(n as usize, 0);
    let mut y = Vec::<i32>::new();
    y.resize(n as usize, 0);

    for i in 0..n {
        x[i as usize] = i;
    }
    for i in (0..n).rev() {
        y[i as usize] = x[i as usize];
    }

    println!("{}", y[(n - 1) as usize]);
}

Porting notes

Initializing vectors of type `i32` and of size `n` as local, mutable variables

C++

    typedef std::vector<int> ARY;
    ARY x(n);
    ARY y(n);

Rust

#![allow(unused)]
fn main() {
    let mut x = Vec::<i32>::new();
    x.resize(n as usize, 0);
    let mut y = Vec::<i32>::new();
    y.resize(n as usize, 0);
}

We initialize the two vectors, x and y using let mut. let is the standard way to declare a local variable and mut is added because we will modify (mutate) the vector later.

When initializing a Vec of element type i32, we put the element type i32 in angly brackets as in Vec::<i32>, as opposed to the C++ style Vec<i32>.

As there isn't currently a way to initialize a Vec with a given size n in one shot in Rust, we initialize a vVec of size zero and then resize to n with default element value 0.

The vector indices in Rust are of type usize like size_t and there are no implicit integer conversions, we need to cast n (of type i32) to usize using the as keyword as in n as usize.

Writing `for` range loops

C++

    for (i=0; i<n; i++) {
        x[i] = i;
    }
    for (int i = n - 1; i >= 0; --i) {
        y[i] = x[i];
    }

Rust

#![allow(unused)]
fn main() {
    for i in 0..n {
        x[i as usize] = i;
    }
    for i in (0..n).rev() {
        y[i as usize] = x[i as usize];
    }
}

A for range loop is written as for i in 0..n { ... }. The loop iterates with the index range 0 up to n - 1 induction variable where n is exclusive. There are no parentheses around the for loop header like the if statement condition expression. It's equivalent to for (int i = 0; i < n; ++i) { ... } in C++.

There is no general for loop in Rust where the loop variable declaration, the loop termination condition and the loop increment statements are spelled out and can be customizable. We would use while loops instead.

To make n to be inclusive, we would write for i in 0..<=n { ... }.

A for range loop in the reverse order is written as for i in (0..n).rev() { ... }. This is equivalent to for (int i = n - 1; i >= 0; --i) { ... } in C++.

Since the variable i is of type i32, we convert it to usize, as in i as usize.

ary2

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <cstdlib>
#include <iostream>
#include <vector>

int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 90000
#else
#define LENGTH 900000
#endif
    int i, n = 10*((argc == 2) ? atoi(argv[1]) : LENGTH);
    typedef std::vector<int> ARY;
    ARY x(n);
    ARY y(n);

    for (i=0; i<n;) {
        x[i] = i; ++i;
        x[i] = i; ++i;
        x[i] = i; ++i;
        x[i] = i; ++i;
        x[i] = i; ++i;

        x[i] = i; ++i;
        x[i] = i; ++i;
        x[i] = i; ++i;
        x[i] = i; ++i;
        x[i] = i; ++i;
    }
    for (int i = n - 1; i >= 0;) {
        y[i] = x[i]; --i;
        y[i] = x[i]; --i;
        y[i] = x[i]; --i;
        y[i] = x[i]; --i;
        y[i] = x[i]; --i;

        y[i] = x[i]; --i;
        y[i] = x[i]; --i;
        y[i] = x[i]; --i;
        y[i] = x[i]; --i;
        y[i] = x[i]; --i;
    }

    std::cout << y.back() << std::endl;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 90000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 900000;

fn main() {
    let mut args = env::args();
    let n = 10 * if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut x = Vec::<i32>::new();
    x.resize(n as usize, 0);
    let mut y = Vec::<i32>::new();
    y.resize(n as usize, 0);

    let mut i: i32 = 0;
    while i < n {
        x[i as usize] = i; i += 1;
        x[i as usize] = i; i += 1;
        x[i as usize] = i; i += 1;
        x[i as usize] = i; i += 1;
        x[i as usize] = i; i += 1;

        x[i as usize] = i; i += 1;
        x[i as usize] = i; i += 1;
        x[i as usize] = i; i += 1;
        x[i as usize] = i; i += 1;
        x[i as usize] = i; i += 1;
    }

    i = n - 1;
    while i >= 0 {
        y[i as usize] = x[i as usize]; i -= 1;
        y[i as usize] = x[i as usize]; i -= 1;
        y[i as usize] = x[i as usize]; i -= 1;
        y[i as usize] = x[i as usize]; i -= 1;
        y[i as usize] = x[i as usize]; i -= 1;

        y[i as usize] = x[i as usize]; i -= 1;
        y[i as usize] = x[i as usize]; i -= 1;
        y[i as usize] = x[i as usize]; i -= 1;
        y[i as usize] = x[i as usize]; i -= 1;
        y[i as usize] = x[i as usize]; i -= 1;
    }

    println!("{}", y.iter().last().unwrap());
}

Porting notes

Using `while` loops instead of general `for` loops

C++

    int i, ...
    for (i=0; i<n;) {
      ...
    }
    for (int i = n - 1; i >= 0;) {
      ...
    }

Rust

#![allow(unused)]
fn main() {
    let mut i: i32 = 0;
    while i < n {
      ...
    }
    i = n - 1;
    while i >= 0 {
      ...
    }
}

Since Rust has no general for loops, we use while loops instead.

We declare the loop variable i outside the loop. If we want to explicitly write the type of a local variable, we put the type after the variable name and the : but before the initializer, as in let mut i: i32 = 0;. The type are often omitted relying on type inference.

Incrementing or decrementing a variable

C++

        x[i] = i; ++i;
        ...
        y[i] = x[i]; --i;

Rust

#![allow(unused)]
fn main() {
        x[i as usize] = i; i += 1;
        ...
        y[i as usize] = x[i as usize]; i -= 1;
}

Rust doesn't have the prefix or postfix increment and decrement operators. I think it's to avoid bugs caused by the prefix/postfix evaluation orders. We use the compound assignment operators += and -=.

ary3

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <cstdlib>
#include <iostream>
#include <vector>

using namespace std;

int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 150000
#else
#define LENGTH 1500000
#endif
    int i, k, n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    typedef vector<int> ARY;
    ARY x(n);
    ARY y(n);

    for (i=0; i<n; i++) {
        x[i] = i + 1;
    }
    for (k=0; k<1000; k++) {
        for (int i = n - 1; i >= 0; --i) {
            y[i] += x[i];
        }
    }

    cout << y[0] << " " << y.back() << endl;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 150000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 1500000;

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut x = Vec::<i32>::new();
    x.resize(n as usize, 0);
    let mut y = Vec::<i32>::new();
    y.resize(n as usize, 0);

    for i in 0..n {
        x[i as usize] = i + 1;
    }

    for _k in 0..1000 {
        for i in (0..n).rev() {
            y[i as usize] += x[i as usize];
        }
    }

    println!("{} {}", y[0], y.iter().last().unwrap());
}

Porting notes

A nested loop and an unused loop index variable

C++

    for (k=0; k<1000; k++) {
	for (int i = n - 1; i >= 0; --i) {
	    y[i] += x[i];
	}
    }

Rust

#![allow(unused)]
fn main() {
    for _k in 0..1000 {
        for i in (0..n).rev() {
            y[i as usize] += x[i as usize];
        }
    }
}

A nested loop is written in a way similar to C++.

The Rust compiler emits a warning for an unused loop variable, as in _k, to suggest adding the preceding underscore in its name though it doesn't cause an error.

Performance analysis

The benchmark results show that the Rust version is more than 4 times slower than the C++ version.

We compare the generated code between C++ and Rust.

C++

Here is the hot loop according to the Linux Perf:

C++ perf data

  7.51 │1c0:   movdqu  -0x10(%rsi,%r10,4),%xmm0
  7.65 │       movdqu  (%rsi,%r10,4),%xmm1
 14.29 │       movdqu  -0x10(%rdi,%r10,4),%xmm2
  7.07 │       paddd   %xmm0,%xmm2
  8.11 │       movdqu  (%rdi,%r10,4),%xmm0
  9.28 │       paddd   %xmm1,%xmm0
  8.99 │       movdqu  %xmm0,(%rdi,%r10,4)
 18.54 │       movdqu  %xmm2,-0x10(%rdi,%r10,4)
  9.43 │       add     $0xfffffffffffffff8,%r10
  8.98 │       cmp     %r10,%r8
       │     ↑ jne     1c0

This corresponds to the nested loop in the second loop in the source code.

C++ LLVM IR (the frontend output)

77:                                               ; preds = %76, %77
  %78 = phi i64 [ %93, %77 ], [ 0, %76 ]
  %79 = xor i64 %78, -1
  %80 = add i64 %79, %40
  %81 = getelementptr inbounds nuw i32, ptr %33, i64 %80
  %82 = getelementptr inbounds i8, ptr %81, i64 -12
  %83 = getelementptr inbounds i8, ptr %81, i64 -28
  %84 = load <4 x i32>, ptr %82, align 4, !tbaa !10
  %85 = load <4 x i32>, ptr %83, align 4, !tbaa !10
  %86 = getelementptr inbounds nuw i32, ptr %36, i64 %80
  %87 = getelementptr inbounds i8, ptr %86, i64 -12
  %88 = getelementptr inbounds i8, ptr %86, i64 -28
  %89 = load <4 x i32>, ptr %87, align 4, !tbaa !10
  %90 = load <4 x i32>, ptr %88, align 4, !tbaa !10
  %91 = add nsw <4 x i32> %89, %84
  %92 = add nsw <4 x i32> %90, %85
  store <4 x i32> %91, ptr %87, align 4, !tbaa !10
  store <4 x i32> %92, ptr %88, align 4, !tbaa !10
  %93 = add nuw i64 %78, 8
  %94 = icmp eq i64 %93, %43
  br i1 %94, label %95, label %77, !llvm.loop !17

C++ LLVM IR (right before the Machine IR generation)

87:                                               ; preds = %.preheader, %87
  %lsr.iv = phi i64 [ 0, %.preheader ], [ %lsr.iv.next, %87 ]
  %88 = shl nsw i64 %lsr.iv, 2
  %scevgep8 = getelementptr i8, ptr %scevgep5, i64 %88
  %89 = shl nsw i64 %lsr.iv, 2
  %scevgep6 = getelementptr i8, ptr %scevgep5, i64 %89
  %scevgep7 = getelementptr i8, ptr %scevgep6, i64 -16
  %90 = load <4 x i32>, ptr %scevgep8, align 4, !tbaa !10
  %91 = load <4 x i32>, ptr %scevgep7, align 4, !tbaa !10
  %92 = shl nsw i64 %lsr.iv, 2
  %scevgep4 = getelementptr i8, ptr %scevgep, i64 %92
  %93 = shl nsw i64 %lsr.iv, 2
  %scevgep2 = getelementptr i8, ptr %scevgep, i64 %93
  %scevgep3 = getelementptr i8, ptr %scevgep2, i64 -16
  %94 = load <4 x i32>, ptr %scevgep4, align 4, !tbaa !10
  %95 = load <4 x i32>, ptr %scevgep3, align 4, !tbaa !10
  %96 = add nsw <4 x i32> %94, %90
  %97 = add nsw <4 x i32> %95, %91
  store <4 x i32> %96, ptr %scevgep4, align 4, !tbaa !10
  store <4 x i32> %97, ptr %scevgep3, align 4, !tbaa !10
  %lsr.iv.next = add nsw i64 %lsr.iv, -8
  %98 = icmp eq i64 %48, %lsr.iv.next
  br i1 %98, label %99, label %87, !llvm.loop !17

Rust

Here is the hot loop according to the Linux Perf:

Rust perf data

  8.96 │370:   cmp     %rdi,%r14
       │     ↓ jbe     587
  8.81 │       cmp     %rdi,%rbp
       │     ↓ jbe     581
  8.59 │       mov     (%rbx,%rdi,4),%r8d
  8.81 │       add     %r8d,(%r15,%rdi,4)
 16.53 │       lea     -0x1(%rdi),%r8
  8.92 │       inc     %rdi
  7.97 │       cmp     $0x1,%rdi
 15.84 │       mov     %r8,%rdi
 15.52 │     ↑ jg      370

This also corresponds to the same nested loop in the second loop in the source code.

Rust LLVM IR (the frontend output)

bb16.us:                                          ; preds = %bb44.us, %bb51.us
  %indvars.iv124 = phi i64 [ %73, %bb44.us ], [ %indvars.iv.next125, %bb51.us ]
  %indvars.iv.next125 = add nsw i64 %indvars.iv124, -1
  %_86.us = icmp ugt i64 %46, %indvars.iv.next125
  br i1 %_86.us, label %bb50.us, label %panic10.invoke

bb50.us:                                          ; preds = %bb16.us
  %_92.us = icmp ugt i64 %56, %indvars.iv.next125
  br i1 %_92.us, label %bb51.us, label %panic10.invoke

bb51.us:                                          ; preds = %bb50.us
  %_33.us = getelementptr inbounds i32, ptr %_24.i.i, i64 %indvars.iv.next125
  %_32.us = load i32, ptr %_33.us, align 4, !noundef !4
  %_35.us = getelementptr inbounds i32, ptr %_24.i.i54, i64 %indvars.iv.next125
  %75 = load i32, ptr %_35.us, align 4, !noundef !4
  %76 = add i32 %75, %_32.us
  store i32 %76, ptr %_35.us, align 4
  %_77.us = icmp sgt i64 %indvars.iv124, 1
  br i1 %_77.us, label %bb16.us, label %bb15.bb12.loopexit_crit_edge.us

It looks like the difference is caused by a lack of loop vectorization in the Rust version.

A June 2025 update on the performance difference

I filed an issue at the Rust repo about this performance difference. According to the conversation, if the second loop is rewritten to use iterators as below, the loop vectorization occurs and the performance becomes on par with the C++ version.

A version of the second loop that uses iterators:

#![allow(unused)]
fn main() {
    for _k in 0..1000 {
        y.iter_mut().zip(x.iter()).rev().for_each(|(y, x)| {
            *y += *x
        })
    }
}

I think that the Rust compiler could be improved to recognize the index-based version of the loop and perform loop vectorization.

fibo

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <iostream>
#include <stdlib.h>

using namespace std;

unsigned long fib(unsigned long n) {
    if (n < 2)
        return(1);
    else
        return(fib(n-2) + fib(n-1));
}

int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 40
#else
#define LENGTH 43
#endif
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);

    cout << fib(n) << endl;
    return(0);
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 40;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 43;

fn fib(n: u64) -> u64 {
    if n < 2 {
        return 1;
    } else {
        return fib(n - 2) + fib(n - 1);
    }
}

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    println!("{}", fib(n as u64));
}

Porting notes

None.

hash

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <stdio.h>
#include <iostream>
#include <ext/hash_map>
#include <cstring>

using namespace std;
using namespace __gnu_cxx;

struct eqstr {
    bool operator()(const char* s1, const char* s2) const {
        return strcmp(s1, s2) == 0;
    }
};

int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 50000
#else
#define LENGTH 500000
#endif
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    char buf[16];
    typedef __gnu_cxx::hash_map<const char*, int, __gnu_cxx::hash<const char*>, eqstr> HM;
    HM X;

    for (int i=1; i<=n; i++) {
        sprintf(buf, "%x", i);
        X[strdup(buf)] = i;
    }

    int c = 0;
    for (int i=n; i>0; i--) {
        sprintf(buf, "%d", i);
        if (X[strdup(buf)]) c++;
    }

    cout << c << endl;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;
use std::collections::HashMap;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 50000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 500000;

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut x = HashMap::<String, i32>::new();

    for i in 1..=n {
        let s = format!("{:X}", i);
        x.insert(s, i);
    }

    let mut c: i32 = 0;
    for i in (1..=n).rev() {
        let s = format!("{}", i);
        if x.contains_key(&s) {
            c += 1;
        }
    }

    println!("{}", c);
}

Porting notes

Using the standard string and hash map types

C++

    char buf[16];
    typedef __gnu_cxx::hash_map<const char*, int, __gnu_cxx::hash<const char*>, eqstr> HM;
    HM X;

Rust

#![allow(unused)]
fn main() {
    let mut x = HashMap::<String, i32>::new();
}

The C++ version uses the C string char* and the GNU extension hash map __gnu_cxx::hash_map as opposed to std::string and std::unordered_map. I suspect that this program was originally written before the standard hash map became part of the C++ standard.

In the Rust version, we use the Rust standard String and std::collections::HashMap rather than more faithfully reproducing C string-like data structures or reimplementing the C++ hash_map in Rust to stick to the idiomatic Rust style.

There is no need to implement or pass the hash and the equal functions for the string type for the hash map in the Rust version, which would be equivalent to __gnu_cxx::hash<const char*> and eqstr, because they are provided by the standard library. This is also the case in std::unordered_map in C++.

`For` range loops with the inclusive range ends

C++

    for (int i=1; i<=n; i++) {
        ...
    }
    ...
    for (int i=n; i>0; i--) {
        ...
    }

Rust

#![allow(unused)]
fn main() {
    for i in 1..=n {
        ...
    }
    ...
    for i in (1..=n).rev() {
        ...
    }
}

The for range loop for i in 1..=n { ... } has the range end n inclusive and is equivalent to for (int i=1; i<=n; i++) { ... } in C++.

Similarly, the reverse ordered range loop for i in (1..=n).rev() { ... } is equivalent to for (int i=n; i>0; i--) { ... } in C++.

Formating strings

C++

        sprintf(buf, "%x", i);

Rust

#![allow(unused)]
fn main() {
        let s = format!("{:X}", i);
}

A common way to format a string in Rust is to use format! roughly corresponding to sprintf in C++.

While the C++ version writes into an already allocated buffer, the Rust version allocates a new string. This could significantly affect the performance difference but it will be roughly canceled out by the fact that the C++ version will copy the key when inserting the key into hash map in the next line.

Inserting into a hash map

C++

        X[strdup(buf)] = i;

Rust

#![allow(unused)]
fn main() {
        x.insert(s, i);
}

Rust HashMap has the square bracket operator [] but it will panic if the key already exists, unlike the C++ hash map. So we use insert instead.

Checking whether an hash map already contains a key

C++

        if (X[strdup(buf)]) c++;

Rust

#![allow(unused)]
fn main() {
        if x.contains_key(&s) {
            c += 1;
        }
}

There is a subtle difference that the C++ version inserts a new entry with the default value 0 when the key is not already contained in the hash map, in addition to checking whether the key is already contained, while the Rust version only checks if the key is already contained, again, because Rust HashMap's [] operator will panic if the key does not already exist in the hash map.

This difference is absorbed by the fact that value 0 is not used across the two loops and the two if statements branch the same way.

hash2

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <stdio.h>
#include <iostream>
#include <ext/hash_map>
#include <cstring>

using namespace std;
using namespace __gnu_cxx;

struct eqstr {
    bool operator()(const char* s1, const char* s2) const {
        return strcmp(s1, s2) == 0;
    }
};

int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 200
#else
#define LENGTH 2000
#endif
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    char buf[16];
    typedef __gnu_cxx::hash_map<const char*, int, __gnu_cxx::hash<const char*>, eqstr> HM;
    HM hash1, hash2;

    for (int i=0; i<10000; i++) {
        sprintf(buf, "foo_%d", i);
        hash1[strdup(buf)] = i;
    }
    for (int i=0; i<n; i++) {
        for (HM::iterator k = hash1.begin(); k != hash1.end(); ++k) {
            hash2[(*k).first] += hash1[(*k).first];
        }
    }
    cout << hash1["foo_1"] << " " << hash1["foo_9999"] << " "
         << hash2["foo_1"] << " " << hash2["foo_9999"] << endl;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;
use std::collections::HashMap;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 200;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 2000;

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut hash1: HashMap<String, i32> = HashMap::new();
    let mut hash2: HashMap<String, i32> = HashMap::new();

    for i in 0..10000 {
        let s = format!("foo_{}", i);
        hash1.insert(s, i);
    }

    for _i in 0..n {
        for (key, value) in &hash1 {
            hash2.entry(key.to_string()).and_modify(|e| { *e += *value }).or_insert(*value);
        }
    }

    println!("{} {} {} {}", hash1["foo_1"], hash1["foo_9999"], hash2["foo_1"], hash2["foo_9999"]);
}

Porting notes

Most of the code structure is similar to hash.

Iterating hash map entries and conditionally mutating another hash map

C++

	for (HM::iterator k = hash1.begin(); k != hash1.end(); ++k) {
	    hash2[(*k).first] += hash1[(*k).first];
	}

Rust

#![allow(unused)]
fn main() {
        for (key, value) in &hash1 {
            hash2.entry(key.to_string()).and_modify(|e| { *e += *value }).or_insert(*value);
        }
}

Similar to hash, because C++ hash_map's [] operator inserts a new entry if it's used with a new key while Rust HashMap's [] operator will panic if the key does not already exist in the hash map, we cannot just use Rust HashMap's [] operator.

Instead we use the and_modify and the or_insert methods to achieve similar behaviors. and_modify takes a closure |e| { *e += *value }, which will be executed when the key already exists in the hash map, or_insert takes a value for a new entry to be inserted when the key doesn't exist.

We need to convert the key of type &String to String when passing it to HashMap.entry because it takes the key type String, not a reference to it. Due to the cloning that is not necessary in the C++ version, this could be a potential performance disadvantage, but it doesn't seem to be the case according to the performance results where the Rust version was not slower. I suspect that the Rust compiler can optimize away the cloning.

heapsort

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <iostream>
#include <stdlib.h>
#include <math.h>
#include <cstdio>

using namespace std;

#define IM 139968
#define IA   3877
#define IC  29573

double
gen_random(double max) {
    static long last = 42;
    return( max * (last = (last * IA + IC) % IM) / IM );
}

void
heapsort(int n, double *ra) {
    int i, j;
    int ir = n;
    int l = (n >> 1) + 1;
    double rra;

    for (;;) {
        if (l > 1) {
            rra = ra[--l];
        } else {
            rra = ra[ir];
            ra[ir] = ra[1];
            if (--ir == 1) {
                ra[1] = rra;
                return;
            }
        }
        i = l;
        j = l << 1;
        while (j <= ir) {
            if (j < ir && ra[j] < ra[j+1]) { ++j; }
            if (rra < ra[j]) {
                ra[i] = ra[j];
                j += (i = j);
            } else {
                j = ir + 1;
            }
        }
        ra[i] = rra;
    }
}

int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 800000
#else
#define LENGTH 8000000
#endif
    int N = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    double *ary;
    int i;
    
    /* create an array of N random doubles */
    ary = (double *)malloc((N+1) * sizeof(double));
    for (i=1; i<=N; i++) {
        ary[i] = gen_random(1);
    }

    heapsort(N, ary);

    printf("%.10f\n", ary[N]);
    free(ary);
    return(0);
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: usize = 800000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: usize = 8000000;

const IM: i64 = 139968;
const IA: i64 = 3877;
const IC: i64 = 29573;

fn get_random(last_random: &mut i64, max: f64) -> f64 {
    let new_last = (*last_random * IA + IC) % IM;
    *last_random = new_last;
    return max * (new_last as f64) / (IM as f64);
}

fn heapsort(n: usize, ra: &mut Vec<f64>) {
    let mut i: usize;
    let mut j: usize;
    let mut ir = n;
    let mut l = (n >> 1) + 1;
    let mut rra: f64;

    loop {
        if l > 1 {
            l -= 1;
            rra = ra[l];
        } else {
            rra = ra[ir];
            ra[ir] = ra[1];
            ir -= 1;
            if ir == 1 {
                ra[1] = rra;
                return;
            }
        }
        i = l;
        j = l << 1;
        while j <= ir {
            if j < ir && ra[j] < ra[j+1] { j += 1; }
            if rra < ra[j] {
                ra[i] = ra[j];
                i = j;
                j += i;
            } else {
                j = ir + 1;
            }
        }
        ra[i] = rra;
    }
}

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<usize>().unwrap()
    } else {
        LENGTH
    };

    let mut last_random: i64 = 42;

    let mut ary = Vec::<f64>::new();
    ary.resize_with(n + 1, Default::default);
    for i in 1..=n {
        ary[i] = get_random(&mut last_random, 1.0);
    }

    heapsort(n, &mut ary);

    println!("{:.10}", ary[n]);
}

Porting notes

Replacing a C++ block-scoped static variable with a locally-scoped local variable

C++

inline double gen_random(double max) {
    static long last = 42;
    return( max * (last = (last * IA + IC) % IM) / IM );
}

Rust

fn gen_random(last_random: &mut i64, max: f64) -> f64 {
    let new_last = (*last_random * IA + IC) % IM;
    *last_random = new_last;
    return max * (new_last as f64) / (IM as f64);
}
...

fn main() {
    ...
    let mut last_random: i64 = 42;

    for i in 1..=n {
        ary[i] = get_random(&mut last_random, 1.0);
    }
    ...

Rust doesn't support a static variable nested in a function or block scope.

Furthermore, Rust doesn't allow a mutable global (static) variable without using a thread-safe mechanism such as a mutex.

To avoid the mutex overhead, the Rust version uses a locally scoped local variable last_random in the main function and passes its mutable reference to the gen_random function. This is safe because the lifetime of the variable covers the entire uses of the gen_random function.

I interpret this as Rust discouraging casual uses of global variables and encouraing uses of locally scoped thread-local data for thread safety reasons.

Using `usize` variables intead of `int`

C++

void
heapsort(int n, double *ra) {
    int i, j;
    int ir = n;
    int l = (n >> 1) + 1;

Rust

#![allow(unused)]
fn main() {
fn heapsort(n: usize, ra: &mut Vec<f64>) {
    let mut i: usize;
    let mut j: usize;
    let mut ir = n;
    let mut l = (n >> 1) + 1;
    ...
}

The Rust version uses the index variables of type `usize` instead of
`int`. This is more idiomatic because of the more explict distinction
between `usize` and `int` for vector indices in Rust.

## Writing an infinite loop

C++
```cpp
    for (;;) {
        ...
    }
}

Rust

#![allow(unused)]
fn main() {
    loop {
        ...
    }
}

In Rust, an infinite loop, which unconditionally loops (often with a `break` or `return` in the loop body), is written with the `loop` keyword, which is analogous to `for (;;) { ... }` or `while (true) { ... }` in C++.

## Formatting floating point values to show 10 digits after the decimal point

C++
```cpp
    printf("%.10f\n", ary[N]);
}

Rust

#![allow(unused)]
fn main() {
    println!("{:.10}", ary[n]);
}

Since printf formats a double to show the 6 digits after the decimal point by default, the Rust version uses the format specifier {:.6} to achieve the similar.

lists

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
// from Bill Lear

#include <cstdlib>
#include <iostream>
#include <list>
#include <numeric>

using namespace std;

#ifdef SMALL_PROBLEM_SIZE
const size_t SIZE = 1000;
#else
const size_t SIZE = 10000;
#endif

template <class _ForwardIterator, class _Tp>
void 
iota(_ForwardIterator __first, _ForwardIterator __last, _Tp __value)
{
  while (__first != __last)
    *__first++ = __value++;
}

size_t test_lists() {
    std::list<size_t> li1(SIZE);

    ::iota(li1.begin(), li1.end(), 1);

    std::list<size_t> li2(li1);

    std::list<size_t> li3;

    size_t N = li2.size();
    while (N--) {
        li3.push_back(li2.front());
        li2.pop_front();
    }

    N = li3.size();
    while (N--) {
        li2.push_back(li3.back());
        li3.pop_back();
    }

    li1.reverse();

    return (li1.front() == SIZE) && (li1 == li2) ? li1.size() : 0;
}

int main(int argc, char* argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 300
#else
#define LENGTH 3000
#endif
    size_t ITER = (argc == 2 ? (atoi(argv[1]) < 1 ? 1 : atoi(argv[1])): LENGTH);

    size_t result = 0;
    while (ITER > 0) {
        result = test_lists();
        --ITER;
    }

    std::cout << result << std::endl;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
// from Bill Lear
use std::env;
use std::collections::LinkedList;
use std::collections::linked_list::IterMut;

#[cfg(feature = "small_problem_size")]
const SIZE: usize = 1000;

#[cfg(not(feature = "small_problem_size"))]
const SIZE: usize = 10000;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 300;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 3000;

fn iota(iter: IterMut<'_, usize>, v: usize) {
    let mut i = 0;
    for e in iter {
        *e = i + v;
        i += 1;
    }
}

fn list_reverse<T>(list: LinkedList<T>) -> LinkedList<T> {
    let mut reversed = LinkedList::new();
    for elem in list {
        reversed.push_front(elem);
    }
    return reversed
}

fn test_lists() -> usize {
    let mut li1 = LinkedList::<usize>::new();
    for _i in 0..SIZE {
        li1.push_back(Default::default());
    }

    iota(li1.iter_mut(), 1);

    let mut li2 = li1.clone();

    let mut li3 = LinkedList::<usize>::new();

    let mut n = li2.len();
    while n != 0 {
        n -= 1;
        li3.push_back(li2.pop_front().unwrap());
    }

    n = li3.len();
    while n != 0 {
        n -= 1;
        li2.push_back(li3.pop_back().unwrap());
    }

    li1 = list_reverse(li1);

    return if *li1.front().unwrap() == SIZE && li1.eq(&li2) {
        li1.len()
    } else {
        0 as usize
    };
}

fn main() {
    let mut args = env::args();
    let mut iter = if args.len() == 2 {
        if args.nth(1).unwrap().parse::<i32>().unwrap() < 1 {
            1
        } else {
            args.nth(1).unwrap().parse::<i32>().unwrap()
        }
    } else {
        LENGTH
    };

    let mut result: usize = 0;
    while iter > 0 {
        result = test_lists();
        iter -= 1;
    }

    println!("{}", result);
}

Porting notes

Initializing a list with incrementing values (iota)

C++

template <class _ForwardIterator, class _Tp>
void 
iota(_ForwardIterator __first, _ForwardIterator __last, _Tp __value)
{
  while (__first != __last)
    *__first++ = __value++;
}

Rust

#![allow(unused)]
fn main() {
fn iota(iter: IterMut<'_, usize>, v: usize) {
    let mut i = 0;
    for e in iter {
        *e = i + v;
        i += 1;
    }
}
}

I omitted the use of generics in the Rust version because it's used only for one call site.

In Rust, an iterator is a single entity, an IterMut in this case (an Iter if mutation isn't needed) and the iteration over the iterator uses the for range loop, as opposed to a pair of separate, pointer-like iterators and more free-style iteration over them in C++. I think this helps make the handling of iterators safer in Rust.

Reversing a linked list

C++ none.

Rust

#![allow(unused)]
fn main() {
fn list_reverse<T>(list: LinkedList<T>) -> LinkedList<T> {
    let mut reversed = LinkedList::new();
    for elem in list {
        reversed.push_front(elem);
    }
    return reversed
}
}

Rust's LinkedList interestingly doesn't have a reverse method.

I implemented a simple reverse function that copies to a new linked list in the reverse order. I'd imagine that this could be slower than a built-in reversemethod that reverses the directions of the internal pointers without copying the nodes, but the benchmark results show the Rust version is no slower.

The type inference allows the type T to be inferred in the LinkedList::new() call.

Initializinga linked list of a given size with the default element value

C++

    std::list<size_t> li1(SIZE);

Rust

#![allow(unused)]
fn main() {
    let mut li1 = LinkedList::<usize>::new();
    for _i in 0..SIZE {
        li1.push_back(Default::default());
    }
}

Similar to Vec, there isn't a way to initialize a LinkedList with a given size, unlike std::list in C++. I used a two-step approach of allocating a zero-sized linked list and then resizing it with the default element value. Default::default() is a convenient way to specify the default value of a certain type without typing the type name.

Copying a linked list

C++

    std::list<size_t> li2(li1);

Rust

#![allow(unused)]
fn main() {
    let mut li2 = li1.clone();
}

Rust's LinkedList doesn't provide a way to initialize a linked list as a copy of another. There is no copy constructor as in C++. We use clone() to achieve the same.

Popping the front of a linked list and capturing the front

C++

        li3.push_back(li2.front());
        li2.pop_front();

Rust

#![allow(unused)]
fn main() {
        li3.push_back(li2.pop_front().unwrap());
}

Rust's pop_front() returns an std::optional value in addition to removing the front element from the linked list and achieves what both front() and pop_front() need to be used to achieve the same in C++.

An `if` expression

C++

    return (li1.front() == SIZE) && (li1 == li2) ? li1.size() : 0;

Rust

#![allow(unused)]
fn main() {
    return if *li1.front().unwrap() == SIZE && li1.eq(&li2) {
        li1.len()
    } else {
        0 as usize
    };
}

To peek the value of the front element of a linked list, we get a std::optional by calling front(), takes the reference by calling unwrap on the std::optional and dereference it. This would normally force programmars to think about the case where the linked list is empty but here we bypass that for simplicity by asserting otherwise.

The == operator in C++ would compare the two linked lists as a whole. For that, we use the eq method in Rust.

We use the if statement instead of a ternary operator which doesn't exist in Rust.

lists1

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <list>
#include <numeric>

template <class _ForwardIterator, class _Tp>
void 
iota(_ForwardIterator __first, _ForwardIterator __last, _Tp __value)
{
  while (__first != __last)
    *__first++ = __value++;
}

using namespace std;

void list_print_n (list<int> L, int n) {
    int c, lastc = n - 1;
    list<int>::iterator i;
    for (c = 0, i = L.begin(); i != L.end() && c < n; ++i, ++c) {
        cout << (*i);
        if (c < lastc) cout << " ";
    }
    cout << endl;
}

int main(int argc, char* argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 100000
#else
#define LENGTH 1000000
#endif
    int N = (argc == 2 ? (atoi(argv[1]) < 1 ? 1 : atoi(argv[1])): LENGTH);
    list<int>::iterator i;

    // create empty list B
    list<int> B;

    // create list (A) of integers from 1 through N
    list<int> A(N);
    ::iota(A.begin(), A.end(), 1);

    // move each individual item from A to B, in a loop, reversing order
    while (! A.empty()) {
        B.push_front(A.front());
        A.pop_front();
    }
    
    // print first 2 elements of B
    list_print_n(B, 2);

    // reverse B (can be done in place)
    B.reverse();
    // reverse(B.begin(), B.end());

    // is 0 a member of B?
    cout << ((find(B.begin(), B.end(), 0) == B.end()) ? "false" : "true") << endl;

    // is N a member of B?
    cout << ((find(B.begin(), B.end(), N) == B.end()) ? "false" : "true") << endl;

    // filter values from B to A that are less than N/2, preserving order
    int mid = N/2;
    for (i = B.begin(); i != B.end(); ++i) {
        if ((*i) < mid) A.push_back(*i);
    }

    // print first ten items of A
    list_print_n(A, 10);

    // print sum of items in A that are less than 1000
    int sum = 0;
    for (i = A.begin(); i != A.end(); ++i) {
        if ((*i) < 1000) sum += (*i);
    }
    cout << sum << endl;

    // append B to end of A
    A.splice(A.end(), B);

    // print length and last element of A
    cout << A.size() << " " << A.back() << endl;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;
use std::collections::LinkedList;
use std::collections::linked_list::IterMut;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 100000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 1000000;

fn iota(iter: IterMut<'_, i32>, v: i32) {
    let mut i = 0;
    for e in iter {
        *e = i + v;
        i += 1;
    }
}

fn list_print_n(list: &LinkedList<i32>, n: i32) {
    let lastc = n - 1;
    let mut c = 0;
    for elem in list.iter() {
        if c >= n {
            break;
        }
        print!("{}", elem);
        if c < lastc {
            print!(" ");
        }
        c += 1;
    }
    println!();
}

fn list_reverse<T>(list: LinkedList<T>) -> LinkedList<T> {
    let mut reversed = LinkedList::new();
    for elem in list {
        reversed.push_front(elem);
    }
    return reversed
}

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        if args.nth(1).unwrap().parse::<i32>().unwrap() < 1 {
            1
        } else {
            args.nth(1).unwrap().parse::<i32>().unwrap()
        }
    } else {
        LENGTH
    };

    let mut b = LinkedList::<i32>::new();

    let mut a = LinkedList::<i32>::new();
    for _i in 0..n {
        a.push_back(Default::default());
    }
    iota(a.iter_mut(), 1);

    while a.len() > 0 {
        b.push_front(a.pop_front().unwrap());
    }

    list_print_n(&b, 2);

    b = list_reverse(b);

    println!("{}", if !b.contains(&0) { "false" } else { "true" });

    println!("{}", if !b.contains(&n) { "false" } else { "true" });

    let mid = n / 2;
    for elem in b.iter() {
        if *elem < mid {
            a.push_back(*elem);
        }
    }

    list_print_n(&a, 10);

    let mut sum = 0;
    for elem in b.iter() {
        if *elem < 1000 {
            sum += *elem;
        }
    }
    println!("{}", sum);

    a.append(&mut b);

    println!("{} {}", a.len(), *a.back().unwrap());
}

Porting notes

Many parts are shared with lists with some differences.

There are no particular notes.

matrix

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <iostream>
#include <stdlib.h>

using namespace std;

#define SIZE 30

int **mkmatrix(int rows, int cols) {
    int i, j, count = 1;
    int **m = (int **) malloc(rows * sizeof(int *));
    for (i=0; i<rows; i++) {
        m[i] = (int *) malloc(cols * sizeof(int));
        for (j=0; j<cols; j++) {
            m[i][j] = count++;
        }
    }
    return(m);
}

void zeromatrix(int rows, int cols, int **m) {
    int i, j;
    for (i=0; i<rows; i++)
        for (j=0; j<cols; j++)
            m[i][j] = 0;
}

void freematrix(int rows, int **m) {
    while (--rows > -1) { free(m[rows]); }
    free(m);
}

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
    int i, j, k, val;
    for (i=0; i<rows; i++) {
        for (j=0; j<cols; j++) {
            val = 0;
            for (k=0; k<cols; k++) {
                val += m1[i][k] * m2[k][j];
            }
            m3[i][j] = val;
        }
    }
    return(m3);
}

int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 10000
#else
#define LENGTH 100000
#endif
    int i, n = ((argc == 2) ? atoi(argv[1]) : LENGTH);

    int **m1 = mkmatrix(SIZE, SIZE);
    int **m2 = mkmatrix(SIZE, SIZE);
    int **mm = mkmatrix(SIZE, SIZE);

    for (i=0; i<n; i++) {
        mm = mmult(SIZE, SIZE, m1, m2, mm);
    }
    cout << mm[0][0] << " " << mm[2][3] << " " << mm[3][2] << " " << mm[4][4] << endl;

    freematrix(SIZE, m1);
    freematrix(SIZE, m2);
    freematrix(SIZE, mm);
    return(0);
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

const SIZE: usize = 30;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 10000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 100000;

fn mkmatrix(rows: usize, cols: usize) -> Vec::<Vec::<i32>> {
    let mut count = 1;
    let mut m = Vec::<Vec::<i32>>::new();
    m.resize(rows, Default::default());
    for i in 0..rows {
        m[i].resize(cols, Default::default());
        for j in 0..cols {
            m[i][j] = count;
            count += 1;
        }
    }
    return m;
}

fn mmult(rows: usize, cols: usize, m1: &Vec::<Vec::<i32>>,
         m2: &Vec::<Vec::<i32>>, m3: &mut Vec::<Vec::<i32>>) {
    for i in 0..rows {
        for j in 0..cols {
            let mut val = 0;
            for k in 0..cols {
                val += m1[i][k] * m2[k][j];
            }
            m3[i][j] = val;
        }
    }
}

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let m1 = mkmatrix(SIZE, SIZE);
    let m2 = mkmatrix(SIZE, SIZE);
    let mut mm = mkmatrix(SIZE, SIZE);

    for _i in 0..n {
        mmult(SIZE, SIZE, &m1, &m2, &mut mm);
    }
    println!("{} {} {} {}", mm[0][0], mm[2][3], mm[3][2], mm[4][4]);
}

Porting notes

Using nested `Vec`s to represent a matrix

C++

int **mkmatrix(int rows, int cols) {
    int i, j, count = 1;
    int **m = (int **) malloc(rows * sizeof(int *));
    for (i=0; i<rows; i++) {
        m[i] = (int *) malloc(cols * sizeof(int));
        for (j=0; j<cols; j++) {
            m[i][j] = count++;
        }
    }
    return(m);
}

Rust

#![allow(unused)]
fn main() {
fn mkmatrix(rows: usize, cols: usize) -> Vec::<Vec::<i32>> {
    let mut count = 1;
    let mut m = Vec::<Vec::<i32>>::new();
    m.resize(rows, Default::default());
    for i in 0..rows {
        m[i].resize(cols, Default::default());
        for j in 0..cols {
            m[i][j] = count;
            count += 1;
        }
    }
    return m;
}
}

We use a two-level, nested Vec data structure where the first level is for the rows and the second level is for the columns. That is similar to the data structure that uses arrays in the C++ version. While it'd be possible to use raw i32 arrays in Rust, it is more idiomatic to use Vec treating Vec as the safe equivalent of raw arrays.

Passing matrices to the matrix multiplication function using references

C++

    int **m1 = mkmatrix(SIZE, SIZE);
    int **m2 = mkmatrix(SIZE, SIZE);
    int **mm = mkmatrix(SIZE, SIZE);

    for (i=0; i<n; i++) {
        mm = mmult(SIZE, SIZE, m1, m2, mm);
    }

Rust

#![allow(unused)]
fn main() {
    let m1 = mkmatrix(SIZE, SIZE);
    let m2 = mkmatrix(SIZE, SIZE);
    let mut mm = mkmatrix(SIZE, SIZE);

    for _i in 0..n {
        mmult(SIZE, SIZE, &m1, &m2, &mut mm);
    }
}

Freeing matrices

C++

    freematrix(SIZE, m1);
    freematrix(SIZE, m2);
    freematrix(SIZE, mm);

Rust

#![allow(unused)]
fn main() {
// empty
}

The Rust version doesn't need to explicitly free the matrices because they are automatically dropped at the end of the main function thanks to the ownership mechanism.

Performance analysis

The benchmark results show that the Rust version is more than 2 times slower than the C++ version.

We compare the generated code between C++ and Rust.

C++

Here is the hot loop according to the Linux Perf:

C++ perfdata

  5.98 │2b0:   mov    -0x10(%r14,%r8,8),%r10
  5.93 │       mov    -0x8(%r14,%r8,8),%r11
  7.04 │       mov    (%r10,%rdi,4),%r10d
 11.66 │       imul   -0x8(%rsi,%r8,4),%r10d
  5.84 │       add    %r9d,%r10d
  5.71 │       mov    (%r11,%rdi,4),%r11d
 10.56 │       imul   -0x4(%rsi,%r8,4),%r11d
 11.00 │       mov    (%r14,%r8,8),%r9
  5.17 │       mov    (%r9,%rdi,4),%r9d
  4.77 │       imul   (%rsi,%r8,4),%r9d
  4.52 │       add    %r11d,%r9d
  4.53 │       add    %r10d,%r9d
  8.94 │       add    $0x3,%r8
  4.73 │       cmp    $0x20,%r8
       │     ↑ jne    2b0

C++ LLVM IR

106:                                              ; preds = %106, %104
  %107 = phi i64 [ 0, %104 ], [ %135, %106 ]
  %108 = phi i32 [ 0, %104 ], [ %134, %106 ]
  %109 = getelementptr inbounds nuw i32, ptr %103, i64 %107
  %110 = load i32, ptr %109, align 4, !tbaa !10
  %111 = getelementptr inbounds nuw ptr, ptr %39, i64 %107
  %112 = load ptr, ptr %111, align 8, !tbaa !5
  %113 = getelementptr inbounds nuw i32, ptr %112, i64 %105
  %114 = load i32, ptr %113, align 4, !tbaa !10
  %115 = mul nsw i32 %114, %110
  %116 = add nsw i32 %115, %108
  %117 = add nuw nsw i64 %107, 1
  %118 = getelementptr inbounds nuw i32, ptr %103, i64 %117
  %119 = load i32, ptr %118, align 4, !tbaa !10
  %120 = getelementptr inbounds nuw ptr, ptr %39, i64 %117
  %121 = load ptr, ptr %120, align 8, !tbaa !5
  %122 = getelementptr inbounds nuw i32, ptr %121, i64 %105
  %123 = load i32, ptr %122, align 4, !tbaa !10
  %124 = mul nsw i32 %123, %119
  %125 = add nsw i32 %124, %116
  %126 = add nuw nsw i64 %107, 2
  %127 = getelementptr inbounds nuw i32, ptr %103, i64 %126
  %128 = load i32, ptr %127, align 4, !tbaa !10
  %129 = getelementptr inbounds nuw ptr, ptr %39, i64 %126
  %130 = load ptr, ptr %129, align 8, !tbaa !5
  %131 = getelementptr inbounds nuw i32, ptr %130, i64 %105
  %132 = load i32, ptr %131, align 4, !tbaa !10
  %133 = mul nsw i32 %132, %128
  %134 = add nsw i32 %133, %125
  %135 = add nuw nsw i64 %107, 3
  %136 = icmp eq i64 %135, 30
  br i1 %136, label %137, label %106, !llvm.loop !22

The loop is unrolled by a factor of 3.

Rust

Here is the hot loop according to the Linux Perf:

Rust perfdata

 16.54 │ 260:   cmp    %rbx,%rsi
       │      ↓ je     5c3
  5.16 │        cmp    %rbx,%r8
       │      ↓ je     5d2
  4.99 │        mov    (%r12),%r13
  4.84 │        cmp    %r13,%rdi
       │      ↓ jae    5e1
 10.44 │        mov    0x8(%rax),%r13
  5.67 │        mov    0x0(%r13,%rbx,4),%r13d
  5.73 │        mov    -0x8(%r12),%r14
  5.16 │        imul   (%r14,%rdi,4),%r13d
  5.51 │        inc    %rbx
 10.59 │        add    %r13d,%r9d
  5.17 │        add    $0x18,%r12
 17.06 │        cmp    $0x1e,%rbx
       │      ↑ jne    260

Rust LLVM IR

bb10.us.i:                                        ; preds = %bb17.us.i, %bb7.us.i
  %iter2.sroa.0.024.us.i = phi i64 [ 0, %bb7.us.i ], [ %_0.i13.us.i, %bb17.us.i ]
  %val.sroa.0.023.us.i = phi i32 [ 0, %bb7.us.i ], [ %95, %bb17.us.i ]
  %_0.i13.us.i = add nuw nsw i64 %iter2.sroa.0.024.us.i, 1
  %exitcond.not.i = icmp eq i64 %iter2.sroa.0.024.us.i, %_47.us.i
  br i1 %exitcond.not.i, label %panic7.i.invoke, label %bb15.us.i

bb15.us.i:                                        ; preds = %bb10.us.i
  %_48.us.i = load ptr, ptr %90, align 8, !nonnull !4, !noundef !4
  %_15.us.i = getelementptr inbounds i32, ptr %_48.us.i, i64 %iter2.sroa.0.024.us.i
  %_14.us.i = load i32, ptr %_15.us.i, align 4, !noundef !4
  %exitcond89.not.i = icmp eq i64 %iter2.sroa.0.024.us.i, %m2.val25
  br i1 %exitcond89.not.i, label %panic7.i.invoke, label %bb16.us.i

bb16.us.i:                                        ; preds = %bb15.us.i
  call void @llvm.assume(i1 %35)
  %_19.us.i = getelementptr inbounds %"alloc::vec::Vec<i32>", ptr %m2.val, i64 %iter2.sroa.0.024.us.i
  %93 = getelementptr inbounds i8, ptr %_19.us.i, i64 16
  %_57.us.i = load i64, ptr %93, align 8, !noundef !4
  %_60.us.i = icmp ult i64 %iter1.sroa.0.025.us.i, %_57.us.i
  br i1 %_60.us.i, label %bb17.us.i, label %panic7.i.invoke

bb17.us.i:                                        ; preds = %bb16.us.i
  %94 = getelementptr inbounds i8, ptr %_19.us.i, i64 8
  %_58.us.i = load ptr, ptr %94, align 8, !nonnull !4, !noundef !4
  %_18.us.i = getelementptr inbounds i32, ptr %_58.us.i, i64 %iter1.sroa.0.025.us.i
  %_17.us.i = load i32, ptr %_18.us.i, align 4, !noundef !4
  %_13.us.i = mul i32 %_17.us.i, %_14.us.i
  %95 = add i32 %_13.us.i, %val.sroa.0.023.us.i
  %exitcond90.not.i = icmp eq i64 %_0.i13.us.i, 30
  br i1 %exitcond90.not.i, label %bb12.us.i, label %bb10.us.i

The key differences from the C++ version are the following.

First, the mmult function isn't inlined into the main function. That makes it non-trivial to constant-propgate the Vec sizes as compile-time constant 30. If intraprocedural constant propagation was implemented indepedent of inlining, it would do but it doesn't seem to be there.

Second, the bounds checks are not elided. The bounds checks could be hoisted out of the loop (loop invariant code motion) and the special-cased, bounds-check-free version of the innermost loop could be generated (loop versioning). The LLVM backend probably has some form of these implemented but it doesn't seem to trigger here.

Third, loop unrolling has not been applied. I suspect that because of the above two lack of optimizations, the loop doesn't look simple enough for loop unrolling to kick in.

Note these are differences in the frontends. The Clang frontend applied these three optimiations for the C++ version. But the Rustc frontend did not for the Rust version. While the LLVM backend could also apply these in theory but it doesn't seem to at least in the given environment.

A June 2025 update on the performance difference

I filed an issue at the Rust repo about this performance difference. According to the conversation, if the program uses arrays (of compile-time constant size), as opposed to vectors to represent the matrices, where the arrays are heap-allocated by using Box's to be equivalent to the C++ version, as below, similar set of optimizations are performanced and the performance gets on par with the C++ version.

A version that uses heap allocated arrays instead of vectors:

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/

use std::env;
use std::convert::TryInto;

const SIZE: usize = 30;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 10000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 100000;

fn mkmatrix(rows: usize, cols: usize) -> Box::<[Box::<[i32; SIZE]>; SIZE]> {
    let mut count = 1;
    let mut row_vec = Vec::<Box<[i32; SIZE]>>::new();
    for _i in 0..rows {
        let mut row: [i32; SIZE] = [0; SIZE];
        for j in 0..cols {
            row[j] = count;
            count += 1;
        }
        row_vec.push(Box::new(row));
    }
    let m: [Box<[i32; SIZE]>; SIZE] = row_vec.try_into().unwrap_or_else(
        |v: Vec<Box<[i32; SIZE]>>| panic!("Expected a Vec of length {} but it was {}", SIZE, v.len()));
    return Box::new(m);
}

fn mmult(rows: usize, cols: usize, m1: &Box::<[Box::<[i32; SIZE]>; SIZE]>,
         m2: &Box::<[Box::<[i32; SIZE]>; SIZE]>, m3: &mut Box::<[Box::<[i32; SIZE]>; SIZE]>) {
    for i in 0..rows {
        for j in 0..cols {
            let mut val = 0;
            for k in 0..cols {
                val += m1[i][k] * m2[k][j];
            }
            m3[i][j] = val;
        }
    }
}

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let m1 = mkmatrix(SIZE, SIZE);
    let m2 = mkmatrix(SIZE, SIZE);
    let mut mm = mkmatrix(SIZE, SIZE);

    for _i in 0..n {
        mmult(SIZE, SIZE, &m1, &m2, &mut mm);
    }
    println!("{} {} {} {}", mm[0][0], mm[2][3], mm[3][2], mm[4][4]);
}

But this may not be considered equivalent to the C++ version in that the array version of mmult in Rust is tied to the specific size while the Rust vector version or the C++ version work on any sizes.

I think that the Rust compiler could be improved to similarly optimize the vector version of the code.

methcall

Full source code

C++

// -*- mode: c++ -*-
// $Id
// http://www.bagley.org/~doug/shootout/

// with some help from Bill Lear

#include <stdlib.h>
#include <iostream>

using namespace std;

class Toggle {
public:
    Toggle(bool start_state) : state(start_state) { }
    virtual ~Toggle() {  }
    bool value() {
        return(state);
    }
    virtual Toggle& activate() {
        state = !state;
        return(*this);
    }
    bool state;
};

class NthToggle : public Toggle {
public:
    NthToggle(bool start_state, int max_counter) :
        Toggle(start_state), count_max(max_counter), counter(0) {
    }
    Toggle& activate() {
        if (++this->counter >= this->count_max) {
            state = !state;
            counter = 0;
        }
        return(*this);
    }
private:
    int count_max;
    int counter;
};


int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 100000000
#else
#define LENGTH 1000000000
#endif
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);

    bool val = true;
    Toggle *toggle = new Toggle(val);
    for (int i=0; i<n; i++) {
        val = toggle->activate().value();
    }
    cout << ((val) ? "true" : "false") << endl;
    delete toggle;

    val = true;
    NthToggle *ntoggle = new NthToggle(val, 3);
    for (int i=0; i<n; i++) {
        val = ntoggle->activate().value();
    }
    cout << ((val) ? "true" : "false") << endl;
    delete ntoggle;

    return 0;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
// with some help from Bill Lear
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 100000000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 1000000000;

struct Toggle {
    state: bool
}

impl Toggle {
    fn new(start_state: bool) -> Toggle {
        Toggle { state: start_state }
    }
}

trait Togglable {
    fn activate(&mut self);
    fn value(&self) -> bool;
}

impl Togglable for Toggle {
    fn activate(&mut self) {
        self.state = !self.state;
    }
    fn value(&self) -> bool {
        return self.state;
    }
}

struct NthToggle {
    state: bool,
    count_max: i32,
    counter: i32,
}

impl NthToggle {
    fn new(start_state: bool, max_counter: i32) -> NthToggle {
        NthToggle {
            state: start_state,
            count_max: max_counter,
            counter: 0,
        }
    }
}

impl Togglable for NthToggle {
    fn activate(&mut self) {
        self.counter += 1;
        if self.counter >= self.count_max {
            self.state = !self.state;
            self.counter = 0;
        }
    }
    fn value(&self) -> bool {
        return self.state;
    }
}

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut val = true;
    let mut toggle = Toggle::new(val);
    for _i in 0..n {
        toggle.activate();
        val = toggle.value();
    }
    println!("{}", if val { "true" } else { "false" });

    val = true;
    let mut ntoggle = NthToggle::new(val, 3);
    for _i in 0..n {
        ntoggle.activate();
        val = ntoggle.value();
    }
    println!("{}", if val { "true" } else { "false" });
}

Porting notes

Declaring a struct in place for a class

C++

class Toggle {
public:
    Toggle(bool start_state) : state(start_state) { }
    virtual ~Toggle() {  }
    bool value() {
        return(state);
    }
    virtual Toggle& activate() {
        state = !state;
        return(*this);
    }
    bool state;
};

Rust

#![allow(unused)]
fn main() {
struct Toggle {
    state: bool
}

impl Toggle {
    fn new(start_state: bool) -> Toggle {
        Toggle { state: start_state }
    }
}

trait Togglable {
    fn activate(&mut self);
    fn value(&self) -> bool;
}

impl Togglable for Toggle {
    fn activate(&mut self) {
        self.state = !self.state;
    }
    fn value(&self) -> bool {
        return self.state;
    }
}
}

There are no classes and structs are used in Rust.

Member variables and methods aren't placed together in a Rust struct. Member variables are declared in the struct and member methods are declared in the impl block for the struct.

The new method, which is like a C++ constructor, is usually used to create an instance of the struct. The struct values are often initialized with struct literals, as in Toggle { state: start_state }, as opposed to the C++ member-by-member style.

The drop method, which is analogous to a C++ destructor but is often omitted because the owership takes care of destruction unless the struct requires a custom logic for deinitization like resource reclamation.

The methods activate and value would be put in the impl Toggle { ... } otherwise but they are put into a trait Togglable here to handle the use of inheritance, which will be discussed next. A trait is like an interface in Go or Java. For now, note that the struct Toggle implements the trait Togglable that has the two methods activate and value and they are defined in the impl Tooglable for Toggle { ... } block.

Using a trait in place for inheritance.

C++

class NthToggle : public Toggle {
public:
    NthToggle(bool start_state, int max_counter) :
        Toggle(start_state), count_max(max_counter), counter(0) {
    }
    Toggle& activate() {
        if (++this->counter >= this->count_max) {
            state = !state;
            counter = 0;
        }
        return(*this);
    }
private:
    int count_max;
    int counter;
};

Rust

#![allow(unused)]
fn main() {
struct NthToggle {
    state: bool,
    count_max: i32,
    counter: i32,
}

impl NthToggle {
    fn new(start_state: bool, max_counter: i32) -> NthToggle {
        NthToggle {
            state: start_state,
            count_max: max_counter,
            counter: 0,
        }
    }
}

impl Togglable for NthToggle {
    fn activate(&mut self) {
        self.counter += 1;
        if self.counter >= self.count_max {
            self.state = !self.state;
            self.counter = 0;
        }
    }
    fn value(&self) -> bool {
        return self.state;
    }
}
}

Rust is not an object oriented programming lanuage. In the C++ version, the class NthToggle inherits from the class Toggle. Because Rust doesn't support inheritance, we instead use a trait, which is like interfaces in Go or Java. The activate and value methods, which are inherited or overridden, are put into a trait Togglable, as opposed to directly in the impl Toggle { ... } block. The struct NthToggle has three variables, instead of one for Toogle and it also implements the trait Togglable. The impl Togglable for NthToggle { ... } block implements the two methods for NthToggle. That way, the two methods can be separately implemented for Toggle and NthToggle and can be polymorphically called like C++ virtual calls. A caveat is that the state variable and the value method need to be duplicated.

Avoding unnecessarily returning a reference

C++

        val = toggle->activate().value();

Rust

#![allow(unused)]
fn main() {
        toggle.activate();
        val = toggle.value();
}

I chose to change the activate method not to return a reference to the instance. I believe that we would want to avoid returning a reference in Rust as it might require lifetime annotations to be used and lead to unnecessary complexity. In this example, we could simply call the value method to get the state value in the second line.

Performance analysis

The benchmark results show that the Rust version is about 8 times faster than the C++ version.

We will compare the generated code between C++ and Rust.

C++

C++ perfdata (the hot function list)

Overhead  Command   Shared Object        Symbol
  78.05%  methcall  methcall             [.] main
  12.20%  methcall  methcall             [.] NthToggle::activate()
   9.65%  methcall  methcall             [.] Toggle::activate()

C++ perfdata (the main function)

  4.86 │ 50:   mov    (%r14),%rax
  4.73 │       mov    %r14,%rdi
 40.76 │     → call   *0x10(%rax)
  2.60 │       dec    %ebp
  4.86 │     ↑ jne    50

C++ perfdata (Toggle::activate())

 21.28 │     mov  %rdi,%rax
 39.85 │     xorb $0x1,0x8(%rdi)
 38.87 │   ← ret

C++ perfdata (NthToggle::activate())

 10.18 │      mov  %rdi,%rax
  9.78 │      mov  0x10(%rdi),%ecx
 10.25 │      inc  %ecx
 21.22 │      mov  %ecx,0x10(%rdi)
  9.13 │      cmp  0xc(%rdi),%ecx
       │    ↓ jl   1b
  3.94 │      xorb $0x1,0x8(%rax)
  7.03 │      movl $0x0,0x10(%rax)
 28.45 │1b: ← ret

It looks like the activate call isn't devirtualized or inlined.

Rust

Rust perfdata (the host function list)

Overhead  Command   Shared Object     Symbol
  99.92%  methcall  methcall          [.] methcall::main

Rust perfdata (the host loop in the main function)

  2.52 │280:   xor    %dil,%cl
  4.92 │       xor    %r8b,%cl
  4.50 │       xor    %r9b,%cl
  2.40 │       xor    %r11b,%cl
  3.75 │       add    $0xfffffffc,%ebx
  3.51 │     ↑ je     23a
  4.33 │291:   lea    0x1(%rdx),%r8d
  6.20 │       cmp    $0x2,%edx
  3.23 │       setge  %dil
  7.95 │       cmovge %esi,%r8d
  3.92 │       lea    0x1(%r8),%edx
  2.63 │       cmp    $0x2,%r8d
  9.47 │       cmovge %esi,%edx
  4.91 │       setge  %r8b
  2.28 │       mov    $0x0,%r10d
  3.27 │       cmp    $0x2,%edx
  2.52 │       setge  %r9b
  3.39 │     ↓ jge    2c3
  2.75 │       inc    %edx
  1.99 │       mov    %edx,%r10d
  3.63 │2c3:   mov    $0x0,%edx
  2.92 │       cmp    $0x2,%r10d
  2.92 │       setge  %r11b
  3.80 │     ↑ jge    280
  1.93 │       inc    %r10d
  2.83 │       mov    %r10d,%edx
  1.52 │     ↑ jmp    280

We can see that the activate call is devirtualized, inlined and the loop is unrolled by a factor of 4. This explains the performance difference.

moments

Full source code

C++

// -*- mode: c++ -*-
// $Id
// http://www.bagley.org/~doug/shootout/
// Calculate statistical moments of a region, from Bill Lear
// Modified by Tamás Benkõ
// Further modified by Tom Hyer

#include <vector>
#include <numeric>
#include <iterator>
#include <algorithm>
#include <cstdio>
#include <cstdlib>
#include <cmath>

using namespace std;

template <class T>
struct moments {
public:
    template <class InputIterator>
    moments(InputIterator begin, InputIterator end)
        : median(0.0), mean(0.0), average_deviation(0.0),
          standard_deviation(0.0), variance(0.0),
          skew(0.0), kurtosis(0.0)
        {
            T sum = accumulate(begin, end, 0.0);
            size_t N = end - begin;
            mean = sum / N;
            for (InputIterator i = begin; i != end; ++i) {
                T deviation = *i - mean;
                average_deviation += fabs(deviation);
                T temp = deviation * deviation;
                variance += temp;
                temp *= deviation;
                skew += temp;
                kurtosis += temp * deviation;
            }
            average_deviation /= N;
            variance /= (N - 1);
            standard_deviation = sqrt(variance);

            if (variance) {
                skew /= (N * variance * standard_deviation);
                kurtosis = kurtosis/(N * variance * variance) - 3.0;
            }

            InputIterator mid = begin+N/2;
            nth_element(begin, mid, end);
            if (N % 2 == 0) {
                InputIterator next_biggest = max_element(begin, mid);
                median = (*mid+*next_biggest)/2;
            }
            else
                median = *mid;
        }

    T median;
    T mean;
    T average_deviation;
    T standard_deviation;
    T variance;
    T skew;
    T kurtosis;
};

int main(int argc, char**argv) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 500000
#else
#define LENGTH 5000000
#endif
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    vector<double> v;
    double d;

    for (unsigned i = 0; i != n; ++i) v.push_back(i);
    moments<double> m(v.begin(), v.end());

    printf("n:                  %d\n", v.end() - v.begin());
    printf("median:             %f\n", m.median);
    printf("mean:               %f\n", m.mean);
    printf("average_deviation:  %f\n", m.average_deviation);
    printf("standard_deviation: %f\n", m.standard_deviation);
    printf("variance:           %f\n", m.variance);
    printf("skew:               %f\n", m.skew);
    printf("kurtosis:           %f\n", m.kurtosis);

    return 0;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
// Calculate statistical moments of a region, from Bill Lear
// Modified by Tamás Benkõ
// Further modified by Tom Hyer
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 500000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 5000000;

struct Moments<T> {
    median: T,
    mean: T,
    average_deviation: T,
    standard_deviation: T,
    variance: T,
    skew: T,
    kurtosis: T,
}

impl Moments<f64> {
    fn new(slice: &mut [f64]) -> Moments<f64> {
        let mut m = Moments::<f64> {
            median: 0.0,
            mean: 0.0,
            average_deviation: 0.0,
            standard_deviation: 0.0,
            variance: 0.0,
            skew: 0.0,
            kurtosis: 0.0,
        };

        let sum: f64 = slice.iter().sum();
        let n = slice.iter().count();
        m.mean = sum / (n as f64);
        for e in slice.iter() {
            let deviation = *e - m.mean;
            m.average_deviation += deviation.abs();
            let mut temp = deviation * deviation;
            m.variance += temp;
            temp *= deviation;
            m.skew += temp;
            m.kurtosis += temp * deviation;
        }
        m.average_deviation /= n as f64;
        m.variance /= (n - 1) as f64;
        m.standard_deviation = m.variance.sqrt();

        if m.variance != 0.0 {
            m.skew /= (n as f64) * m.variance * m.standard_deviation;
            m.kurtosis =
                m.kurtosis / ((n as f64) * m.variance * m.variance) - 3.0;
        }

        let (before, mid, _after) =
            slice.select_nth_unstable_by(n / 2,
                                         |a, b| a.partial_cmp(b).unwrap());
        if n % 2 == 0 {
            let next_biggest =
                before.iter().max_by(|a, b| a.partial_cmp(b).unwrap());
            m.median = (*mid + *next_biggest.unwrap()) / 2.0;
        } else {
            m.median = *mid;
        }

        return m;
    }
}

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };
    let mut v = Vec::<f64>::new();

    for i in 0..n {
        v.push(i as f64);
    }
    let m = Moments::<f64>::new(&mut v[..]);

    println!("n:                  {}", v.iter().count());
    println!("median:             {:.6}", m.median);
    println!("mean:               {:.6}", m.mean);
    println!("average_deviation:  {:.6}", m.average_deviation);
    println!("standard_deviation: {:.6}", m.standard_deviation);
    println!("variance:           {:.6}", m.variance);
    println!("skew:               {:.6}", m.skew);
    println!("kurtosis:           {:.6}", m.kurtosis);
}

Porting notes

Passing the input vector using a slice, as opposed to C++ iterators

C++

    template <class InputIterator>
    moments(InputIterator begin, InputIterator end)
        : median(0.0), mean(0.0), average_deviation(0.0),
          standard_deviation(0.0), variance(0.0),
          skew(0.0), kurtosis(0.0)
        {
            ...
        }

Rust

#![allow(unused)]
fn main() {
    fn new(slice: &mut [f64]) -> Moments<f64> {
        let mut m = Moments::<f64> {
            median: 0.0,
            mean: 0.0,
            average_deviation: 0.0,
            standard_deviation: 0.0,
            variance: 0.0,
            skew: 0.0,
            kurtosis: 0.0,
        };
        ...
    }
}

The C++ version passes the input vector using its begin and end iterators to the moments constructor.

In constract, the Rust version passes a mutable slice of the input vector. &mut [f64] represents a mutable slice of type f64. &mut v[..] is the syntax that takes a mutable slice of a vector v. If it were an immutable slice, the syntax would be &[f64] and &v[..], respectively.

Slices are safer than the C++ iterators in that they are bounds-checked and, being single entities, they are less error-prone, avoiding potential mistakes like mixing up iterators from different containers or passing them in a wrong order.

The Rust version uses a slice instead of an iterator because the select_nth_unstable_by method that is used to implement the C++ std::nth_element below is available for a slice but not for an iterator.

Explicitly converting types

C++

            mean = sum / N;

Rust

#![allow(unused)]
fn main() {
        m.mean = sum / (n as f64);
}

In the C++ version, the division implicit converts the N of type size_t to double.

In Rust, implicit conversions are not supported for safety reasons and types need to be explicitly converted. The Rust version uses the as operator as in (n as f64) to convert from usize to f64.

Implementing `std::nth_element` with `select_nth_unstable_by`, multiple return values as a tuple, explicitly handling 64-bit floating point not having a total order

C++

            InputIterator mid = begin+N/2;
            nth_element(begin, mid, end);

Rust

#![allow(unused)]
fn main() {
        let (before, mid, _after) =
            slice.select_nth_unstable_by(n / 2,
                                         |a, b| a.partial_cmp(b).unwrap());
}

The C++ version uses the std::nth_element function to find the median value.

The Rust version uses the select_nth_unstable_by method as an equivalent.

select_nth_unstable_by uses a tuple return type to return multiple values. let (before, mid, _after) = ... declares and assigns each element of the returned tuple as local variables conveniently in a single line.

The 64-bit floating point type (double or f64) doesn't technically have a total order because of NaN. So there could be subtle undefined or implementation-dependent behaviors where strict ordering is expected. The Rust version makes it explicit by not letting f64 implement the cmp method. The closure |a, b| a.partial_cmp(b).unwrap() makes it explicit that it would panic if NaN is encountered, though it doesn't occur in this program.

nestedloop

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <iostream>
#include <stdlib.h>

using namespace std;

int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 30
#else
#define LENGTH 46
#endif
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    int a, b, c, d, e, f, x=0;
        
    for (a=0; a<n; a++)
        for (b=0; b<n; b++)
            for (c=0; c<n; c++)
                for (d=0; d<n; d++)
                    for (e=0; e<n; e++)
                        for (f=0; f<n; f++)
                            x++;

    cout << x << endl;
    return(0);
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 30;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 46;

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut x: i32 = 0;

    for _a in 0..n {
        for _b in 0..n {
            for _c in 0..n {
                for _d in 0..n {
                    for _e in 0..n {
                        for _f in 0..n {
                            x = x.wrapping_add(1);
                        }
                    }
                }
            }
        }
    }

    println!("{}", x);
}

Porting notes

Integer addition without trapping

C++

                            x++;

Rust

#![allow(unused)]
fn main() {
                            x = x.wrapping_add(1);
}

In Rust, the normal integer addition would panic if it causes an overflow, unlike in C++. To avoid this, the Rust version uses the wrapping_add method.

objinst

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <stdlib.h>
#include <iostream>

using namespace std;

class Toggle {
public:
    Toggle(bool start_state) : state(start_state) { }
    virtual ~Toggle() {  }
    bool value() {
        return(state);
    }
    virtual Toggle& activate() {
        state = !state;
        return(*this);
    }
    bool state;
};

class NthToggle : public Toggle {
public:
    NthToggle(bool start_state, int max_counter) :
        Toggle(start_state), count_max(max_counter), counter(0) {
    }
    Toggle& activate() {
        if (++this->counter >= this->count_max) {
            state = !state;
            counter = 0;
        }
        return(*this);
    }
private:
    int count_max;
    int counter;
};

int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 1000000
#else
#define LENGTH 70000000
#endif
    int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);

    Toggle *toggle1 = new Toggle(true);
    for (int i=0; i<5; i++) {
        cout << ((toggle1->activate().value()) ? "true" : "false") << endl;
    }
    delete toggle1;
    for (int i=0; i<n; i++) {
        Toggle *toggle = new Toggle(true);
        delete toggle;
    }
    
    cout << endl;
    
    NthToggle *ntoggle1 = new NthToggle(true, 3);
    for (int i=0; i<8; i++) {
        cout << ((ntoggle1->activate().value()) ? "true" : "false") << endl;
    }
    delete ntoggle1;
    for (int i=0; i<n; i++) {
        NthToggle *ntoggle = new NthToggle(true, 3);
        delete ntoggle;
    }
    return 0;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 1000000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 70000000;

struct Toggle {
    state: bool
}

impl Toggle {
    fn new(start_state: bool) -> Toggle {
        Toggle { state: start_state }
    }
}

trait Togglable {
    fn activate(&mut self);
    fn value(&self) -> bool;
}

impl Togglable for Toggle {
    fn activate(&mut self) {
        self.state = !self.state;
    }
    fn value(&self) -> bool {
        return self.state;
    }
}

struct NthToggle {
    state: bool,
    count_max: i32,
    counter: i32,
}

impl NthToggle {
    fn new(start_state: bool, max_counter: i32) -> NthToggle {
        NthToggle {
            state: start_state,
            count_max: max_counter,
            counter: 0,
        }
    }
}

impl Togglable for NthToggle {
    fn activate(&mut self) {
        self.counter += 1;
        if self.counter >= self.count_max {
            self.state = !self.state;
            self.counter = 0;
        }
    }
    fn value(&self) -> bool {
        return self.state;
    }
}

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut toggle1 = Toggle::new(true);
    for _i in 0..5 {
        toggle1.activate();
        println!("{}", if toggle1.value() { "true" } else { "false" });
    }
    for _i in 0..n {
        let _toggle = Toggle::new(true);
    }

    println!();

    let mut ntoggle1 = NthToggle::new(true, 3);
    for _i in 0..8 {
        ntoggle1.activate();
        println!("{}", if ntoggle1.value() { "true" } else { "false" });
    }
    for _i in 0..n {
        let _ntoggle = NthToggle::new(true, 3);
    }
}

Porting notes

This program is similar to methcall. The Toggle and the NthToggle classes or structs are identical and the usage of them are somewhat different in the main function.

random

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/

#include <iostream>
#include <stdlib.h>
#include <math.h>

using namespace std;

#define IM 139968
#define IA 3877
#define IC 29573

inline double gen_random(double max) {
    static long last = 42;
    last = (last * IA + IC) % IM;
    return( max * last / IM );
}

int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 4000000
#else
#define LENGTH 400000000
#endif
    int N = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    double result = 0;
    
    while (N--) {
        result = gen_random(100.0);
    }
    cout.precision(9);
    cout.setf(ios::fixed);
    cout << result << endl;
    return(0);
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 4000000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 400000000;

const IM: i64 = 139968;
const IA: i64 = 3877;
const IC: i64 = 29573;

fn gen_random(last_random: &mut i64, max: f64) -> f64 {
    let new_last = (*last_random * IA + IC) % IM;
    *last_random = new_last;
    return max * (new_last as f64) / (IM as f64);
}

fn main() {
    let mut args = env::args();
    let mut n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut result: f64 = 0.0;
    let mut last_random: i64 = 42;

    while n != 0 {
        n -= 1;
        result = gen_random(&mut last_random, 100.0);
    }

    println!("{:.9}", result);
}

Porting notes

The gen_random function is the same as the one in heapsort.

sieve

Full source code

C++

#include <iostream>
#include <vector>
#include <list>
#include <cstdlib>

using namespace std;

void sieve(list<int>& unknown, vector<int>& primes)
{
  while (!unknown.empty())
  {
    int p = unknown.front();
    unknown.pop_front();
    list<int>::iterator i = unknown.begin();
    while (i != unknown.end())
    {
      if (*i % p)
        ++i;
      else
        i = unknown.erase(i);
    }
    primes.push_back(p);
  }
}

int main(int argc, char *argv[]) 
{
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 50
#else
#define LENGTH 500
#endif
  size_t NUM = (argc == 2 ? (atoi(argv[1]) < 1 ? 1 : atoi(argv[1])): LENGTH);

  vector<int> primes;

  // run the sieve repeatedly
  while (NUM--) {
    list<int> integers;
    for (int i = 2; i < 8192; ++i)
      integers.push_back(i);
    primes.clear();
    sieve(integers, primes);
  }

  cout << "Count: " << primes.size() << endl;
  return 0;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
use std::env;

// Use VecDeque instead of LinkedList because Rust doesn't have stable
// ways to remove items from LinkedList and strongly encourage using
// Vec or VecDeque.
use std::collections::VecDeque;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 50;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 500;

fn sieve(unknown: &mut VecDeque::<i32>, primes: &mut Vec::<i32>) {
    while unknown.len() > 0 {
        let p = unknown.pop_front().unwrap();
        unknown.retain(|&x| x % p != 0);
        primes.push(p);
    }
}

fn main() {
    let mut args = env::args();
    let mut num = if args.len() == 2 {
        let arg1 = args.nth(1).unwrap().parse::<i32>().unwrap();
        if arg1 < 1 { 1 } else { arg1 }
    } else {
        LENGTH
    };
    let mut primes = Vec::<i32>::new();

    // run the sieve repeatedly
    while num != 0 {
        let mut integers = VecDeque::<i32>::new();
        for i in 2..8192 {
            integers.push_back(i);
        }
        primes.clear();
        sieve(&mut integers, &mut primes);
        num -= 1;
    }

    println!("Count: {}", primes.len());
}

Porting notes

Using `VecDeque` instead of `LinkedList`

C++

void sieve(list<int>& unknown, vector<int>& primes)
{
  while (!unknown.empty())
  {
    int p = unknown.front();
    unknown.pop_front();
    list<int>::iterator i = unknown.begin();
    while (i != unknown.end())
    {
      if (*i % p)
        ++i;
      else
        i = unknown.erase(i);
    }
    primes.push_back(p);
  }
}

Rust

#![allow(unused)]
fn main() {
fn sieve(unknown: &mut VecDeque::<i32>, primes: &mut Vec::<i32>) {
    while unknown.len() > 0 {
        let p = unknown.pop_front().unwrap();
        unknown.retain(|&x| x % p != 0);
        primes.push(p);
    }
}
}

The Rust LinkedList interestingly does not support mutation such as element insertion and removal in its stable API, as of this writing. Rust instead encourages to use VecDeque which is double-sided vector for performance reasons when a doubly linked list would be otherwise used. So the Rust version uses VecDeque even though it may have different performance characteristics from the C++ std::list. The benchmark results shows that the performance differences are small.

Furthermore, the Rust version uses the retain method to remove elements from the VecDeque I believe this is less error-prone than the C++ version where it uses an iteration over the list along with the erase function during the iteration.

strcat

Full source code

C++

// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
// with help from PeterB

#include <cstdlib>
#include <iostream>
#include <string>
using namespace std;

int main(int argc, char *argv[])
{
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 1000000
#else
#define LENGTH 5000000
#endif
    int i, n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
    string str;
    size_t capacity = 31;
    str.reserve(capacity); // as per C-string
    size_t newLength = 6;
    for (i = 0; i < n; i++)
    {
        if(newLength > capacity)
        {
            capacity *= 2;
            str.reserve(capacity);
        }
        str += "hello\n";
        newLength += 6;
    }
    cout << str.length() << endl;
    return 0;
}

Rust

// Adapted from https://github.com/llvm/llvm-test-suite and
// http://www.bagley.org/~doug/shootout/
// with help from PeterB
use std::env;

#[cfg(feature = "small_problem_size")]
const LENGTH: i32 = 1000000;

#[cfg(not(feature = "small_problem_size"))]
const LENGTH: i32 = 5000000;

fn main() {
    let mut args = env::args();
    let n = if args.len() == 2 {
        args.nth(1).unwrap().parse::<i32>().unwrap()
    } else {
        LENGTH
    };

    let mut str = String::new();
    let mut capacity: usize = 31;
    str.reserve(capacity);
    let mut new_length: usize = 6;
    for _i in 0..n {
        if new_length > capacity {
            capacity *= 2;
            str.reserve(capacity);
        }
        str.push_str("hello\n");
        new_length += 6;
    }
    println!("{}", str.len());
}

Porting notes

The porting was straightforward with no specific notes.

Performance analysis

The benchmark results show that the Rust version is more than 2 times faster than the C++ version.

We compare the generated code between C++ and Rust.

C++

C++ perfdata (the host function list)

Overhead  Command  Shared Object         Symbol
  19.78%  strcat   libstdc++.so.6.0.30   [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)
  12.01%  strcat   libc.so.6             [.] __memmove_avx_unaligned_erms
   8.25%  strcat   strcat                [.] main
   6.77%  strcat   [unknown]             [k] 0xffffffff9fd9f917
   3.87%  strcat   libc.so.6             [.] __memmove_avx_unaligned
   2.43%  strcat   libstdc++.so.6.0.30   [.] memcpy@plt
   1.95%  strcat   strcat                [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@plt

C++ perfdata (the hottest function std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@@GLIBCXX_3.4.21)

  7.22 │      endbr64
  9.87 │      push    %rbp
  2.46 │      mov     %rdx,%r8
       │      push    %rbx
  4.92 │      mov     %rdi,%rbx
  2.46 │      sub     $0x8,%rsp
  2.46 │      mov     0x8(%rdi),%rax
       │      mov     (%rdi),%rdi
       │      lea     (%rdx,%rax,1),%rbp
  2.46 │      lea     0x10(%rbx),%rdx
  2.48 │      cmp     %rdx,%rdi
       │    ↓ je      78
       │      mov     0x10(%rbx),%rdx
  7.27 │28:   cmp     %rbp,%rdx
       │    ↓ jb      60
  4.87 │      test    %r8,%r8
       │    ↓ je      46
  7.33 │      add     %rax,%rdi
  7.40 │      cmp     $0x1,%r8
       │    ↓ je      80
  2.46 │      mov     %r8,%rdx
  7.35 │    → call    memcpy@plt
       │      mov     (%rbx),%rdi
  2.46 │46:   mov     %rbp,0x8(%rbx)
  2.46 │      mov     %rbx,%rax
  2.37 │      movb    $0x0,(%rdi,%rbp,1)
  7.11 │      add     $0x8,%rsp
  9.64 │      pop     %rbx
       │      pop     %rbp
  4.94 │    ← ret

C++ perfdata (The second hottest function __memcpy_avx_unaligned_erms)

 16.08 │ 30:   cmp        $0x10,%edx
       │     ↓ jae        62
  8.10 │       cmp        $0x8,%edx
       │     ↓ jae        80
  4.07 │       cmp        $0x4,%edx
       │     ↓ jae        55
       │       cmp        $0x1,%edx
       │     ↓ jl         54
       │       mov        (%rsi),%cl
       │     ↓ je         52
       │       movzwl     -0x2(%rsi,%rdx,1),%esi
       │       mov        %si,-0x2(%rdi,%rdx,1)
       │ 52:   mov        %cl,(%rdi)
       │ 54: ← ret
  3.77 │ 55:   mov        -0x4(%rsi,%rdx,1),%ecx
  4.06 │       mov        (%rsi),%esi
 12.17 │       mov        %ecx,-0x4(%rdi,%rdx,1)
  4.06 │       mov        %esi,(%rdi)
 12.05 │     ← ret

C++ LLVM IR (the loop in the main function, the frontend output)

16:                                               ; preds = %14, %38
  %17 = phi i64 [ %39, %38 ], [ 6, %14 ]
  %18 = phi i64 [ %30, %38 ], [ 31, %14 ]
  %19 = phi i32 [ %40, %38 ], [ 0, %14 ]
  %20 = icmp ugt i64 %17, %18
  br i1 %20, label %21, label %29

21:                                               ; preds = %16
  %22 = shl i64 %18, 1
  invoke void @_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7reserveEm(ptr noundef nonnull align 8 dereferenceable(32) %3, i64 noundef %22)
          to label %29 unwind label %25

23:                                               ; preds = %10
  %24 = landingpad { ptr, i32 }
          cleanup
  br label %83

25:                                               ; preds = %21, %36
  %26 = landingpad { ptr, i32 }
          cleanup
  br label %83

27:                                               ; preds = %34, %42, %53, %62, %63, %68, %71
  %28 = landingpad { ptr, i32 }
          cleanup
  br label %83

29:                                               ; preds = %21, %16
  %30 = phi i64 [ %22, %21 ], [ %18, %16 ]
  %31 = load i64, ptr %13, align 8, !tbaa !12
  %32 = add i64 %31, -4611686018427387898
  %33 = icmp ult i64 %32, 6
  br i1 %33, label %34, label %36

34:                                               ; preds = %29
  invoke void @_ZSt20__throw_length_errorPKc(ptr noundef nonnull @.str.1) #11
          to label %35 unwind label %27

35:                                               ; preds = %34
  unreachable

36:                                               ; preds = %29
  %37 = invoke noundef nonnull align 8 dereferenceable(32) ptr @_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm(ptr noundef nonnull align 8 dereferenceable(32) %3, ptr noundef nonnull @.str, i64 noundef 6
)
          to label %38 unwind label %25

38:                                               ; preds = %36
  %39 = add nuw nsw i64 %17, 6
  %40 = add nuw nsw i32 %19, 1
  %41 = icmp eq i32 %40, %11
  br i1 %41, label %42, label %16, !llvm.loop !16

The loop body in the main function doesn't have the basic_string::append and its internal memcpy inlined. IThey aren't probably defined in the C++ string and the string.h header files as its body is not part of the compilation unit. Even though it uses the SIMD (AVX) version of the memcpy, it doesn't use the AVX instructions because it copies only 6 bytes at a time.

Rust

Rust perfdata (the hot function list)

Overhead  Command  Shared Object     Symbol
  34.98%  strcat   strcat            [.] strcat::main

Rust perfdata

       │1d0:   mov    0x8(%rsp),%rax
  4.84 │       cmp    %rbx,%r12
       │     ↓ jbe    1e8
       │       add    %rbx,%rbx
       │       mov    %rax,%rcx
       │       sub    %rsi,%rcx
       │       cmp    %rbx,%rcx
       │     ↓ jb     218
  3.24 │1e8:   sub    %rsi,%rax
 18.89 │       cmp    $0x5,%rax
       │     ↓ jbe    236
  3.25 │1f1:   mov    0x10(%rsp),%rax
 14.99 │       movw   $0xa6f,0x4(%rax,%rsi,1)
 19.38 │       movl   $0x6c6c6568,(%rax,%rsi,1)
  2.98 │       add    $0x6,%rsi
 13.03 │       mov    %rsi,0x18(%rsp)
  3.25 │       add    $0x6,%r12
  3.27 │       dec    %r15d
 12.90 │     ↑ jne    1d0

Rust LLVM IR (the loop body in the main function, the frontend output)

bb29:                                             ; preds = %bb8.preheader, %bb32
  %len.i.i.i.pre62 = phi i64 [ %49, %bb32 ], [ %len.i.i.i.pre62.pre, %bb8.preheader ]
  %capacity.sroa.0.051 = phi i64 [ %capacity.sroa.0.1, %bb32 ], [ 31, %bb8.preheader ]
  %new_length.sroa.0.050 = phi i64 [ %50, %bb32 ], [ 6, %bb8.preheader ]
  %iter.sroa.0.049 = phi i32 [ %47, %bb32 ], [ 0, %bb8.preheader ]
  %47 = add nuw nsw i32 %iter.sroa.0.049, 1
  %_18 = icmp ugt i64 %new_length.sroa.0.050, %capacity.sroa.0.051
  %self2.i.i.i.pre64 = load i64, ptr %str, align 8, !range !33
  br i1 %_18, label %bb10, label %bb12

bb10:                                             ; preds = %bb29
  %48 = shl i64 %capacity.sroa.0.051, 1
  %_10.i31 = sub i64 %self2.i.i.i.pre64, %len.i.i.i.pre62
  %_7.i32 = icmp ult i64 %_10.i31, %48
  br i1 %_7.i32, label %bb1.i33, label %bb12, !prof !32

bb1.i33:                                          ; preds = %bb10
; invoke alloc::raw_vec::RawVecInner<A>::reserve::do_reserve_and_handle
  invoke fastcc void @"_ZN5alloc7raw_vec20RawVecInner$LT$A$GT$7reserve21do_reserve_and_handle17h0f49121d9a52801dE"(ptr noalias noundef nonnull align 8 dereferenceable(16) %str, i64 noundef %len.i.i.i.pre62, i64 noundef %48)
          to label %bb1.i33.bb12_crit_edge unwind label %cleanup2.loopexit

bb1.i33.bb12_crit_edge:                           ; preds = %bb1.i33
  %len.i.i.i.pre = load i64, ptr %_40.sroa.5.0.str.sroa_idx, align 8, !alias.scope !64
  %self2.i.i.i.pre = load i64, ptr %str, align 8, !range !33, !alias.scope !69
  br label %bb12

bb12:                                             ; preds = %bb1.i33.bb12_crit_edge, %bb10, %bb29
  %self2.i.i.i = phi i64 [ %self2.i.i.i.pre64, %bb29 ], [ %self2.i.i.i.pre, %bb1.i33.bb12_crit_edge ], [ %self2.i.i.i.pre64, %bb10 ]
  %len.i.i.i = phi i64 [ %len.i.i.i.pre62, %bb29 ], [ %len.i.i.i.pre, %bb1.i33.bb12_crit_edge ], [ %len.i.i.i.pre62, %bb10 ]
  %capacity.sroa.0.1 = phi i64 [ %capacity.sroa.0.051, %bb29 ], [ %48, %bb1.i33.bb12_crit_edge ], [ %48, %bb10 ]
  call void @llvm.experimental.noalias.scope.decl(metadata !72)
  call void @llvm.experimental.noalias.scope.decl(metadata !73)
  %_10.i.i.i = sub i64 %self2.i.i.i, %len.i.i.i
  %_7.i.i.i = icmp ult i64 %_10.i.i.i, 6
  br i1 %_7.i.i.i, label %bb1.i.i.i, label %bb32, !prof !32

bb1.i.i.i:                                        ; preds = %bb12
; invoke alloc::raw_vec::RawVecInner<A>::reserve::do_reserve_and_handle
  invoke fastcc void @"_ZN5alloc7raw_vec20RawVecInner$LT$A$GT$7reserve21do_reserve_and_handle17h0f49121d9a52801dE"(ptr noalias noundef nonnull align 8 dereferenceable(16) %str, i64 noundef %len.i.i.i, i64 noundef 6)
          to label %.noexc36 unwind label %cleanup2.loopexit

.noexc36:                                         ; preds = %bb1.i.i.i
  %len.pre.i.i = load i64, ptr %_40.sroa.5.0.str.sroa_idx, align 8, !alias.scope !64
  br label %bb32

bb32:                                             ; preds = %.noexc36, %bb12
  %len.i.i = phi i64 [ %len.i.i.i, %bb12 ], [ %len.pre.i.i, %.noexc36 ]
  %_9.i.i = icmp sgt i64 %len.i.i, -1
  call void @llvm.assume(i1 %_9.i.i)
  %_10.i.i = load ptr, ptr %_40.sroa.4.0.str.sroa_idx, align 8, !alias.scope !64, !nonnull !4, !noundef !4
  %dst.i.i = getelementptr inbounds i8, ptr %_10.i.i, i64 %len.i.i
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(6) %dst.i.i, ptr noundef nonnull align 1 dereferenceable(6) @alloc_67205f92287559ab7be9a96d10923d6c, i64 6, i1 false), !noalias !64
  %49 = add nuw i64 %len.i.i, 6
  store i64 %49, ptr %_40.sroa.5.0.str.sroa_idx, align 8, !alias.scope !64
  %50 = add nuw nsw i64 %new_length.sroa.0.050, 6
  %exitcond.not = icmp eq i32 %47, %n.sroa.0.0
  br i1 %exitcond.not, label %bb30, label %bb29

The hottest of the loop is fully inlined into the main function and is a straightforward memory copy from the string literal hello\n into the string buffer with occasional size expansion through the reverse call.

I believe that the fact that the push_str and memcpy functions are inlined in the Rust version explains the the performance difference.

Benchmarking results

Program	C++ (s)¹	Rust (s)²	Ratio (x)³
ackermann	0.5849	0.6696	1.1448
ary	0.0403	0.0430	1.0679
ary2	0.0414	0.0422	1.0194
ary3	0.1707	0.7217	4.2281
fibo	1.1382	1.2286	1.0795
hash	0.2829	0.2104	0.7437
hash2	1.1647	0.8224	0.7061
heapsort	1.5964	1.6009	1.0028
lists	1.8520	1.7653	0.9532
lists1	0.1186	0.0862	0.7265
matrix	0.9690	2.0188	2.0834
methcall	3.0316	0.4226	0.1394
moments	0.0736	0.0554	0.7534
nestedloop	0.0017	0.0015	0.8347
objinst	0.0016	0.0012	0.7930
random	1.2529	1.0735	0.8568
sieve	0.9437	0.9028	0.9566
strcat	0.0534	0.0248	0.4640

The outliers are ary3, matrix, methcall and strcat.

I examined the generated code for those to understand the performance differences.

With Clang 20.1.5 on x86_64-pc-linux-gnu, the option -O2, over 30 runs, on an AMD Zen 2 CPU. ↩
With Rustc 1.86.0 (05f9846f8 2025-03-31), the option -O, over 30 runs, on the same machine. ↩
Ratios > 1.0 indicates the Rust version was slower than the C++ version. Ratios < 1.0 indicates the Rust version was faster than the C++ version. ↩