Introduction
This repository contains a set of relatively small Rust programs that I ported from C++ to learn Rust.
Programs
The C++ programs ported are the non-iteractive programs from the LLVM Test Suite Shootout C++ which originates in the Programming Language Shootout:
- ackermann
- ary
- ary2
- ary3
- fibo
- hash
- hash2
- heapsort
- lists
- lists1
- matrix
- methcall
- moments
- nestedloop
- objinst
- random
- sieve
- strcat
The current criteria for programs I picked are:
- they are small
- they produce some output so that the correctness of porting can be checked
- they are deterministic and can be run without any interactive user input from a beginning to an end
- their execution times indicate their run time performance
I'd like to add more programs in the future.
Porting notes
When porting from C++ to Rust, I stick with the idiomatic style,
including the use of the standard library types, as much as
appropriate. For example, I'd use Rust's String
type where C++
std::string
(or a C char*
string) is used. Similarly, I'd use
Rust's Vec
type in place for C++ std::vector
and C/C++ raw array
types.
I wrote down porting notes for each program.
Benchmark results and performance analysis
I included benchmark results between the C++ and the Rust versions. I also included performance analysis based on the generated code for the programs whose benchmark results were significantly different,
ackermann
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <iostream>
#include <stdlib.h>
using namespace std;
int Ack(int M, int N) { return(M ? (Ack(M-1,N ? Ack(M,(N-1)) : 1)) : N+1); }
int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 11
#else
#define LENGTH 12
#endif
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
cout << "Ack(3," << n << "): " << Ack(3, n) << endl;
return(0);
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 11; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 12; fn ack(m: i32, n: i32) -> i32 { if m != 0 { ack(m - 1, if n != 0 { ack(m, n - 1) } else { 1 }) } else { n + 1 } } fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; println!("Ack(3,{}): {}", n, ack(3, n)); }
Porting notes
Conditionally declaring constants
C++
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 11
#else
#define LENGTH 12
#endif
Rust
#![allow(unused)] fn main() { #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 11; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 12; }
It's idiomatic to use const
variables along with the cfg
attributes to conditionally declare constants, as opposed to using macros in C/C++.
Declaring a function
C++
int Ack(int M, int N) { return(M ? (Ack(M-1,N ? Ack(M,(N-1)) : 1)) : N+1); }
Rust
#![allow(unused)] fn main() { fn ack(m: i32, n: i32) -> i32 { if m != 0 { ack(m - 1, if n != 0 { ack(m, n - 1) } else { 1 }) } else { n + 1 } } }
Declaring a function in Rust is not too dissimilar to C++ with some differences:
- Functions are declared with the
fn
keyword. - The parameter type comes after the parameter name and a colon.
- The return type comes after the parameter list and the
->
. - The if statement condition expression doesn't need parentheses around them.
- Rust has if expressions in addition to if statements (or the last statement in the blocks can be an expression), and lacks the ternary operator.
- Rust has no
int
type and uses integer types with explicit widths such asi32
.
Accessing program arguments
C++
int main(int argc, char *argv[]) {
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
...
Rust
fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; ...
Rust's main function doesn't have the argc
and argv
parameters. Instead the program arguments are accessed with the std::env
API.
The std::option
type is commonly used in Rust to represent the optional result of an operation. The unwrap
function is used to access the result when it is assumed that it contains a valid result, which would cause a panic if it does not. For example, accessing the n
th element of an argument array and parsing a string into an integer.
This common pattern on the program arguments are used in most of the programs here. I believe it achieves dual purposes, to allow the problem size changed and to makes it harder for the compiler to compile-time evaluate (constant-fold) the entire program logic.
Printing to standard output
C++
cout << "Ack(3," << n << "): " << Ack(3, n) << endl;
Rust
#![allow(unused)] fn main() { println!("Ack(3,{}): {}", n, ack(3, n)); }
println!
is the standard way to print to standard output in Rust. The !
at the end indicates it's a macro. I won't go into macros here.
To print without the new line, we'd use print!
instead.
ary
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <cstdlib>
#include <iostream>
#include <vector>
int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 900000
#else
#define LENGTH 9000000
#endif
int i, n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
typedef std::vector<int> ARY;
ARY x(n);
ARY y(n);
for (i=0; i<n; i++) {
x[i] = i;
}
for (int i = n - 1; i >= 0; --i) {
y[i] = x[i];
}
std::cout << y.back() << std::endl;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 900000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 9000000; fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut x = Vec::<i32>::new(); x.resize(n as usize, 0); let mut y = Vec::<i32>::new(); y.resize(n as usize, 0); for i in 0..n { x[i as usize] = i; } for i in (0..n).rev() { y[i as usize] = x[i as usize]; } println!("{}", y[(n - 1) as usize]); }
Porting notes
Initializing vectors of type i32
and of size n
as local, mutable variables
C++
typedef std::vector<int> ARY;
ARY x(n);
ARY y(n);
Rust
#![allow(unused)] fn main() { let mut x = Vec::<i32>::new(); x.resize(n as usize, 0); let mut y = Vec::<i32>::new(); y.resize(n as usize, 0); }
We initialize the two vectors, x
and y
using let mut
. let
is the standard way to declare a local variable and mut
is added because we will modify (mutate) the vector later.
When initializing a Vec
of element type i32
, we put the element type i32
in angly brackets as in Vec::<i32>
, as opposed to the C++ style Vec<i32>
.
As there isn't currently a way to initialize a Vec
with a given size n
in one shot in Rust, we initialize a vVec
of size zero and then resize to n
with default element value 0
.
The vector indices in Rust are of type usize
like size_t
and there are no implicit integer conversions, we need to cast n
(of type i32
) to usize
using the as
keyword as in n as usize
.
Writing for
range loops
C++
for (i=0; i<n; i++) {
x[i] = i;
}
for (int i = n - 1; i >= 0; --i) {
y[i] = x[i];
}
Rust
#![allow(unused)] fn main() { for i in 0..n { x[i as usize] = i; } for i in (0..n).rev() { y[i as usize] = x[i as usize]; } }
A for
range loop is written as for i in 0..n { ... }
. The loop iterates with the index range 0
up to n - 1
induction variable where n
is exclusive. There are no parentheses around the for loop header like the if statement condition expression. It's equivalent to for (int i = 0; i < n; ++i) { ... }
in C++.
There is no general for
loop in Rust where the loop variable declaration, the loop termination condition and the loop increment statements are spelled out and can be customizable. We would use while
loops instead.
To make n
to be inclusive, we would write for i in 0..<=n { ... }
.
A for
range loop in the reverse order is written as for i in (0..n).rev() { ... }
. This is equivalent to for (int i = n - 1; i >= 0; --i) { ... }
in C++.
Since the variable i
is of type i32
, we convert it to usize
, as in i as usize
.
ary2
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <cstdlib>
#include <iostream>
#include <vector>
int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 90000
#else
#define LENGTH 900000
#endif
int i, n = 10*((argc == 2) ? atoi(argv[1]) : LENGTH);
typedef std::vector<int> ARY;
ARY x(n);
ARY y(n);
for (i=0; i<n;) {
x[i] = i; ++i;
x[i] = i; ++i;
x[i] = i; ++i;
x[i] = i; ++i;
x[i] = i; ++i;
x[i] = i; ++i;
x[i] = i; ++i;
x[i] = i; ++i;
x[i] = i; ++i;
x[i] = i; ++i;
}
for (int i = n - 1; i >= 0;) {
y[i] = x[i]; --i;
y[i] = x[i]; --i;
y[i] = x[i]; --i;
y[i] = x[i]; --i;
y[i] = x[i]; --i;
y[i] = x[i]; --i;
y[i] = x[i]; --i;
y[i] = x[i]; --i;
y[i] = x[i]; --i;
y[i] = x[i]; --i;
}
std::cout << y.back() << std::endl;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 90000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 900000; fn main() { let mut args = env::args(); let n = 10 * if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut x = Vec::<i32>::new(); x.resize(n as usize, 0); let mut y = Vec::<i32>::new(); y.resize(n as usize, 0); let mut i: i32 = 0; while i < n { x[i as usize] = i; i += 1; x[i as usize] = i; i += 1; x[i as usize] = i; i += 1; x[i as usize] = i; i += 1; x[i as usize] = i; i += 1; x[i as usize] = i; i += 1; x[i as usize] = i; i += 1; x[i as usize] = i; i += 1; x[i as usize] = i; i += 1; x[i as usize] = i; i += 1; } i = n - 1; while i >= 0 { y[i as usize] = x[i as usize]; i -= 1; y[i as usize] = x[i as usize]; i -= 1; y[i as usize] = x[i as usize]; i -= 1; y[i as usize] = x[i as usize]; i -= 1; y[i as usize] = x[i as usize]; i -= 1; y[i as usize] = x[i as usize]; i -= 1; y[i as usize] = x[i as usize]; i -= 1; y[i as usize] = x[i as usize]; i -= 1; y[i as usize] = x[i as usize]; i -= 1; y[i as usize] = x[i as usize]; i -= 1; } println!("{}", y.iter().last().unwrap()); }
Porting notes
Using while
loops instead of general for
loops
C++
int i, ...
for (i=0; i<n;) {
...
}
for (int i = n - 1; i >= 0;) {
...
}
Rust
#![allow(unused)] fn main() { let mut i: i32 = 0; while i < n { ... } i = n - 1; while i >= 0 { ... } }
Since Rust has no general for loops, we use while
loops instead.
We declare the loop variable i
outside the loop. If we want to explicitly write the type of a local variable, we put the type after the variable name and the :
but before the initializer, as in let mut i: i32 = 0;
. The type are often omitted relying on type inference.
Incrementing or decrementing a variable
C++
x[i] = i; ++i;
...
y[i] = x[i]; --i;
Rust
#![allow(unused)] fn main() { x[i as usize] = i; i += 1; ... y[i as usize] = x[i as usize]; i -= 1; }
Rust doesn't have the prefix or postfix increment and decrement operators. I think it's to avoid bugs caused by the prefix/postfix evaluation orders. We use the compound assignment operators +=
and -=
.
ary3
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <cstdlib>
#include <iostream>
#include <vector>
using namespace std;
int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 150000
#else
#define LENGTH 1500000
#endif
int i, k, n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
typedef vector<int> ARY;
ARY x(n);
ARY y(n);
for (i=0; i<n; i++) {
x[i] = i + 1;
}
for (k=0; k<1000; k++) {
for (int i = n - 1; i >= 0; --i) {
y[i] += x[i];
}
}
cout << y[0] << " " << y.back() << endl;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 150000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 1500000; fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut x = Vec::<i32>::new(); x.resize(n as usize, 0); let mut y = Vec::<i32>::new(); y.resize(n as usize, 0); for i in 0..n { x[i as usize] = i + 1; } for _k in 0..1000 { for i in (0..n).rev() { y[i as usize] += x[i as usize]; } } println!("{} {}", y[0], y.iter().last().unwrap()); }
Porting notes
A nested loop and an unused loop index variable
C++
for (k=0; k<1000; k++) {
for (int i = n - 1; i >= 0; --i) {
y[i] += x[i];
}
}
Rust
#![allow(unused)] fn main() { for _k in 0..1000 { for i in (0..n).rev() { y[i as usize] += x[i as usize]; } } }
A nested loop is written in a way similar to C++.
The Rust compiler emits a warning for an unused loop variable, as in _k
, to suggest adding the preceding underscore in its name though it doesn't cause an error.
Performance analysis
The benchmark results show that the Rust version is more than 4 times slower than the C++ version.
We compare the generated code between C++ and Rust.
C++
Here is the hot loop according to the Linux Perf:
C++ perf data
7.51 │1c0: movdqu -0x10(%rsi,%r10,4),%xmm0
7.65 │ movdqu (%rsi,%r10,4),%xmm1
14.29 │ movdqu -0x10(%rdi,%r10,4),%xmm2
7.07 │ paddd %xmm0,%xmm2
8.11 │ movdqu (%rdi,%r10,4),%xmm0
9.28 │ paddd %xmm1,%xmm0
8.99 │ movdqu %xmm0,(%rdi,%r10,4)
18.54 │ movdqu %xmm2,-0x10(%rdi,%r10,4)
9.43 │ add $0xfffffffffffffff8,%r10
8.98 │ cmp %r10,%r8
│ ↑ jne 1c0
This corresponds to the nested loop in the second loop in the source code.
C++ LLVM IR (the frontend output)
77: ; preds = %76, %77
%78 = phi i64 [ %93, %77 ], [ 0, %76 ]
%79 = xor i64 %78, -1
%80 = add i64 %79, %40
%81 = getelementptr inbounds nuw i32, ptr %33, i64 %80
%82 = getelementptr inbounds i8, ptr %81, i64 -12
%83 = getelementptr inbounds i8, ptr %81, i64 -28
%84 = load <4 x i32>, ptr %82, align 4, !tbaa !10
%85 = load <4 x i32>, ptr %83, align 4, !tbaa !10
%86 = getelementptr inbounds nuw i32, ptr %36, i64 %80
%87 = getelementptr inbounds i8, ptr %86, i64 -12
%88 = getelementptr inbounds i8, ptr %86, i64 -28
%89 = load <4 x i32>, ptr %87, align 4, !tbaa !10
%90 = load <4 x i32>, ptr %88, align 4, !tbaa !10
%91 = add nsw <4 x i32> %89, %84
%92 = add nsw <4 x i32> %90, %85
store <4 x i32> %91, ptr %87, align 4, !tbaa !10
store <4 x i32> %92, ptr %88, align 4, !tbaa !10
%93 = add nuw i64 %78, 8
%94 = icmp eq i64 %93, %43
br i1 %94, label %95, label %77, !llvm.loop !17
C++ LLVM IR (right before the Machine IR generation)
87: ; preds = %.preheader, %87
%lsr.iv = phi i64 [ 0, %.preheader ], [ %lsr.iv.next, %87 ]
%88 = shl nsw i64 %lsr.iv, 2
%scevgep8 = getelementptr i8, ptr %scevgep5, i64 %88
%89 = shl nsw i64 %lsr.iv, 2
%scevgep6 = getelementptr i8, ptr %scevgep5, i64 %89
%scevgep7 = getelementptr i8, ptr %scevgep6, i64 -16
%90 = load <4 x i32>, ptr %scevgep8, align 4, !tbaa !10
%91 = load <4 x i32>, ptr %scevgep7, align 4, !tbaa !10
%92 = shl nsw i64 %lsr.iv, 2
%scevgep4 = getelementptr i8, ptr %scevgep, i64 %92
%93 = shl nsw i64 %lsr.iv, 2
%scevgep2 = getelementptr i8, ptr %scevgep, i64 %93
%scevgep3 = getelementptr i8, ptr %scevgep2, i64 -16
%94 = load <4 x i32>, ptr %scevgep4, align 4, !tbaa !10
%95 = load <4 x i32>, ptr %scevgep3, align 4, !tbaa !10
%96 = add nsw <4 x i32> %94, %90
%97 = add nsw <4 x i32> %95, %91
store <4 x i32> %96, ptr %scevgep4, align 4, !tbaa !10
store <4 x i32> %97, ptr %scevgep3, align 4, !tbaa !10
%lsr.iv.next = add nsw i64 %lsr.iv, -8
%98 = icmp eq i64 %48, %lsr.iv.next
br i1 %98, label %99, label %87, !llvm.loop !17
Rust
Here is the hot loop according to the Linux Perf:
Rust perf data
8.96 │370: cmp %rdi,%r14
│ ↓ jbe 587
8.81 │ cmp %rdi,%rbp
│ ↓ jbe 581
8.59 │ mov (%rbx,%rdi,4),%r8d
8.81 │ add %r8d,(%r15,%rdi,4)
16.53 │ lea -0x1(%rdi),%r8
8.92 │ inc %rdi
7.97 │ cmp $0x1,%rdi
15.84 │ mov %r8,%rdi
15.52 │ ↑ jg 370
This also corresponds to the same nested loop in the second loop in the source code.
Rust LLVM IR (the frontend output)
bb16.us: ; preds = %bb44.us, %bb51.us
%indvars.iv124 = phi i64 [ %73, %bb44.us ], [ %indvars.iv.next125, %bb51.us ]
%indvars.iv.next125 = add nsw i64 %indvars.iv124, -1
%_86.us = icmp ugt i64 %46, %indvars.iv.next125
br i1 %_86.us, label %bb50.us, label %panic10.invoke
bb50.us: ; preds = %bb16.us
%_92.us = icmp ugt i64 %56, %indvars.iv.next125
br i1 %_92.us, label %bb51.us, label %panic10.invoke
bb51.us: ; preds = %bb50.us
%_33.us = getelementptr inbounds i32, ptr %_24.i.i, i64 %indvars.iv.next125
%_32.us = load i32, ptr %_33.us, align 4, !noundef !4
%_35.us = getelementptr inbounds i32, ptr %_24.i.i54, i64 %indvars.iv.next125
%75 = load i32, ptr %_35.us, align 4, !noundef !4
%76 = add i32 %75, %_32.us
store i32 %76, ptr %_35.us, align 4
%_77.us = icmp sgt i64 %indvars.iv124, 1
br i1 %_77.us, label %bb16.us, label %bb15.bb12.loopexit_crit_edge.us
It looks like the difference is caused by a lack of loop vectorization in the Rust version.
A June 2025 update on the performance difference
I filed an issue at the Rust repo about this performance difference. According to the conversation, if the second loop is rewritten to use iterators as below, the loop vectorization occurs and the performance becomes on par with the C++ version.
A version of the second loop that uses iterators:
#![allow(unused)] fn main() { for _k in 0..1000 { y.iter_mut().zip(x.iter()).rev().for_each(|(y, x)| { *y += *x }) } }
I think that the Rust compiler could be improved to recognize the index-based version of the loop and perform loop vectorization.
fibo
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <iostream>
#include <stdlib.h>
using namespace std;
unsigned long fib(unsigned long n) {
if (n < 2)
return(1);
else
return(fib(n-2) + fib(n-1));
}
int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 40
#else
#define LENGTH 43
#endif
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
cout << fib(n) << endl;
return(0);
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 40; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 43; fn fib(n: u64) -> u64 { if n < 2 { return 1; } else { return fib(n - 2) + fib(n - 1); } } fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; println!("{}", fib(n as u64)); }
Porting notes
None.
hash
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <stdio.h>
#include <iostream>
#include <ext/hash_map>
#include <cstring>
using namespace std;
using namespace __gnu_cxx;
struct eqstr {
bool operator()(const char* s1, const char* s2) const {
return strcmp(s1, s2) == 0;
}
};
int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 50000
#else
#define LENGTH 500000
#endif
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
char buf[16];
typedef __gnu_cxx::hash_map<const char*, int, __gnu_cxx::hash<const char*>, eqstr> HM;
HM X;
for (int i=1; i<=n; i++) {
sprintf(buf, "%x", i);
X[strdup(buf)] = i;
}
int c = 0;
for (int i=n; i>0; i--) {
sprintf(buf, "%d", i);
if (X[strdup(buf)]) c++;
}
cout << c << endl;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; use std::collections::HashMap; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 50000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 500000; fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut x = HashMap::<String, i32>::new(); for i in 1..=n { let s = format!("{:X}", i); x.insert(s, i); } let mut c: i32 = 0; for i in (1..=n).rev() { let s = format!("{}", i); if x.contains_key(&s) { c += 1; } } println!("{}", c); }
Porting notes
Using the standard string and hash map types
C++
char buf[16];
typedef __gnu_cxx::hash_map<const char*, int, __gnu_cxx::hash<const char*>, eqstr> HM;
HM X;
Rust
#![allow(unused)] fn main() { let mut x = HashMap::<String, i32>::new(); }
The C++ version uses the C string char*
and the GNU extension hash map __gnu_cxx::hash_map
as opposed to std::string
and std::unordered_map
. I suspect that this program was originally written before the standard hash map became part of the C++ standard.
In the Rust version, we use the Rust standard String
and std::collections::HashMap
rather than more faithfully reproducing C string-like data structures or reimplementing the C++ hash_map
in Rust to stick to the idiomatic Rust style.
There is no need to implement or pass the hash and the equal functions for the string type for the hash map in the Rust version, which would be equivalent to __gnu_cxx::hash<const char*>
and eqstr
, because they are provided by the standard library. This is also the case in std::unordered_map
in C++.
For
range loops with the inclusive range ends
C++
for (int i=1; i<=n; i++) {
...
}
...
for (int i=n; i>0; i--) {
...
}
Rust
#![allow(unused)] fn main() { for i in 1..=n { ... } ... for i in (1..=n).rev() { ... } }
The for
range loop for i in 1..=n { ... }
has the range end n
inclusive and is equivalent to for (int i=1; i<=n; i++) { ... }
in C++.
Similarly, the reverse ordered range loop for i in (1..=n).rev() { ... }
is equivalent to for (int i=n; i>0; i--) { ... }
in C++.
Formating strings
C++
sprintf(buf, "%x", i);
Rust
#![allow(unused)] fn main() { let s = format!("{:X}", i); }
A common way to format a string in Rust is to use format!
roughly corresponding to sprintf
in C++.
While the C++ version writes into an already allocated buffer, the Rust version allocates a new string. This could significantly affect the performance difference but it will be roughly canceled out by the fact that the C++ version will copy the key when inserting the key into hash map in the next line.
Inserting into a hash map
C++
X[strdup(buf)] = i;
Rust
#![allow(unused)] fn main() { x.insert(s, i); }
Rust HashMap
has the square bracket operator []
but it will panic if the key already exists, unlike the C++ hash map. So we use insert
instead.
Checking whether an hash map already contains a key
C++
if (X[strdup(buf)]) c++;
Rust
#![allow(unused)] fn main() { if x.contains_key(&s) { c += 1; } }
There is a subtle difference that the C++ version inserts a new entry with the default value 0
when the key is not already contained in the hash map, in addition to checking whether the key is already contained, while the Rust version only checks if the key is already contained, again, because Rust HashMap
's []
operator will panic if the key does not already exist in the hash map.
This difference is absorbed by the fact that value 0
is not used across the two loops and the two if statements branch the same way.
hash2
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <stdio.h>
#include <iostream>
#include <ext/hash_map>
#include <cstring>
using namespace std;
using namespace __gnu_cxx;
struct eqstr {
bool operator()(const char* s1, const char* s2) const {
return strcmp(s1, s2) == 0;
}
};
int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 200
#else
#define LENGTH 2000
#endif
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
char buf[16];
typedef __gnu_cxx::hash_map<const char*, int, __gnu_cxx::hash<const char*>, eqstr> HM;
HM hash1, hash2;
for (int i=0; i<10000; i++) {
sprintf(buf, "foo_%d", i);
hash1[strdup(buf)] = i;
}
for (int i=0; i<n; i++) {
for (HM::iterator k = hash1.begin(); k != hash1.end(); ++k) {
hash2[(*k).first] += hash1[(*k).first];
}
}
cout << hash1["foo_1"] << " " << hash1["foo_9999"] << " "
<< hash2["foo_1"] << " " << hash2["foo_9999"] << endl;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; use std::collections::HashMap; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 200; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 2000; fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut hash1: HashMap<String, i32> = HashMap::new(); let mut hash2: HashMap<String, i32> = HashMap::new(); for i in 0..10000 { let s = format!("foo_{}", i); hash1.insert(s, i); } for _i in 0..n { for (key, value) in &hash1 { hash2.entry(key.to_string()).and_modify(|e| { *e += *value }).or_insert(*value); } } println!("{} {} {} {}", hash1["foo_1"], hash1["foo_9999"], hash2["foo_1"], hash2["foo_9999"]); }
Porting notes
Most of the code structure is similar to hash.
Iterating hash map entries and conditionally mutating another hash map
C++
for (HM::iterator k = hash1.begin(); k != hash1.end(); ++k) {
hash2[(*k).first] += hash1[(*k).first];
}
Rust
#![allow(unused)] fn main() { for (key, value) in &hash1 { hash2.entry(key.to_string()).and_modify(|e| { *e += *value }).or_insert(*value); } }
Similar to hash, because C++ hash_map
's []
operator inserts a new entry if it's used with a new key while Rust HashMap
's []
operator will panic if the key does not already exist in the hash map, we cannot just use Rust HashMap
's []
operator.
Instead we use the and_modify
and the or_insert
methods to achieve similar behaviors. and_modify
takes a closure |e| { *e += *value }
, which will be executed when the key already exists in the hash map, or_insert
takes a value for a new entry to be inserted when the key doesn't exist.
We need to convert the key of type &String
to String
when passing it to HashMap.entry
because it takes the key type String
, not a reference to it. Due to the cloning that is not necessary in the C++ version, this could be a potential performance disadvantage, but it doesn't seem to be the case according to the performance results where the Rust version was not slower. I suspect that the Rust compiler can optimize away the cloning.
heapsort
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <iostream>
#include <stdlib.h>
#include <math.h>
#include <cstdio>
using namespace std;
#define IM 139968
#define IA 3877
#define IC 29573
double
gen_random(double max) {
static long last = 42;
return( max * (last = (last * IA + IC) % IM) / IM );
}
void
heapsort(int n, double *ra) {
int i, j;
int ir = n;
int l = (n >> 1) + 1;
double rra;
for (;;) {
if (l > 1) {
rra = ra[--l];
} else {
rra = ra[ir];
ra[ir] = ra[1];
if (--ir == 1) {
ra[1] = rra;
return;
}
}
i = l;
j = l << 1;
while (j <= ir) {
if (j < ir && ra[j] < ra[j+1]) { ++j; }
if (rra < ra[j]) {
ra[i] = ra[j];
j += (i = j);
} else {
j = ir + 1;
}
}
ra[i] = rra;
}
}
int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 800000
#else
#define LENGTH 8000000
#endif
int N = ((argc == 2) ? atoi(argv[1]) : LENGTH);
double *ary;
int i;
/* create an array of N random doubles */
ary = (double *)malloc((N+1) * sizeof(double));
for (i=1; i<=N; i++) {
ary[i] = gen_random(1);
}
heapsort(N, ary);
printf("%.10f\n", ary[N]);
free(ary);
return(0);
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: usize = 800000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: usize = 8000000; const IM: i64 = 139968; const IA: i64 = 3877; const IC: i64 = 29573; fn get_random(last_random: &mut i64, max: f64) -> f64 { let new_last = (*last_random * IA + IC) % IM; *last_random = new_last; return max * (new_last as f64) / (IM as f64); } fn heapsort(n: usize, ra: &mut Vec<f64>) { let mut i: usize; let mut j: usize; let mut ir = n; let mut l = (n >> 1) + 1; let mut rra: f64; loop { if l > 1 { l -= 1; rra = ra[l]; } else { rra = ra[ir]; ra[ir] = ra[1]; ir -= 1; if ir == 1 { ra[1] = rra; return; } } i = l; j = l << 1; while j <= ir { if j < ir && ra[j] < ra[j+1] { j += 1; } if rra < ra[j] { ra[i] = ra[j]; i = j; j += i; } else { j = ir + 1; } } ra[i] = rra; } } fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<usize>().unwrap() } else { LENGTH }; let mut last_random: i64 = 42; let mut ary = Vec::<f64>::new(); ary.resize_with(n + 1, Default::default); for i in 1..=n { ary[i] = get_random(&mut last_random, 1.0); } heapsort(n, &mut ary); println!("{:.10}", ary[n]); }
Porting notes
Replacing a C++ block-scoped static variable with a locally-scoped local variable
C++
inline double gen_random(double max) {
static long last = 42;
return( max * (last = (last * IA + IC) % IM) / IM );
}
Rust
fn gen_random(last_random: &mut i64, max: f64) -> f64 { let new_last = (*last_random * IA + IC) % IM; *last_random = new_last; return max * (new_last as f64) / (IM as f64); } ... fn main() { ... let mut last_random: i64 = 42; for i in 1..=n { ary[i] = get_random(&mut last_random, 1.0); } ...
Rust doesn't support a static
variable nested in a function or block scope.
Furthermore, Rust doesn't allow a mutable global (static
) variable without using a thread-safe mechanism such as a mutex.
To avoid the mutex overhead, the Rust version uses a locally scoped local variable last_random
in the main function and passes its mutable reference to the gen_random
function. This is safe because the lifetime of the variable covers the entire uses of the gen_random
function.
I interpret this as Rust discouraging casual uses of global variables and encouraing uses of locally scoped thread-local data for thread safety reasons.
Using usize
variables intead of int
C++
void
heapsort(int n, double *ra) {
int i, j;
int ir = n;
int l = (n >> 1) + 1;
Rust
#![allow(unused)] fn main() { fn heapsort(n: usize, ra: &mut Vec<f64>) { let mut i: usize; let mut j: usize; let mut ir = n; let mut l = (n >> 1) + 1; ... } The Rust version uses the index variables of type `usize` instead of `int`. This is more idiomatic because of the more explict distinction between `usize` and `int` for vector indices in Rust. ## Writing an infinite loop C++ ```cpp for (;;) { ... } }
Rust
#![allow(unused)] fn main() { loop { ... } } In Rust, an infinite loop, which unconditionally loops (often with a `break` or `return` in the loop body), is written with the `loop` keyword, which is analogous to `for (;;) { ... }` or `while (true) { ... }` in C++. ## Formatting floating point values to show 10 digits after the decimal point C++ ```cpp printf("%.10f\n", ary[N]); }
Rust
#![allow(unused)] fn main() { println!("{:.10}", ary[n]); }
Since printf
formats a double
to show the 6 digits after the decimal point by default, the Rust version uses the format specifier {:.6}
to achieve the similar.
lists
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
// from Bill Lear
#include <cstdlib>
#include <iostream>
#include <list>
#include <numeric>
using namespace std;
#ifdef SMALL_PROBLEM_SIZE
const size_t SIZE = 1000;
#else
const size_t SIZE = 10000;
#endif
template <class _ForwardIterator, class _Tp>
void
iota(_ForwardIterator __first, _ForwardIterator __last, _Tp __value)
{
while (__first != __last)
*__first++ = __value++;
}
size_t test_lists() {
std::list<size_t> li1(SIZE);
::iota(li1.begin(), li1.end(), 1);
std::list<size_t> li2(li1);
std::list<size_t> li3;
size_t N = li2.size();
while (N--) {
li3.push_back(li2.front());
li2.pop_front();
}
N = li3.size();
while (N--) {
li2.push_back(li3.back());
li3.pop_back();
}
li1.reverse();
return (li1.front() == SIZE) && (li1 == li2) ? li1.size() : 0;
}
int main(int argc, char* argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 300
#else
#define LENGTH 3000
#endif
size_t ITER = (argc == 2 ? (atoi(argv[1]) < 1 ? 1 : atoi(argv[1])): LENGTH);
size_t result = 0;
while (ITER > 0) {
result = test_lists();
--ITER;
}
std::cout << result << std::endl;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ // from Bill Lear use std::env; use std::collections::LinkedList; use std::collections::linked_list::IterMut; #[cfg(feature = "small_problem_size")] const SIZE: usize = 1000; #[cfg(not(feature = "small_problem_size"))] const SIZE: usize = 10000; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 300; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 3000; fn iota(iter: IterMut<'_, usize>, v: usize) { let mut i = 0; for e in iter { *e = i + v; i += 1; } } fn list_reverse<T>(list: LinkedList<T>) -> LinkedList<T> { let mut reversed = LinkedList::new(); for elem in list { reversed.push_front(elem); } return reversed } fn test_lists() -> usize { let mut li1 = LinkedList::<usize>::new(); for _i in 0..SIZE { li1.push_back(Default::default()); } iota(li1.iter_mut(), 1); let mut li2 = li1.clone(); let mut li3 = LinkedList::<usize>::new(); let mut n = li2.len(); while n != 0 { n -= 1; li3.push_back(li2.pop_front().unwrap()); } n = li3.len(); while n != 0 { n -= 1; li2.push_back(li3.pop_back().unwrap()); } li1 = list_reverse(li1); return if *li1.front().unwrap() == SIZE && li1.eq(&li2) { li1.len() } else { 0 as usize }; } fn main() { let mut args = env::args(); let mut iter = if args.len() == 2 { if args.nth(1).unwrap().parse::<i32>().unwrap() < 1 { 1 } else { args.nth(1).unwrap().parse::<i32>().unwrap() } } else { LENGTH }; let mut result: usize = 0; while iter > 0 { result = test_lists(); iter -= 1; } println!("{}", result); }
Porting notes
Initializing a list with incrementing values (iota)
C++
template <class _ForwardIterator, class _Tp>
void
iota(_ForwardIterator __first, _ForwardIterator __last, _Tp __value)
{
while (__first != __last)
*__first++ = __value++;
}
Rust
#![allow(unused)] fn main() { fn iota(iter: IterMut<'_, usize>, v: usize) { let mut i = 0; for e in iter { *e = i + v; i += 1; } } }
I omitted the use of generics in the Rust version because it's used only for one call site.
In Rust, an iterator is a single entity, an IterMut
in this case (an Iter
if mutation isn't needed) and the iteration over the iterator uses the for
range loop, as opposed to a pair of separate, pointer-like iterators and more free-style iteration over them in C++. I think this helps make the handling of iterators safer in Rust.
Reversing a linked list
C++ none.
Rust
#![allow(unused)] fn main() { fn list_reverse<T>(list: LinkedList<T>) -> LinkedList<T> { let mut reversed = LinkedList::new(); for elem in list { reversed.push_front(elem); } return reversed } }
Rust's LinkedList
interestingly doesn't have a reverse method.
I implemented a simple reverse function that copies to a new linked list in the reverse order. I'd imagine that this could be slower than a built-in reversemethod that reverses the directions of the internal pointers without copying the nodes, but the benchmark results show the Rust version is no slower.
The type inference allows the type T
to be inferred in the LinkedList::new()
call.
Initializinga linked list of a given size with the default element value
C++
std::list<size_t> li1(SIZE);
Rust
#![allow(unused)] fn main() { let mut li1 = LinkedList::<usize>::new(); for _i in 0..SIZE { li1.push_back(Default::default()); } }
Similar to Vec
, there isn't a way to initialize a LinkedList
with a given size, unlike std::list
in C++. I used a two-step approach of allocating a zero-sized linked list and then resizing it with the default element value. Default::default()
is a convenient way to specify the default value of a certain type without typing the type name.
Copying a linked list
C++
std::list<size_t> li2(li1);
Rust
#![allow(unused)] fn main() { let mut li2 = li1.clone(); }
Rust's LinkedList
doesn't provide a way to initialize a linked list as a copy of another. There is no copy constructor as in C++. We use clone()
to achieve the same.
Popping the front of a linked list and capturing the front
C++
li3.push_back(li2.front());
li2.pop_front();
Rust
#![allow(unused)] fn main() { li3.push_back(li2.pop_front().unwrap()); }
Rust's pop_front()
returns an std::optional
value in addition to removing the front element from the linked list and achieves what both front()
and pop_front()
need to be used to achieve the same in C++.
An if
expression
C++
return (li1.front() == SIZE) && (li1 == li2) ? li1.size() : 0;
Rust
#![allow(unused)] fn main() { return if *li1.front().unwrap() == SIZE && li1.eq(&li2) { li1.len() } else { 0 as usize }; }
To peek the value of the front element of a linked list, we get a std::optional
by calling front()
, takes the reference by calling unwrap
on the std::optional
and dereference it. This would normally force programmars to think about the case where the linked list is empty but here we bypass that for simplicity by asserting otherwise.
The ==
operator in C++ would compare the two linked lists as a whole. For that, we use the eq
method in Rust.
We use the if
statement instead of a ternary operator which doesn't exist in Rust.
lists1
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <list>
#include <numeric>
template <class _ForwardIterator, class _Tp>
void
iota(_ForwardIterator __first, _ForwardIterator __last, _Tp __value)
{
while (__first != __last)
*__first++ = __value++;
}
using namespace std;
void list_print_n (list<int> L, int n) {
int c, lastc = n - 1;
list<int>::iterator i;
for (c = 0, i = L.begin(); i != L.end() && c < n; ++i, ++c) {
cout << (*i);
if (c < lastc) cout << " ";
}
cout << endl;
}
int main(int argc, char* argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 100000
#else
#define LENGTH 1000000
#endif
int N = (argc == 2 ? (atoi(argv[1]) < 1 ? 1 : atoi(argv[1])): LENGTH);
list<int>::iterator i;
// create empty list B
list<int> B;
// create list (A) of integers from 1 through N
list<int> A(N);
::iota(A.begin(), A.end(), 1);
// move each individual item from A to B, in a loop, reversing order
while (! A.empty()) {
B.push_front(A.front());
A.pop_front();
}
// print first 2 elements of B
list_print_n(B, 2);
// reverse B (can be done in place)
B.reverse();
// reverse(B.begin(), B.end());
// is 0 a member of B?
cout << ((find(B.begin(), B.end(), 0) == B.end()) ? "false" : "true") << endl;
// is N a member of B?
cout << ((find(B.begin(), B.end(), N) == B.end()) ? "false" : "true") << endl;
// filter values from B to A that are less than N/2, preserving order
int mid = N/2;
for (i = B.begin(); i != B.end(); ++i) {
if ((*i) < mid) A.push_back(*i);
}
// print first ten items of A
list_print_n(A, 10);
// print sum of items in A that are less than 1000
int sum = 0;
for (i = A.begin(); i != A.end(); ++i) {
if ((*i) < 1000) sum += (*i);
}
cout << sum << endl;
// append B to end of A
A.splice(A.end(), B);
// print length and last element of A
cout << A.size() << " " << A.back() << endl;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; use std::collections::LinkedList; use std::collections::linked_list::IterMut; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 100000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 1000000; fn iota(iter: IterMut<'_, i32>, v: i32) { let mut i = 0; for e in iter { *e = i + v; i += 1; } } fn list_print_n(list: &LinkedList<i32>, n: i32) { let lastc = n - 1; let mut c = 0; for elem in list.iter() { if c >= n { break; } print!("{}", elem); if c < lastc { print!(" "); } c += 1; } println!(); } fn list_reverse<T>(list: LinkedList<T>) -> LinkedList<T> { let mut reversed = LinkedList::new(); for elem in list { reversed.push_front(elem); } return reversed } fn main() { let mut args = env::args(); let n = if args.len() == 2 { if args.nth(1).unwrap().parse::<i32>().unwrap() < 1 { 1 } else { args.nth(1).unwrap().parse::<i32>().unwrap() } } else { LENGTH }; let mut b = LinkedList::<i32>::new(); let mut a = LinkedList::<i32>::new(); for _i in 0..n { a.push_back(Default::default()); } iota(a.iter_mut(), 1); while a.len() > 0 { b.push_front(a.pop_front().unwrap()); } list_print_n(&b, 2); b = list_reverse(b); println!("{}", if !b.contains(&0) { "false" } else { "true" }); println!("{}", if !b.contains(&n) { "false" } else { "true" }); let mid = n / 2; for elem in b.iter() { if *elem < mid { a.push_back(*elem); } } list_print_n(&a, 10); let mut sum = 0; for elem in b.iter() { if *elem < 1000 { sum += *elem; } } println!("{}", sum); a.append(&mut b); println!("{} {}", a.len(), *a.back().unwrap()); }
Porting notes
Many parts are shared with lists with some differences.
There are no particular notes.
matrix
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <iostream>
#include <stdlib.h>
using namespace std;
#define SIZE 30
int **mkmatrix(int rows, int cols) {
int i, j, count = 1;
int **m = (int **) malloc(rows * sizeof(int *));
for (i=0; i<rows; i++) {
m[i] = (int *) malloc(cols * sizeof(int));
for (j=0; j<cols; j++) {
m[i][j] = count++;
}
}
return(m);
}
void zeromatrix(int rows, int cols, int **m) {
int i, j;
for (i=0; i<rows; i++)
for (j=0; j<cols; j++)
m[i][j] = 0;
}
void freematrix(int rows, int **m) {
while (--rows > -1) { free(m[rows]); }
free(m);
}
int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
int i, j, k, val;
for (i=0; i<rows; i++) {
for (j=0; j<cols; j++) {
val = 0;
for (k=0; k<cols; k++) {
val += m1[i][k] * m2[k][j];
}
m3[i][j] = val;
}
}
return(m3);
}
int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 10000
#else
#define LENGTH 100000
#endif
int i, n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
int **m1 = mkmatrix(SIZE, SIZE);
int **m2 = mkmatrix(SIZE, SIZE);
int **mm = mkmatrix(SIZE, SIZE);
for (i=0; i<n; i++) {
mm = mmult(SIZE, SIZE, m1, m2, mm);
}
cout << mm[0][0] << " " << mm[2][3] << " " << mm[3][2] << " " << mm[4][4] << endl;
freematrix(SIZE, m1);
freematrix(SIZE, m2);
freematrix(SIZE, mm);
return(0);
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; const SIZE: usize = 30; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 10000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 100000; fn mkmatrix(rows: usize, cols: usize) -> Vec::<Vec::<i32>> { let mut count = 1; let mut m = Vec::<Vec::<i32>>::new(); m.resize(rows, Default::default()); for i in 0..rows { m[i].resize(cols, Default::default()); for j in 0..cols { m[i][j] = count; count += 1; } } return m; } fn mmult(rows: usize, cols: usize, m1: &Vec::<Vec::<i32>>, m2: &Vec::<Vec::<i32>>, m3: &mut Vec::<Vec::<i32>>) { for i in 0..rows { for j in 0..cols { let mut val = 0; for k in 0..cols { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } } fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let m1 = mkmatrix(SIZE, SIZE); let m2 = mkmatrix(SIZE, SIZE); let mut mm = mkmatrix(SIZE, SIZE); for _i in 0..n { mmult(SIZE, SIZE, &m1, &m2, &mut mm); } println!("{} {} {} {}", mm[0][0], mm[2][3], mm[3][2], mm[4][4]); }
Porting notes
Using nested Vec
s to represent a matrix
C++
int **mkmatrix(int rows, int cols) {
int i, j, count = 1;
int **m = (int **) malloc(rows * sizeof(int *));
for (i=0; i<rows; i++) {
m[i] = (int *) malloc(cols * sizeof(int));
for (j=0; j<cols; j++) {
m[i][j] = count++;
}
}
return(m);
}
Rust
#![allow(unused)] fn main() { fn mkmatrix(rows: usize, cols: usize) -> Vec::<Vec::<i32>> { let mut count = 1; let mut m = Vec::<Vec::<i32>>::new(); m.resize(rows, Default::default()); for i in 0..rows { m[i].resize(cols, Default::default()); for j in 0..cols { m[i][j] = count; count += 1; } } return m; } }
We use a two-level, nested Vec
data structure where the first level is for the rows and the second level is for the columns. That is similar to the data structure that uses arrays in the C++ version. While it'd be possible to use raw i32 arrays in Rust, it is more idiomatic to use Vec
treating Vec
as the safe equivalent of raw arrays.
Passing matrices to the matrix multiplication function using references
C++
int **m1 = mkmatrix(SIZE, SIZE);
int **m2 = mkmatrix(SIZE, SIZE);
int **mm = mkmatrix(SIZE, SIZE);
for (i=0; i<n; i++) {
mm = mmult(SIZE, SIZE, m1, m2, mm);
}
Rust
#![allow(unused)] fn main() { let m1 = mkmatrix(SIZE, SIZE); let m2 = mkmatrix(SIZE, SIZE); let mut mm = mkmatrix(SIZE, SIZE); for _i in 0..n { mmult(SIZE, SIZE, &m1, &m2, &mut mm); } }
Freeing matrices
C++
freematrix(SIZE, m1);
freematrix(SIZE, m2);
freematrix(SIZE, mm);
Rust
#![allow(unused)] fn main() { // empty }
The Rust version doesn't need to explicitly free the matrices because they are automatically dropped at the end of the main function thanks to the ownership mechanism.
Performance analysis
The benchmark results show that the Rust version is more than 2 times slower than the C++ version.
We compare the generated code between C++ and Rust.
C++
Here is the hot loop according to the Linux Perf:
C++ perfdata
5.98 │2b0: mov -0x10(%r14,%r8,8),%r10
5.93 │ mov -0x8(%r14,%r8,8),%r11
7.04 │ mov (%r10,%rdi,4),%r10d
11.66 │ imul -0x8(%rsi,%r8,4),%r10d
5.84 │ add %r9d,%r10d
5.71 │ mov (%r11,%rdi,4),%r11d
10.56 │ imul -0x4(%rsi,%r8,4),%r11d
11.00 │ mov (%r14,%r8,8),%r9
5.17 │ mov (%r9,%rdi,4),%r9d
4.77 │ imul (%rsi,%r8,4),%r9d
4.52 │ add %r11d,%r9d
4.53 │ add %r10d,%r9d
8.94 │ add $0x3,%r8
4.73 │ cmp $0x20,%r8
│ ↑ jne 2b0
C++ LLVM IR
106: ; preds = %106, %104
%107 = phi i64 [ 0, %104 ], [ %135, %106 ]
%108 = phi i32 [ 0, %104 ], [ %134, %106 ]
%109 = getelementptr inbounds nuw i32, ptr %103, i64 %107
%110 = load i32, ptr %109, align 4, !tbaa !10
%111 = getelementptr inbounds nuw ptr, ptr %39, i64 %107
%112 = load ptr, ptr %111, align 8, !tbaa !5
%113 = getelementptr inbounds nuw i32, ptr %112, i64 %105
%114 = load i32, ptr %113, align 4, !tbaa !10
%115 = mul nsw i32 %114, %110
%116 = add nsw i32 %115, %108
%117 = add nuw nsw i64 %107, 1
%118 = getelementptr inbounds nuw i32, ptr %103, i64 %117
%119 = load i32, ptr %118, align 4, !tbaa !10
%120 = getelementptr inbounds nuw ptr, ptr %39, i64 %117
%121 = load ptr, ptr %120, align 8, !tbaa !5
%122 = getelementptr inbounds nuw i32, ptr %121, i64 %105
%123 = load i32, ptr %122, align 4, !tbaa !10
%124 = mul nsw i32 %123, %119
%125 = add nsw i32 %124, %116
%126 = add nuw nsw i64 %107, 2
%127 = getelementptr inbounds nuw i32, ptr %103, i64 %126
%128 = load i32, ptr %127, align 4, !tbaa !10
%129 = getelementptr inbounds nuw ptr, ptr %39, i64 %126
%130 = load ptr, ptr %129, align 8, !tbaa !5
%131 = getelementptr inbounds nuw i32, ptr %130, i64 %105
%132 = load i32, ptr %131, align 4, !tbaa !10
%133 = mul nsw i32 %132, %128
%134 = add nsw i32 %133, %125
%135 = add nuw nsw i64 %107, 3
%136 = icmp eq i64 %135, 30
br i1 %136, label %137, label %106, !llvm.loop !22
The loop is unrolled by a factor of 3.
Rust
Here is the hot loop according to the Linux Perf:
Rust perfdata
16.54 │ 260: cmp %rbx,%rsi
│ ↓ je 5c3
5.16 │ cmp %rbx,%r8
│ ↓ je 5d2
4.99 │ mov (%r12),%r13
4.84 │ cmp %r13,%rdi
│ ↓ jae 5e1
10.44 │ mov 0x8(%rax),%r13
5.67 │ mov 0x0(%r13,%rbx,4),%r13d
5.73 │ mov -0x8(%r12),%r14
5.16 │ imul (%r14,%rdi,4),%r13d
5.51 │ inc %rbx
10.59 │ add %r13d,%r9d
5.17 │ add $0x18,%r12
17.06 │ cmp $0x1e,%rbx
│ ↑ jne 260
Rust LLVM IR
bb10.us.i: ; preds = %bb17.us.i, %bb7.us.i
%iter2.sroa.0.024.us.i = phi i64 [ 0, %bb7.us.i ], [ %_0.i13.us.i, %bb17.us.i ]
%val.sroa.0.023.us.i = phi i32 [ 0, %bb7.us.i ], [ %95, %bb17.us.i ]
%_0.i13.us.i = add nuw nsw i64 %iter2.sroa.0.024.us.i, 1
%exitcond.not.i = icmp eq i64 %iter2.sroa.0.024.us.i, %_47.us.i
br i1 %exitcond.not.i, label %panic7.i.invoke, label %bb15.us.i
bb15.us.i: ; preds = %bb10.us.i
%_48.us.i = load ptr, ptr %90, align 8, !nonnull !4, !noundef !4
%_15.us.i = getelementptr inbounds i32, ptr %_48.us.i, i64 %iter2.sroa.0.024.us.i
%_14.us.i = load i32, ptr %_15.us.i, align 4, !noundef !4
%exitcond89.not.i = icmp eq i64 %iter2.sroa.0.024.us.i, %m2.val25
br i1 %exitcond89.not.i, label %panic7.i.invoke, label %bb16.us.i
bb16.us.i: ; preds = %bb15.us.i
call void @llvm.assume(i1 %35)
%_19.us.i = getelementptr inbounds %"alloc::vec::Vec<i32>", ptr %m2.val, i64 %iter2.sroa.0.024.us.i
%93 = getelementptr inbounds i8, ptr %_19.us.i, i64 16
%_57.us.i = load i64, ptr %93, align 8, !noundef !4
%_60.us.i = icmp ult i64 %iter1.sroa.0.025.us.i, %_57.us.i
br i1 %_60.us.i, label %bb17.us.i, label %panic7.i.invoke
bb17.us.i: ; preds = %bb16.us.i
%94 = getelementptr inbounds i8, ptr %_19.us.i, i64 8
%_58.us.i = load ptr, ptr %94, align 8, !nonnull !4, !noundef !4
%_18.us.i = getelementptr inbounds i32, ptr %_58.us.i, i64 %iter1.sroa.0.025.us.i
%_17.us.i = load i32, ptr %_18.us.i, align 4, !noundef !4
%_13.us.i = mul i32 %_17.us.i, %_14.us.i
%95 = add i32 %_13.us.i, %val.sroa.0.023.us.i
%exitcond90.not.i = icmp eq i64 %_0.i13.us.i, 30
br i1 %exitcond90.not.i, label %bb12.us.i, label %bb10.us.i
The key differences from the C++ version are the following.
First, the mmult
function isn't inlined into the main function. That makes it non-trivial to constant-propgate the Vec
sizes as compile-time constant 30. If intraprocedural constant propagation was implemented indepedent of inlining, it would do but it doesn't seem to be there.
Second, the bounds checks are not elided. The bounds checks could be hoisted out of the loop (loop invariant code motion) and the special-cased, bounds-check-free version of the innermost loop could be generated (loop versioning). The LLVM backend probably has some form of these implemented but it doesn't seem to trigger here.
Third, loop unrolling has not been applied. I suspect that because of the above two lack of optimizations, the loop doesn't look simple enough for loop unrolling to kick in.
Note these are differences in the frontends. The Clang frontend applied these three optimiations for the C++ version. But the Rustc frontend did not for the Rust version. While the LLVM backend could also apply these in theory but it doesn't seem to at least in the given environment.
A June 2025 update on the performance difference
I filed an issue at
the Rust repo about this performance difference. According to the
conversation, if the program uses arrays (of compile-time constant
size), as opposed to vectors to represent the matrices, where the
arrays are heap-allocated by using Box
's to be equivalent to the C++
version, as below, similar set of optimizations are performanced and
the performance gets on par with the C++ version.
A version that uses heap allocated arrays instead of vectors:
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; use std::convert::TryInto; const SIZE: usize = 30; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 10000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 100000; fn mkmatrix(rows: usize, cols: usize) -> Box::<[Box::<[i32; SIZE]>; SIZE]> { let mut count = 1; let mut row_vec = Vec::<Box<[i32; SIZE]>>::new(); for _i in 0..rows { let mut row: [i32; SIZE] = [0; SIZE]; for j in 0..cols { row[j] = count; count += 1; } row_vec.push(Box::new(row)); } let m: [Box<[i32; SIZE]>; SIZE] = row_vec.try_into().unwrap_or_else( |v: Vec<Box<[i32; SIZE]>>| panic!("Expected a Vec of length {} but it was {}", SIZE, v.len())); return Box::new(m); } fn mmult(rows: usize, cols: usize, m1: &Box::<[Box::<[i32; SIZE]>; SIZE]>, m2: &Box::<[Box::<[i32; SIZE]>; SIZE]>, m3: &mut Box::<[Box::<[i32; SIZE]>; SIZE]>) { for i in 0..rows { for j in 0..cols { let mut val = 0; for k in 0..cols { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } } fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let m1 = mkmatrix(SIZE, SIZE); let m2 = mkmatrix(SIZE, SIZE); let mut mm = mkmatrix(SIZE, SIZE); for _i in 0..n { mmult(SIZE, SIZE, &m1, &m2, &mut mm); } println!("{} {} {} {}", mm[0][0], mm[2][3], mm[3][2], mm[4][4]); }
But this may not be considered equivalent to the C++ version in that
the array version of mmult
in Rust is tied to the specific size
while the Rust vector version or the C++ version work on any sizes.
I think that the Rust compiler could be improved to similarly optimize the vector version of the code.
methcall
Full source code
C++
// -*- mode: c++ -*-
// $Id
// http://www.bagley.org/~doug/shootout/
// with some help from Bill Lear
#include <stdlib.h>
#include <iostream>
using namespace std;
class Toggle {
public:
Toggle(bool start_state) : state(start_state) { }
virtual ~Toggle() { }
bool value() {
return(state);
}
virtual Toggle& activate() {
state = !state;
return(*this);
}
bool state;
};
class NthToggle : public Toggle {
public:
NthToggle(bool start_state, int max_counter) :
Toggle(start_state), count_max(max_counter), counter(0) {
}
Toggle& activate() {
if (++this->counter >= this->count_max) {
state = !state;
counter = 0;
}
return(*this);
}
private:
int count_max;
int counter;
};
int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 100000000
#else
#define LENGTH 1000000000
#endif
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
bool val = true;
Toggle *toggle = new Toggle(val);
for (int i=0; i<n; i++) {
val = toggle->activate().value();
}
cout << ((val) ? "true" : "false") << endl;
delete toggle;
val = true;
NthToggle *ntoggle = new NthToggle(val, 3);
for (int i=0; i<n; i++) {
val = ntoggle->activate().value();
}
cout << ((val) ? "true" : "false") << endl;
delete ntoggle;
return 0;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ // with some help from Bill Lear use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 100000000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 1000000000; struct Toggle { state: bool } impl Toggle { fn new(start_state: bool) -> Toggle { Toggle { state: start_state } } } trait Togglable { fn activate(&mut self); fn value(&self) -> bool; } impl Togglable for Toggle { fn activate(&mut self) { self.state = !self.state; } fn value(&self) -> bool { return self.state; } } struct NthToggle { state: bool, count_max: i32, counter: i32, } impl NthToggle { fn new(start_state: bool, max_counter: i32) -> NthToggle { NthToggle { state: start_state, count_max: max_counter, counter: 0, } } } impl Togglable for NthToggle { fn activate(&mut self) { self.counter += 1; if self.counter >= self.count_max { self.state = !self.state; self.counter = 0; } } fn value(&self) -> bool { return self.state; } } fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut val = true; let mut toggle = Toggle::new(val); for _i in 0..n { toggle.activate(); val = toggle.value(); } println!("{}", if val { "true" } else { "false" }); val = true; let mut ntoggle = NthToggle::new(val, 3); for _i in 0..n { ntoggle.activate(); val = ntoggle.value(); } println!("{}", if val { "true" } else { "false" }); }
Porting notes
Declaring a struct in place for a class
C++
class Toggle {
public:
Toggle(bool start_state) : state(start_state) { }
virtual ~Toggle() { }
bool value() {
return(state);
}
virtual Toggle& activate() {
state = !state;
return(*this);
}
bool state;
};
Rust
#![allow(unused)] fn main() { struct Toggle { state: bool } impl Toggle { fn new(start_state: bool) -> Toggle { Toggle { state: start_state } } } trait Togglable { fn activate(&mut self); fn value(&self) -> bool; } impl Togglable for Toggle { fn activate(&mut self) { self.state = !self.state; } fn value(&self) -> bool { return self.state; } } }
There are no classes and structs are used in Rust.
Member variables and methods aren't placed together in a Rust struct. Member variables are declared in the struct and member methods are declared in the impl
block for the struct.
The new
method, which is like a C++ constructor, is usually used to create an instance of the struct. The struct values are often initialized with struct literals, as in Toggle { state: start_state }
, as opposed to the C++ member-by-member style.
The drop
method, which is analogous to a C++ destructor but is often omitted because the owership takes care of destruction unless the struct requires a custom logic for deinitization like resource reclamation.
The methods activate
and value
would be put in the impl Toggle { ... }
otherwise but they are put into a trait Togglable
here to handle the use of inheritance, which will be discussed next. A trait is like an interface in Go or Java. For now, note that the struct Toggle
implements the trait Togglable
that has the two methods activate
and value
and they are defined in the impl Tooglable for Toggle { ... }
block.
Using a trait in place for inheritance.
C++
class NthToggle : public Toggle {
public:
NthToggle(bool start_state, int max_counter) :
Toggle(start_state), count_max(max_counter), counter(0) {
}
Toggle& activate() {
if (++this->counter >= this->count_max) {
state = !state;
counter = 0;
}
return(*this);
}
private:
int count_max;
int counter;
};
Rust
#![allow(unused)] fn main() { struct NthToggle { state: bool, count_max: i32, counter: i32, } impl NthToggle { fn new(start_state: bool, max_counter: i32) -> NthToggle { NthToggle { state: start_state, count_max: max_counter, counter: 0, } } } impl Togglable for NthToggle { fn activate(&mut self) { self.counter += 1; if self.counter >= self.count_max { self.state = !self.state; self.counter = 0; } } fn value(&self) -> bool { return self.state; } } }
Rust is not an object oriented programming lanuage. In the C++ version, the class NthToggle
inherits from the class Toggle
. Because Rust doesn't support inheritance, we instead use a trait, which is like interfaces in Go or Java. The activate
and value
methods, which are inherited or overridden, are put into a trait Togglable
, as opposed to directly in the impl Toggle { ... }
block. The struct NthToggle
has three variables, instead of one for Toogle
and it also implements the trait Togglable
. The impl Togglable for NthToggle { ... }
block implements the two methods for NthToggle
. That way, the two methods can be separately implemented for Toggle
and NthToggle
and can be polymorphically called like C++ virtual calls. A caveat is that the state
variable and the value
method need to be duplicated.
Avoding unnecessarily returning a reference
C++
val = toggle->activate().value();
Rust
#![allow(unused)] fn main() { toggle.activate(); val = toggle.value(); }
I chose to change the activate
method not to return a reference to the instance. I believe that we would want to avoid returning a reference in Rust as it might require lifetime annotations to be used and lead to unnecessary complexity. In this example, we could simply call the value
method to get the state value in the second line.
Performance analysis
The benchmark results show that the Rust version is about 8 times faster than the C++ version.
We will compare the generated code between C++ and Rust.
C++
C++ perfdata (the hot function list)
Overhead Command Shared Object Symbol
78.05% methcall methcall [.] main
12.20% methcall methcall [.] NthToggle::activate()
9.65% methcall methcall [.] Toggle::activate()
C++ perfdata (the main function)
4.86 │ 50: mov (%r14),%rax
4.73 │ mov %r14,%rdi
40.76 │ → call *0x10(%rax)
2.60 │ dec %ebp
4.86 │ ↑ jne 50
C++ perfdata (Toggle::activate()
)
21.28 │ mov %rdi,%rax
39.85 │ xorb $0x1,0x8(%rdi)
38.87 │ ← ret
C++ perfdata (NthToggle::activate()
)
10.18 │ mov %rdi,%rax
9.78 │ mov 0x10(%rdi),%ecx
10.25 │ inc %ecx
21.22 │ mov %ecx,0x10(%rdi)
9.13 │ cmp 0xc(%rdi),%ecx
│ ↓ jl 1b
3.94 │ xorb $0x1,0x8(%rax)
7.03 │ movl $0x0,0x10(%rax)
28.45 │1b: ← ret
It looks like the activate
call isn't devirtualized or inlined.
Rust
Rust perfdata (the host function list)
Overhead Command Shared Object Symbol
99.92% methcall methcall [.] methcall::main
Rust perfdata (the host loop in the main function)
2.52 │280: xor %dil,%cl
4.92 │ xor %r8b,%cl
4.50 │ xor %r9b,%cl
2.40 │ xor %r11b,%cl
3.75 │ add $0xfffffffc,%ebx
3.51 │ ↑ je 23a
4.33 │291: lea 0x1(%rdx),%r8d
6.20 │ cmp $0x2,%edx
3.23 │ setge %dil
7.95 │ cmovge %esi,%r8d
3.92 │ lea 0x1(%r8),%edx
2.63 │ cmp $0x2,%r8d
9.47 │ cmovge %esi,%edx
4.91 │ setge %r8b
2.28 │ mov $0x0,%r10d
3.27 │ cmp $0x2,%edx
2.52 │ setge %r9b
3.39 │ ↓ jge 2c3
2.75 │ inc %edx
1.99 │ mov %edx,%r10d
3.63 │2c3: mov $0x0,%edx
2.92 │ cmp $0x2,%r10d
2.92 │ setge %r11b
3.80 │ ↑ jge 280
1.93 │ inc %r10d
2.83 │ mov %r10d,%edx
1.52 │ ↑ jmp 280
We can see that the activate
call is devirtualized, inlined and the loop is unrolled by a factor of 4. This explains the performance difference.
moments
Full source code
C++
// -*- mode: c++ -*-
// $Id
// http://www.bagley.org/~doug/shootout/
// Calculate statistical moments of a region, from Bill Lear
// Modified by Tamás Benkõ
// Further modified by Tom Hyer
#include <vector>
#include <numeric>
#include <iterator>
#include <algorithm>
#include <cstdio>
#include <cstdlib>
#include <cmath>
using namespace std;
template <class T>
struct moments {
public:
template <class InputIterator>
moments(InputIterator begin, InputIterator end)
: median(0.0), mean(0.0), average_deviation(0.0),
standard_deviation(0.0), variance(0.0),
skew(0.0), kurtosis(0.0)
{
T sum = accumulate(begin, end, 0.0);
size_t N = end - begin;
mean = sum / N;
for (InputIterator i = begin; i != end; ++i) {
T deviation = *i - mean;
average_deviation += fabs(deviation);
T temp = deviation * deviation;
variance += temp;
temp *= deviation;
skew += temp;
kurtosis += temp * deviation;
}
average_deviation /= N;
variance /= (N - 1);
standard_deviation = sqrt(variance);
if (variance) {
skew /= (N * variance * standard_deviation);
kurtosis = kurtosis/(N * variance * variance) - 3.0;
}
InputIterator mid = begin+N/2;
nth_element(begin, mid, end);
if (N % 2 == 0) {
InputIterator next_biggest = max_element(begin, mid);
median = (*mid+*next_biggest)/2;
}
else
median = *mid;
}
T median;
T mean;
T average_deviation;
T standard_deviation;
T variance;
T skew;
T kurtosis;
};
int main(int argc, char**argv) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 500000
#else
#define LENGTH 5000000
#endif
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
vector<double> v;
double d;
for (unsigned i = 0; i != n; ++i) v.push_back(i);
moments<double> m(v.begin(), v.end());
printf("n: %d\n", v.end() - v.begin());
printf("median: %f\n", m.median);
printf("mean: %f\n", m.mean);
printf("average_deviation: %f\n", m.average_deviation);
printf("standard_deviation: %f\n", m.standard_deviation);
printf("variance: %f\n", m.variance);
printf("skew: %f\n", m.skew);
printf("kurtosis: %f\n", m.kurtosis);
return 0;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ // Calculate statistical moments of a region, from Bill Lear // Modified by Tamás Benkõ // Further modified by Tom Hyer use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 500000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 5000000; struct Moments<T> { median: T, mean: T, average_deviation: T, standard_deviation: T, variance: T, skew: T, kurtosis: T, } impl Moments<f64> { fn new(slice: &mut [f64]) -> Moments<f64> { let mut m = Moments::<f64> { median: 0.0, mean: 0.0, average_deviation: 0.0, standard_deviation: 0.0, variance: 0.0, skew: 0.0, kurtosis: 0.0, }; let sum: f64 = slice.iter().sum(); let n = slice.iter().count(); m.mean = sum / (n as f64); for e in slice.iter() { let deviation = *e - m.mean; m.average_deviation += deviation.abs(); let mut temp = deviation * deviation; m.variance += temp; temp *= deviation; m.skew += temp; m.kurtosis += temp * deviation; } m.average_deviation /= n as f64; m.variance /= (n - 1) as f64; m.standard_deviation = m.variance.sqrt(); if m.variance != 0.0 { m.skew /= (n as f64) * m.variance * m.standard_deviation; m.kurtosis = m.kurtosis / ((n as f64) * m.variance * m.variance) - 3.0; } let (before, mid, _after) = slice.select_nth_unstable_by(n / 2, |a, b| a.partial_cmp(b).unwrap()); if n % 2 == 0 { let next_biggest = before.iter().max_by(|a, b| a.partial_cmp(b).unwrap()); m.median = (*mid + *next_biggest.unwrap()) / 2.0; } else { m.median = *mid; } return m; } } fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut v = Vec::<f64>::new(); for i in 0..n { v.push(i as f64); } let m = Moments::<f64>::new(&mut v[..]); println!("n: {}", v.iter().count()); println!("median: {:.6}", m.median); println!("mean: {:.6}", m.mean); println!("average_deviation: {:.6}", m.average_deviation); println!("standard_deviation: {:.6}", m.standard_deviation); println!("variance: {:.6}", m.variance); println!("skew: {:.6}", m.skew); println!("kurtosis: {:.6}", m.kurtosis); }
Porting notes
Passing the input vector using a slice, as opposed to C++ iterators
C++
template <class InputIterator>
moments(InputIterator begin, InputIterator end)
: median(0.0), mean(0.0), average_deviation(0.0),
standard_deviation(0.0), variance(0.0),
skew(0.0), kurtosis(0.0)
{
...
}
Rust
#![allow(unused)] fn main() { fn new(slice: &mut [f64]) -> Moments<f64> { let mut m = Moments::<f64> { median: 0.0, mean: 0.0, average_deviation: 0.0, standard_deviation: 0.0, variance: 0.0, skew: 0.0, kurtosis: 0.0, }; ... } }
The C++ version passes the input vector using its begin
and end
iterators to the moments
constructor.
In constract, the Rust version passes a mutable slice of the input vector. &mut [f64]
represents a mutable slice of type f64
. &mut v[..]
is the syntax that takes a mutable slice of a vector v
. If it were an immutable slice, the syntax would be &[f64]
and &v[..]
, respectively.
Slices are safer than the C++ iterators in that they are bounds-checked and, being single entities, they are less error-prone, avoiding potential mistakes like mixing up iterators from different containers or passing them in a wrong order.
The Rust version uses a slice instead of an iterator because the select_nth_unstable_by
method that is used to implement the C++ std::nth_element
below is available for a slice but not for an iterator.
Explicitly converting types
C++
mean = sum / N;
Rust
#![allow(unused)] fn main() { m.mean = sum / (n as f64); }
In the C++ version, the division implicit converts the N
of type size_t
to double
.
In Rust, implicit conversions are not supported for safety reasons and types need to be explicitly converted. The Rust version uses the as
operator as in (n as f64)
to convert from usize
to f64
.
Implementing std::nth_element
with select_nth_unstable_by
, multiple return values as a tuple, explicitly handling 64-bit floating point not having a total order
C++
InputIterator mid = begin+N/2;
nth_element(begin, mid, end);
Rust
#![allow(unused)] fn main() { let (before, mid, _after) = slice.select_nth_unstable_by(n / 2, |a, b| a.partial_cmp(b).unwrap()); }
The C++ version uses the std::nth_element
function to find the median value.
The Rust version uses the select_nth_unstable_by
method as an equivalent.
select_nth_unstable_by
uses a tuple return type to return multiple values. let (before, mid, _after) = ...
declares and assigns each element of the returned tuple as local variables conveniently in a single line.
The 64-bit floating point type (double
or f64
) doesn't technically have a total order because of NaN
. So there could be subtle undefined or implementation-dependent behaviors where strict ordering is expected. The Rust version makes it explicit by not letting f64
implement the cmp
method. The closure |a, b| a.partial_cmp(b).unwrap()
makes it explicit that it would panic if NaN
is encountered, though it doesn't occur in this program.
nestedloop
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <iostream>
#include <stdlib.h>
using namespace std;
int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 30
#else
#define LENGTH 46
#endif
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
int a, b, c, d, e, f, x=0;
for (a=0; a<n; a++)
for (b=0; b<n; b++)
for (c=0; c<n; c++)
for (d=0; d<n; d++)
for (e=0; e<n; e++)
for (f=0; f<n; f++)
x++;
cout << x << endl;
return(0);
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 30; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 46; fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut x: i32 = 0; for _a in 0..n { for _b in 0..n { for _c in 0..n { for _d in 0..n { for _e in 0..n { for _f in 0..n { x = x.wrapping_add(1); } } } } } } println!("{}", x); }
Porting notes
Integer addition without trapping
C++
x++;
Rust
#![allow(unused)] fn main() { x = x.wrapping_add(1); }
In Rust, the normal integer addition would panic if it causes an overflow, unlike in C++. To avoid this, the Rust version uses the wrapping_add
method.
objinst
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <stdlib.h>
#include <iostream>
using namespace std;
class Toggle {
public:
Toggle(bool start_state) : state(start_state) { }
virtual ~Toggle() { }
bool value() {
return(state);
}
virtual Toggle& activate() {
state = !state;
return(*this);
}
bool state;
};
class NthToggle : public Toggle {
public:
NthToggle(bool start_state, int max_counter) :
Toggle(start_state), count_max(max_counter), counter(0) {
}
Toggle& activate() {
if (++this->counter >= this->count_max) {
state = !state;
counter = 0;
}
return(*this);
}
private:
int count_max;
int counter;
};
int
main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 1000000
#else
#define LENGTH 70000000
#endif
int n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
Toggle *toggle1 = new Toggle(true);
for (int i=0; i<5; i++) {
cout << ((toggle1->activate().value()) ? "true" : "false") << endl;
}
delete toggle1;
for (int i=0; i<n; i++) {
Toggle *toggle = new Toggle(true);
delete toggle;
}
cout << endl;
NthToggle *ntoggle1 = new NthToggle(true, 3);
for (int i=0; i<8; i++) {
cout << ((ntoggle1->activate().value()) ? "true" : "false") << endl;
}
delete ntoggle1;
for (int i=0; i<n; i++) {
NthToggle *ntoggle = new NthToggle(true, 3);
delete ntoggle;
}
return 0;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 1000000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 70000000; struct Toggle { state: bool } impl Toggle { fn new(start_state: bool) -> Toggle { Toggle { state: start_state } } } trait Togglable { fn activate(&mut self); fn value(&self) -> bool; } impl Togglable for Toggle { fn activate(&mut self) { self.state = !self.state; } fn value(&self) -> bool { return self.state; } } struct NthToggle { state: bool, count_max: i32, counter: i32, } impl NthToggle { fn new(start_state: bool, max_counter: i32) -> NthToggle { NthToggle { state: start_state, count_max: max_counter, counter: 0, } } } impl Togglable for NthToggle { fn activate(&mut self) { self.counter += 1; if self.counter >= self.count_max { self.state = !self.state; self.counter = 0; } } fn value(&self) -> bool { return self.state; } } fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut toggle1 = Toggle::new(true); for _i in 0..5 { toggle1.activate(); println!("{}", if toggle1.value() { "true" } else { "false" }); } for _i in 0..n { let _toggle = Toggle::new(true); } println!(); let mut ntoggle1 = NthToggle::new(true, 3); for _i in 0..8 { ntoggle1.activate(); println!("{}", if ntoggle1.value() { "true" } else { "false" }); } for _i in 0..n { let _ntoggle = NthToggle::new(true, 3); } }
Porting notes
This program is similar to methcall. The Toggle
and the NthToggle
classes or structs are identical and the usage of them are somewhat different in the main
function.
random
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
#include <iostream>
#include <stdlib.h>
#include <math.h>
using namespace std;
#define IM 139968
#define IA 3877
#define IC 29573
inline double gen_random(double max) {
static long last = 42;
last = (last * IA + IC) % IM;
return( max * last / IM );
}
int main(int argc, char *argv[]) {
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 4000000
#else
#define LENGTH 400000000
#endif
int N = ((argc == 2) ? atoi(argv[1]) : LENGTH);
double result = 0;
while (N--) {
result = gen_random(100.0);
}
cout.precision(9);
cout.setf(ios::fixed);
cout << result << endl;
return(0);
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 4000000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 400000000; const IM: i64 = 139968; const IA: i64 = 3877; const IC: i64 = 29573; fn gen_random(last_random: &mut i64, max: f64) -> f64 { let new_last = (*last_random * IA + IC) % IM; *last_random = new_last; return max * (new_last as f64) / (IM as f64); } fn main() { let mut args = env::args(); let mut n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut result: f64 = 0.0; let mut last_random: i64 = 42; while n != 0 { n -= 1; result = gen_random(&mut last_random, 100.0); } println!("{:.9}", result); }
Porting notes
The gen_random
function is the same as the one in heapsort.
sieve
Full source code
C++
#include <iostream>
#include <vector>
#include <list>
#include <cstdlib>
using namespace std;
void sieve(list<int>& unknown, vector<int>& primes)
{
while (!unknown.empty())
{
int p = unknown.front();
unknown.pop_front();
list<int>::iterator i = unknown.begin();
while (i != unknown.end())
{
if (*i % p)
++i;
else
i = unknown.erase(i);
}
primes.push_back(p);
}
}
int main(int argc, char *argv[])
{
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 50
#else
#define LENGTH 500
#endif
size_t NUM = (argc == 2 ? (atoi(argv[1]) < 1 ? 1 : atoi(argv[1])): LENGTH);
vector<int> primes;
// run the sieve repeatedly
while (NUM--) {
list<int> integers;
for (int i = 2; i < 8192; ++i)
integers.push_back(i);
primes.clear();
sieve(integers, primes);
}
cout << "Count: " << primes.size() << endl;
return 0;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ use std::env; // Use VecDeque instead of LinkedList because Rust doesn't have stable // ways to remove items from LinkedList and strongly encourage using // Vec or VecDeque. use std::collections::VecDeque; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 50; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 500; fn sieve(unknown: &mut VecDeque::<i32>, primes: &mut Vec::<i32>) { while unknown.len() > 0 { let p = unknown.pop_front().unwrap(); unknown.retain(|&x| x % p != 0); primes.push(p); } } fn main() { let mut args = env::args(); let mut num = if args.len() == 2 { let arg1 = args.nth(1).unwrap().parse::<i32>().unwrap(); if arg1 < 1 { 1 } else { arg1 } } else { LENGTH }; let mut primes = Vec::<i32>::new(); // run the sieve repeatedly while num != 0 { let mut integers = VecDeque::<i32>::new(); for i in 2..8192 { integers.push_back(i); } primes.clear(); sieve(&mut integers, &mut primes); num -= 1; } println!("Count: {}", primes.len()); }
Porting notes
Using VecDeque
instead of LinkedList
C++
void sieve(list<int>& unknown, vector<int>& primes)
{
while (!unknown.empty())
{
int p = unknown.front();
unknown.pop_front();
list<int>::iterator i = unknown.begin();
while (i != unknown.end())
{
if (*i % p)
++i;
else
i = unknown.erase(i);
}
primes.push_back(p);
}
}
Rust
#![allow(unused)] fn main() { fn sieve(unknown: &mut VecDeque::<i32>, primes: &mut Vec::<i32>) { while unknown.len() > 0 { let p = unknown.pop_front().unwrap(); unknown.retain(|&x| x % p != 0); primes.push(p); } } }
The Rust LinkedList
interestingly does not support mutation such as
element insertion and removal in its stable API, as of this
writing. Rust instead encourages to use VecDeque
which is
double-sided vector for performance reasons when a doubly linked list
would be otherwise used. So the Rust version uses VecDeque
even
though it may have different performance characteristics from the C++
std::list
. The benchmark results shows
that the performance differences are small.
Furthermore, the Rust version uses the retain
method to remove
elements from the VecDeque
I believe this is less error-prone than
the C++ version where it uses an iteration over the list along with
the erase
function during the iteration.
strcat
Full source code
C++
// -*- mode: c++ -*-
// $Id$
// http://www.bagley.org/~doug/shootout/
// with help from PeterB
#include <cstdlib>
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char *argv[])
{
#ifdef SMALL_PROBLEM_SIZE
#define LENGTH 1000000
#else
#define LENGTH 5000000
#endif
int i, n = ((argc == 2) ? atoi(argv[1]) : LENGTH);
string str;
size_t capacity = 31;
str.reserve(capacity); // as per C-string
size_t newLength = 6;
for (i = 0; i < n; i++)
{
if(newLength > capacity)
{
capacity *= 2;
str.reserve(capacity);
}
str += "hello\n";
newLength += 6;
}
cout << str.length() << endl;
return 0;
}
Rust
// Adapted from https://github.com/llvm/llvm-test-suite and // http://www.bagley.org/~doug/shootout/ // with help from PeterB use std::env; #[cfg(feature = "small_problem_size")] const LENGTH: i32 = 1000000; #[cfg(not(feature = "small_problem_size"))] const LENGTH: i32 = 5000000; fn main() { let mut args = env::args(); let n = if args.len() == 2 { args.nth(1).unwrap().parse::<i32>().unwrap() } else { LENGTH }; let mut str = String::new(); let mut capacity: usize = 31; str.reserve(capacity); let mut new_length: usize = 6; for _i in 0..n { if new_length > capacity { capacity *= 2; str.reserve(capacity); } str.push_str("hello\n"); new_length += 6; } println!("{}", str.len()); }
Porting notes
The porting was straightforward with no specific notes.
Performance analysis
The benchmark results show that the Rust version is more than 2 times faster than the C++ version.
We compare the generated code between C++ and Rust.
C++
C++ perfdata (the host function list)
Overhead Command Shared Object Symbol
19.78% strcat libstdc++.so.6.0.30 [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)
12.01% strcat libc.so.6 [.] __memmove_avx_unaligned_erms
8.25% strcat strcat [.] main
6.77% strcat [unknown] [k] 0xffffffff9fd9f917
3.87% strcat libc.so.6 [.] __memmove_avx_unaligned
2.43% strcat libstdc++.so.6.0.30 [.] memcpy@plt
1.95% strcat strcat [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@plt
C++ perfdata (the hottest function std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)@@GLIBCXX_3.4.21
)
7.22 │ endbr64
9.87 │ push %rbp
2.46 │ mov %rdx,%r8
│ push %rbx
4.92 │ mov %rdi,%rbx
2.46 │ sub $0x8,%rsp
2.46 │ mov 0x8(%rdi),%rax
│ mov (%rdi),%rdi
│ lea (%rdx,%rax,1),%rbp
2.46 │ lea 0x10(%rbx),%rdx
2.48 │ cmp %rdx,%rdi
│ ↓ je 78
│ mov 0x10(%rbx),%rdx
7.27 │28: cmp %rbp,%rdx
│ ↓ jb 60
4.87 │ test %r8,%r8
│ ↓ je 46
7.33 │ add %rax,%rdi
7.40 │ cmp $0x1,%r8
│ ↓ je 80
2.46 │ mov %r8,%rdx
7.35 │ → call memcpy@plt
│ mov (%rbx),%rdi
2.46 │46: mov %rbp,0x8(%rbx)
2.46 │ mov %rbx,%rax
2.37 │ movb $0x0,(%rdi,%rbp,1)
7.11 │ add $0x8,%rsp
9.64 │ pop %rbx
│ pop %rbp
4.94 │ ← ret
C++ perfdata (The second hottest function __memcpy_avx_unaligned_erms
)
16.08 │ 30: cmp $0x10,%edx
│ ↓ jae 62
8.10 │ cmp $0x8,%edx
│ ↓ jae 80
4.07 │ cmp $0x4,%edx
│ ↓ jae 55
│ cmp $0x1,%edx
│ ↓ jl 54
│ mov (%rsi),%cl
│ ↓ je 52
│ movzwl -0x2(%rsi,%rdx,1),%esi
│ mov %si,-0x2(%rdi,%rdx,1)
│ 52: mov %cl,(%rdi)
│ 54: ← ret
3.77 │ 55: mov -0x4(%rsi,%rdx,1),%ecx
4.06 │ mov (%rsi),%esi
12.17 │ mov %ecx,-0x4(%rdi,%rdx,1)
4.06 │ mov %esi,(%rdi)
12.05 │ ← ret
C++ LLVM IR (the loop in the main function, the frontend output)
16: ; preds = %14, %38
%17 = phi i64 [ %39, %38 ], [ 6, %14 ]
%18 = phi i64 [ %30, %38 ], [ 31, %14 ]
%19 = phi i32 [ %40, %38 ], [ 0, %14 ]
%20 = icmp ugt i64 %17, %18
br i1 %20, label %21, label %29
21: ; preds = %16
%22 = shl i64 %18, 1
invoke void @_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7reserveEm(ptr noundef nonnull align 8 dereferenceable(32) %3, i64 noundef %22)
to label %29 unwind label %25
23: ; preds = %10
%24 = landingpad { ptr, i32 }
cleanup
br label %83
25: ; preds = %21, %36
%26 = landingpad { ptr, i32 }
cleanup
br label %83
27: ; preds = %34, %42, %53, %62, %63, %68, %71
%28 = landingpad { ptr, i32 }
cleanup
br label %83
29: ; preds = %21, %16
%30 = phi i64 [ %22, %21 ], [ %18, %16 ]
%31 = load i64, ptr %13, align 8, !tbaa !12
%32 = add i64 %31, -4611686018427387898
%33 = icmp ult i64 %32, 6
br i1 %33, label %34, label %36
34: ; preds = %29
invoke void @_ZSt20__throw_length_errorPKc(ptr noundef nonnull @.str.1) #11
to label %35 unwind label %27
35: ; preds = %34
unreachable
36: ; preds = %29
%37 = invoke noundef nonnull align 8 dereferenceable(32) ptr @_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm(ptr noundef nonnull align 8 dereferenceable(32) %3, ptr noundef nonnull @.str, i64 noundef 6
)
to label %38 unwind label %25
38: ; preds = %36
%39 = add nuw nsw i64 %17, 6
%40 = add nuw nsw i32 %19, 1
%41 = icmp eq i32 %40, %11
br i1 %41, label %42, label %16, !llvm.loop !16
The loop body in the main function doesn't have the basic_string::append
and its internal memcpy
inlined. IThey aren't probably defined in the C++ string
and the string.h
header files as its body is not part of the compilation unit. Even though it uses the SIMD (AVX) version of the memcpy
, it doesn't use the AVX instructions because it copies only 6 bytes at a time.
Rust
Rust perfdata (the hot function list)
Overhead Command Shared Object Symbol
34.98% strcat strcat [.] strcat::main
Rust perfdata
│1d0: mov 0x8(%rsp),%rax
4.84 │ cmp %rbx,%r12
│ ↓ jbe 1e8
│ add %rbx,%rbx
│ mov %rax,%rcx
│ sub %rsi,%rcx
│ cmp %rbx,%rcx
│ ↓ jb 218
3.24 │1e8: sub %rsi,%rax
18.89 │ cmp $0x5,%rax
│ ↓ jbe 236
3.25 │1f1: mov 0x10(%rsp),%rax
14.99 │ movw $0xa6f,0x4(%rax,%rsi,1)
19.38 │ movl $0x6c6c6568,(%rax,%rsi,1)
2.98 │ add $0x6,%rsi
13.03 │ mov %rsi,0x18(%rsp)
3.25 │ add $0x6,%r12
3.27 │ dec %r15d
12.90 │ ↑ jne 1d0
Rust LLVM IR (the loop body in the main function, the frontend output)
bb29: ; preds = %bb8.preheader, %bb32
%len.i.i.i.pre62 = phi i64 [ %49, %bb32 ], [ %len.i.i.i.pre62.pre, %bb8.preheader ]
%capacity.sroa.0.051 = phi i64 [ %capacity.sroa.0.1, %bb32 ], [ 31, %bb8.preheader ]
%new_length.sroa.0.050 = phi i64 [ %50, %bb32 ], [ 6, %bb8.preheader ]
%iter.sroa.0.049 = phi i32 [ %47, %bb32 ], [ 0, %bb8.preheader ]
%47 = add nuw nsw i32 %iter.sroa.0.049, 1
%_18 = icmp ugt i64 %new_length.sroa.0.050, %capacity.sroa.0.051
%self2.i.i.i.pre64 = load i64, ptr %str, align 8, !range !33
br i1 %_18, label %bb10, label %bb12
bb10: ; preds = %bb29
%48 = shl i64 %capacity.sroa.0.051, 1
%_10.i31 = sub i64 %self2.i.i.i.pre64, %len.i.i.i.pre62
%_7.i32 = icmp ult i64 %_10.i31, %48
br i1 %_7.i32, label %bb1.i33, label %bb12, !prof !32
bb1.i33: ; preds = %bb10
; invoke alloc::raw_vec::RawVecInner<A>::reserve::do_reserve_and_handle
invoke fastcc void @"_ZN5alloc7raw_vec20RawVecInner$LT$A$GT$7reserve21do_reserve_and_handle17h0f49121d9a52801dE"(ptr noalias noundef nonnull align 8 dereferenceable(16) %str, i64 noundef %len.i.i.i.pre62, i64 noundef %48)
to label %bb1.i33.bb12_crit_edge unwind label %cleanup2.loopexit
bb1.i33.bb12_crit_edge: ; preds = %bb1.i33
%len.i.i.i.pre = load i64, ptr %_40.sroa.5.0.str.sroa_idx, align 8, !alias.scope !64
%self2.i.i.i.pre = load i64, ptr %str, align 8, !range !33, !alias.scope !69
br label %bb12
bb12: ; preds = %bb1.i33.bb12_crit_edge, %bb10, %bb29
%self2.i.i.i = phi i64 [ %self2.i.i.i.pre64, %bb29 ], [ %self2.i.i.i.pre, %bb1.i33.bb12_crit_edge ], [ %self2.i.i.i.pre64, %bb10 ]
%len.i.i.i = phi i64 [ %len.i.i.i.pre62, %bb29 ], [ %len.i.i.i.pre, %bb1.i33.bb12_crit_edge ], [ %len.i.i.i.pre62, %bb10 ]
%capacity.sroa.0.1 = phi i64 [ %capacity.sroa.0.051, %bb29 ], [ %48, %bb1.i33.bb12_crit_edge ], [ %48, %bb10 ]
call void @llvm.experimental.noalias.scope.decl(metadata !72)
call void @llvm.experimental.noalias.scope.decl(metadata !73)
%_10.i.i.i = sub i64 %self2.i.i.i, %len.i.i.i
%_7.i.i.i = icmp ult i64 %_10.i.i.i, 6
br i1 %_7.i.i.i, label %bb1.i.i.i, label %bb32, !prof !32
bb1.i.i.i: ; preds = %bb12
; invoke alloc::raw_vec::RawVecInner<A>::reserve::do_reserve_and_handle
invoke fastcc void @"_ZN5alloc7raw_vec20RawVecInner$LT$A$GT$7reserve21do_reserve_and_handle17h0f49121d9a52801dE"(ptr noalias noundef nonnull align 8 dereferenceable(16) %str, i64 noundef %len.i.i.i, i64 noundef 6)
to label %.noexc36 unwind label %cleanup2.loopexit
.noexc36: ; preds = %bb1.i.i.i
%len.pre.i.i = load i64, ptr %_40.sroa.5.0.str.sroa_idx, align 8, !alias.scope !64
br label %bb32
bb32: ; preds = %.noexc36, %bb12
%len.i.i = phi i64 [ %len.i.i.i, %bb12 ], [ %len.pre.i.i, %.noexc36 ]
%_9.i.i = icmp sgt i64 %len.i.i, -1
call void @llvm.assume(i1 %_9.i.i)
%_10.i.i = load ptr, ptr %_40.sroa.4.0.str.sroa_idx, align 8, !alias.scope !64, !nonnull !4, !noundef !4
%dst.i.i = getelementptr inbounds i8, ptr %_10.i.i, i64 %len.i.i
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(6) %dst.i.i, ptr noundef nonnull align 1 dereferenceable(6) @alloc_67205f92287559ab7be9a96d10923d6c, i64 6, i1 false), !noalias !64
%49 = add nuw i64 %len.i.i, 6
store i64 %49, ptr %_40.sroa.5.0.str.sroa_idx, align 8, !alias.scope !64
%50 = add nuw nsw i64 %new_length.sroa.0.050, 6
%exitcond.not = icmp eq i32 %47, %n.sroa.0.0
br i1 %exitcond.not, label %bb30, label %bb29
The hottest of the loop is fully inlined into the main function and is a straightforward memory copy from the string literal hello\n
into the string buffer with occasional size expansion through the reverse
call.
I believe that the fact that the push_str
and memcpy
functions are inlined in the Rust version explains the the performance difference.
Benchmarking results
Program | C++ (s)1 | Rust (s)2 | Ratio (x)3 |
---|---|---|---|
ackermann | 0.5849 | 0.6696 | 1.1448 |
ary | 0.0403 | 0.0430 | 1.0679 |
ary2 | 0.0414 | 0.0422 | 1.0194 |
ary3 | 0.1707 | 0.7217 | 4.2281 |
fibo | 1.1382 | 1.2286 | 1.0795 |
hash | 0.2829 | 0.2104 | 0.7437 |
hash2 | 1.1647 | 0.8224 | 0.7061 |
heapsort | 1.5964 | 1.6009 | 1.0028 |
lists | 1.8520 | 1.7653 | 0.9532 |
lists1 | 0.1186 | 0.0862 | 0.7265 |
matrix | 0.9690 | 2.0188 | 2.0834 |
methcall | 3.0316 | 0.4226 | 0.1394 |
moments | 0.0736 | 0.0554 | 0.7534 |
nestedloop | 0.0017 | 0.0015 | 0.8347 |
objinst | 0.0016 | 0.0012 | 0.7930 |
random | 1.2529 | 1.0735 | 0.8568 |
sieve | 0.9437 | 0.9028 | 0.9566 |
strcat | 0.0534 | 0.0248 | 0.4640 |
The outliers are ary3
, matrix
, methcall
and strcat
.
I examined the generated code for those to understand the performance differences.
-
With Clang 20.1.5 on x86_64-pc-linux-gnu, the option
-O2
, over 30 runs, on an AMD Zen 2 CPU. ↩ -
With Rustc 1.86.0 (05f9846f8 2025-03-31), the option
-O
, over 30 runs, on the same machine. ↩ -
Ratios > 1.0 indicates the Rust version was slower than the C++ version. Ratios < 1.0 indicates the Rust version was faster than the C++ version. ↩