Today, there was a reddit post that asked what one needs to know whenGoing after C++ with Rust basics. I thought this was an interesting question to answer in a blog post and revive my blog.
Since I got C++ job after learning Rust, I thought it would be interesting to write a summary how one would adapt to C++ with some prior Rust experience.
I would assume the reader already knows C++ syntax and features, and would be interested in how one would fit concepts to C++ from Rust world.
In this post, however, I could not fit everything I wanted to write, so I will focus on Ownership, Borrowing and Lifetimes.
Ownership and Moves
The big feature in Rust is Ownership, which means that non-primitive values are moved by default instead of being copied.
As an example, if we create a String
in Rust and pass it to another function,
it will be moved into that function and destroyed there.
fnfoo(val:String){// val destroyed here}fnmain(){letval=String::from("Hello");foo(val);// accessing val here is compile-time error}
Let’s look at the same code in C++:
#include <string>usingstd::string;voidfoo(stringval){// val is destroyed here}intmain(){stringval("Hello");foo(val);// accessing val here is fine, because we passed a copy to function// original val is destroyed here}
You may be tempted to reduce copying in C++ too.
The C++ has this notion of lvalues
versus rvalues
.
In C++, lvalues
are copied, while rvalues
can be moved, if the type
actually implements move operations (and I am glossing over a lot of details
here).
There is a function in C++ std
library that allows us to transform anylvalue
to rvalue
, called std::move
.
So, we can modify our previous C++ program to behave similarly to Rust program
and avoid unnecessary copy by wrapping val
with std::move
:
#include <string>usingstd::string;voidfoo(stringval){// val is destroyed here}intmain(){stringval("Hello");foo(std::move(val));// warning: accessing val here is NOT fine!// original val is also destroyed here, but contains no value so it's fine}
Note that std::move
does not actually move anything, it just changes how
the compiler treats the value at this particular place. In this case, move works
because std::string
implements move operations.
In C++, it is possible to accidentally use moved value. Therefore, the move operations usually set the original container size to zero.
Therefore, a good practice in C++ is to avoid using move in the case like this, even if this means unnecessary deep copy of the value, to avoid the accidental usage of the moved value.
If the copy of value is actually costly and should not be copied, it is worth
wrapping it into unique_ptr
(like Box
) or shared_ptr
(like Arc
),
which will keep a single instance of the value on the heap. Relying on move
in such case is very fragile and incurs a maintenance cost to keep the program
correct.
Functions and Methods
Const references
In Rust, you can create a function that immutably borrows a value:
fnfoo(value:&String){println!("value: {}",value);}
The Rust compiler will not allow calling methods or operations on String that modify contents of that String. In Rust-talk it would not allow to call methods that mutably borrow a string or need to take ownership of a string.
In C++, you can do the same:
#include <string>
#include <iostream>usingstd::string;usingstd::cout;usingstd::endl;voidfoo(conststring&value){cout<<"value: "<<value<<endl;}
The const T&
idiom is similar to &T
in Rust. C++ compiler will
not allow modifying the contents of const T&
object. In C++-talk, the C++
would not allow to call methods on the string that are non-const.
Const methods
Let’s say we have structure Person
in Rust, and use it as parameter for functionprint_full_name
:
structPerson{first_name:String,last_name:String,}fnprint_full_name(person:&Person){println!("{} {}",person.first_name,person.last_name);}
This function could be made into a method on Person:
structPerson{first_name:String,last_name:String,}implPerson{pubfnprint_full_name(&self){println!("{} {}",self.first_name,self.last_name);}}
Note that print_full_name
can only access &self
reference immutably.
In C++, this is achieved with const
modifier on the method:
#include <string>
#include <iostream>classPerson{private:std::stringfirst_name;std::stringlast_name;public:voidprint_full_name()const{std::cout<<first_name<<" "<<last_name<<std::endl;}};
In Rust, we would be able to use print_full_name
method in places wherePerson
can be borrowed immutably.
fnfoo(person:&Person){person.print_full_name();}
In C++, we will be able to use print_full_name
in places where Person
can be const
.
voidfoo(constPerson&person){person.print_full_name();}
Methods that Mutably Borrow in C++
In Rust, methods that modify the reference must use &mut
reference. For
example, a method implemented on Person
:
structPerson{first_name:String,last_name:String,}implPerson{pubfnclear_name(&mutself){self.first_name.clear();self.last_name.clear();}}
Or a standalone method:
fnfoo(person:&mutPerson){person.clear_name();// "clear_name" mutably re-borrows Person}
In C++, this is simply any method without const
qualifier:
#include <string>classPerson{private:std::stringfirst_name;std::stringlast_name;public:voidclear_name(){first_name.clear();last_name.clear();}};
And any method that takes non-const reference:
voidfoo(Person&person){person.clear_name();}
Methods that Take Ownership in C++
As discussed previously, it is possible in C++, but is considered a bad practice, and you should leave moves up to the compiler.
However, there is a few cases where manual std::move
might be ok. One of them
is a setter function.
Consider a Rust method that changes the name:
structPerson{name:String,}implPerson{pubfnset_name(&mutself,name:String){self.name=name;}}
We can call it in some function foo
that had the ownership of the name:
fnfoo(person:&mutPerson,name:String){person.set_name(name);// requires explicit clone}
In Rust, the set_name
will take the ownership of name be default. However,
C++ it would copy by default.
Same method in C++:
#include <string>classPerson{private:std::stringname;public:voidset_name(std::stringname){this->name=std::move(name);// we can safely move}};
We can safely move inside the setter, because we have a parameter that is already a copy. However, we did not avoid the copying at the call site:
voidfoo(Person&person,std::stringname){person.set_name(name);// copy}
We can use std::move
here:
voidfoo(Person&person,std::stringname){person.set_name(std::move(name));// move}
However, the caller of foo must do the same to ensure the move, and this cycle continues.
One thing to look for when using std::move
is mutable
references! Let’s say we had a mutable reference in function foo
, and moved
the value:
voidfoo(Person&person,std::string&name){person.set_name(std::move(name));// move clears the original name}
Now the caller of foo will suddenly find the name gone.
In this particular case, the better practice is to use const T&
reference
all the way down to the setter. This will create a copy of name inside
the setter, with a minimal overhead.
However, if the name
was a very big string, i.e. something like file contents,
and it would be necessary to ensure no copies for performance reasons, theunique_ptr
or shared_ptr
would come to the rescue:
#include <string>
#include <memory>classPerson{private:std::shared_ptr<std::string>personal_page;public:voidset_personal_page(conststd::shared_ptr<std::string>&personal_page){this->personal_page=personal_page;// note that we copy here}};
Note that we leave the copy in, but what we copy now is only a Arc
pointer
that points to the same memory contents.
Lifetimes
One idiomatic thing in Rust is exposing value’s contents for external mutation. All iterators in Rust are built on this concept, as well as many standard library functions.
For example, we may add a method for Person
that allows someone else to change
the first and the last names:
#[derive(Debug)]structPerson{first_name:String,last_name:String,}implPerson{pubfnget_first_name_mut(&mutself)->&mutString{&mutself.first_name}pubfnget_last_name_mut(&mutself)->&mutString{&mutself.last_name}}
Then we can have a function that appends “foo” to a string reference:
fnappend_foo(value:&mutString){value.push_str(" foo");}
Then we can write some code that allows some external function to modify
contents of a String
inside the Person
:
fnmain(){letmutp=Person{first_name:String::from("John"),last_name:String::from("Smith"),};append_foo(p.get_first_name_mut());append_foo(p.get_last_name_mut());println!("{:?}",p);// output:// Person { first_name: "John foo", last_name: "Smith foo" }}
As you may know, the Rust compiler understands lifetime elision. That means you usually do not need to annotate any references with lifetimes, but they are still there.
For example, impl
of Person
has these lifetime annotations:
implPerson{pubfnget_first_name_mut(&'amutself)->&'amutString{&mutself.first_name}}
References are basically pointers. The lifetime syntax &'a mut
communicates to the
compiler that the returned value must point to the same or narrower memory location 'a
as the function
argument.
If we tried to return a reference to the value which is outside of 'a
, the compiler would complain:
implPerson{pubfnget_first_name_mut(&'amutself)->&'amutString{&mutString::from("Other")// error: borrowed value does not live long enough// ^^^^^^^^^^^^^^^^^^^^^ temporary value created here}}
Therefore, at the call site, the compiler knows that the Person
is borrowed
for every call to append_foo
and would not allow us to do anything funky:
fnmain(){letmutp=Person{first_name:String::from("John"),last_name:String::from("Smith"),};{letname:&mutString=p.get_first_name_mut();p.first_name=String::from("Crash");// error: cannot assign to `p.first_name` because it is borrowedappend_foo(name);}}
The C++, however, has no machinery to understand where the pointers or references point to, and does not help. However, we can still implement the same in C++.
First, the Person
:
classPerson{public:std::stringfirst_name;std::stringlast_name;Person(std::stringfirst_name,std::stringlast_name):first_name(std::move(first_name)),last_name(std::move(last_name)){}std::string&get_first_name_mut(){returnthis->first_name;}std::string&get_last_name_mut(){returnthis->last_name;}};
Similar to setters, we used std::move
trick in constructor to avoid copies.
This is a usual practice in C++.
Then we create append_foo
, which is nothing surprising:
voidappend_foo(std::string&value){value+=" foo";}
And finally, the main function:
intmain(){Personp("John","Smith");append_foo(p.get_first_name_mut());append_foo(p.get_last_name_mut());std::cout<<"first name: "<<p.first_name<<std::endl;std::cout<<"last name: "<<p.last_name<<std::endl;// output:// first name: John foo// last name: Smith foo}
However, the C++ compiler is not able to track lifetimes and ensure memory safety.
This is a problem when you get used to these things being verified by the compiler.
The objects we have just written might become more complex, and it would
become much harder to track runaway modifications
to Person
:
intmain(){Personp("John","Smith");std::string&name=p.get_first_name_mut();p=Person("Crash","Bob");append_foo(name);// Output:// first name: Crash foo// last name: Bob}
It worked, even when we have overwritten the memory location ofPerson
. This actually may continue working. Or it may fail in release build.
Or it may fail when other developer wraps Person
in shared_ptr:
intmain(){autop=std::make_shared<Person>("John","Smith");std::string&name=p->get_first_name_mut();p=std::make_shared<Person>("Crash","Bob");append_foo(name);std::cout<<"first name: "<<p->first_name<<std::endl;std::cout<<"last name: "<<p->last_name<<std::endl;// Output:// first name: Crash// last name: Bob}
Now, we modified freed memory, which worked, but may not work if something else was written in that previous memory location.
The better practice in C++ is to avoid methods that return mutable references. Instead, we could access the fields directly (but trade away privacy):
intmain(){Personp("John","Smith");append_foo(p.first_name);append_foo(p.last_name);}
Or create the additional copy, which is not really a big deal:
std::stringappend_foo(conststd::string&value){// set capacity and avoid multiple allocationsstd::stringret;ret.reserve(value.size()+4);ret+=value;ret+=" foo";returnret;}intmain(){Personp("John","Smith");p.first_name=append_foo(p.first_name);p.last_name=append_foo(p.last_name);}
Conclusion
The big hurdle when moving back to C++ from Rust was the missing move-by-default feature. This required learning other idiomatic patterns in C++ land, and in some cases admitting that not all the code needs to be both efficient and easy to maintain.
In most cases maintainability wins, and avoiding “premature optimization” is very much a necessity in C++.