logo le blog invivoo blanc

REDUCING C ++: && AND STD::MOVE

19 August 2019 | C++

C ++ is one of the most used languages in finance today, especially for its performance. Its first standardization by the International Standardization Organization (ISO) took place in 1998. It followed a simple version of bug fixes in 2003. Thanks to the C ++ 11 revision, published in September 2011, we notice a real update with new development tools to improve performance and make life easier for developers. In addition, C ++ 11 introduces new temporary object management, whose impact on performance is often underestimated.

LIMITS OF C ++ 98/03

In which cases are temporary variables found? Here are two simple examples that will illustrate the subject.

std::string retrieveCustomerId(const std::string& nom, const std::string& prenom)
{
  std::string id;
  /*
     process pour retrouver l'ID (appel base de donnee etc)
  */
  id = "ID1234";
  return id;
}
int main()
{
  std::string id = retrieveCustomerId("Doe", "John");
  std::cout << id << std::endl;
  return 0
}
Figure 1: Using a variable and a function returning by value

In the example shown above, we have a function that takes the name of a client as a parameter and returns the corresponding ID by value.

What problem occurs with this implementation?

When using the retrieveCustomerId function, one could theoretically have two copies: the first one when calling the return of the function and the second when initializing the id variable (the first copy is avoided with compiler optimizations).

Let’s take a look at a second example:

class Basket
{
public:
  Basket() {}
  Basket(const Basket& secondBasket) : objectIds_( secondBasket.objectIds_)
  {
    std::cout << "Copy Constructor" << std::endl;
  }
  void addObjectId(const std::string& objectId) { objectIds_.push_back(objectId); }
  void toString()
  {
    for(size_t i=0; i < objectIds_.size(); ++i)
    {
      std::cout << objectIds_[i] << std::endl;
    }
  }
  std::vector<std::string> objectIds_;
};
int main()
{
  Basket myOriginalBasket;
  myOriginalBasket.addObjectId("1");
  myOriginalBasket.addObjectId("2");
  myOriginalBasket.addObjectId("3");
  myOriginalBasket.toString();
  
  std::cout << "###########################" << std::endl;
  
  Basket myNewBasket(myOriginalBasket);
  myNewBasket.toString();
  
  return 0;
}
C++ 1Figure 2: Using a copy constructor

In this example, we have a Basket class that contains a vector of IDs object.

As a first step, a first instance of the basket myOriginalBasket class is created and IDs are added. Then, myNewBasket is instantiated from myOriginalBasket. The copy constructor is called and all the vector elements of myOriginalBasket are copied into the vector of myNewBasket.

In this simplistic case, we have created a temporary object that will be used to create our work object by copy. In the two examples above, we have shown that using temporary objects can result in additional copy operations. For these simple examples, this does not suggest any performance problems, but in critical applications that use large amounts of data, this can quickly become problematic.

C ++ 11: REFERENCE ON RVALUE &&

Before going further, we will focus on the question: what is a rvalue? The notion of rvalue (“right hand value”) complements the notion of lvalue (“left hand value”).

A lvalue designates a named element whose memory address can be accessed via its name. A rvalue is a temporary unnamed value that exists only during the evaluation of an expression. To make this clearer, let’s look at the example below:

int x = 2 + 3;

We have a variable “x” to which we will assign the value “2 + 3”.

“x” is a named variable that can be accessed in the following code, so it’s an lvalue. On the contrary, “2 + 3” is temporary and only exists during the initialization operation, so it is a rvalue. In practice, the lvalues are located to the left of the ‘=’ operator, while the rvalues are to the right; hence their respective names. The declaration of a reference on an lvalue is with the operator “&”. The declaration of a reference on a rvalue is done simply with the operator “&&”.

What is the purpose of this new type of reference? Quite simply, it is to link a constant reference or not to a temporary value. We naturally get to the question: in practice, how does this apply?

Let’s go back to the first given example. The problem noticed previously was that when using retrieveCustomerId in the main, possible copies were made during initialization. From what we have seen above, we can rewrite the code differently:

std::string retrieveCustomerId(const std::string& nom, const std::string& prenom)
{
  std::string id;
  /*
     process pour retrouver l'ID (appel base de donnee etc)
  */
  id = "ID1234";
  return id;
}
int main()
{
  std::string&& id = retrieveCustomerId("Doe", "John");
  std::cout << id << std::endl;
  return 0
}
Figure 3: Using a reference on rvalue

Because retrieveCustomerId (” Doe ”, ” John ”) is a rvalue, we can use the && operator to create a reference to it. Like classic references, here the string returned by our method will not be destroyed immediately and will be accessible via its reference named id without copy. In addition, like standard references, rvalues references can be used as function parameters, which will allow many applications.

BUILDING BY MOVING AND STD :: MOVE

One of the major applications of the reference on rvalue is the creation of constructors by displacement. As its name suggests, the goal is to transfer elements from one object to the other. Let’s take the second example with the copy constructor and adapt it step by step to apply this new concept. First, let’s add a move constructor to the Basket class.

Basket(Basket&& secondBasket) : objectIds_(secondBasket.objectIds_)
{
  std::cout << "Move Constructor" << std::endl;
}
Figure 4: First attempt to write a builder by displacement

At first we simply try to use a reference on a rvalue seen previously. After execution we obtain the following result: C++ 2 Which is exactly the same result as above … For better understanding, you have to look at the main.

int main()
{
  Basket myOriginalBasket;
  myOriginalBasket.addObjectId("1");
  myOriginalBasket.addObjectId("2");
  myOriginalBasket.addObjectId("3");
  myOriginalBasket.toString();
  
  std::cout << "###########################" << std::endl;
  
  Basket myNewBasket(myOriginalBasket);
  myNewBasket.toString();
  
  return 0;
}

As you can see, the parameter used for the myNewBasket constructor has not been changed, hence the call to the default constructor.

We are now faced with a new problem: how to make sure we call the builder by displacement that we have just written?  For this we can use the std::move function, introduced with C ++11. Indeed, this function takes as parameter a piece of data and will return a reference on a rvalue.

Let’s rectify our example now by adding calls to the move function.

class Basket
{
public:
  Basket() {}
  Basket(Basket&& secondBasket) : objectIds_(secondBasket.objectIds_)
  {
  std::cout << "Move Constructor" << std::endl;
  }
  
  Basket(const Basket& secondBasket) : objectIds_( secondBasket.objectIds_)
  {
    std::cout << "Copy Constructor" << std::endl;
  }
  void addObjectId(const std::string& objectId) { objectIds_.push_back(objectId); }
  void toString()
  {
    for(size_t i=0; i < objectIds_.size(); ++i)
    {
      std::cout << objectIds_[i] << std::endl;
    }
  }
  std::vector<std::string> objectIds_;
};
int main()
{
  Basket myOriginalBasket;
  myOriginalBasket.addObjectId("1");
  myOriginalBasket.addObjectId("2");
  myOriginalBasket.addObjectId("3");
  myOriginalBasket.toString();
  
  std::cout << "###########################" << std::endl;
  
  Basket myNewBasket(std::move(myOriginalBasket));
  myNewBasket.toString();
  
  return 0;
}

C++ 3

We can see that the displacement constructor is now called. Note that we also used the std::move function to initialize the objectsIds_ attribute. This is possible because displacement semantics have been added to the STL through the 2011 revision.

PERFORMANCES COMPARISONS

We have seen that the C++11 standard introduced a new way to handle temporary objects while avoiding unnecessary copying. But when applied, can we see a real difference?

Let’s take our Basket class and compare constructor’s performance by copy and the manufacturer’s by displacement, that we wrote. For this, we will create a Basket object, containing a vector of 1000000 elements. Then for each case, we will print the processing time in seconds.

int main()
{
  Basket myOriginalBasket;
  for (size_t i = 0; i < 1000000; ++i)
  {
    myOriginalBasket.addObjectId("ID123");
  }
  {
    auto start = std::chrono::high_resolution_clock::now();
    Basket myNewBasket(myOriginalBasket);
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = end - start;
    std::cout << "First duration: " << elapsed.count() << " s\n";
  }
  {
    auto start = std::chrono::high_resolution_clock::now();
    Basket myNewBasket(std::move(myOriginalBasket));
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = end - start;
    std::cout << "Second duration: " << elapsed.count() << " s\n";
  }
  return 0;
}

C++ 4

It can be seen that the displacement manufacturer is 1000 times faster than the copy manufacturer.

CASE STUDY: DEFINING A CUSTOMIZED STRING CLASS

We will now apply what we saw earlier by definition of a custom String class that we will call MyString. Let’s start by writing the outline:

class MyString
{
public:
  MyString(const char* init, size_t size)
  {
    std::cout << "MyString custom constructor ";
    buffer_ = new char[size + 1];
    size_ = size;
    memcpy(buffer_, init, size + 1);
    printf("MyString custom constructor this= %p     buffer_= %p   value= %s\n", this, buffer_, buffer_);
  }
  ~MyString()
  {
    printf("MyString Destructor this= %p     buffer_= %p\n", this, buffer_, buffer_);
    if(buffer_)
      delete[] buffer_;
  }
  
  void print()
  {
    std::cout << "buffer_=" << buffer_ << " size_=" << size_ << std::endl;
  }
  char* buffer_;
  size_t size_;
};
int main()
{
  MyString s("abcd", 4);
  MyString s2("efghijklmnop", 12);
  s2 = s + s2;
  
  s.print();
  s2.print();
  return 0;
}

So we created a MyString class, having the following attributes:

  • buffer_: a string that will contain our string
  • size_: a size_t that will contain the size of our string

We have also added a constructor that will initialize our buffer with a string and an input size. Remember when allocating memory to add +1 to account for the end-of-string character. In addition, we add the destructor to deallocate our buffer during the destruction of our instances. Note that we are deallocating a table, so do not forget []. Finally we define a print method to display our buffer and its size. Let’s execute the code and observe the output:

C++ 5 When we first take a look at it, we have an error message informing us of an invalid pointer. Let’s take a closer look at what’s going on. Firstly, we built our two instances of MyString s and s2 with the constructor we defined. We can see on the output the construction of these two strings and the memory addresses of the two strings.

Then we assign s2 to s. By displaying with the method print our two objects, we see that the assignment seems to have worked well since our buffers have the same value and the same size. Nevertheless, we can see during the destruction of our two objects that the two pointers actually point to the same address. In reality, the problem is that we used the assignment operator but we did not explicitly define it. In this case, the compiler will create a default assignment operator that will not copy the contents of one character string into another but copy the pointers.

In general, to avoid such problems, it is good practice to always explicitly define the assignment operator and the copy constructor. If they are not used or we want to deny access, we can always put them in private in the class interface.

So let’s add our copy constructor and our assignment operator to our MyString class:

MyString(const MyString& other)
{
  size_ = other.size_;
  buffer_ = new char[size_ + 1];
  memcpy(buffer_, other.buffer_, size_ + 1);
  printf("MyString copy constructor  this= %p     buffer_= %p   value= %s\n", this, buffer_, buffer_);
}
MyString& operator=(const MyString& other)
{
  size_ = other.size_;
  char* newBuffer = new char[size_ + 1];
  memcpy(newBuffer, other.buffer_, size_ + 1);
  delete buffer_;
  buffer_ = newBuffer;
  printf("MyString assignment operator  this= %p  other= %p   buffer_= %p   other.buffer_= %p\n", this, &other, buffer_, other.buffer_);
  return *this;
}

Let’s also modify our hand to illustrate these new features:

int main()
{
  MyString s("abcd", 4);
  MyString s2("efghijklmnop", 12);
  MyString s3(s);
  s2 = s + s2;
  
  s.print();
  s2.print();
  s3.print();
  return 0;
}

This is what we obtain after execution: C++ 6 Note that this time, the pointers buffer_ are all different for our 3 instances of MyString. Now let’s add the + operator:

MyString operator+(const MyString& other)
{
  size_t size = size_ + other.size_;
  char * newBuffer = new char[size + 1];
  memcpy(newBuffer, buffer_, size_ + 1);
  strcat_s(newBuffer, size+1, other.buffer_);
  MyString newString(newBuffer, size);
  return newString;
}

Let’s also modifies the main and adapt the displays then we can execute.

int main()
{
  std::cout << "Start of program" << std::endl;
  MyString s("abcd", 4);
  MyString s2("efghijklmnop", 12);
  MyString s3(s);
  std::cout << "Start using operator +" << std::endl;
  s2 = s + s3;
  
  std::cout << "End of program" << std::endl;
  return 0;
}

C++ 7

We notice that when using our + operator:

  • The call to our custom constructor corresponds to our instantiation of newString: buffer_ address = 02D4D8D0
  • The call to a copy constructor: newString is copied to a temporary object when returning newString: buffer_address = 02D4DBE0
  • newString is deallocated since we leave the body of the method: buffer_ address = 02D4D8D0
  • The previously temporary object created is used when assigning s2: temporary buffer_ address used = 02D4DBE0
  • The temporary object is deallocated: buffer_address = 02D4DBE0

Now, let’s define the displacement constructor seen above:

MyString(MyString && other) : buffer_(other.buffer_), size_(other.size_)
  {
    other.buffer_ = NULL;
    other.size_ = 0;
    printf("MyString move constructor this= %p     buffer_= %p   value= %s\n", this, buffer_, buffer_);
  }
  MyString& operator=(MyString&& other)
  {
    if (buffer_ != other.buffer_)
    {
      delete buffer_;
      buffer_ = other.buffer_;
      size_ = other.size_;
      other.size_ = 0;
      other.buffer_ = NULL;
    }
    printf("MyString move operator  this= %p  other= %p   buffer_= %p   buffer_value= %s\n", this, &other, buffer_, buffer_);
    return *this;
  }

We have also defined the so-called displacement assignment operator, which takes a reference to rvalue as a parameter. It follows the same logic as the displacement constructor, its purpose being to move the elements of an object to another already created. So, we must not forget to deal with the data that we want to replace. Now let’s execute our hand, which we left unchanged. C++ 8 It can be seen that the copy constructor and the copy operator have been automatically replaced by their moving equivalents, the value ‘s + s3’ being a rvalue. As can be seen in the implementations above, in the comparison of the copy constructor and the copy assignment operator, we are content here only to move the pointers from one object to another without calling on the memcpy function.

This is also confirmed by the traces that are displayed:

  • The call to our custom constructor that corresponds to our instantiation of newString: buffer_ address = 02EDDE38
  • The call to a constructor by displacement: newString is transferred without copy in a temporary object during the return newString: buffer_ address = 02EDDE38
  • newString is deallocated since we leave the body of the method: buffer_ address = 00000000
  • The previously created temporary object is used when s2 is moved: temporary buffer_address used = 02EDDE38
  • The temporary object is deallocated: buffer_ address = 00000000

Consequently, by implementing the displacement constructor and the displacement assignment operator, we have been able to avoid two unnecessary copies of our buffer in this example. In our example, the buffers were relatively inexpensive in memory and our program quite simple. However one can easily imagine the large number of unnecessary copies if one had worked with much larger character strings and many more operations.

CONCLUSION

Throughout this article, we have been able to discover a new and more efficient method of managing temporary objects through references to rvalues. For example, we could see the use of the constructor by displacement as well as the operator of displacement ‘=’. Note that these are also implemented in all containers of the STL.

However, this is only a very small part of the C++11 revision. Indeed, other important improvements have been made by this revision such as smart pointers or the support of threads.