C# vs C++/CLI vs PInvoke Performance – Part III

I have demonstrated that C++/CLI vs P/Invoke Performance difference in Part I by using all different wrappers to pinvoke C++ method of sqrt of a native C++ DLL. Since I already included C# version of sqrt in the comparison, I will include comparison of C# version of any native c++ method I used in the future blogs, I will try to optimize every piece of C#, C++ and C++/CLI code as possible as I can, otherwise, it won’t be accurate at all. The C# wrapper used to pinvoke C++ DLL was generated by our PInvoke Interop SDK – A C# Wrapper Generator for C++ DLL.

I added the following methods to the original C++ Calculator class. I choose a simplified implementation(it could overflow depending on the parameters passed in) of hypot because the implementation itself is not that important to our test.

When I built the C++ native DLL, I tried to optimize it as possible as I can, So I optimized the dll build for maximize speed and favor fast code.

And I also enabled SIMD Extensions 2, I tried and it did show it improved the performance.

C++/CLI wrapper code is still very simple, but I had to write the method signature one more time. This is why I do not prefer C++/CLI, I have to repeat every method signature I define in the C++ class even it does not require marshaling.

The C# Wrapper methods code are presented below.

Now, the complete C# testing code.

and then the testing code for double version Hypot.

Now, let’s look at the result of calling the two versions of Hypot methods.

Float Version of Hypot

Doube version of Hypot

The result is what I expected, C++/CLI is the worst performer in both case. The C# version of hypot is the best since there is not much math operation involved in this test and the time to setup the thunk is significant, and we had to push two float or double parameter to the stack, it did cost more CPU time for .NET framework to setup the calling stack. Our C# wrapper class performed very close to the C# version of hypot when the direct call to the P/Invoke method is used, that is the fastest way to pinvoke C++ class method. If you define the hypot as static method, the performance is so close to the C# version and it actually could be better if the math operation is more complicated, which means we can probably build a C# Math library wrapper on top of any existing fast-performance C++ library and the performance could be better than a C# version of Math library.

Stay tuned, I will present you C# vs C++/CLI vs PInvoke Performance Part IV.

C# vs C++/CLI vs PInvoke Performance – Part II

C++/CLI vs Explicit PInvoke Performance – Part II

Today, I am going to continue the comparison of C++/CLI vs Explict PInvoke performance. It is the second part of the blog, you may want to read the first part if you have not already.

string or String is commonly used in .NET, if you know nothing about C++/CLI, you would expect it to be able to convert or marshal a .NET string to c string such as char* transparently with no data copy at all, but you will be disappointed to see the a different way of how C++/CLI actually bridges string between managed world and the native world. We will be talking about this later.

In order to test the performance of using the C++/CLI managed wrapper and a C# wrapper generated by our .NET PInvoke Interop SDK – A C# Wrapper Generator for C++ DLL, we will need to create a C++ class inside a native C++ DLL. We are going to use a sample C++ DLL from now on. We are going to write a single method IndexOf in a C++ class named StringHelper, the method is used to find the index of a key string in a value string. Here is the implementation of the C++ class.

In the preceding example of IndexOf method, we used strstr to find the pointer of the key string value, we did not use std::string::find method to find the index of the key string because std::string:find is far less efficient than strstr, and then it would undermine the efficiency of the Explicit P/Invoke when we do the performance comparison of C++/CLI vs Explicit P/Invoke.

Once again, let’s write the C++/CLI wrapper class first, it is a simple one because we have only one class with one method we will be using, if there are many of them, it is tedious, time-consuming, hard-to-maintain, because everything you want from the original C++ class, you will need to wrap them in C++/CLI, it has no such intelligence allowing you to automatically find the methods and wrap them for you.

Remember I stated that the string conversion is not transparent, you would have to manually convert the .NET string to C++ const char*, which is being shown in the preceding code. That is where it performs so much worse than Explicit P/Invoke, I will show you the result later. In order to use a string parameter in C++/CLI to call into the native C++ DLL, we are responsible for converting it to const char* by using a class called marshal_context, I appreciate such a type conversion class is provided by C++/CLI, the issue with this kind of type conversion is it slows down the whole process of calling the native C++ classes, which makes me unwilling to use C++/CLI, not mentioning that I still get to maintain the code whenever I change the pubic interfaces of my C++ classes.

It took me only less than 5 seconds to use our PInvoke Interop SDK Tools to generate a C# wrapper for the sample C++ DLL. A new class named StringHelper is ready for me to use in C# code. I will show you the P/Invoke signature and the wrapper code.

Now, let’s write the C# code to test the performance of C++/CLI and Explicit PInvoke. I would also want to know the performance difference of the C# wrapper for the native version of IndexOf and the C# version of String.IndexOf. Here is the complete test class.

I was surprised to see the testing result, i ran the testing program a few times with different iteration numbers, but they all gave me similar result. Just like what you can see the following picture, the wrapper class generated by our PInvoke Interop SDK performs very close to the C# version of String.IndexOf.

The C# wrapper generated by our PInvoke Interop SDK Tools is 3 times faster than C++/CLI wrapper, it is so close to the C# version of String.IndexOf. From my experience of working with so many C++ DLL, I would say there is not much string copy at all when passing a string value to C++ DLL using Explicit P/Invoke when the C++ method accepts char* as parameter, I don’t know the details of .NET implementation though.

C++/CLI wrapper is once again the worst performer when passing string(char* type) from C# to the C++ DLL.

Stay tuned, we will be comparing the performance when using std::string in the C++ class next time. I hope the C++/CLI wrapper for C++ DLL could perform better.

 

C# vs C++/CLI vs PInvoke Performance – Part I

C++/CLI vs Explicit PInvoke Performance – Part I

When I have a C++ DLL with a bunch of C++ classes and I would want to pinvoke those C++ classes from C#, I have two choices, either using C++/CLI to write a managed wrapper or using our C# wrapper generator to generate a managed C# wrapper of the native C++ DLL. I will create both of the wrappers today.

There is actually a false assumption all over the web which assumes that calling into native C++ DLL from C# via C++/CLI wrapper is faster than simply explicitly pinvoking the C++ DLL. That assumption is wrong and explicit pinvoke actually performs better than C++/CLI when C++/CLI is used to write a managed wrapper of C++ DLL, which is used from .NET. I am going to write a few blogs to talk about the performance difference of C++/CLI wapper and the C# wrapper via explicit PInvoke generated by our PInvoke Interop SDK tools.

This is the first part of my blog comparing C++/CLI to the C# wrapper generated by our C# wrapper generator for C++ DLL.

Let’s write a C++ class with 2 methods we will call from C# via different wrappers. We are going to use a sqrt method in C++ DLL, in this example, the parameter is simple and the type is blittable type and no marshal is needed from C# to the native dll.

The implementation of a managed C++/CLI wrapper for native C++ DLL is very simple for our example although it will become very tedious and time-consuming if you want to wrap a large scale of exported C++ classes in a C++ DLL, such as type marshaling and maintaining all the method signatures, but that is a different story, we may would want to talk about this in the future.

Now, I will present the C# class for testing the wrapper, I also wrote a managed C# sqrt for comparison.

Here is the result for calling float sqrt(float) from C# via all different kinds of wrappers.

The C# version did not perform much faster than the explicit Pinvoke C# wrapper whose method call is made directly on the PInvoke method without going through the C# wrapper class of Calculator. That is because the C# version of Math.Sqrt accepts a double parameter while we passed float type, type conversion happens when we call Sqrt, type conversion happens again when we get back the return value of double.

Calling sqrt directly via the pinvoke method is faster than calling sqrt method of the C# wrapper class of Calculator because there is less stack push and pop.

The C++/CLI wrapper presents the worst performance. I was not surprised because C++/CLI wrapper adds additional layer to make the call and the underlying mechanism of calling into the C++ DLL is still the same, P/Invoke, it is implicit though.

Here is testing class for double version of sqrt and its result for the method of double sqrt(double x).

The result of C++/CLI, PInvoke Performance

This time, the C# managed sqrt performs better than the float version because it does not need to convert the input parameter and convert back the return value any more,  and conversion back and forth does take time.

And, unfortunately, the C++/CLI wrapper is again the worst performer.

To be fair to C++/CLI, I would mention that the

was added to the PInvoke declaration when building the C# Wrapper for the native C++ DLL.

Stay tuned for the second part of the blog of C++/CLI vs PInvoke Performance.

Translate »