CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 2 FirstFirst 12
Results 16 to 21 of 21
  1. #16
    Join Date
    Jan 2001
    Posts
    253

    Re: Problems designing my vectorized math library

    I am not the original poster. I was just curious what the reasons were for the performance problems reported by the original poster.

    My idea was to try to provide the original poster with the syntactic sugar that would make it possible to use the __m128 intrinsic type with normal math operators instead of needing to call the _mm routines.

    The original poster set up the float4 class to provide operator overloads for this syntactic sugar.

  2. #17
    Join Date
    Aug 2008
    Posts
    902

    Re: Problems designing my vectorized math library

    Thanks for the suggestions, jwbarton.

    I gave those non member overrides seem to perform just as well as the inline functions, at least in release mode. In debug, they are a couple times slower.

    It still seems as if the best method of implementing float4 is as a typedef for __m128, with some non-member overloads for convenience, but I am still unsure of how to implement float4x4.
    Last edited by Chris_F; November 10th, 2010 at 05:52 PM.

  3. #18
    Join Date
    Aug 2008
    Posts
    902

    Re: Problems designing my vectorized math library

    Actually, I took a look through the code in the article I linked in my original post, and he has non-member overloads as well, with the following comment above the,

    Code:
    //	Overloaded operators, left here just as a reference.
    //	WARNING: This bloats the code as expressions grow

  4. #19
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: Problems designing my vectorized math library

    Quote Originally Posted by Chris_F View Post
    Thanks for the suggestions, jwbarton.

    I gave those non member overrides seem to perform just as well as the inline functions, at least in release mode. In debug, they are a couple times slower.

    It still seems as if the best method of implementing float4 is as a typedef for __m128, with some non-member overloads for convenience, but I am still unsure of how to implement float4x4.
    Performance of a debug build is irrelevant.

    The difference between a non-member (potentially non-friend) operator, and a member operator, is purely conceptual. The only difference should be if the compiler allows or doesn't allow the operator, but the result should be the same.

    Quote Originally Posted by Chris_F View Post
    Actually, I took a look through the code in the article I linked in my original post, and he has non-member overloads as well, with the following comment above the,

    Code:
    //	Overloaded operators, left here just as a reference.
    //	WARNING: This bloats the code as expressions grow
    Inline methods, by definition is code bloat. But a good code bloat. The alternative is either making them non-inline, ad you'd probably feel a difference in performance of several orders of magnitude. Or not provide the overloads, in which case users would just write by hand the same thing.

    The important part is for the users to understand the cost of each operation, and always choose the right one:

    Code:
    float4 a = b + c + d;
    vs
    float4 a = b;
    a+=c;
    a+=d;
    Chances are the second version is much faster. I know some library use template magic and temporary objects to optimize the first version, but I call it pointless. It's nothing more than syntactic sugar, for programmers who should be good enough understanding why they shouldn't have been using the first version in the first place.

    PS: for operator+, consider:
    Code:
    float4 operator+(const float4& lhs, const float4&)
    {
        float4 ret = lhs; ret+=rhs;
        return ret;
    }
    The act of creating a named varaible, rather than temporary, can help trigger NRVO (named return value optimization). You can read more about it here:

    Boost::operators, or better yet, just use boost operators, and forget about it.
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  5. #20
    Join Date
    Jan 2001
    Posts
    253

    Re: Problems designing my vectorized math library

    Originally posted by monarch_dodra
    The difference between a non-member (potentially non-friend) operator, and a member operator, is purely conceptual. The only difference should be if the compiler allows or doesn't allow the operator, but the result should be the same.
    While conceptually this is true, in practice it depends on the compiler implementation. It is true that the computed result of using a non-member operator with __m128 is the same as making a class that contains an __m128 and using a member operator.

    However, the code generated isn't the same (at least with the VS2010 compiler that I use). The compiler understands __m128 as an intrinsic type which it can pass around in the SSE registers of the processor. When using an __m128 member of a class, it stops passing around the results in the SSE registers, and makes significantly more loads and stores from the member variable of the class.

    Originally posted by monarch_dodra
    Code:
    float4 a = b + c + d;
    vs
    float4 a = b;
    a+=c;
    a+=d;
    Chances are the second version is much faster.
    As far as whether using the first or second version is faster, this is also a quality of implementation issue. As far as I can tell (I only tried this simple example), the code generation looks the same for both versions when using non-member operators with the __m128 intrinsic. It may change when using a more complicated expression. Someone looking for the fastest possible result would need verify that the compiler generated acceptable code or would need to code it explicitly.

  6. #21
    Join Date
    Aug 2008
    Posts
    902

    Re: Problems designing my vectorized math library

    Yes monarch_dodra, the "bloat" I was referring too was not the usual that results from inlining a function, but instead the unnecessary shuffling of data in and out of the SSE registers.

Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured