CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Results 1 to 8 of 8
  1. #1
    Join Date
    Mar 2015
    Posts
    13

    Code runs 2x faster on windows than on linux

    I have a piece of code that runs 2x faster on windows than on linux. Here are the times I measured:

    g++ -Ofast -march=native -m64
    29.1123
    g++ -Ofast -march=native
    29.0497
    clang++ -Ofast -march=native
    28.9192
    visual studio 2013 Debug 32b
    13.8802
    visual studio 2013 Release 32b
    12.5569
    It really seems to be too huge a difference.

    Here is the code:

    #include <iostream>
    #include <map>
    #include <chrono>
    static std::size_t Count = 1000;

    static std::size_t MaxNum = 50000000;

    bool IsPrime(std::size_t num)
    {
    for (std::size_t i = 2; i < num; i++)
    {
    if (num % i == 0)
    return false;
    }
    return true;
    }

    int main()
    {
    auto start = std::chrono::steady_clock::now();
    std::map<std::size_t, bool> value;
    for (std::size_t i = 0; i < Count; i++)
    {
    value[i] = IsPrime(i);
    value[MaxNum - i] = IsPrime(MaxNum - i);
    }
    std::chrono:uration<double> serialTime = std::chrono::steady_clock::now() - start;
    std::cout << "Serial time = " << serialTime.count() << std::endl;

    system("pause");
    return 0;
    }
    All of this was measured on the same machine with windows 8 vs linux 3.19.5(gcc 4.9.2, clang 3.5.0). Both linux and windows are 64bit.

    What could be the reason for this? Some scheduler issues?

    EDIT: It was caused by building 32b binaries on windows as opposed to 64b binaries on linux, here are 64b numbers for windows:

    Visual studio 2013 Debug 64b
    29.1985
    Visual studio 2013 Release 64b
    29.7469

  2. #2
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Code runs 2x faster on windows than on linux

    Quote Originally Posted by Anthonyemuobo View Post
    visual studio 2013 Release 32b
    12.5569
    ...
    Visual studio 2013 Release 64b
    29.7469
    Now somebody has to explain why 64-bit version takes that much longer.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  3. #3
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Code runs 2x faster on windows than on linux

    Let me take a shot at it myself.
    I have eliminated the map and looping, here is the shortest sample I got:
    Code:
    bool IsPrime(std::size_t num)
    {
    	for (std::size_t i = 2; i < num; i++)
    	{
    		if (num % i == 0)
    			return false;
    	}
    	return true;
    }
    
    int main()
    {
    	auto start = std::chrono::steady_clock::now();
    	bool b = IsPrime(49999991);
    	std::chrono::duration<double> serialTime = std::chrono::steady_clock::now() - start;
    	std::cout << "Serial time = " << serialTime.count() << std::endl;
    
    	system("pause");
    	return b;
    }
    It takes 0.352..0.398 sec in 32-bit and 0.825 sec in 64-bit.
    I know there are twice as many bits to work with, but I didn't know it takes twice the time!
    This example takes 0.365..0.401 sec in 64-bit if the parameter type to IsPrime() is changed from std::size_t to unsigned int.
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  4. #4
    2kaud's Avatar
    2kaud is offline Super Moderator Power Poster
    Join Date
    Dec 2012
    Location
    England
    Posts
    7,923

    Re: Code runs 2x faster on windows than on linux

    Using Vladimir#s code from post #3 with MSVS 2013 release optimised I get
    32 bit - 0.19
    64 bit - 0.42

    Replacing size_t with unsigned int gives
    32 bit - 0.19
    64 bit - 0.19

    However replacing size_t with unsigned long long gives
    32 bit - 0.38
    64 bit - 0.43
    All advice is offered in good faith only. All my code is tested (unless stated explicitly otherwise) with the latest version of Microsoft Visual Studio (using the supported features of the latest standard) and is offered as examples only - not as production quality. I cannot offer advice regarding any other c/c++ compiler/IDE or incompatibilities with VS. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/ and can be used without reference or acknowledgement. Also note that I only provide advice and guidance via the forums - and not via private messages!

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  5. #5
    Join Date
    Oct 2008
    Posts
    1,456

    Re: Code runs 2x faster on windows than on linux

    @2kaud, can you post the generated asm ? maybe, a smaller size allows the compiler to unroll the loop and perform two loop cycles in a single op ...

  6. #6
    Join Date
    Aug 2000
    Location
    New York, NY, USA
    Posts
    5,656

    Re: Code runs 2x faster on windows than on linux

    Is it OK if I will answer?
    There is only one loop, and there is no unrolling.
    32-bit
    Code:
    	bool b = IsPrime(49999991);
    00911047  mov         ecx,2  
    0091104C  lea         esp,[esp]  
    00911050  xor         edx,edx  
    00911052  mov         eax,2FAF077h  
    00911057  div         eax,ecx  
    00911059  test        edx,edx  
    0091105B  je          main+3Ah (091106Ah)  
    0091105D  inc         ecx  
    0091105E  cmp         ecx,2FAF077h  
    00911064  jb          main+20h (0911050h)  
    00911066  mov         bl,1  
    00911068  jmp         main+3Ch (091106Ch)  
    0091106A  xor         bl,bl
    64-bit
    Code:
    	bool b = IsPrime(49999991);
    000000013FAF1050  xor         edx,edx  
    000000013FAF1052  mov         eax,2FAF077h  
    000000013FAF1057  div         rax,rcx  
    000000013FAF105A  test        rdx,rdx  
    000000013FAF105D  je          main+40h (013FAF1070h)  
    000000013FAF105F  inc         rcx  
    000000013FAF1062  cmp         rcx,2FAF077h  
    000000013FAF1069  jb          main+20h (013FAF1050h)  
    000000013FAF106B  mov         dil,1  
    000000013FAF106E  jmp         main+43h (013FAF1073h)  
    000000013FAF1070  xor         dil,dil
    Vlad - MS MVP [2007 - 2012] - www.FeinSoftware.com
    Convenience and productivity tools for Microsoft Visual Studio:
    FeinWindows - replacement windows manager for Visual Studio, and more...

  7. #7
    Join Date
    Oct 2008
    Posts
    1,456

    Re: Code runs 2x faster on windows than on linux

    ok, no compiler magic involved; googling around, it appears that integer division is a special beast with data dependent latency ( as far as I got it, the cpu optimizes division for smaller sizes ) with reported, worst case latencies in 32 vs 64 bits unsigneds div's ranging from 2x to 3x, depending on cpu ... nice

  8. #8
    Join Date
    Apr 2000
    Location
    Belgium (Europe)
    Posts
    4,626

    Re: Code runs 2x faster on windows than on linux

    1) 64bit code by nature will be somewhat slower than 32bit because it has more instruction prefixes which means the CPU pipelines stall.
    2) 64bit code tends to be bigger because of that, so more instruction bytes need to be fetched
    3) pointers are 64bit, native types are 64bit, which means your L1 cache will be flooded faster.
    4) 64bit multiplication and division are slower than 32bit

    When you write 64bit software, don't assume you'll get faster code, typically you won't. You basically start in a hole compared to equivalent 32Bit code, and it isn't until you can make use of 64bit specific features such as having access to more than 2Gb ram, and more efficient handling of 64bit math (if you NEED the math to be 64bit) that you have a means to get out of the hole and exceed 32Bit performance.

    Specific here
    size_t is 32Bit on win32 and 64bit on Win64.
    so the 64bit math suffers from a larger datasize and a slower division.

    change the size_t to __int64 and Win64 will blitz ahead because in Win32 the calculations now need to be simulated in software using 32Bit math.
    or
    change size_t to __int32 and the gap will reduce for Win64 because it now doesn't need to perform a more complex 64bit division.

    But 'technology aside'. This is a horribly inefficient way to do primal-testing. It's "ok enough" if you only need to test a single (or a few) numbers, but if you want to do this for lots of numbers, it'll be very slow indeed, especially for large values.


    Optimisations:
    - you don't need to test for all numbers up to i<num, you only need to test for i<sqrt(num) because any number larger that that will already have been discovered by it's matching smaller counterpart. (only do the sqrt(num) once. Or flipped around, since sqrt() tends to be inefficient. test for i*i<n
    - build a list of previous primes (this'll take memory to speed things up). a test for "divisible by 9" will already have been tested by division by 3., the same i true for any other multiple of 3 (and multiples of 5, 7, 11, ...).
    Look up "Sieve of Eratosthenes" for one possible method for doing this. The question here is.. "can you afford the memory" ?
    Last edited by OReubens; May 4th, 2015 at 07:21 AM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  





Click Here to Expand Forum to Full Width

Featured