I've used my random number generator classes hundreds of millions of times in tests and have profiled it. The hundreds of millions of calls takes less than a minute to execute. In all the applications I use random numbers for, the code that uses the random numbers takes much longer to run that the code that produces the random numbers -- in other words I would never notice a 10% or even 30% increase in speed of my random number generators. (That being said, I'm pretty darn sure I have a well optimized generator -- it beats both the MSVC's rand() and the Mersenned Twister, but still has excellent periods [thank you Numerical Recipes!!!]).
Oh, you call hundreds of millions of times per minute something fast? It's slow as crawing! For the record, up to twenty or even fourty hundred million instructions can be carried out NOT in one minute, BUT in one SECOND. So your random number generator takes more than a thousand clock cycles to generate just one random number. That's way too slow. I suspect that Microsoft rand() could be faster than yours.

The OP indicated that he wouldn't even want to check the value of one variable each time, which only takes a few clock cycles. One thousand clock cycles would be a much bigger deal.

rand() is nothing any close to be fast. MT is 4 times faster than rand() but is still not considered a fast PRNG at all. There are plenty of PRNGs out there that's way much faster than MT. Certainly the fastest solution would be a hardware based solution.