Then I'd recommend going with C and some library, be it DirectX, OpenGL, SDL, or whatever. They probably either used one of those, bought a commercial library to use, or rolled their own. I doubt you'll want to do the second or third options.

If you want extra fast, DirectX or OpenGL have hardware acceleration, so you can do cool things like rotations and fading really quickly.

Like I said before, you have to be really good at assembly in order to actually get assembly code faster than C, since the compilers have gotten really good at optimizing code. Assembly is only faster if you're an expert, otherwise it'll likely be slower.