Your example didn't back up the claims you've made about the inefficiency of virtual destructors. I think you should acknowledge that.
You relied on a compiler dependent optimization. This backfired. The C++ standard doesn't state compiler generated destructors must be faster than user supplied empty ones. So generally it's a bad idea to bet on this for the performance of a program. And it's even worse to recommend others to do it.
It's called premature optimization. In my view many C++ programmers spend too much time fiddling with implementation detail they don't know will have any impact on the overall application performance. If you have this inclination it's better to use C. Far less abstraction and much more sweating the small stuff.

