The code actually works the way you initially described. Dividing 1234 by 10 gives a quotient of 123 and a remainder of 4 (low order digit). Dividing the quotient by 10 gives a new quotient of 12 and a remainder of 3 (next low-order digit), etc.

I suspect your 6502 code used the fact that some shift operations work with two registers "end-to-end", while other operations work with a single register. The division by 10 - which is the real crux here - then becomes easier. There are similar operations in the Intel world, too, but my assembler is far too rusty to attempt programming such an algorithm.

As you note, R.J. Mitchell's algorithm is interesting, but hardly of any practical use (other than as an assignment to some poor unsuspecting Computer Science class).