All ARM cores have a limited barrel shifter accessed with the multiply instructions. I forget which argument is the 8 bitter, but it takes one or two cycles + 1 for each 8 bits of the 8 bitter while the other can be as large as you want in all ARMs with more extensive multiply instructions available depending on the core version.