For years and years I used only AVR microcontrollers, which made me into more of an AVR fanboy than I was allowing myself to believe that I really was. I recently moved to Texas and took on a job where we use PIC's 99% of the time.
The first PIC I started on was a PIC12. My boss (and only coworker) was out of town for a week and I couldn't get the environment to work so I started out using assembly on this guy. I've written assembly for AVR and didn't particularly like it, but it wasn't that bad. Assembly for PIC is TERRIBLE. Jesus christ... ONE working register? Bank switching? That kind of crap is rediculous. I was trying to compare some 10 bit numbers from the ADC. The high byte and the low byte were in different banks...
1. Read low ADC byte
2. Move to RAM
3. Change banks
4. Read high ADC byte
5. Do half a dozen shifts into and out of ram to figure out if this ADC reading was higher or lower than the previous one
The AVR has 32 working registers so little need to shift into and out of RAM so you save time there. The AVR is 1 clock per instruction while the PIC is 4 clocks per instruction. The devices such as PIC10 and PIC12 don't have PLLs, so the AVR is significantly faster since it doesn't have to do constant bank switches and shifts into and out of RAM.
Some of the higher pics (not including the PIC24, dsPIC, and PIC32) address the speed problem by adding in a PLL, but I think even these guys still only have ONE WORKING REGISTER. That is a PAINFUL limitation in assembly. Even in C its a terrible headache as it causes problems trying to do simple things.
A project I did recently was a PIC12 based device. It has to measure the frequency of pulses coming in on the analog comparator and output the same frequency but with a 33% duty cycle. The frequency ranges anywhere from 100Hz to 20Khz. The problem is that by the time it calculates the timer value for the output at 20Khz its already over a 33% duty cycle. It can't keep up. In the end, I settled for running TMR0 at 1/4 the speed of TMR1. On every pulse, move TMR0 to TMR1 and clear TMR0. Assert output until TMR1 overflows. At low frequencies, this results in a 25% duty cycle, but at 20Khz I'm looking at like a 65% duty cycle. There is no time to to do any calcs. Its just too dang slow at 2 MIPS. I think an ATTiny would be a much better device for this.
Also, the lower end PICs only have ONE INTERRUPT VECTOR. That means when you get an interrupt, you have to start looking at every interrupt flag to determine which on you have. Sometimes you have to look past that to figure out if the UART interrupt was due to a received byte or a transmitted byte. AVR gets an interrupt, and you're on your way to the correct vector.
Now, for PIC24 and up, the AVRs in my experience are not quite able to keep up. There are just a few features that the PIC has that I don't think AVR has yet. You can move peripherals around to the pins where you want them, a pretty doggone configurable internal oscillator, internal current sources for things such as capacitive touch sense....
As for PIC32, its not even fair to compare them to the AVR. If you're using a PIC32 you're in a different class of MCU.