Most every single-chip processor has had its inclusion into a training system of some sort. Heathkit used the 6800 for years and years. The 6502, Z80, 8085, 1802 and other 8-bit processors all had their educational platforms, some as structured training systems such as Heathkit's, others as inexpensive experimenter's kits like the 6502. The 8080 got its big jump-start with the MITS Altair and IMSAI computers while the 6800 was the core of the Southwest Technical Products machines. Most of the original processors required more supply voltages and support chips and these were the two areas that were taken care of with the next generation of µPs. I could never imagine learning about basic microprocessors using an 8088 or anything more modern. That'd swear you off them forever!
As Nigel observes, programming is programming. You're usually learning basic microcode instructions and interfacing that changes from processor to processor, so no one processor is going to be of any advantage over another. Just like learning digital logic -- you can do it using "ancient" TTL just as well as more modern families.
Dean