As far as I can tell you're correct. The other people are getting confused by thinking about instruction level CPU stuff which doesn't really have that much to do with flipflops. All pipelined CPUs depend on the fact that at each stage of the pipe the previous flipflops are still outputing the old value so the combinatoral logic is running on both sides at the same time. When the clock edge arrives the new values get written into the flipflops at both ends of the pipeline segment at the same time.