Basically it is true. Caches are a way to increase the dataflow, as well as multithreading, which when a cache fault occurs allows the procesor to switch context to other thread which has data ready, so the first process waits until the data gets loaded into the cache.
This provides better utilisation of cpu power, but for single threaded application´s performance this has no effect. The only solution is to either pre-cache data - this is usually not possible under OS for L1..3 caches, so such memory intensive application would have to be standalone and carefully use the available caches to get the best performace. For disk I/O, applications should pre-cache into dram, which is ~1000 faster than hard disks, and work with data in memory.
Another way for getting faster dram access is slicing the ram into multiple physical banks so you get faster throughput overall.
Second question depends on which memory you´re talking about. Hard disks have comaparatively large caches than drams, so slow write speeds can be somewhat masked by the cache. Slow read speed will definitely slow the programm, especially for random seeks.
For dram, write speeds are similar to read speeds. If one was slower and you had symmetrical r/w demands, the lowest would be the bottleneck, since the write cache in a processor is pretty small.