There's also a growing need to provide broad application support, not simply a few specific TCP applications. Forget about in-house applications for a moment. Real-time applications are increasingly important to organizations, and these generally run over RDP/UDP, not TCP. So all of that corporate voice traffic, those telepresence streams, the videoconferencing streams, not to mention other applications, like VDI and desktop sharing, are all going to run over UDP, and none will be improved by your WAN optimizer.
Now optimizing a VoIP application is vastly different than not optimizing a CIFS application, for example. You can't play with TCP windowing (well, there is no TCP windowing to play with), but there is a lot you can do. You can dedupe the voice or video on streaming media, and eliminate things like silence suppression that are often used on live VoIP lines. Sadly, we cannot make ponderous windbags talk faster. You can prioritize voice and video to give it preference over applications on the line. You can also maintain packet ordering to prevent jitter from downgrading voice quality. Practically, it means that enterprises can get a better sounding voice system by using a wideband codec, like G.722, between sites instead of a narrowband CODEC, like G.711 or G.726, as well as a voice service that's more resilient to changes in WAN infrastructure.
I think enterprises will prefer a WAN optimizer that can improve the functionality of all applications--transparently. At the same time, as proxy-based firewalls have proven, vendors that ignore application-specific optimizations will be pressed to explain their case. If they're unable to show how they deliver the same level of performance to application-specific optimizations, they too will aim to improve the performance of applications at higher layers. But how this will be done is anyone's guess. If DPI serves as an example, a generalized engine for tweaking application performance of any TCP/UDP application would be my bet.