This isn't necessarily a bad thing. Companies are willing to replace servers every three years. Replacing data center switching every three years is uncommon today, but it may be more likely in the years ahead, particularly given the changes washing over data center switching, including SDN and 40 GbE.
You could make a sweeping statement that configuration of multicast trees is an "unknown known" for most network engineers and be perfectly correct. In practice, very few companies have a practical use for multicast. (Notable exceptions exist in niche areas of the financial trading market.)
To complicate matters, securing and operationally "de-risking" multicast is complex and expensive. Check out this presentation for more information.
Even worse, problems with a multicast protocol and the entire VXLAN overlay can fail as a single failure domain. Fate sharing of multiple services are acceptable for enterprise networks and application developers, but they are not viable in hyper-scale cloud networks where hundreds of customers and services share the network. (Note that while this article was being published, Cisco announced enhancements to its Nexus 1000V virtual switch to remove the need for IP multicast in the network.)
VXLAN to Win--But Not As You Know It
So is VMware's network virtualization future based on STT or VXLAN? My guess is neither--instead, a new VXLAN will arise.
The case against STT is a lack of standards and market adoption. Ultimately, user data must leave the overlay network and reach the external world, and this means hardware support for tunnel termination. Network vendors already VTEP for VXLAN, and it's hard to imagine that STT support is worth their while. It's certainly possible for VMware to force a standard onto the market, but I don't think they have the appetite to upset major networking vendors, especially Cisco.
If not STT, then it must be VXLAN. I predict that VXLAN will be extended to support an SDN network controller design, and its dependence on multicast will be reduced or removed completely. An SDN controller that manages host and network configuration can replace the requirement for most frame flooding because the controller knows the MAC address and IP address of every device. Thus, ARP requests could be handled in the local vSwitch, and unknown unicasts are not required because there are no unknown addresses.
Consider that VMware's vSwitch and vSphere Distributed Switch (vDS) are actually part of a controller network--vCenter is a "controller" that knows all of the hosts and their MAC addresses, and configures all of the vSwitchs across the network.
It wouldn't take much to add a vmknic for VXLAN interfaces to the vSwitch code, and then set up some configuration in the controller to configure all the endpoints in the ESXi vSwitch. That's what Nicira was doing in Openvswitch and that, I think, is why VMware bought Nicira.