Vendors of products that work at Layer 4 and above like to quote performance numbers at large packet sizes and with a few sessions because most products perform better when dealing with large chunks of data from a handful of sources. But these performance numbers aren't particularly useful because traffic consists of small packet sizes and lots of concurrent sessions. Before you trust anyone's performance claims, including those of third parties, make sure they explain how the test was conducted and the nature of the traffic used in the test.
Latency, to paraphrase RFC 1242, is the time interval from when the last bit enters the SUT to when the first bit exits the SUT. This is the definition for store-and-forward devices--a category that includes routers, switches, load balancers and security equipment--because the packets are first received by the SUT and stored in a buffer, after which some processing occurs and the packet is sent on its way. Latency is a problem for both transactional processing, which happens with databases, and interactive processing, like telnet or SSH (Secure Shell). Although testing latency is especially difficult, you can get reliable latency numbers by placing testing devices adjacent to one another, as we did in our test of midrange Fibre Channel switches (see "High on Fibre").
Loss is just that: lost packets or, in the case of TCP, lost sessions. Loss in an SUT can be as simple as the SUT not accepting new traffic or dropping old, queued traffic in favor of new traffic. Loss is commonly a limiting factor in throughput testing.
Of course, how an SUT performs during a test can vary widely based on the traffic presented. Most Layer 2/3 devices, when presented with a traffic volume that exceeds capacity, will simply start to drop packets, introducing loss. Latency tends not to manifest itself until the capacity is nearly reached and the SUT is overloaded.
On the other hand, when application-layer devices are tested, latency per connection rises sooner as the throughput increases because connections are queued up while other connections are being processed. Especially at Layer 7, latency is often the limiting factor in the useful throughput an SUT can support, long before actual throughput constraints are met. For example, SUTs processing Web applications with a latency under load in the seconds and tens of seconds are all but useless.
There are significant differences in how various testing tools measure performance, but the critical component is how they interact with the SUT. The two broad classes of tools are traffic generators, which create large numbers of packets of the sort that may or may not be produced by a working network stack, and transaction generators, which, at minimum, send and receive real transactions over a valid, working network stack. The main differentiator is that transaction generators implement a true network stack, including the application layers.