Performance, Latency and Loss
Before we get to specific tools, let's define the goals and measures of performance testing.
First, throughput and latency aren't goals of performance testing, but measures. Typically, the goal of a test is to see if the system under test (SUT) can handle current or future traffic conditions. It's not enough to test for raw bits per second at a fixed frame rate, because this doesn't come close to modeling real traffic. The goal of any performance test should be to model traffic based on known or estimated patterns. Once the traffic is modeled, the test can be run against a number of systems, and you can compare performance.
The IEEE's RFC 1242, which specifies benchmarking terminology for network interconnection devices, defines throughput as the maximum rate at which there is no frame loss. However, the terminology for RFC 1242 is aimed at Layer 2/3 devices. The concept of frame loss fails when the testing goes to TCP-oriented traffic, because the error-control and congestion-recovery mechanisms in TCP mean that packets, which are contained in frames, may be lost along the way and retransmitted. When testing TCP, you'll typically want to find out the maximum rate, in bits per second, that can be achieved before the sessions start to fail.
Raw bits per second are of little interest unless you're testing modems and other bit-oriented devices. In testing devices higher up the OSI model, especially at Layer 3 and higher, remember two things: A number of factors, including packet size, can alter the SUT's performance, and once you hit Layer 4 and above, you have to worry about the number of concurrent sessions.