One of the biggest challenges in designing Virtual Environment (VE) training systems is identifying the fidelity requirements for the component technologies. Initial fidelity-related design decisions are often motivated by the belief that the more accurately the VE stimulates individual components of the human sensory system, the more likely the system will provide effective training. Given that stimuli in the real world are not presented in a simple, scripted manner, it is quite probable that this is an unrealistic goal. Consequently, the development of effective VE training systems requires a more holistic approach and must focus on how these sensory systems converge to support performance at the task level within the VE. To evaluate the success of this approach, this process also requires the development of performance metrics that enable the assessment of how a component’s fidelity relates to training outcomes, in terms of different types of sensory information. The current work discusses an initial application of this method to investigate the relationship between system design and performance in the context of a basic Military Operations in Urban Terrain (MOUT) task. While these results provide specific design recommendations for MOUT training, they also suggest a broader application for designing, testing, and evaluating training systems.