Aircraft operation involves many facets of multi-tasking (MT), where breakdowns in task management have serious implications for performance and safety. There is a need to develop valid, predictive tests of MT ability and provide practical criterion measures of that performance. Thirty advanced students (15 pilot-copilot pairs) in a university flight training program were tested in a medium-fidelity King Air simulator. The 30-min. scenario was designed to task-load both pilot and copilot during an event-filled instrument approach to an unpublished holding point, challenging procedure turn, and steep descent with loss of glideslope and worsening weather, culminating in a missed approach. Two observers independently scored subjects' performance on 65 MT-relevant behavioral events and rated them on six process dimensions of MT. Subjects also took a battery of predictive tests, including two specifically-designed MT tests, and tests of fluid intelligence, processing speed, and aviation mathematics. The MT tests assessed ability to simultaneously monitor multiple visual fields and hold multi-dimensional concepts in memory. The scenario challenged MT for all crews, with 40% of the events showing evidence of disruption. Interobserver reliability for MT ratings and sub-event performance was high (r=.85-.91) and MT test reliability coefficients exceeded .80. MT criterion measures showed a complex, but fascinating, relationship to the predictive tests. One test of MT was significantly related to performance for copilots but not pilots. The other test of MT predicted performance on two key archival measures of student proficiency (hours to instrument rating, number of extra flights). The paper closes with study implications for developing other criterion measures of MT, adapting MT tests for student placement, and developing MT training programs for "at-risk" student-pilots.