If a training event happens and no one builds a record of its gains and outcomes, does it matter? How do you know
that the gains and outcomes you recorded, or the tools you used to make that record, are even valid and generalizable
to other situations? Are you really improving human performance, or just inferring that you improved it? It’s a
challenge faced by all communities of research (Teijlingen & Hundley, 2002), whether attempting to solicit survey
data in support of human factors assessments or training effectiveness analyses. This challenge is increased in multinational
events, where results contribute to a shared end state for the coalition. To create a valid new measurement
apparatus, reliability and validity must be established, and correlations should be built between subscales. Nonetheless,
that takes time, results measured from a comparable apparatus or repeated tests, and access to audiences that many
researchers lack. During Bold Quest 15.1, two apparatuses were run for precisely this testing and validation purpose
and presented to the multinational training audience under one of two circumstances: uncommented testing of the
apparatuses or careful explanation of the validation and verification purpose. Two-hundred and seven participants
provided over 1600 free text responses which were taken as indicators of their engagement with each apparatus,
compared against a non-pilot-tested survey. The pilot-tested apparatuses that were actively administered, elicited
significantly more productive responses from the participants than the passive administration groups.
Recommendations focus on optimizing apparatuses that cannot be translated into a native language due to constraints,
and provide suggestions to bolster both pilot tested and non-pilot tested apparatuses.