Recently, increasingly realistic 3D visual displays have been designed to serve as new, more ecologically valid alternatives to conventional 2D visual displays. However, research has thus far provided inconsistent evidence regarding the effectiveness of 3D displays in facilitating training and task performance. We were interested in the contribution of "immersion" to individuals' ability to spatially transform 3D images; we compared subjects' performance on spatial transformation tasks in traditional 2D non-immersive (2DNI), 3D non-immersive (3DNI: stereo-glasses), and 3D immersive (3DI: head mounted display with position tracking) environments. Twenty-five participants completed a number of spatial transformation tasks, in which they were asked either to mentally rotate 3D objects along different planes (mental rotation task) or mentally rotate their imagined selves within the environment (perspective-taking task). While the patterns of subjects' responses were not significantly different between the 2DNI and 3DNI environments, we found a unique pattern of responses in the 3DI environment. Our findings suggest that 2DNI and 3DNI environments might encourage the use of more "artificial" encoding strategies, in which the 3D images are encoded with respect to a scene-based frame of reference (i.e. the computer screen). On the other hand, 3DI environments can provide the necessary feedback for an individual to use the same strategy and egocentric spatial frame of reference that he/she would use in a real-world situation. Overall, the results of this study suggest that immersivity might be one of the most important aspects to be considered for assessment and training in domains that rely on visual-spatial performance and require high spatial transformation skills (e.g., robotics, navigation, medical surgery).