Quantifying the phenomenon of immersion in virtual environments

The emergence of new media fuelled by information technology has had a significant impact on the individual and the postmodern society as a whole. A distinctive feature of new media is the active role of the user, where media and individuals are not separate but define and create each other. Nowhere is this seamless melding more apparent than in the emerging fields of virtual worlds, known as Virtual Reality (VR).

A growing body of research indicates that a human subject is capable of distributing her attention across the VR Environment (VRE) and that the experience of ‘being present’ or ‘immersed’ can be more intense than the corresponding experience in the ‘physical’ world. However, despite the fact that the disruptive technology of VR has already found applications in a number of fields (medical and psychological treatment, gaming, education), there is currently no standard or commonly agreed measurement of immersion. The fundamental thesis of this paper is that the effect of VRE and related immersive technologies can be successfully studied only via a transdisciplinary approach that combines qualitative theoretical models — widely discussed in media studies, phenomenology and psychology — with quantitative data-driven empirical models based on Big Data and modern advances in AI (Artificial Intelligence), Machine Learning (ML), and computational models. The initial aim of this research is to model the phenomenon of immersion by automatically capturing and analysing the kinematic features and expressions of the body. The paper examines visual and audio methods for statistical modeling of immersion, based on behavioural modalities of physical sensations experienced in VRE. It focuses on video data in particular, where key body parts and relevant joint locations are being automatically identified and their movements are being tracked and quantified. Similarly, for audio processing, various features are being computed, and ML algorithms for detecting emotional motifs are being developed.

An array of both supervised statistical models, such as Deep Neural Networks (DNNs), and unsupervised models such as Spectral methods for dimensionality reduction and clustering, and non-parametric kernel methods for density estimation are being employed to analyze the high-dimensional data sets. In addition, the paper demonstrates how biometric signals of immersion can be estimated from video data without the use of biometrical equipment.