Technology designed for use in speech and language recognition is advancing faster than any scientists early in this field’s development deemed possible. Their creativity and labor led to the voices of popular artificial intelligence systems like Alexa and Siri. In a new study from the Japan Advanced Institute of Science and Technology (JAIST) and the Institute of Scientific and Industrial Research at Osaka University, physiological signals were included, for the first time in the analysis of the features of emotional intelligence, with favorable results.
Artificial intelligence (AI) is the theory and development of computer systems able to perform tasks that normally require human intelligence. A significant feature in the development of dialog AI systems is emotional intelligence. In addition to understanding the content of a communication, a system which recognizes the users’ emotional states generates an empathic response, with a less-artificial subjective experience.
“Multimodal sentiment analysis” refers to a group of methods which define standards for an AI dialog system with sentiment detection. These methods discern a person’s psychological state from their voice color, facial expression, posture, and speech. These parameters are essential for human-centered AI systems.
Current emotion-estimating methods use only observable information, without physiological signals. Systems with the ability to detect physiological information could considerably enhance sentiment estimation.
“Humans are very good at concealing their feelings. The internal emotional state of a user is not always accurately reflected by the content of the dialog, but since it is difficult for a person to consciously control their biological signals, such as heart rate, it may be useful to use these for estimating their emotional state. This could make for an AI with sentiment estimation capabilities that are beyond human,” explains Shogo Okada, professor at JAIST, in a statement.
Okada and Kazunori Komatani, a professor at Osaka University, analyzed 2,468 exchanges from 26 participants to estimate the level of enjoyment experienced by the user during the conversation. The user then described how enjoyable or boring they found the conversation. The team used a multimodal dialogue data set named “Hazumi1911.” In addition to speech recognition, voice color sensors, facial expression, and posture detection, it assessed skin potential, a form of physiological response sensing.
“On comparing all the separate sources of information, the biological signal information proved to be more effective than voice and facial expression. When we combined the language information with biological signal information to estimate the self-assessed internal state while talking with the system, the AI’s performance became comparable to that of a human,” comments Dr. Okada.
These findings, published in the journal IEEE Transactions on Affective Computing, suggest that using physiological signals in humans, could lead to more sophisticated, emotionally intelligent AI dialog systems, with more natural human-machine interactions. These systems could help identify and monitor mental illness by sensing emotional states.
There is talk around the water cooler about the next spectacular device – by entering a number of just 7-10 digits, the user is connected with the intended receiver in seconds. Hands are freed for other tasks. The user can speak directly into the device (or use a headset) to communicate with another user, in real time! With response times minimized, communication is more efficient, and productivity is greater. It’s called a telephone.