The ultimate goal of many digital technologies is providing a better experience for people. Today, technologies are moving beyond providing useful features that enhance convenience and evolving to understand and empathize with people. Cars are no exception to this trend. In the past, reliability and comfort were the key values of a car. Since most cars sold today are reliable and comfortable, to be competitive todays’ cars need to evolve to become a machine which can understand and empathize with the people inside. The Hyundai Mobis Convergence UX Technology Team believes emotion recognition technology will play an important role in the future.
Q. Detecting emotion is not an easy task even for humans. For instance, couples often fight about misunderstood emotions so emotion recognition is a rather tough task for machines. What is the key mechanism that allows machines to read emotions?
First, the camera takes a motion image of the features on the face such as eye, eyebrow and lips to detect changes in facial expression. We then create an algorithm which can predict what facial expression mean by analyzing the facial expressions and muscle movement which are associated with different emotions.
Cameras have been around for more than 100 years but now coupled with AI and machine learning, can process images into data for highly detailed analysis. AI can help interpret what subtle changes in facial expression mean in terms of emotions. Recent rapid progress made in the emotion recognition technology has created a lot of buzz lately.
Q. So AIs read people’s emotions just like in the movies, is that how it works?
In 2016, Se-dol Lee scored a single victory against the Alpha-Go. However, no one can beat Alpha-Go today because AI’s intelligence improves exponentially over time. Similarly, emotion recognition technology has improved at an accelerated rate; as facial expression data and the capacity for processing it has improved.
Q. We understand that the camera and AI play an important role, is there any other technology needed?
The most basic requirement is securing high-quality images. The amount of information contained in a photo consisting of 1 million pixels compared to 10 million pixels differs significantly. Another important technology is cloud service; the system must rely on a powerful computer which cannot fit into a car. It is possible to fit a fairly powerful computer in a car but its capacity is already reserved for navigation and autonomous driving. Thankfully, the cloud service allows cars to tap into the processing power of powerful computers connected to the car.
Even if the onboard computer becomes powerful enough to process facial expression images, the reference data for emotion recognition needs to be stored in the server. The fast speed of 5G networks can be very helpful in this context. Even a five second delay in processing can make the system far less useful, it needs to be done in real time. In summary, having a lot of reference data is key to improving the accuracy of emotion recognition and system capacity for instant processing is a key requirement to make it work.
Q. As of April 2019, the world population had reached 7.7 billion. Almost every individual looks different and has ae wide variation in their facial expressions. Isn’t this too much for even an AI?
We had an interesting experience with a game which utilized emotion recognition technology that had been developed for the M.Vision concept car. The game had falling blocks made of emojis representing certain emotions. Many people tried the game and interestingly people from Western cultures did noticeably better than people from Eastern cultures. We think this is because people born and raised in Asia are trained to restrain from expressing their emotions.
By contrast, people born and raised in western cultures have much more distinctive facial expressions. Most Asians scored below 500 while one middle aged American woman scored 3,500. This suggests that the system needs to be improved further to capture the emotions expressed by different people from different cultures.
Q. If we succeed in developing highly advanced emotion recognition technology, what would be another possible application for it?
The entertainment industry would be the first to benefit from the technology but a safety system based on driver monitoring would be a key application as well. We already have technologies such as driver attention warning which detects erratic driving behavior and intervenes but emotion recognition has its own unique applications.
For example, there is a growing problem of aggressive driving. It is highly unlikely for a driver to engage in such behavior if he or she is in a good mood, especially with friends and family onboard. There is a good chance that people will engage in aggressive driving if they are under a lot of stress, emotion recognition technology could detect the driver’s mood and take mitigation measures such as limiting the top speed, prompting the driver to take a break, or playing upbeat music.
Q. Is there any biometric information besides facial expressions and muscle movement which can be used to recognize emotional states?
Yes. There are technologies based on analyzing pulse, heartbeat and breathing patterns but we are not focusing on those at present. Some start-ups are currently working on using sensors attached to the skin to detect changes in conductivity due to sweating to help recognize the emotional state of the subject.
Q. What is the reason behind your team’s focus on using a camera and AI for recognizing emotion? You just said there are many other types of technologies.
We strongly believe that emotion recognition technology needs to be realized without wearing a cumbersome device or compromising on the convenience of either the driver or the passengers. Given these requirements, a camera-based solution is the best technology at present. All the driver has to do is to keep an eye on the road and drive, the camera and computers can do the rest with high accuracy.
In the future, it might be possible to install sensors on the seats which can accurately read skin conductivity or the heart beat and we will be happy to use them. However, sensors that can detect a heartbeat are currently quite sensitive to vibrations. Misreading the vibrations from the car as a sudden change in heartbeat could lead to serious malfunctions. However, solutions based on other biometric information certainly have potential so we are also looking into them.
Q. I heard that recognition rate is quite high. But can it ever reach a 100% recognition rate?
A lot of teams including many start-ups are working on emotion recognition technologies and some people are saying that the recognition rate has reached 80%. However, it is hard to say whether such a number is accurate. For example, it could be possible to achieve a 100% recognition rate if we judge the accuracy by whether the system detected happiness or not. But there are many types and degrees of happiness and feelings of happiness can vary significantly. So, the 80% accuracy could be quite subjective depending on what kind of criteria is used. In fact, the number could be very different depending on the criteria used. Therefore, we are focusing more on providing optimized services based on detected emotion rather than solely focusing on improving recognition accuracy.
Q. What kind of services do you expect to be made available using emotion recognition technology?
New ideas for the application of emotion recognition technology are constantly emerging as the technology continues to evolve. There are so many ideas but here a fun one. Suppose the driver is in a great mood for some reason. Then the system could throw a joke or respond in much more casual tone like an old friend to a voice command. For example, it could say something like “I’ve got it” when asked to turn on the radio. It could make a silly remarks. Or it could keep silent when it seems like that is the best thing to do. Sometimes it is best to just say nothing, you know. Overall, we are striving to improve the technology so that drivers find it pleasant to use.
Q. Is emotion recognition technology created as part of a vision for an autonomous future?
It is not directly related to autonomous driving although there are some interesting synergies. As we mentioned earlier, we are focusing on developing technology which can help the driver calm down and drive in a civilized manner when they are in a bad mood. For example, the smart cruise control setting can be adjusted to secure more distance from the car in front. The camera can continuously monitor the driver’s face and read emotions whether in autonomous driving mode or not.
Q. I think there could be some negative side effects if emotion recognition becomes more advanced such as people refusing to have their emotions read by a machine.
All technologies have limitations. The camera-based emotion recognition system needs to take videos of the user’s face and send it to a centralized server for analysis which could make people uneasy about using a system which is storing videos. Such concerns can be alleviated with powerful security technology but there could be people who just do not want to take such risks.
We hear in the news about Tesla drivers falling asleep at the wheel with Autopilot on. These things happen because people rely too heavily on the technology. Similar risks exist for emotion recognition technology as well. How should drivers react if the system says they are in a bad mood when they are not. They could become confused and even feel forced to feel bad. We should be careful and avoid becoming overly dependent on the technology.
Q. How emotion recognition technology can be developed so the driving experience is improved while minimizing negative impacts?
There was a comedy program in Korea about a conversation between a driver who refuses to follow the directions given by the navigation system. Emotion recognition technology could end up helping drivers communicate with their cars. We think the focus should be on what it can do rather than the technology itself. We believe the goals of any technology should be to serve people and not just to improve technology. It does not really matter which technology is used whether it is emotion recognition or voice recognition.