Here’s something that might increase any feelings of paranoia that you experience around computing and cyberspace
Researchers in the USA (Zoom on the Keystrokes: Exploiting Video Calls for Keystroke Interference Attacks) have shown that it is possible to analyse a recording of a video call (such as a Zoom, Teams, or Skype call) and use computer software to infer, with a fair degree of accuracy, what the person on the recording is typing. Neither the keyboard nor the user’s hands need to be visible on the recording. I confess that any paper whose introduction starts with ” Catalyzed by the ubiquity of the Internet..” is unlikely to capture my undivided attention through to the end, but I think I’ve got the gist of it from Tom’s Guide and a skim of the paper itself.
The basis of the method is that the program looks at reference points in the face of the person in the video and then infers what keys have been pressed from the movement of the arms and shoulders relative to those facial reference points. It sounds fantastic (in the sense of “fanciful” rather than “great!”) and no-one is claiming that it is anywhere near 100% accurate, but it is definitely capable of stealing information.
If, for instance, it knows the email address of the person in the recording, then it can recognise that the email address has just been typed with about 90% accuracy. It then assumes that the next thing being typed is a password. If the password is a good, strong, unique, one then it’s going to struggle, but the supposed password that has just been typed can be compared against a database of the most common million passwords. If the person in the recording has been lazy and/or predictable in the password creation then they may now be in danger. Remember, there will also probably be an audio track to the recording so, depending on the context, it could be completely obvious what account the password just gleaned belongs to.
The paper’s authors do go on to offer advice as to how to mitigate the threat. This, naturally, revolves around reducing the accuracy of the analysis. So, wearing long sleeves reduces the accuracy of the measurement of arm movement, and reducing the frame rate or resolution of the video capture also reduces accuracy. Having long hair also affects the analysis, apparently (those were the days!). Some things you might think are relevant, but aren’t, include the make and size of the keyboard (but a “zwerty” keyboard instead of a normal “qwerty” one would probably complicate things). The researchers also acknowledge that they didn’t investigate differences in accuracy caused by the participant’s “error rate” when typing. My mind is now thinking of other potential tactics such as moving the keyboard by a few inches every now and then, or turning off the video when entering sensitive information.
When I first read about this, I thought that you’d have to be paranoid to be worried about it, but the more I think about it, the more realistic the threat appears to become (or the more paranoid I become). Clearly, if your video conference is with someone you trust (and you don’t fear anyone else getting hold of a recording of the session) then there’s probably not a lot to worry about. But what if you are on a conference call with 100 other people who you don’t know?
Will this be just a quirky bit of research that is soon forgotten, or might this become a major new threat to cyber security as the accuracy of the analysis improves? Dunno.