Google’s AI can now lip read better than humans after watching thousands of hours of TV




Researchers from Google’s AI division DeepMind and the University of Oxford have used artificial intelligence to create the most accurate lip-reading software ever. Using thousands of hours of TV footage from the BBC, scientists trained a neural network to annotate video footage with 46.8 percent accuracy. That might not seem that impressive at first — especially compared to AI accuracy rates when transcribing audio — but tested on the same footage, a professional human lip-reader was only able to get the right word 12.4 percent of the time.

The research follows similar work published by a separate group at the University of Oxford earlier this month. Using related techniques, these scientist were able to create a lip-reading program called LipNet that achieved 93.4 percent accuracy in tests, compared to 52.3 percent human accuracy. However, LipNet was only tested on specially-recorded footage that used volunteers speaking formulaic sentences. By comparison, DeepMind’s software — known as “Watch, Listen, Attend, and Spell” — was tested on far more challenging footage; transcribing natural, unscripted conversations from BBC politics shows.