Tag: reading:
Recommended Reading: Behind the wheel of the 2023 Mercedes-Benz EQS SUV
2023 Mercedes-Benz EQS SUV first drive: Better because it’s bigger?
John Beltz Snyder, Autoblog
Our colleagues at Autoblog have some in-depth analysis of the 2023 Mercedes-Benz EQS SUV via Snyder’s first drive experience. While it’s similar to the EQS sedan, Snyder argues the SUV variant will likely be more popular.
Your smart thermostat isn’t here to help you
Ian Bogost, The Atlantic
A recent study found that smart thermostats don’t really save you money because you’re more likely to use the convenience of quick adjustments on your phone. So why are energy providers subsidizing them for customers? They’re gathering that sweet data and maybe even throttling your power consumption (with permission). Bogost argues that convenience is still worth it, especially when you don’t have to get out of bed to make yourself comfy.
America’s throwaway spies
Joel Schectman and Bozorgmehr Sharafedin, Reuters
This in-depth report examines how the US intelligence failed its informants in Iran while it fought a covert war with Tehran. “A faulty CIA covert communications system” made it easy for Iranian officials to find sources, even if they had been otherwise careful about their work.
AI is already better at lip reading than we are
They Shall Not Grow Old, a 2018 documentary about the lives and aspirations of British and New Zealand soldiers living through World War I from acclaimed Lord of the Rings director Peter Jackson, had its hundred-plus-year-old silent footage modernized through both colorization and the recording of new audio for previously non-existent dialog. To get an idea of what the folks featured in the archival footage were saying, Jackson hired a team of forensic lip readers to guesstimate their recorded utterances. Reportedly, “the lip readers were so precise they were even able to determine the dialect and accent of the people speaking.”
“These blokes did not live in a black and white, silent world, and this film is not about the war; it’s about the soldier’s experience fighting the war,” Jackson told the Daily Sentinel in 2018. “I wanted the audience to see, as close as possible, what the soldiers saw, and how they saw it, and heard it.”
That is quite the linguistic feat given that a 2009 study found that most people can only read lips with around 20 percent accuracy and the CDC’s Hearing Loss in Children Parent’s Guide estimates that, “a good speech reader might be able to see only 4 to 5 words in a 12-word sentence.” Similarly, a 2011 study out of the University of Oklahoma saw only around 10 percent accuracy in its test subjects.
“Any individual who achieved a CUNY lip-reading score of 30 percent correct is considered an outlier, giving them a T-score of nearly 80 three times the standard deviation from the mean. A lip-reading recognition accuracy score of 45 percent correct places an individual 5 standard deviations above the mean,” the 2011 study concluded. “These results quantify the inherent difficulty in visual-only sentence recognition.”
For humans, lip reading is a lot like batting in the Major Leagues — consistently get it right even just three times out of ten and you’ll be among the best to ever play the game. For modern machine learning systems, lip reading is more like playing Go — just round after round of beating up on the meatsacks that created and enslaved you — with today’s state-of-the-art systems achieving well over 95 percent sentence-level word accuracy. And as they continue to improve, we could soon see a day where tasks from silent-movie processing and silent dictation in public to biometric identification are handled by AI systems.
Context matters
Now, one would think that humans would be better at lip reading by now given that we’ve been officially practicing the technique since the days of Spanish Benedictine monk, Pedro Ponce de León, who is credited with pioneering the idea in the early 16th century.
“We usually think of speech as what we hear, but the audible part of speech is only part of it,” Dr. Fabian Campbell-West, CTO of lip reading app developer, Liopa, told Engadget via email. “As we perceive it, a person’s speech can be divided into visual and auditory units. The visual units, called visemes, are seen as lip movements. The audible units, called phonemes, are heard as sound waves.”
“When we’re communicating with each other face-to-face is often preferred because we are sensitive to both visual and auditory information,” he continued. “However, there are approximately three times as many phonemes as visemes. In other words, lip movements alone do not contain as much information as the audible part of speech.”
“Most lipreading actuations, besides the lips and sometimes tongue and teeth, are latent and difficult to disambiguate without context,” then-Oxford University researcher and LipNet developer, Yannis Assael, noted in 2016, citing Fisher’s earlier studies. These homophemes are the secret to Bad Lip Reading’s success.
What’s wild is that Bad Lip Reading will generally work in any spoken language, whether it’s pitch-accent like English or tonal like Vietnamese. “Language does make a difference, especially those with unique sounds that aren’t common in other languages,” Campbell-West said. “Each language has syntax and pronunciation rules that will affect how it is interpreted. Broadly speaking, the methods for understanding are the same.”
“Tonal languages are interesting because they use the same word with different tone (like musical pitch) changes to convey meaning,” he continued. “Intuitively this would present a challenge for lip reading, however research shows that it’s still possible to interpret speech this way. Part of the reason is that changing tone requires physiological changes that can manifest visually. Lip reading is also done over time, so the context of previous visemes, words and phrases can help with understanding.”
“It matters in terms of how good your knowledge of the language is because you’re basically limiting the set of ambiguities that you can search for,” Adrian KC Lee, ScD, Professor and Chair of the Speech and Hearing Sciences Department, Speech and Hearing Sciences at University of Washington, told Engadget. “Say, ‘cold; and ‘hold,’ right? If you just sit in front of a mirror, you can’t really tell the difference. So from a physical point of view, it’s impossible, but if I’m holding something versus talking about the weather, you, by the context, already know.”
In addition to the general context of the larger conversion, much of what people convey when they speak comes across non-verbally. “Communication is usually easier when you can see the person as well as hear them,” Campbell-West said, “but the recent proliferation of video calls has shown us all that it’s not just about seeing the person there’s a lot more nuance. There is a lot more potential for building intelligent automated systems for understanding human communication than what is currently possible.”
Missing a forest for the trees, linguistically
While human and machine lip readers have the same general end goal, the aims of their individual processes differ greatly. As a team of researchers from Iran University of Science and Technology argued in 2021, “Over the past years, several methods have been proposed for a person to lip-read, but there is an important difference between these methods and the lip-reading methods suggested in AI. The purpose of the proposed methods for lip-reading by the machine is to convert visual information into words… However, the main purpose of lip-reading by humans is to understand the meaning of speech and not to understand every single word of speech.”
In short, “humans are generally lazy and rely on context because we have a lot of prior knowledge,” Lee explained. And it’s that dissonance in process — the linguistic equivalent of missing a forest for the trees — that presents such a unique challenge to the goal of automating lip reading.
“A major obstacle in the study of lipreading is the lack of a standard and practical database,” said Hao. “The size and quality of the database determine the training effect of this model, and a perfect database will also promote the discovery and solution of more and more complex and difficult problems in lipreading tasks.” Other obstacles can include environmental factors like poor lighting and shifting backgrounds which can confound machine vision systems, as can variances due the speaker’s skin tone, the rotational angle of their head (which shifts the viewed angle of the mouth) and the obscuring presence of wrinkles and beards.
As Assael notes, “Machine lipreading is difficult because it requires extracting spatiotemporal features from the video (since both position and motion are important).” However, as Mingfeng Hao of Xinjiang University explains in 2020’s A Survey on Lip Reading Technology, “action recognition, which belongs to video classification, can be classified through a single image.” So, “while lipreading often needs to extract the features related to the speech content from a single image and analyze the time relationship between the whole sequence of images to infer the content.“ It’s an obstacle that requires both natural language processing and machine vision capabilities to overcome.
Acronym soup
Today, speech recognition comes in three flavors, depending on the input source. What we’re talking about today falls under Visual Speech Recognition (VSR) research — that is, using only visual means to understand what is being conveyed. Conversely, there’s Automated Speech Recognition (ASR) which relies entirely on audio, ie “Hey Siri,” and Audio-Visual Automated Speech Recognition (AV-ASR), which incorporates both audio and visual cues into its guesses.
“Research into automatic speech recognition (ASR) is extremely mature and the current state-of the-art is unrecognizable compared to what was possible when the research started,” Campbell-West said. “Visual speech recognition (VSR) is still at the relatively early stages of exploitation and systems will continue to mature.” Liopa’s SRAVI app, which enables hospital patients to communicate regardless of whether they can actively verbalize, relies on the latter methodology. “This can use both modes of information to help overcome the deficiencies of the other,” he said. “In future there will absolutely be systems that use additional cues to support understanding.”
“There are several differences between VSR implementations,” Campbell-West continued. “From a technical perspective the architecture of how the models are built is different … Deep-learning problems can be approached from two different angles. The first is looking for the best possible architecture, the second is using a large amount of data to cover as much variation as possible. Both approaches are important and can be combined.”
In the early days of VSR research, datasets like AVLetters had to be hand-labeled and -categorized, a labor-intensive limitation that severely restricted the amount of data available for training machine learning models. As such, initial research focused first on the absolute basics — alphabet and number-level identification — before eventually advancing to word- and phrase-level identification, with sentence-level being today’s state-of-the-art which seeks to understand human speech in more natural settings and situations.
In recent years, the rise of more advanced deep learning techniques, which train models on essentially the internet at large, along with the massive expansion of social and visual media posted online, have enabled researchers to generate far larger datasets, like the Oxford-BBC Lip Reading Sentences 2 (LRS2), which is based on thousands of spoken lines from various BBC programs. LRS3-TED gleaned 150,000 sentences from various TED programs while the LSVSR (Large-Scale Visual Speech Recognition) database, among the largest currently in existence offers 140,000 hours of audio segments with 2,934,899 speech statements and over 127,000 words.
And it’s not just English: Similar datasets exist for a number of languages such as HIT-AVDB-II, which is based on a set of Chinese poems, or IV2, a French database composed of 300 people saying the same 15 phrases. Similar sets exist too for Russian, Spanish and Czech-language applications.
Looking ahead
VSR’s future could wind up looking a lot like ASR’s past, says Campbell-West, “There are many barriers for adoption of VSR, as there were for ASR during its development over the last few decades.” Privacy is a big one, of course. Though the younger generations are less inhibited with documenting their lives on line, Campbell-West said, “people are rightly more aware of privacy now then they were before. People may tolerate a microphone while not tolerating a camera.”
Regardless, Campbell-West remains excited about VSR’s potential future applications, such as high-fidelity automated captioning. “I envisage a real-time subtitling system so you can get live subtitles in your glasses when speaking to someone,” Campbell-West said. “For anyone hard-of-hearing this could be a life-changing application, but even for general use in noisy environments this could be useful.”
“There are circumstances where noise makes ASR very difficult but voice control is advantageous, such as in a car,” he continued. “VSR could help these systems become better and safer for the driver and passengers.”
On the other hand, Lee, whose lab at UW has researched Brain-Computer Interface technologies extensively, sees wearable text displays more as a “stopgap” measure until BCI tech further matures. “We don’t necessarily want to sell BCI to that point where, ‘Okay, we’re gonna do brain-to-brain communication without even talking out loud,’“ Lee said. “In a decade or so, you’ll find biological signals being leveraged in hearing aids, for sure. As little as [the device] seeing where your eyes glance may be able to give it a clue on where to focus listening.”
“I hesitate to really say ‘oh yeah, we’re gonna get brain-controlled hearing aids,” Lee conceded. “I think it is doable, but you know, it will take time.”
By Reading Brainwaves, An A.I. Aims To Predict What Words People Listened To – Smithsonian Magazine
— Delivered by Feed43 service
In RPGs, I prefer reading books to slaying dragons
Reading Festival: Watch as armed cops with assault rifles storm in amid brawls & looting on final day
ARMED cops have stormed into Reading Festival after tents were set alight and brawls broke out on the final night.
Yobs torched camping gear, trampled tents and hurled chairs on bonfires amid ugly scenes at the Berkshire music event yesterday.
Some even went looting as armed police were called in to control the mob last night.
Shocking video shows officers wielding assault rifles as they storm the crowds to break up brawls.
Some festival-goers said they decided to leave on the final night of the event amid fears for their safety.
Amber Vellacourt told The Mirror: “We saw fires start at about 4pm in various camps.
Read more on the festival
“The crew and security were fast on them, but all the kids were surrounding and egging it on, throwing rubbish and cans into them.
“We felt the whole vibe of the camp sites change. When we saw people start picking up tents and rubbish, throwing them into the trees and across the camps, we thought it was best to pack up and head out – annoyingly so!
“But it just didn’t feel safe for two grown adults, let alone all the kids there.
“When we left at about 7pm, there was a fair bit of security but not masses.”
Most read in The Sun
Worried parents also took to social media to express their horror over the mayhem.
One claimed her daughter was “stabbed with a needle” in the leg as she waited for the toilets at the main stage.
Another told how she had she to pick her son and his three friends up after tents were trampled and set alight.
Last night’s scenes are a common occurrence on the final night of the annual three-day festival.
Thames Valley Police said: “There were some fires in the campsite on Sunday, but festival security had water pumps and extinguished these within minutes.
“There was some disorder in the campsite at about 4.30pm on Sunday, but this was dealt with within minutes by festival security and about fifty people were ejected from the site.
“Those ejected were safeguarded by the festival organisers, Thames Valley Police, and British Transport Police to ensure they could get home safely.”
This year saw thousands of music fans descend on the site to watch headliners including Dave, Arctic Monkeys and The 1975.
Read More on The Sun
It comes after a 16-year-old boy died following a suspected drug overdose at sister event Leeds Festival on Saturday.
A police investigation has been launched into the teen’s death.
Aerial pictures showed tents still burning[/caption]
The clean up operation at the site has begun[/caption]
The three-day event attracted huge crowds of music fans[/caption]
The BookTok creators whose sci-fi recommendations will shake up your reading list
For all of your science fiction and fantasy bookish needs
Recommended Reading: Productivity surveillance
The rise of the worker productivity score
Jodi Kantor and Arya Sundaram, The New York Times
Imagine if your employer only paid you for the hours you were actively working on your computer. Time spent on the phone, doing tasks on paper or reading isn’t part of your compensation since your job can’t track those things with monitoring software. It’s no far-fetched scenario — it’s already happening. Companies are tracking, recording and ranking employees in the name of efficiently and accountability. And as you read this piece, a simulation shows you what it’s like to be monitored.
Social media was a CEO’s bullhorn, and how he lured women
Karen Weise, The New York Times
Weise writes about Dan Price, the former CEO of a payment processing company who used his social media persona to “bury a troubled past.”
Harlan Band’s descent started with an easy online Adderall prescription
Rolfe Winkler, The Wall Street Journal
A 29-year-old man sought help from online mental-health startup Done, a company that “prescribes stimulants like Adderall in video calls as short as 10 minutes.” Band was already in recovery and lax patient monitoring didn’t keep adequate tabs on him. Done advertises on social platforms, “promoting a one-minute ADHD assessment ahead of its 30-minute evaluations” before charging “a $79 monthly service fee for ‘worry-free refills’ and clinician responses to questions.”
Recommended Reading: Imogen Heap’s far-reaching influence on music
The eternal influence of Imogen Heap
Cat Zhang, Pitchfork
Whether it’s “the vivid detailing in each song,” her “openness to new media and technology” or projects like her Mi.Mu Gloves, Imogen Heap’s work has inspired the likes of A$AP Rocky, Taylor Swift and Kacey Musgraves. “Heap’s music sounds like it could be released today, and not simply because the 2000s are trendy again,” Zhang writes.
Where does Alex Jones go from here?
Charlie Warzel, The Atlantic
Warzel’s Galaxy Brain newsletter makes the cut in our weekly roundup a lot because his writing on technology and related topics is consistently on point. This week, he spoke to an ex-Infowars staffer about the Alex Jones trial, including what that work experience was like and what we can do to hold Jones accountable.
Elon Musk is convinced he’s the future. We need to look beyond him
Paris Marx, Time
“Musk has become the figure everyone was looking for: a powerful man who sold the fantasy that faith in the combined power of technology and the market could change the world without needing a role for the government,” Marx writes. “But that collective admiration has only served to bolster an unaccountable and increasingly hostile billionaire. The holes in those future visions, and the dangers of applauding billionaire visionaries, have only become harder to ignore.”
Recommended Reading: What’s next for DALL-E 2?
Tech’s new frontier raises a “buffet of unwanted questions”
Charlie Warzel, Galaxy Brain/The Atlantic
Warzel dives into questions about DALL-E 2 in his newsletter for The Atlantic, many of which have been voiced by others. Those include what it could mean for the future of art and the potential commercial ambitions of OpenAI, the company that created it.
Computer lab week
Polygon
Enjoy a bit of nostalgia this weekend with pieces like “Type to Learn became a battle royale in our computer lab” and “Artists somehow keep making masterpieces with Kid Pix and MS Paint.”
‘Operating with increased intensity’: Zuckerberg leads Meta into next phase
Mike Isaac, The New York Times
Before Meta’s dismal earnings report this week, there was news of how CEO Mark Zuckerberg plans to revitalize the company as it focuses on the metaverse.