WellSaid Labs will have a lot more to say in the years ahead, thanks to $10 million in new investment that’ll be used to beef up the Seattle startup’s efforts to put a widening chorus of AI-generated synthetic voices to work.
The Series A funding round — led by Fuse, an early-stage venture capital firm that counts Seattle Seahawks star linebacker Bobby Wagner among its partners — follows up on $2 million in seed funding that WellSaid raised in 2019 when it was spun out from Seattle’s Allen Institute for Artificial Intelligence.
WellSaid CEO Matt Hocking said the new funding will go toward growing the text-to-speech startup, which has a dozen employees.
“We need to double down on the research that we’re putting to work, and the research that we’re doing here to continually improve our technology,” Hocking told GeekWire. “On top of that, there’s obviously hires to build out our product offering and serve more customers in more diverse and interesting ways. And then as well as that, we’re definitely focused on our sales team and building that up.”
WellSaid Labs’ platform makes a wide assortment of natural-sounding synthetic voices available via its audio production platform, for use in applications ranging from in-house training materials to quick-hit social media videos.
“We’re not trying to create better voices than humans,” Hocking said. “That’s not what we’re here for. A lot of content goes unvoiced, simply because of the quick turnaround that needs to happen, or it needs to be updated constantly, or it’s just an internal piece of content that doesn’t have a budget associated with it.”
Those are situations in which WellSaid comes in handy. “It opens up opportunities to allow voice to be added to these productions where they wouldn’t usually have that alternative,” Hocking said.
He declined to name customer names, but for what it’s worth, WellSaid’s website lists endorsements from Nokia, the University of California at San Francisco, Blue Sky eLearn and a Canadian food retailer called Sobeys.
WellSaid offers more than a dozen text-to-speech avatars based on human voice patterns, ranging from the revved-up patter of a car salesman to no-nonsense recitations that sound as if they’re coming from a woman researcher. The company claims its software has achieved “human parity” for naturalness in short audio clips.
But wait … there’s more: Customers can create their own “AI Voice Avatars” to spec, capturing the speaking style of a branded voice. Theoretically, WellSaid could bring Jeff Bezos into the studio and create a synthetic voice that makes it sound as if the former Amazon CEO is reading out a welcome message to new employees. (Realistically, if that need ever arose, Amazon would probably have its own voice synthesis team take on the job.)
As time goes on, WellSaid aims to add to its repertoire and increase the fidelity of its synthetic voices. In the future, the company’s voices just might play speaking roles in video games, read scripts on computer-generated news programs, or engage in complex real-time interactions with consumers.
All this raises deeper questions about WellSaid’s technology and its business model. First of all, what’s to stop somebody from synthesizing, say, President Joe Biden’s voice for malign purposes?
“We obviously have a responsibility to ensure that our technology is being used in the right way for the right purposes,” Hocking said. “We create domain-specific voices based on a real voice. We would never go and just build a voice without someone’s consent.”
And when it comes to the business model, how can WellSaid hope to compete with companies like Google, Amazon and Microsoft, all of which have their own voice synthesis platforms?
“We’re in competition with them because they do TTS [text-to-speech],” Hocking acknowledged. “But we’ve re-architected and reinvented what TTS is.”
Hocking argued that WellSaid is well-placed to pursue new applications for text-to-speech technology. “We’ve been exposed to some of these other interesting use cases,” he explained. “The stuff that used to be only possible on a movie set five years ago is now possible in a different perspective today.”
And from Hocking’s perspective, Seattle is the right place for pushing further out into the speech synthesis frontier.
“The majority of our team is from Seattle,” he pointed out. “We all met here, and our preference is obviously to have people living in the area — not only because we feel as though there’s great talent here, but as well as that, it’s just a great place to build a business.”