21 APR 2023

Jesse Shemen: "We want to make video content watchable in any language"

The Co-founder and CEO of Papercup, explains the potential of the company's AI-powered video translation software, which can simultaneously transform a content's original language into other new versions.

21 APR 2023
Jesse Shemen

Jesse Shemen

Share
  • Facebook
  • X
  • Linkedin
  • Whatsapp

Founded in 2017, Papercup offers an artificial intelligence and machine learning-based system that localizes videos into other languages using synthetic voices with genuine emotional depth. Content that has been AI dubbed by Papercup has already reached over 350 million people in the last 12 months alone in non-English speaking territories. To describe the focus and objectives of the company, Señal News spoke with Jesse Shemen, Papercup's Co-founder and CEO.

How would you explain the concept behind Papercup's new AI dubbing technology?
"We generate synthetic voices in a target language; that way, people can watch, consume, and enjoy the video content they want to see, but in their native tongue. That happens without us needing to go through the entire end-to-end dubbing process, which can be quite time-consuming and expensive. It's important to note, though, that we also have what's called a human intervening step."

How does technology manage to interpret languages, accents, and tones?
"With accents, we almost treat it as an independent language. If we want to accommodate another accent, we must treat it like another language study. So, we need to build up a different dictionary and create a different linguistic front end. We must be selective about which languages we prioritize to go into, which is usually just a function of demand. We're using novel machine learning and artificial intelligence techniques to make video content watchable in any language."

How does technology convert content into other languages?
"To explain how that works, we take the original asset in English, we upload it to our platform, it then goes through automated transcription, which is going from audio to text, it then takes that text, and it is automatically translated into whatever the target language is. Then, our speech synthesis system, full text-to-speech, or synthetic voice system, generates a synthetic voice from that translated text. Now, once that's produced, which can happen very quickly, we always have a human translator that quality checks each sentence to make sure that it meets the requirements of the content owner."

What languages is Papercup working with right now?
"Right now, we focus on five: Latin American Spanish, Brazilian Portuguese, German, Italian, and French. The following objectives on the roadmap are Castilian Spanish and a few others, like Hindi and Arabic, which we've already started testing."

What specific content genres are your primary focus?
"The content type we primarily work on today is factual documentary, informational news, and reality TV. Dramas and comedies are still so vibrant and expressive that we don't think it's best suited for synthetic voices. We think the dubbing industry and voice artists will eventually focus on and migrate to it. The idea is to go to a more creative and performance-based model, whereas synthetic or AI dubbing will also care for those genres."

What are Papercup's short-term expansion plans?
"Two very straightforward things. One is to create even more expressive and emotive voices to expand the different content types we want to translate. Secondly, more languages. The more languages you can offer, the more you can help content owners reach bigger audiences".

By Karla Florez & Diego Alfagemez