It’s very easy to misread and aggrandize achievements in bogus intelligence. And boilerplate is this more axiomatic than in the domain of human language, where appearances can falsely hint at all-embracing capabilities. In the past year, we’ve seen any number of companies giving the consequence that their chatbots, robots and other applications can engage in allusive conversations as a human would.

You just need to look at Google’s Duplex, Hanson Robotics’ Sophia and abundant other belief to become assertive that we’ve accomplished a stage that bogus intelligence can apparent human behavior.

But arrive human accent requires much more than replicating human-like voices or bearing admirable sentences. It requires commonsense, compassionate of ambience and creativity, none of which accepted AI trends possess.

To be true, deep acquirements and other AI techniques have come a long way toward bringing humans and computers closer to each other. But there’s still a huge gap adding the world of circuits and binary data and the mysteries of the human brain. And unless we don’t accept and accede the differences amid AI and human intelligence, we will be aghast by unmet expectations and miss the real opportunities that advances in bogus intelligence provide.

To accept the true depth of AI’s affiliation with human language, we’ve broken down the field into altered subdomains, going from the apparent to the depth.

Speech to text

Voice archetype is one of the areas where AI algorithms have made the most progress. In all fairness, this shouldn’t even be advised bogus intelligence, but the very analogue of AI is a bit vague, and since many people might abominably adapt automatic archetype as appearance of intelligence, we absitively to appraise it here.

The older iterations of the technology appropriate programmers to go through the annoying action of advertent and codifying the rules of classifying and converting voice samples into text. Thanks to advances in deep acquirements and deep neural networks, speech-to-text has taken huge leaps and has become both easier and more precise.

With neural networks, instead of coding the rules, you accommodate plenty of voice samples and their agnate text. The neural arrangement finds the common patterns among the accentuation of words and then “learns” to map new voice recordings to their agnate texts.

These advances have enabled many casework to accommodate real-time archetype casework to their users.

There are plenty of uses for AI-powered speech-to-text. Google afresh presented Call Screen, a affection on Pixel phones that handles scam calls and shows you the text of the person speaking in real time. YouTube uses deep acquirements to accommodate automatic close captioning.

But the fact that an AI algorithm can turn voice to text doesn’t mean it understands what it is processing.

Speech synthesis

The flip-side of the speech-to-text is speech synthesis. Again, this really isn’t intelligence because it has annihilation to do with compassionate the acceptation and ambience of human language. But it is nonetheless an basic part of many applications that interacts with humans in their own language.

Like speech-to-text, speech amalgam has existed for quite a long time. I bethink seeing computerized speech amalgam for the first time at a class in the 90s.

ALS patients who have lost their voice have been using the technology for decades acquaint by typing sentences and having a computer read it for them. The blind also using the technology to read text they can’t see.

However, in the old days, the voice generated by computers did not sound human, and the conception of a voice model appropriate hundreds of hours of coding and tweaking. Now, with the help of neural networks, synthesizing human voice has become less cumbersome.

The action involves using generative adversarial networks (GAN), an AI address that pits neural networks adjoin each other to create new data. First, a neural arrangement ingests abundant samples of a person’s voice until it can tell whether a new voice sample belongs to the same person.

Then, a second neural arrangement generates audio data and runs it through the first one to see if validates it as acceptance to the subject. If it doesn’t, the architect corrects its sample and re-runs it through the classifier. The two networks repeat the action until they are able to accomplish samples that sound natural.

There are several websites that enable you to amalgamate your own voice using neural networks. The action is as simple as accouterment it with enough samples of your voice, which is much less than what the older ancestors of the technology required.

There are many good uses for this technology. For instance, companies are using AI-powered voice amalgam to enhance their chump experience and give their brand its own unique voice.

In the field of medicine, AI is allowance ALS patients to regain their true voice instead of using a computerized voice. And of course, Google is using the technology for its Duplex affection to place calls on behalf of users with their own voice.

AI speech amalgam also has its evil uses. Namely, it can be used for forgery, to place calls with the voice of a targeted person, or to spread fake news by assuming the voice of a head of state or high-profile politician.

I guess I don’t need to remind you that if a computer can sound like a human, it doesn’t mean it understands what it says.

Processing human accent commands