10 Artificial Intelligence Innovations in 2021 That Gave us Hope for the Future
The application and popularity of Artificial Intelligence are thriving by the day. Businesses around the globe are tripling if not doubling their budgets for AI research and applications pursuing various innovations. AI is revolutionizing all possible aspects of technological advancements from the field of biology, to chemistry, to astronomy, to material sciences, to earth sciences, to zoology, and much more. And Innovative AI technologies are spawning at a rapid rate. As we wrapped the year 2021, we are trying to list the top 10 advancements in the field of artificial intelligence that gave us hope for the future.
CLIP is a neural network that efficiently learns visual concepts from natural language supervision. It can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized. This network is trained on a wide variety of natural language supervision most of which are abundantly available over the internet. The creators claim that this network closes the robustness gap nearly by 75% while matching the performance of the original ResNet-50 on ImageNet zero-shot without making use of any labeled instances.
Here are 10 of the top AI innovations of 2021....
OpenAI's DALL.e & CLIP
OpenAI is an AI research and deployment company that ensures that artificial general intelligence benefits all of humanity. This company introduced two multimodal neural networks in 2021 that as per their creators "a step towards systems with a deeper understanding of the human world".
DALL.e (inspired by popular artist Salvador Dali and Pixar's WALL-E) is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text-image pairs. The create explained DALL-E in their blog post that "It receives both the text and the image as a single stream of data containing up to 1280 tokens and is trained using maximum likelihood to generate all of the tokens, one after another." There is some demo of DALL-E available on the official website of OpenAI here which looks amazing. You can also change some text parameters and see how that affects the output. One of the output looks like below:
CLIP is a neural network that efficiently learns visual concepts from natural language supervision. It can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized. This network is trained on a wide variety of natural language supervision most of which are abundantly available over the internet. The creators claim that this network closes the robustness gap nearly by 75% while matching the performance of the original ResNet-50 on ImageNet zero-shot without making use of any labeled instances.
Unified Transformer
Transformer architectures have shown great improvements in the field of machine learning, especially in natural language processing. They are later extended in the visual domain and beyond. A Unified Transformer can simultaneously learn the prime tasks across various domains. Companies like Facebook, Google, and many others are putting a lot of resources to make the transformer architecture much stronger. As of now, they are limited to a single domain or specific multimodal domains but new architectures can be found at no time.
GitHub's Copilot
GitHub Copilot as they say "GitHub Copilot: Your AI pair programmer" is an AI pair programmer that makes suggestions for line completions and the entire function body as you type while programming. It is powered by the OpenAI Codex System and is trained on many billion lines of code. It has already been implemented as an extension in Visual Studio Code and soon will be implemented in all commercial VS products. For now, it is only available to very few of the developers. you can try your luck here.
Blender Bot 2
Blender Bot 2 is an open-source AI chatbot developed by Facebook AI Research(FAIR). It is the first chatbot that can simultaneously build long-term memory it can continually access, search the internet for timely information, and have sophisticated conversations on nearly any topic. It can conduct longer, more knowledgeable, and factually consistent conversations over multiple sessions.
Google's Translatotron 2
Researchers from Google have developed a new version of the Translatotron AI translation model that can recreate a speaker's voice in a variety of languages. It is a neural direct speech-to-speech translation model that can be trained end-to-end. It consists of a speech encoder, a phoneme decoder, a mel-spectrogram synthesizer, and an attention module that connects all the previous three components.
FLAML
FLAML (A Fast and LightWeight AutoML Library) is a python package developed by Microsoft that can tell us the best fitting machine learning model for low computation, which means it removes the manual process of choosing the best model and identifying the best parameter. Microsoft focused on five main features while developing this, they are:
- Model Selection
- Feature Engineering
- Neural Architecture Search
- Hyperparameter Tuning
- Model Compression
MusicBERT
MusicBERT is a large-scale pre-trained model for symbolic music understanding developed by Microsoft. Symbolic Music as in understanding music from the symbolic data, for instance, MIDI format. It consists of many music applications like emotion classification, genre classification, and music piece matching. Microsoft made use of the OctupleMIDI method, level masking strategy, and a large-scale symbolic music corpus of more than a million music tracks.
Vertex AI
Google developed a new managed machine learning platform called Vertex AI which is meant to make it easier for developers to deploy and maintain their AI models. It is primarily focused on mobile and web developers. It is missioned to be very flexible platform for developers which will allow them to quickly train models with according to Google in about 80% of fewer lines of code.
Microsoft's neural TTS
Neural Text-to-speech (Neural TTS) is now capable of matching human voice quality through the release of Uni-TTSv4 which is the newest version in the platform. It is being implemented for Microsoft's Azure Cloud Platform. This TTS now only supports a few languages however Microsoft is saying that other languages will get the update soon. All of the TTS models are measured by Mean Opinion Score(MOS), which is a popular speech quality testing service.
To show the performance of this new version below are the test results of top 8 available models:
TensorFlow 3D
Google's TensorFlow 3D is an open-source library that adds 3D deep-learning capabilities to the TensorFlow Machine Learning Framework. This brings resources and tools that allow developers to develop and deploy models that have an understanding of 3 Dimensional scenes. With the rise in 3Dsensors like Lidar and Depth Sensing cameras this library will help to tackle 3D scene understanding for the researchers.
Honorable Mentions:
- Quantum ML
- Tiny ML
- AI with RPA(Robot Process Automation)
Post a Comment