20 Computer Vision Projects and Applications to learn with Deep Learning in 2022

What is Computer Vision?

Human see through their eyes and process what they can see. This type of visual perception, when used in the realm of AI, is commonly known as Computer Vision or Machine Vision. In computer vision we try to teach computer how to see. Helping blind people navigate, Reconstructing Images, Face recognition, Image restoration, Understanding Scenes are some of the popular computer vision applications. The goal of Computer Vision is to extract meaningful information from images and deriving an abstract representation of its content. There is an old saying, "An image is worth more than ten thousand words" - and for that reason Computer Vision has received enormous amounts of attention from several scientific communities in the last decades. The history of computer vision is quite interesting Summer Vision Project.

20 Computer Vision Applications

Thanks to the thousands of scientific and mathematical communities we have seen many exciting application of Computer Vision. The field of Computer Vision has been greatly enhanced by the advancement of Deep learning technology. Contemporary popular Computer Vision applications like Classification, Detection, Segmentation are nearly inseparable from the field of Deep Learning. Some of the applications of Computer Vision where deep learning is used are:

  1. Image Classification
  2. Object Detection
  3. Semantic Segmentation
  4. Human Pose Estimation
  5. Face Detection
  6. Face Recognition
  7. Neural Style Transfer
  8. Face Transfer
  9. Image Captioning
  10. Visual Question Answering
  11. Image Colorization
  12. Image Compression
  13. Image Enhancement
  14. Optical Character Recognition
  15. Image Inpainting
  16. Facial Expression Analysis
  17. Object Tracking
  18. Automatic Sign Language Recognition
  19. Robot Navigation
  20. Automatic drone inspections

1. Image Classification

In Image Classification task, we try to classify given image by assigning it to a specific category or label. During such task, we make an assumption that there is only one object or target in the given image and we focus on how to identify the category of given target.

Input: an image with a single object.

Output: a class label (e.g. cat, dog, etc.)

Example output: class probability (e.g. 84% cat).

Classification in Computer Vision
Classification in Computer Vision

Papers to read

ImageNet Classification with Deep Convolutional Neural Networks 

Gradient-based Learning Applied to Document Recognition

2. Object Detection

Object detection is the task of detecting instances of objects of a certain class within an image. In image classification there is only one output(class of the image) and we only focus on how to find the class. There are many cases where there are multiple objects with different categories. Object detection is a process of classifying each objects and finding position of each object in the given image.




Object Detection: (Object Classifying + Object Localization) of each objects in given image.

Object Classifying: Object classifying means assigning a class label to the object.

Object Localization: Object localization means finding the position of object mostly done by drawing a bounding box around the object.

Papers to read

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 

You Only Look Once: Unified, Real-Time Object Detection



3. Semantic Segmentation

Semantic segmentation, or image segmentation, is the task of clustering parts of an image together which belong to the same object class. It is a form of pixel-level prediction because each pixel in an image is classified according to a category. Semantic segmentation is one of the active research field in the realm of Computer Vision.




 Input: images
 Output: regions, structures(line segments, curve segments, circles, etc.)


4. Human Pose Estimation

The goal of Human Pose Estimation task is to detect the position and orientation of an object in given image or video frame. The given image shows one of the outcome of Human Pose Estimation. Here, we have determined the position of human joings from give input image. Beside a image, we can use image sequences, depth images, skeleton data as an input for Human Pose Estimation task.



 5. Face Detection

Face detection involves a task where we identify the boundary box that enclose face of human being in a given photo or video frame. Face detection is tremendously important field in the Computer Vision because it can be used as preliminary step for Face recognition, sentiment analysis, video surveillance, and many other fields. Face detection system takes an arbitrary image or video frame as an input, and it determine whether there are any facial structure in the image, and if any face is present, it will return the image location and extent of each face. In the image below, we can see a green bounding box that enclose human faces. After identifying face we may judge their handsomeness. For instance, Nirdesh Shrestha is the most handsome man of Nepal and we used Face Detection techniques to find the face of Nirdesh in many photos.



 

 

 6. Face Recognition



We human beings perform face recognition task routinely and effortlessly in our daily lives. The very step of Face recognition is Face detection. After getting the boundary box of the box, we compare given face against a database of pre-existing faces. It begins with detection - distinguishing human faces from other objects in the image - and then works on identification of those detected faces.

 

 7. Neural Style Transfer

Neural Style transfer is the task of changing the style of an image in one domain to the style of an image in another domain. Neural Style Transfer includes 3 images - style image, content image and generated image. We take style of style image, apply it to the content of content image and generate a generated image which will have the content of content image but style of the style image.

Content : Objects and their arrangement

Style: Style, Colors, Textures


Papers to read

A Neural Algorithm of Artistic Style

8. Face Transfer

In Face transfer we map facial performances from source to facial animations of target. Here both source and target are human individual. Face transfer utilizes facial expressions and head poses coming from the video of source actors to produce a video of target character.

9. Image Captioning



Image captioning is the task of providing a natural language description of the content within an
image. It lies at the intersection of computer vision and natural language processing.


10. Visual Question Answering

Visual Question Answering is an active research area to answer questions based on given input image. This field is combination of both Natural Language Processing and Computer Vision. The questions are asked in natural language based on given input image. For example: In the image below the question asked is "What is the mustache made of?". Clearly to ask such question we have to understand the image first. The Visual Question Answering task combines challenges for processing data with both Visual and Linguistic processing, to answer basic "common sense" question about give images.


 

11. Image Colorization

Old photographs are mostly taken in monochrome. Image colorization is the field of Computer Vision where we add plausible colors to monochrome images and videos. Image Colorization is a highly undetermined problem, requiring mapping a real-valued luminance image to a three-dimensional color valued one, that has not a unique solution. That means, there is no one single outcome in image colorization.




12. Image Compression

Image compression is a data compression technique primarily applied with the objective to reduce the size of image. The compression can be either lossless or lossy. In lossless, no information is lost when image is changed from normal form to compress form. In lossy compression some information are lost. Image compression is a type of data compression applied to digital images, to reduce their size for efficient transmission and storage. The main goal of image compression is to lower the storage and transmission requirements of a digital image.

Papers to read

Variable Rate Deep Image Compression With a Conditional Autoencoder  

13. Image Enhancement

Image Enhancement is among the most appealing areas of Computer Vision. The input to Image Enhancement will be a deteriorated image like images with low contrast, blurred images and so on and the output will be image with high quality. Image Enhancment bring out the details that is obscured, hidden or they simply highlight certain features of interest in an image. The primary goal of Image Enhancement is to modify attributes of an image like brightness, gray-level etc to make it more suitable for given task and a specific observer. The filters we use in Instagram, Snapchat are the application of Image Enhancement. A familiar task that involve Image Enhancement is when we decrease or increase the contrast, brightness of an image to make it look more better..

14. Optical Character Recognition(OCR)

Many texts, documents we see in our daily life are not in machine readable format. They are either handwritten or printed as a text. The goal of Optical Character Recognition is to take a printed text or handwritten text and convert them into digital form such that they can be read and manipulated by the computers. When texts are converted from handwritten or print form to digital form they require less memory, they can be displayed on web, and they can be transmitted from one place to another easily.

Popular OCR Tools

15. Image Inpainting

Image Inpainting is a task of reconstructing missing regions in an image. It is an important problem in computer vision and an essential functionality in many imaging and graphics applications, e.g. object removal, image restoration, manipulation, re-targeting, compositing, and image-based rendering. The technique of modifying an image in an undetectable form, is as ancient as art itself. The goals and applications of inpainting are numerous, from restoration of damaged paintings and photographs to the removal of selected objects.


16. Facial Expression Analysis 

Facial expressions Analysis and Emotion Recognition are two separate applications of Computer Vision. Emotion Recognition requires higher level of knowledge while Facial expression analysis can be done easily with lower level knowledge of human face. Human shows facial expression in response to a person's internal emotional situation, communications, intentions and feelings. The main goal of facial expression analysis is to develop a computer system that will automatically analyze and recognize the facial movements and facial feature changes from visual information.



17. Object Tracking

Object Tracking is a field in computer vision where we try to estimate the trajectory of an object in a sequence of scenes. The idea behind object tracking is very simple. The very first task is to identify initial coordinates of object which we want to track, assigning unique id to the object and then tracking each of the objects as the object make a move around the scene.

18. Automatic Sign Language Recognition

Automatic Sign Language Recognition or ASLR is an active field of research topic located in between gesture recognition and linguistics.  The main aim of automatic sign language recognition  is to automatically recognize the meaning of the movements and gesture, and translate those gestures into meaningful human spoken language. In other words we can say that the ultimate goal is to build a translation system that take input a sign language video and outputs its meaning in audio or text format.



Post a Comment

To Top