Blogs & Past Meetups

Introduction to Computer Vision / Roadmap to CV Developer in 2022

Updated: Aug 9

What is Computer Vision?


Computer vision is a rapidly popularized field of artificial intelligence that is becoming increasingly used in technology industries and startups.



Vision Development lies at the intersection of artificial intelligence and machine learning. A computer vision engineer’s purpose is to help computers “see” – through the use of deep/machine learning and mathematical architectures in code.


The technology relates to computers not only being able to visualize images but also extracting the message or purpose of that image, such as determining distances and movements of incoming objects.


Photos and videos are becoming an integral part of our lives. There are over 4.4 Billion internet users and growing every day. In 2019, the number of videos watched on YouTube every minute was 4,333,560, and around 300 hours of videos were uploaded on YouTube every minute. Skype users make around 176k calls.



All of this to say, that there are a lot of other sources on the internet where graphical data i.e images and videos are being uploaded consistently. YouTube might be the 2nd largest search engine after Google, where hours of videos are being uploaded every minute.


It is easy to index and search the text, but in order to index and search the images, algorithms need to read beyond an image. So, in order to get the most out of images and videos, and to provide the best services to users based on this data, computers need to understand the images and actually “see” inside them.



This is an exciting area of technology, and it is only going to grow in the near future. Computer vision technology is already being implemented in numerous sectors and industries, including healthcare and business.



Hopefully this article has interested you in this growing field of artificial intelligence, as well as encouraged you to learn more about computer vision and where it will take us in the future.


Who is Computer Vision Engineer?

A computer vision engineer uses both software programming languages and an assortment of machine learning and artificial intelligence algorithms to create functioning vision systems.


CV engineers, who work mainly in the tech industry, are primarily responsible for implementing various computer vision applications.


However, the area of computer vision and the need for computer vision engineers is growing at a rapid pace thanks in part to automation and the rise of Big Data.



If you’re someone who enjoys the intersection of coding and innovation by research and experimentation, a career in computer vision engineering may be just right for you.


Computer vision engineers can get paid very well, but the reason they get paid so much is because their work is so useful to those who are currently developing products on the marketplace and will be in the future.


If you’re someone who loves a challenge, whether it’s through artificial intelligence or machine learning, computer vision engineer might be the career you’ve been looking for.


What does a Computer Vision Engineer do?

Computer vision engineers are very versatile and important computer engineers. While it required knowledge in many computer engineering fields, computer vision has a host of applications, from air traffic control to airport security.


Computer vision engineers apply computer vision and machine learning research to solve real-world problems. Their work uses large sums of data and statistics in order to complete complex tasks and apply supervised or unsupervised learning as part of computer vision tasks.



Also, CV engineers are tasked with spending much of their time researching and implementing machine learning and computer vision systems for their client companies and overarching corporations.


As computer vision engineer you will be responsible for researching and developing the latest computer vision applications. You will also be responsible for learning new computer vision programs and keeping up with the market research. If you love adventure, change, deadlines, and new ideas this is the job for you.


What are Computer Vision Engineers Proficient at?

Computer vision engineers generally have a significant amount of experience with a variety of systems, such as image recognition, machine learning, Edge AI, networking and communication, deep learning, artificial intelligence, advanced computing, image annotation, data science, and image/video segmentation.


The tasks required of computer vision engineers often involve skills dependent on linear algebra math libraries and a foundational understanding of algorithms and mathematical processes.


Furthermore, prosperous CV engineers will need to have various software skills in the areas of database management, development environment, and component or object-oriented software and programming languages.



Every computer vision engineer is required to have the ability to:


  • Develop image analysis algorithms: For example, algorithms allow programs to recognize and classify images into categories.

  • Develop Deep Learning architectures to solve problems: Deep learning is a sector of artificial intelligence used by computer vision engineers to create powerful image recognition or video analysis models.

  • Design and create platforms for image processing and visualization: Aside from building architectures and using algorithms, computer vision engineers are often tasked with helping to or being assisting developers of the hosts of computer vision models, which involves the designing of apps, websites, or devices that will run computer vision models.

  • Use knowledge of computer vision libraries: Since computer vision engineers use programming and coding to create computer vision models, their job descriptions often require them to be comfortable with libraries specific to the computer vision task at hand.

  • Understand dataflow programming: Dataflow programming is a programming feature that models a program as a directed graph of the data flowing between operations and involves implementing dataflow principles and architecture.


What are the key areas in Computer Vision to Master?


One should expect projects and task to be related to any of below areas in Computer Vision Science.


Image segmentation


It is the process of breaking the image into segments for easier processing and representation. Each component is then manipulated individually with attention to different characteristics.



Semantic segmentation


Semantic segmentation identifies objects in an image and labels the object into classes like a dog, human, burger etc. Also, in a picture of 5 dogs, all the dogs are segmented as one class, i.e. dog.


There are two ways to go about semantic segmentation. One is the route of classic and traditional algorithms, while the other dives into deep learning.



Instance Segmentation


Unlike semantic segmentation, objects in the image that are similar and belong to the same class are also identified as distinct instances. Usually more intensive as each instance is treated individually, and each pixel in the image is labelled with class. It’s an example of dense prediction.


For example, in an image of 5 cats, each cat would be segmented as a unique object.


Some common examples of image segmentation are: Autonomous Driving Cars, Medical Image Segmentation, Satellite Image Processing



Object Localisation


Object localisation is the process of detecting the single most prominent instance of an object in an image.


Object Detection


Object detection recognises objects in an image with the use of bounding boxes. It also measures the scale of the object and object location in the picture. Unlike object localisation, Object detection is not restricted to finding just one single instance of an object in the image but instead all the object instances present in the image.



Object Tracking


Object tracking is the process of following moving objects in a scene or video used widely in surveillance, in CGI movies to track actors and in self-driving cars. It uses two approaches to detect and track the relevant object/objects.


The first method is the generative approach which searches for regions in the image most similar to the tracked object without any attention to the background. In comparison, the second method, known as the discriminative model, finds differences between the object and its background.



Image Classification


Classification means labelling images or subjects in the image with a class that relates to the meaning. Following are some of the standard image classification algorithms you must know :


Parallelepiped classification , Minimum distance classification, Mahalanobis classification , Maximum likelihood, Some common examples of classification are:, Image recognition, object detection, object tracking.


Face Recognition


Face recognition is a non-trivial computer vision problem used to recognise faces in an image and tag the faces accordingly. It uses neural networks and deep learning models like CNN, FaceNet etc.


Firstly, the face is detected and bordered with bounding boxes. Features from the faces are extracted and normalised for comparison. These features are then fed to the model to label the face with a name/title.





Optical Character Recognition


OCR is used for converting printed physical documents, texts, bills to digitised text, which is for many other applications. It is a crossover of pattern recognition and computer vision.


A popular open-source OCR engine developed by HP and Google and written in C++ is Tesseract. To use Tesseract-OCR in python, one must call it from Pytesseract.


Image Processing


One needs to have a ground understanding of simple image processing techniques like histogram equalisation, median filtering, RGB manipulation, image denoising and image restoration.





Image regeneration or restoration is a prevalent technique of taking a degraded noisy image and generating a clean image out of it. The input image can be noisy, blurred, pixelated or tattered with old age.


Image restoration uses the concept of Prior to fill in the gaps in the image and tries to rebuild the image in steps. In each iteration the image is refined, finally outputting the restored image.


Reading