Machine perception is the ability to use input from sensors (such as cameras, microphones, wireless signals, and active lidar, sonar, radar, and tactile sensors) to deduce aspects of the world. Applications include speech recognition, facial recognition, and object recognition. Computer vision is the ability to analyse visual input.
Machine perception is the capability of a computer system to interpret data in a manner that is similar to the way humans use their senses to relate to the world around them. The basic method that the computers take in and respond to their environment is through the attached hardware. Until recently input was limited to a keyboard, or a mouse, but advances in technology, both in hardware and software, have allowed computers to take in sensory input in a way similar to humans.
Machine perception allows the computer to use this sensory input, as well as conventional computational means of gathering information, to gather information with greater accuracy and to present it in a way that is more comfortable for the user. These include computer vision, machine hearing, machine touch, and machine smelling.
The end goal of machine perception is to give machines the ability to see, feel and perceive the world as humans do and therefore for them to be able to explain in a human way why they are making their decisions, to warn us when it is failing and more importantly, the reason why it is failing. This purpose is very similar to the proposed purposes for artificial intelligence generally, except that machine perception would only grant machines limited sentience, rather than bestow upon machines full consciousness, self-awareness, and intentionality. Present day technology, scientists, and researchers though still have a ways to go before they accomplish this goal.
Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images and high-dimensional data from the real world to produce numerical or symbolic information, e.g., in the forms of decisions. Computer vision has many applications already in use today such as facial recognition, geographical modeling, and even aesthetic judgement.
However, machines still struggle to interpret visual impute accurately if said impute is blurry, and if the viewpoint at which stimulus are viewed varies often. Computers also struggle to determine the proper nature of some stimulus if overlapped by or seamlessly touching another stimulus. This refers to The Principle of Good Continuation. Machines also struggle to perceive and record stimulus functioning according to the Apparent Movement principle which Gestalt psychologists researched.
Machine hearing, also known as machine listening or computer audition, is the ability of a computer or machine to take in and process sound data such as speech or music. This area has a wide range of application including music recording and compression, speech synthesis, and speech recognition. Moreover, this technology allows the machine to replicate the human brain’s ability to selectively focus on a specific sound against many other competing sounds and background noise. This particular ability is called “auditory scene analysis”. The technology enables the machine to segment several streams occurring at the same time.
Many commonly used devices such as a smartphones, voice translators, and cars make use of some form of machine hearing. Present technology still occasionally struggles with speech segmentation though. This means hearing words within sentences, especially when human accents are accounted for.
Machine touch is an area of machine perception where tactile information is processed by a machine or computer. Applications include tactile perception of surface properties and dexterity whereby tactile information can enable intelligent reflexes and interaction with the environment. This could possibly be done through measuring when and where friction occurs, and of what nature and intensity the friction is. Machines however still do not have any way of measuring some physical human experiences we consider ordinary, including physical pain. For example, scientists have yet to invent a mechanical substitute for the Nociceptors in the body and brain that are responsible for noticing and measuring physical human discomfort and suffering.
Scientists are also developing computers known as machine olfaction which can recognise and measure smells as well. Airborne chemicals can be sensed and classified with a device sometimes known as an electronic nose. While the present prototypes to this technology are still elementary, the possible future uses for such machines are staggeringly impressive.
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do.
Computer vision tasks include methods for acquiring, processing, analysing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images (the input of the retina) into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.
The scientific discipline of computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, multi-dimensional data from a 3D scanner, or medical scanning device. The technological discipline of computer vision seeks to apply its theories and models to the construction of computer vision systems.
Sub-domains of computer vision include scene reconstruction, object detection, event detection, video tracking, object recognition, 3D pose estimation, learning, indexing, motion estimation, visual servoing, 3D scene modeling, and image restoration.