Many problems in AI (in reasoning, planning, learning, perception, and robotics) require the agent to use probabilistic methods, to operate with incomplete or uncertain information. AI researchers have devised a number of powerful tools to solve these problems using methods from probability theory and economics. Bayesian networks are a very general tool that can be used for various problems: reasoning (using the Bayesian inference algorithm), learning (using the expectation-maximisation algorithm), planning (using decision networks) and perception (using dynamic Bayesian networks). Probabilistic methods algorithms can also be used for filtering, prediction, smoothing and finding explanations for streams of data, helping perception systems to analyse processes that occur over time (e.g., hidden Markov models or Kalman filters).

A key concept from the science of economics is “utility”: a measure of how valuable something is to an intelligent agent. Precise mathematical tools have been developed that analyse how an agent can make choices and plan, using decision theory, decision analysis, and information value theory. These probabilistic methods or tools include models such as Markov decision processes, dynamic decision networks, game theory and mechanism design.

Table of Contents

Probabilistic methods are nonconstructive methods, primarily used in combinatorics and pioneered by Paul Erdős, for proving the existence of a prescribed kind of mathematical object. It works by showing that if one randomly chooses objects from a specified class, the probability that the result is of the prescribed kind is strictly greater than zero. Although the proof uses probability, the final conclusion is determined for *certain*, without any possible error.

Probabilistic methods now been applied to other areas of mathematics such as number theory, linear algebra, and real analysis, as well as in computer science (e.g. randomized rounding), and information theory.

In mathematics and computer science, probabilistic methods are used to prove the existence of mathematical objects with desired combinatorial properties. The proofs are probabilistic, they work by showing that a random object, chosen from some probability distribution, has the desired properties with positive probability. Consequently, they are nonconstructive, they don’t explicitly describe an efficient method for computing the desired objects.

The method of conditional probabilities converts such a proof, in a “very precise sense”, into an efficient deterministic algorithm, one that is guaranteed to compute an object with the desired properties. That is, the method derandomises the proof. The basic idea is to replace each random choice in a random experiment by a deterministic choice, so as to keep the conditional probability of failure, given the choices so far, below 1.

The method is particularly relevant in the context of randomised rounding, which uses probabilistic methods to design approximation algorithms. When applying the method of conditional probabilities, the technical term pessimistic estimator refers to a quantity used in place of the true conditional probability (or conditional expectation) underlying the proof.

A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalisations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.

Hidden Markov Model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it {\displaystyle X} — with unobservable (“*hidden*“) states.

As part of the definition, HMM requires that there be an observable process {\displaystyle Y} whose outcomes are “influenced” by the outcomes of {\displaystyle X} in a known way. Since {\displaystyle X} cannot be observed directly, the goal is to learn about {\displaystyle X} by observing {\displaystyle Y.} HMM has an additional requirement that the outcome of {\displaystyle Y} at time {\displaystyle t=t_{0}} may be “influenced” exclusively by the outcome of {\displaystyle X} at {\displaystyle t=t_{0}} and that the outcomes of {\displaystyle X} and {\displaystyle Y} at {\displaystyle t<t_{0}} must not affect the outcome of {\displaystyle Y} at {\displaystyle t=t_{0}.}

Hidden Markov models are known for their applications to thermodynamics, statistical mechanics, physics, chemistry, economics, finance, signal processing, information theory, pattern recognition, such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, and bioinformatics.

For statistics and control theory, Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, who was one of the primary developers of its theory.

This digital filter is sometimes termed the *Stratonovich–Kalman–Bucy filter* because it is a special case of a more general, nonlinear filter developed somewhat earlier by the Soviet mathematician Ruslan Stratonovich. ^{}^{}^{}^{}In fact, some of the special case linear filter’s equations appeared in papers by Stratonovich that were published before summer 1960, when Kalman met with Stratonovich during a conference in Moscow.^{}

Kalman filtering has numerous technological applications. A common application is for guidance, navigation, and control of vehicles, particularly aircraft, spacecraft and ships positioned dynamically. ^{}Furthermore, Kalman filtering is a concept much applied in time series analysis used for topics such as signal processing and econometrics. Kalman filtering is also one of the main topics of robotic motion planning and control and can be used for trajectory optimization. ^{}Kalman filtering also works for modeling the central nervous system’s control of movement. Due to the time delay between issuing motor commands and receiving sensory feedback, the use of Kalman filters provides a realistic model for making estimates of the current state of a motor system and issuing updated commands.^{}

The algorithm works by a two-phase process. For the prediction phase, the Kalman filter produces estimates of the current state variables, along with their uncertainties. Once the outcome of the next measurement (necessarily corrupted with some error, including random noise) is observed, these estimates are updated using a weighted average, with more weight being given to estimates with greater certainty. The algorithm is recursive. It can operate in real time, using only the present input measurements and the state calculated previously and its uncertainty matrix; no additional past information is required.

Optimality of Kalman filtering assumes that errors have a normal (Gaussian) distribution. In the words of Rudolf E. Kálmán: “In summary, the following assumptions are made about random processes: Physical random phenomena may be thought of as due to primary random sources exciting dynamic systems. The primary sources are assumed to be independent gaussian random processes with zero mean; the dynamic systems will be linear.” ^{}Though regardless of Gaussianity, if the process and measurement covariances are known, the Kalman filter is the best possible *linear* estimator in the minimum mean-square-error sense.^{}

Extensions and generalisations of the method have also been developed, such as the extended Kalman filter and the unscented Kalman filter which work on nonlinear systems. The basis is a hidden Markov model such that the state space of the latent variables is continuous and all latent and observed variables have Gaussian distributions. Also, Kalman filtering has been used successfully in multi-sensor fusion, ^{}and distributed sensor networks to develop distributed or consensus Kalman filtering.

Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the internal states in dynamical systems when partial observations are made, and random perturbations are present in the sensors as well as in the dynamical system. The objective is to compute the posterior distributions of the states of some Markov process, given some noisy and partial observations. The term “particle filters” was first coined in 1996 by Del Moral in reference to mean-field interacting particle methods used in fluid mechanics since the beginning of the 1960s. The term “Sequential Monte Carlo” was coined by Liu and Chen in 1998.^{}

Particle filtering uses a set of particles (also called samples) to represent the posterior distribution of some stochastic process given noisy and/or partial observations. The state-space model can be nonlinear and the initial state and noise distributions can take any form required. Particle filter techniques provide a well-established methodology ^{}^{}^{}for generating samples from the required distribution without requiring assumptions about the state-space model or the state distributions. However, these methods do not perform well when applied to very high-dimensional systems.

Particle filters update their prediction in an approximate (statistical) manner. The samples from the distribution are represented by a set of particles; each particle has a likelihood weight assigned to it that represents the probability of that particle being sampled from the probability density function. Weight disparity leading to weight collapse is a common issue encountered in these filtering algorithms; however it can be mitigated by including a resampling step before the weights become too uneven. Several adaptive resampling criteria can be used, including the variance of the weights and the relative entropy with respect to the uniform distribution. ^{}In the resampling step, the particles with negligible weights are replaced by new particles in the proximity of the particles with higher weights.

From the statistical and probabilistic methods point of view, particle filters can be interpreted as mean-field particle interpretations of Feynman-Kac probability measures. ^{}^{}^{}^{}^{}These particle integration techniques were developed in molecular chemistry and computational physics by Theodore E. Harris and Herman Kahn in 1951, Marshall N. Rosenbluth and Arianna W. Rosenbluth in 1955 ^{}and more recently by Jack H. Hetherington in 1984. ^{}In computational physics, these Feynman-Kac type path particle integration methods are also used in Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods. ^{}^{}^{}Feynman-Kac interacting particle methods are also strongly related to mutation-selection genetic algorithms currently used in evolutionary computing to solve complex optimization problems.

The particle filter methodology is used to solve Hidden Markov Model (HMM) and nonlinear filtering problems. With the notable exception of linear-Gaussian signal-observation models (Kalman filter) or wider classes of models (Benes filter ) Mireille Chaleyat-Maurel and Dominique Michel proved in 1984 that the sequence of posterior distributions of the random states of the signal given the observations (a.k.a. optimal filter) have no finitely recursive recursion. ^{}Various other numerical probabilistic methods based on fixed grid approximations, Markov Chain Monte Carlo techniques, conventional linearisation, extended Kalman filters, or determining the best linear system (in the expected cost-error sense) are unable to cope with large scale systems, unstable processes, or when the nonlinearities are not sufficiently smooth.

Particle filters and Feynman-Kac particle methodologies find application in signal and image processing, Bayesian inference, machine learning, risk analysis and rare event sampling, engineering and robotics, artificial intelligence, bioinformatics,^{} phylogenetics, computational science, Economics and mathematical finance, molecular chemistry, computational physics, pharmacokinetics and other fields.

Decision theory (or the theory of choice; not to be confused with choice theory) is the study of an agent’s choices. Decision theory can be broken into two branches: normative decision theory, which analyses the outcomes of decisions or determines the optimal decisions given constraints and assumptions, and descriptive decision theory, which analyses *how* agents actually make the decisions they do.

Decision theory is closely related to the field of game theory ^{}and is an interdisciplinary topic, studied by economists, mathematicians, data scientists, psychologists, biologists,^{} political and other social scientists, philosophers ^{}and computer scientists.

Empirical applications of this theory are usually done with the help of statistical and econometric methods.

As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosophers such as Jeremy Bentham and John Stuart Mill. The term has been adapted and reapplied within neoclassical economics, which dominates modern economic theory, as a utility function that represents a single consumer’s preference ordering over a choice set but is not comparable across consumers. This concept of utility is personal and based on choice rather than on pleasure received, and so is specified more rigorously than the original concept but makes it less useful (and controversial) for ethical decision.

This website uses cookies. By continuing to use this site, you accept our use of cookies.