NUMERICAL METHODS FOR INVERSE RADIATION PROBLEMS
Following from: The concept of regularization
Leading to: Applications of inverse methods
Successful solution of an inverse radiation problem, be it parameter estimation or design optimization, depends on judicious selection of the proper analysis tool. This, in turn, depends on the underlying mathematical character of the inverse problem. Inverse problems are categorized as either linear or nonlinear.
1. LINEAR INVERSE PROBLEMS
Linear inverse problems involve first-kind integral equations; the most important example in radiative transfer is the inverse boundary condition design problem, in which the goal is to estimate the heat flux and temperature distribution over a heater surface that satisfies prescribed (and most often uniform) heat flux and temperature distributions over another surface called the design surface. As mentioned above, in traditional forward analysis, only one boundary condition, either temperature T(u) or heat flux q(u), is specified at each location on the enclosure. The parameter u = [u,v]^{T} denotes a particular location on the enclosure surface. The objective is to find the unknown radiosity distribution, q_{o}(u), over the enclosure surfaces, which is done by solving a Fredholm integral equation of the second kind (Daun and Hollands, 2001),
(1) |
where u R_{uv}, the kernel k(u, u') depends on the geometry between u and u', and
(2) |
if T(u) is known, or
(3) |
if q(u) is known. Equation (1) is mathematically well posed, meaning that there is only one unique distribution, q_{o}(u), that satisfies Eq. (1) and, moreover, it is guaranteed to exist. Equation (1) is rarely analytically tractable, so it is usually solved numerically by assuming uniform radiosity over small but finite area elements and then approximating the integral as a summation. Writing this equation for each finite area results in a matrix equation, Ax = b, where x and b contain unknown radiosity values and corresponding boundary conditions, respectively. The A matrix is well conditioned, so x is readily found using traditional linear algebra methods (e.g., Gauss-Seidel, LU decomposition).
As noted above, however, the desired heat flux and temperature distributions are usually known over the design surface, while the corresponding distributions over a heater surface are unknown. It can be shown that specifying both T(u) and q(u) over a surface means that the radiosity, q_{o}(u), is also known over that surface. This known radiosity distribution is related to the radiosity distribution over other surfaces by a Fredholm integral equation of the first kind (Howell et al., 2000; Wing, 1991),
(4) |
The left-hand side is known, in contrast to Eq. (1), making this equation mathematically ill posed. This can be understood by writing Eq. (4) for only one point u on the design surface so the left-hand side is a constant, and it can be shown that an infinite set of distributions {q_{o}(u')} can be substituted into the integral to satisfy this constant. Writing Eq. (4) over the entire design surface instead of a single point somewhat reduces this ambiguity, but there still exists multiple radiosity distributions that “almost” satisfy f(u), violating Hadamard’s second criteria of well-posed problems.
Following the same discretization procedure described above also yields a matrix equation, Ax = b, but in this case A is ill conditioned, meaning that in addition to the “true” solution x^{*}, there exists a set of solutions {x} that almost satisfies b, even though these solutions may be very different. This is best visualized by comparing plots of the residual norms of a well-conditioned matrix equation,
(5) |
and an ill-conditioned matrix equation,
(6) |
which are 2D analogues to the forward and inverse boundary condition estimation problems. The well-conditioned matrix equation has a well-defined solution that minimizes the residual, x^{*}, as shown in Fig. 1a. The contour plot in Fig. 1b also has a single, unique minimum (guaranteed for nonsingular linear problems), but there is also a locus of solutions along the valley floor that makes the residual norm very small. In the radiant enclosure design problem, these solutions correspond to radiosity distributions over the heaters that produce the desired conditions on the design surface. Moreover, the shallow residual topography in the vicinity of x^{*} makes the solution of an ill-conditioned matrix equation highly sensitive to small perturbations and numerical artifacts, such as round off error, so inverting an ill-conditioned matrix equation using the above-mentioned traditional linear algebra techniques results in an oscillatory solution of large amplitude that would be impossible to implement in a practical setting.
Figure 1. Plot of the residual norm for (a) a well-conditioned and (b) an ill-conditioned matrix equation.
To resolve this ambiguity, it is necessary to employ regularization methods that transform the mathematically ill-posed inverse problem into a well-posed problem by adding auxiliary information based on desired or presumed solution characteristics. One of the earliest examples is Tikhonov regularization (Phillips, 1962; Tikhonov, 1963), which augments the ill-conditioned matrix equation with a second homogeneous matrix equation, Lx = 0, and the regularized solution is then found by minimizing the residual norm,
(7) |
where L is the smoothing matrix and λ is a heuristically adjusted scalar, called the regularization parameter, that controls the extent of regularization. The form of L depends on the presumed or desired solution characteristics, i.e., setting L ≡ I promotes a solution having a small norm, for example, while a smooth solution can be obtained from,
(8) |
When carrying out Tikhonov regularization, it is important to choose λ so that sufficient regularization is used to obtain a useful solution, but using too much overwhelms the ill-conditioned matrix, resulting in an inaccurate solution. Several parametric selection strategies are described in Hansen (1998). In a topographical sense, increasing λ reduces the problem ambiguity by steepening the residual norm in the vicinity of its minimum, x_{λ}^{*}, but this also shifts x_{λ}^{*} away from x^{*}.
A second type of regularization used to solve the inverse boundary condition design problem is truncated singular value decomposition (TSVD) (Hansen, 1998), which is based on the singular value decomposition of A,
(9) |
where U and V are orthogonal matrices, u_{j} and v_{j}^{T} are the jth column vectors of these matrices, Σ is a diagonal matrix containing the singular values arranged in descending order, and σ_{j} = Σ_{jj} is the jth singular value. Due to the orthogonality of U and V, Ax = b can be solved through back-substitution,
(10) |
If A is ill conditioned, some of the singular values are near zero, as shown in Fig. 2, and dividing by these values produces the large-amplitude oscillations in x. A regularized solution, x_{p}, is obtained by ignoring (or truncating) the summation terms corresponding to the p smallest singular values in Eq. (10), and the solution becomes smoother as more singular values are truncated. As is the case in Tikhonov regularization, however, the residual norm ||Ax_{p} - b|| also increases as p increases since the truncated singular values correspond to summation terms of increasing significance in Eq.(9), so p must be chosen with care.
Figure 2. The singular values of an ill-conditioned matrix range over several orders of magnitude, and some are very near zero.
A third technique used to solve the inverse boundary condition problem is the conjugate gradient method (Hansen, 1998; Nash and Sofer, 1996), which is based on the observation that solving Ax = b is equivalent to minimizing the quadratic function F(x) = 1/2x^{T}Ax - b^{T}x, since at x^{*}, _{x}F(x) = Ax - b = 0. The conjugate gradient method is actually a nonlinear programming (NLP) technique, which will be discussed in more detail shortly. All NLP methods work by minimizing F(x) iteratively through a sequence of steps. At the kth step,
(11) |
where α^{k} is the step size and p^{k} is the search direction. The methods differ on how the step sizes and search directions are chosen. The conjugate gradient method sequentially generates mutually conjugate search directions that satisfy p^{k - 1}^{T}Ap^{k} = 0, and the corresponding step size is then found by minimizing F(x^{k} + α^{k}p_{k}) analytically with respect to α^{k}. Mutual-conjugate search directions are “noninterfering” in the sense that minimization progress made by one step is not undone by future steps, and x_{}^{*} is consequently found in exactly n iterations if A is symmetric and positive definite, and exact arithmetic is used. (For matrixes that are not symmetric and positive definite, the normal equations are solved instead of Ax = b.) Consequently, the solution can be written as
(12) |
While it is a highly efficient linear solver in its own right, the conjugate gradient method is also well suited to linear regularization because (Hansen, 1998): (1) starting from x^{0} = 0, it can be shown that ||x^{k}|| increases monotonically with iteration number, k, while ||Ax^{k} - b|| decreases monotonically, and; (2) the search directions associated with large eigenvalues of A (or equivalently large singular values for A^{T}A) are generated first, while steps that produce high-amplitude oscillations in the solution occur later. A regularized solution, x^{k}, can thus be obtained by prematurely truncating iterations at the kth step, before F(x) and ||Ax - b|| are fully minimized.
2. NONLINEAR PROBLEMS
While the inverse boundary condition design problem for radiation alone is linear, most other types of inverse problems involving radiant transfer are nonlinear. Examples include multimode heat transfer, enclosures containing participating media with temperature-dependent properties, and geometric optimization problems; all of which are discussed in more detail later.
As mentioned above, a nonlinear inverse problem can sometimes be written as Ax^{k+1} = b(x^{k}), in which case linear regularization can be used to solve the ill-conditioned matrix equation at each iteration. More often, however, this class of problem is cast as a minimization problem of the general form
(13) |
where c(x) is a vector of inequality constraints. The objective function depends on the type of inverse problem being solved; in parameter estimation problems, F(x) is usually a least squares function,
(14) |
where b contains measured data, b^{model}(x) is corresponding modeled data, and x is the unknown state variable. Under certain conditions, Eq. (14) is minimized by the maximum likelihood estimator of x, the set of parameters most likely responsible for the observed data. In design optimization, the objective function quantifies the “goodness” of a particular design and x contains design parameters, which specify the design configuration, for example, the heater settings or enclosure geometry. If the goal is to obtain a desired temperature and heat flux distribution over an enclosure surface, one of these parameters (say, temperature) is specified over the surface, and the other (say, heat flux) is calculated for a given set of design parameters. The enclosure design that satisfies both imposed boundary conditions is found by minimizing
(15) |
while inequality constraints can be used to enforce a nonnegative heat flux over a heater surface, or make sure the enclosure geometry remains unobstructed and/or confined within some maximum dimension in the case of shape optimization.
Thus, the challenge for parameter estimation and design optimization problems is to find a vector x^{*} that minimizes an objective function, F(x), sometimes subject to constraints. Methods for solving multivariate minimization problems are either nonlinear programming or metaheuristic.
2.1 Nonlinear Programming
As described above, NLP methods find x^{*} iteratively; at the kth iteration x^{k} is updated by taking a “step” using Eq. (11), and iterations continue until ||F(x^{k})|| ≈ ||F(x^{*})|| = 0 as shown in Fig. 3. These methods differ on how the search direction is chosen, but this calculation is always based on the local objective function curvature at x^{k}. Most model F(x^{k}) as being locally quadratic,
Figure 3. NLP algorithms update x^{k} by calculating a search direction and a step size at each iteration.
(16) |
where ^{2}F(x) is the Hessian matrix containing the second-order derivatives of F(x) with respect to the elements of x. If F(x) were truly quadratic, ^{2}F would be constant, and x^{*} could be found by setting F(x^{*}) = 0, resulting in (Nash and Sofer, 1996; Gill et al., 1986)
(17) |
Solving this equation results in Newton’s direction, p^{k} = x^{*} - x^{k}, and this NLP algorithm is called Newton’s method. Since F(x) is not truly quadratic, multiple steps are required to reach x^{*}, but the quadratic approximation improves as x^{k} approaches x^{*}, and the method is second-order convergent.
While Newton’s method normally requires the fewest steps of all NLP algorithms to reach x^{*}, it is not necessarily the most efficient, particularly if ^{2}F(x) is expensive to calculate. An alternative is the quasi-Newton method [also called the variable metric method (VMM)], in which the search direction is found by (Nash and Sofer, 1996; Gill et al., 1986)
(18) |
where B^{k} is a Hessian approximation formed using the gradients from the current and previous steps. Usually, B^{0} ≡ I and p^{0} is the steepest descent direction, -F(x^{0}); the Hessian approximation improves with subsequent steps, and B^{k} ≈ ^{2}F(x^{k}) at large k. The quasi-Newton’s method requires more steps than Newton’s method, but is often computationally faster since it avoids calculating the Hessian.
The conjugate gradient approach described above is a different class of NLP method that generates each new search direction so that it is mutually conjugate to the previous search direction with respect to the current Hessian. As noted above, for quadratic objective functions, x^{*} can be reached in exactly n steps, and otherwise more steps are required.
Although finding F(x) requires solving the well-posed forward problem for b^{model}(x) or q_{DS}^{model}(x), the nonlinear approach does not circumvent the underlying ill posedness of the inverse problem; instead, the ill posedness manifests in Eq. (17). This situation is shown graphically in Fig 4a, which shows the topography of F(x) arising from an ill-conditioned linear problem solved using least squares minimization. As discussed above, solutions that lie in the long shallow valley surrounding x^{*} “almost” minimize F(x), violating Hadamard’s second criterion for well-posed problems; choosing a search direction along this line using Newton’s method (or any other NLP technique) is problematic because the ambiguous curvature of F(x) makes ^{2}F(x^{k}) ill conditioned. In a topographical sense, the two eigenvalues of ^{2}F(x^{*}), λ_{max} and λ_{min}, are the rates at which F(x) increases as x moves away from x^{*} in the principle curvature directions, which are the corresponding eigenvectors, μ_{max} and μ_{min}. Due to the long shallow valley, λ_{max} >> λ_{min}, and eigenvalues of varying orders of magnitude have the same meaning as singular values that range over orders of magnitude that characterize ill-conditioned matrices.
Figure 4. Nonlinear inverse problems may have objective function topographies corresponding to violations of (a) Hadamard’s second and (b) third criteria of well-posed problems.
The solution is to apply linear regularization methods to solve Eq. (17). One common approach (Nash and Sofer, 1996; Gill et al., 1986) is to compute the Cholesky factorization of the Hessian, ^{2}F(x^{*}) = LDL^{T}, and then reset the smallest diagonal elements in D equal to some minimum value before constructing p^{k} through back-substitution, in a way similar to TSVD. A second technique specialized for least squares problems is the Levenburg-Marquardt method (Marquardt, 1963), in which p^{k} is found by solving
(19) |
where (x^{k}) is the Gauss-Newton approximation of the Hessian. This is equivalent to Tikhonov regularization with L ≡ I. Guidance for choosing λ^{k} is provided in (More and Sorensen, 1983).
2.2 Metaheuristics
Nonlinear inverse problems may also produce multimodal objective functions having multiple local minima, as in Fig. 4b. Each minimum corresponds to a possible solution to the inverse problem, violating Hadamard’s third criterion of solution uniqueness. A major drawback of NLP algorithms is that, because they model F(x) as quadratic, they can only find one of these minima, usually the one closest to x^{0}.
Metaheuristic methods avoid this problem by strategically sampling the solution space of a multimodal optimization problem in search of good solutions. While NLP algorithms always choose p^{k} to be a downhill direction, metaheuristic methods include stochastic processes that periodically move the solution uphill, allowing them to escape shallow local minima. Three of the most common metaheuristic methods are simulated annealing (SA), genetic algorithms (GAs), and tabu search (TS).
Simulated annealing (Kirkpatrick and Sorensen, 1983) is named after the microstructure crystallization physics in metal annealing that inspired its development. Solutions are generated randomly, and the resulting objective function is evaluated and compared to the current solution. To diversify the search, SA periodically accepts uphill solutions using a probabilistic formula,
(20) |
where P is the probability that a worse solution will be accepted, ΔE is the difference in the objective function of the current and new solution, and θ is a heuristic set by the designer. At large values of θ, the algorithm escapes from nearly all local minima in the solutions space, allowing the algorithm to traverse more of the solutions space. After a specified number of solutions are explored, θ is reduced, forcing the algorithm to remain in the vicinity of the local minima for more iterations. At the lowest levels of θ, which occur late in the search, SA approaches a deterministic steepest descent algorithm.
Genetic algorithms (GAs) (Goldberg, 1989) are modeled on evolutionary processes in the natural world; although there are many variations, most algorithms include the following steps:
Generate an initial population of candidate solutions, either randomly or using an initial solution heuristic. The elements of x are converted from decimal notation to binary representation.
Select a subset of this population to undergo “mating” based on the objective function of each population member.
Randomly perform crossover, a sharing of design characteristics, between mating solutions.
Perform a low-probability mutation operation by randomly sampling a few bits of the members and reversing their binary value.
Calculate an objective function for each member of the new population.
The selective mating process combines the best attributes of a large population into a few good solutions, while the random paring and mutation operations prevent the algorithm from getting “stuck” in a shallow local minimum.
The tabu search differs from SA and GA in that it keeps track of previously visited local minima, and repeating these steps are thus forbidden or “tabu.” The tabu search has several key components, namely, a solution representation, a neighborhood definition, a candidate list structure, and a memory structure (Glover, 1986, 1989). The neighborhood is the subsection of the solution space to be explored at each iteration of the algorithm, and typically consists of the kinds of changes made to the current solution (e.g., a subsection of heaters to be toggled on and off). After a neighborhood is defined, a candidate list containing a defined number of prospective moves within the neighborhood is generated. Each of these moves is then evaluated by calculating the objective function value, and the best nontabu move is used to update the solution. The best move in the candidate list may or may not be better than the incumbent solution, so by always accepting the best move from the candidate list, worse solutions are inevitably accepted, and local minima are escaped. Inherent in the use of the candidate list is the definition of a tabu move. After selecting the best move from the candidate list, an essential attribute of this move is designated as tabu (e.g., a particular heater is not adjusted in subsequent designs). Once this tabu attribute is defined, a tabu tenure must be determined that dictates how many iterations the tabu attribute will remain active.
REFERENCES
Daun, K. J. and Hollands, K. G. T., Infinitesimal-area radiative analysis using parametric surfaces, through NURBS, J. Heat Transfer, vol. 123, pp. 249-256, 2001.
Gill, P. E., Murray, W., and Wright, M. H., Practical Optimization, Academic Press, New York, 1986.
Glover, F., Future paths for integer programming and links to artificial intelligence, Comput. Operations Res., vol. 5, pp. 533-549, 1986.
Glover, F., Tabu search--Part I, ORSA J. Comput., vol. 1, pp. 191-206, 1989.
Goldberg, D.E., Genetic Algorithms: in Search, Optimization, and Machine Learning, Addison-Wesley, Boston, 1989.
Hansen, P. C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, SIAM, Philadelphia, 1998.
Howell, J. R., Ezekoye, O. A., and Morales, J. C., Inverse design model for radiative heat transfer, J. Heat Transfer, vol. 122, pp. 492-502, 2000.
Kirkpatrick, S., Gelatt, C. C., Jr., and Vecchi, M. P., Optimization by simulated annealing, Science, vol. 220, pp. 671-680, 1983.
Marquardt, D. W., An algorithm for least-squares estimation of nonlinear parameters, J. Society Industrial App. Math., vol. 11, pp. 431-441, 1963.
More., J. J. and Sorensen, D. C., Computing a trust region step, SIAM J. Sci. Stat. Comput., vol. 4, pp. 553-572, 1983.
Nash, S. G. and Sofer, A., Linear and Nonlinear Programming, McGraw Hill, New York, pp. 383-391, 1996.
Phillips, D. L., A treatment for the numerical solution of certain integral equations of the first kind, J. Assoc. Comput. Mach., vol. 9, pp. 84-97, 1962.
Tikhonov, A. N., Solution of incorrectly formulated problems and the regularization method, Sov. Math. Dokl., vol. 4, pp. 1035-1038; Engl. trans., Dokl. Akad. Nauk. SSSR, vol. 151, pp. 501-504, 1963.
Wing, G. W., A Primer on Integral Equations of the First Kind, SIAM, Philadelphia, 1991.
References
- Daun, K. J. and Hollands, K. G. T., Infinitesimal-area radiative analysis using parametric surfaces, through NURBS, J. Heat Transfer, vol. 123, pp. 249-256, 2001.
- Gill, P. E., Murray, W., and Wright, M. H., Practical Optimization, Academic Press, New York, 1986.
- Glover, F., Future paths for integer programming and links to artificial intelligence, Comput. Operations Res., vol. 5, pp. 533-549, 1986.
- Glover, F., Tabu search--Part I, ORSA J. Comput., vol. 1, pp. 191-206, 1989.
- Goldberg, D.E., Genetic Algorithms: in Search, Optimization, and Machine Learning, Addison-Wesley, Boston, 1989.
- Hansen, P. C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, SIAM, Philadelphia, 1998.
- Howell, J. R., Ezekoye, O. A., and Morales, J. C., Inverse design model for radiative heat transfer, J. Heat Transfer, vol. 122, pp. 492-502, 2000.
- Kirkpatrick, S., Gelatt, C. C., Jr., and Vecchi, M. P., Optimization by simulated annealing, Science, vol. 220, pp. 671-680, 1983.
- Marquardt, D. W., An algorithm for least-squares estimation of nonlinear parameters, J. Society Industrial App. Math., vol. 11, pp. 431-441, 1963.
- More., J. J. and Sorensen, D. C., Computing a trust region step, SIAM J. Sci. Stat. Comput., vol. 4, pp. 553-572, 1983.
- Nash, S. G. and Sofer, A., Linear and Nonlinear Programming, McGraw Hill, New York, pp. 383-391, 1996.
- Phillips, D. L., A treatment for the numerical solution of certain integral equations of the first kind, J. Assoc. Comput. Mach., vol. 9, pp. 84-97, 1962.
- Tikhonov, A. N., Solution of incorrectly formulated problems and the regularization method, Sov. Math. Dokl., vol. 4, pp. 1035-1038; Engl. trans., Dokl. Akad. Nauk. SSSR, vol. 151, pp. 501-504, 1963.
- Wing, G. W., A Primer on Integral Equations of the First Kind, SIAM, Philadelphia, 1991.
Heat & Mass Transfer, and Fluids Engineering