# Inference-guiding on Bayesian knowledge-based systems.

INTRODUCTIONA knowledge-based system is a computer system that automates intelligent processes. Inference on a knowledge-based system is a field of artificial intelligence (AI). AI has found numerous applications in business management. An AI approach can assist international firms in screening markets (Fish, 2006), can help plan and control the performance of a just-in-time manufacturing system (Wray, Markham, & Mathieu, 2003), and can serve for monitoring and detecting financial frauds and abuses (Hall & McPeak, 2003). Knowledge-based systems are necessary components of decision support systems (DSS) that are now widely used in business to enhance managers' decision making process (McManus & Snyder, 2003; Hung, Tang, & Shu, 2008).

A knowledge-based system is composed of a knowledge base, an inference engine, and an environment interface. A knowledge base organizes and stores knowledge. An inference engine, which is composed of software for inference and reasoning, generates logical implications of given data based on the knowledge in the knowledge base. An environment interface consists of software and hardware, such as sensors, monitors, keyboards, speakers, and control mechanisms, for interacting with the environment. Knowledge-based systems have been successfully used in numerous applications such as locating fuel deposits, designing complex computer systems, analyzing electronic circuits, diagnosing diseases, assisting driving vehicles, and doing the jobs that are dangerous to humans.

An inference engine has two functions for practical applications: logical inference and inference-guiding. Logical inference, or simply inference or reasoning, generates facts that are logically implied by the knowledge and given facts. Inference is aimed at the top-level-conclusion (TLC) that is a conclusion in the desired domain. When data are not sufficient, logical inference cannot reach a TLC, and inference-guiding (IG), the second function of an inference engine, is needed to identify missing data and lead inference to a top-level-conclusion. In a disease diagnosing system, for example, TLCs are possible ailments patients may have, and logical inference is the process of deducting the disease a patient has based on the patient's symptoms and doctor's knowledge. A patent's superficial symptoms and initial information are usually not sufficient for the doctor to determine the illness, and he/she has to ask the patient to provide more information and arrange some medical examinations. In a crime investigation example, TLCs are the suspects who may have committed a particular crime. Based on the information and evidences initially available, a detector deducts the logical implications. If the detector is not able to reach any conclusion due to the missed links, he/she has to identify the missing information and go ahead to find evidences so as to solve the criminal puzzle.

A good IG approach must be able to identify relevant missing information to pursue. Gynecological questions, for example, are not relevant for a male patient. A good IG approach must also be able to select key missing information so that confirmation of those pieces of missing information would quickly lead to a TLC. In other words, a good IG approach should lead to a TLC after pursuing minimum amount of missing information. A bad IG approach, on the other hand, would delay the inference process by asking irrelevant, silly, and off-the-point questions.

Inference and inference-guiding are conducted alternately in the process of reaching a TLC as shown in Figure 1. The process starts with the function of inference based on the currently known facts. If they are not sufficient to logically reach a TLC in the target domain, the inference-guiding function is kicked on to select a missing fact to confirm. The system's environment interface then pursues the value of the selected missing fact through a proper information source (a sensor, or a user, for example). The newly confirmed fact is added to the known data, and the process is repeated again. The process stops when a TLC is reached.

[FIGURE 1 OMITTED]

Knowledge can be imprecise and incomplete. Data can be inexact and fragmentary. Knowledge-based systems in many cases must draw conclusions and act under uncertainty that comes from un-sureness of belief and un-sureness of truth. Un-sureness of belief is often associated with a piece of knowledge. For instance, we do not believe with 100% confidence that "if someone breaks in, then the alarm will sound" because the alarm would not work if it is malfunctioned or the power is out. Un-sureness of belief can be associated with data. For example, we may reasonably doubt at the collected data that "57 people saw a UFO at Hutchinson, Kansas, at 1:35am 11/27/2001" since it could be simply an airplane instead of a UFO. Un-sureness of truth occurs where the definition of a truth value is ambiguous. For example, truth of "the patient has a headache" or "it is cloudy today" may not be definite since "headache" and "cloudy" are not defined clearly.

A couple of models have been developed to represent uncertainties in knowledge and data. Non-probabilistic approaches that were proposed in 1970s and 1980s include MYCIN's certainty factor method (Buchanan & Shortliffe, 1984; Shortliffe & Buchanan, 1975), the confidence factor union method (Hayes-Roth, Waterman, & Lenat, 1983), and the fuzzy set method (Zadeh, 1965; Kickert, 1978). They have been used in many expert system shells (Magill & Leech, 1991). Probability theory was considered in 1960s for dealing with uncertainty (Duda et al., 1976). Expert Edge (Human Edge, 1985), an expert system shell, for example, applies the Bayes' Theorem. The probabilistic models fell out of favor in the early 1970s, until in the recent decade when researchers realized the shortcomings of the non-probabilistic approaches and showed that probability systems had theoretical strength in many applications where the knowledge base is large and there exist complex interactions between pieces of knowledge (Russell & Norvig, 2003).

Probability has an intrinsic link with the uncertainty. Un-sureness of belief is subjective probability by its nature, whose calculations and propagations follow the probability theory. Assigning a probability of 0 to a given assertion corresponds to an unequivocal belief that the assertion is false, while assigning a probability of 1 corresponds to an unequivocal belief that the assertion is true. Probabilities between 0 and 1 correspond to intermediate degrees of belief in the truth of the assertion. Although un-sureness of truth is not probability by its nature, it can be viewed as probability and be taken into the uncertainty calculations and propagations by following the probability theory.

Bayesian network is a robust structure for representing knowledge with uncertainties, which has captured attentions of researchers (Ben-Gal, 2007; Pearl, 1988; Langseth & Portinale, 2007). Its structure makes it convenient to depict a complex knowledge base (Chavira & Darwiche, 2007). There are two exclusive advantages of Bayesian network. One is that Bayesian network is inherently capable of allowing uncertainties (Kanal & Lemmer, 1986). Another is that logical inference in Bayesian network becomes calculations of probabilities, which are supported by the probability theory (Wang, 2005b).

Uncertain inference has been a active research field in artificial intelligence for almost four decades. The approaches of uncertain inference available in literatures can be grouped into three categories. One category is for the rule-base, which was built on the success of logical rule-based systems by adding a sort of "fudge factor" to each rule to accommodate uncertainty. Certainty factor is an instance. The methods of this category were developed in the mid-1970s and formed the basis for a large number of expert systems in medicine and other areas. The second category of uncertainty reasoning methods is based on fuzzy logic that was developed in 1980s and specially good for calculating un-sureness of truth (Zadeh, 1965). The third category is based on the probability theory. By such an approach, the process of reasoning becomes a process of calculating probabilities. Researchers have developed a couple of algorithms to have more efficient inference in Bayesian networks more efficient, such as variational methods (Bishop, Spiegelhalter, & Winn, 2003), variable elimination method and likelihood weighting methods (Liu & Soetjipto, 2004), and those surveyed by Russell and Norvig in their book in their book (Russell & Norvig, 2003).

Compared to logical inference, inference-guiding has received less attention from researchers, though there have been some published literature on the topic. For exact inference that does not consider uncertainties, EXPERT uses pre-listed orderings of rules and questions for missing data selection (Hayes-Roth et al., 1983). KAS, a shell over PROSPECT, uses both forward and backward chaining, together with a scoring function, for selecting missing data (Duda, Hart, & Nilsson, 1979). Mellish gave a procedure, using a so-called "Alpha-beta pruning technique", to eliminate irrelevant questions for acyclic inference nets (Mellish, 1985). Wang and Vande Vate (1990) proved that the inference-guiding problem is NP-hard even in a Horn clause system. Based on a dynamic representation of Horn systems (Jerolow & Wang, 1989), they proposed an efficient heuristic strategy, called minimum usage set strategy (MUS). The experiments carried out by Wang and Triantaphyllou (1994) showed that MUS strategy performed well. The inference guiding strategy developed in (Wang, 2005a) was able to select the key pieces of missing information in such a way that the total cost of acquiring additional information for reaching a conclusion is minimized. For uncertain inference, the method presented in (Wang, 1994) is a heuristic for inference-guiding in a rule-base with certainty factors. The research on inference-guiding in the probabilistic Bayesian network started in (Wang, 2005b).

This paper presents the criterion for selecting key missing information for inference-guiding in a Bayesian network, and thereby develops an inference approach that aims at reaching a TLC after pursuing fewest data. Section 2 introduces the fundamental concepts of uncertain inference and the Bayesian network. Section 3 explores the criterion for effectively guiding inference in Bayesian network. An IG index is developed and characterized as an effective criterion. Section 4 incorporates the IG index into an inference approach that would lead inference quickly to a TLC. An example is given throughout the paper to illustrate Bayesian network, the IG index, and the new inference approach.

FUNDAMENTALS

An assertion is a statement can be either 'true' or 'false'. An assertion is observable if its value can be obtained directly from an information source in the environment (a user or a sensor, for example). An assertion is called unconfirmed observable assertion (UOA) if it is observable and its value is not yet known. UOAs represent the missing data. Selecting a missing data is to select a UOA. If assertion [A.sub.k]'s value is logically implied by assertions [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

..., are called the premises of [A.sub.k], and [A.sub.k] is called the inferred assertion. As defined in Section 1, an assertion is a top-level-conclusion (TLC) if it is in the domain that inference is aimed at. For example, possible diseases are TLCs in a disease diagnosing system; possible faults in an engine are TLCs in a diagnosing system for aircraft maintenance; and required operating adjustments are TLCs of a real-time piloting control system.

If the currently known facts are not sufficient to reach a TLC, inference-guiding is needed to identify some UOAs, whose values are then pursued. The newly obtained data are added to the set of facts for purpose of further inference. If there are n UOAs, then there are n options in selecting the next UOA to pursue. Total number of UOAs that are selected to pursue before a TLC is reached can be large or small, depending on what UOAs are selected for pursuing and their sequence. Of cause, we want to reach a TLC after asking for as few UOAs as possible.

Uncertainties may occur in knowledge and facts. Inference on knowledge and facts with uncertainty is called uncertain inference or uncertain reasoning. A TLC is reached when its certainty to be true is high enough. For example, the doctor would not inform a patient about the disease he might have unless, in the doctor's mind, the probability that the patient has that disease is significantly high. Usually, people use a threshold value to represent the 'high enough' probability. Thus, a TLC is reached if its probability to be true is at or above the preset threshold.

The Bayesian network is composed of nodes and arcs. A node represents an assertion. There is a directed arc from node [A.sub.j] to node [A.sub.k] if [A.sub.j] is a premise of [A.sub.k]. In other words, there is an arc from [A.sub.j] to [A.sub.k] if [A.sub.j] has a direct influence on [A.sub.k]. Each node is labeled with the probabilities of possible values. If assertion [A.sub.k] is observable, then the prior probability of '[A.sub.k] to be true', denoted as P([A.sub.k]=T), is given as the label. If [A.sub.k] is an inferred assertion with premises [A.sub.i] and [A.sub.j], for example, then node [A.sub.k] is labeled with full conditional probabilities, {P([A.sub.k]=T | [A.sub.i]=T, [A.sub.j]=T), P([A.sub.k]=T | [A.sub.i]=F, [A.sub.j]=T), P([A.sub.k]=T | [A.sub.i]=T, [A.sub.j]=F), and P([A.sub.k]=T | [A.sub.i]=F, [A.sub.j]=F)}. The complementary conditional probabilities can be derived from the above probabilities, such as P([A.sub.k]=F | [A.sub.i]=T, [A.sub.j]=T) = 1- P([A.sub.k]=T | [A.sub.i]=T, [A.sub.j]=T). The uncertainties are thus embedded in these probabilities. Given the full conditional probabilities, we can calculate any joint probability by applying the probability theory. The process of logical inference in a Bayesian network is therefore a process of probability calculations.

Example 1. A Bayesian network and calculations for inference.

Figure 2 shows a Bayesian network with five assertions. Suppose that [A.sub.1], [A.sub.2] and [A.sub.3] are UOAs, and [A.sub.4] and [A.sub.5] are inferred assertions and TLCs. The arcs show that [A.sub.1] and [A.sub.2] have direct influences on [A.sub.4], and [A.sub.2] and [A.sub.3] have direct influences on [A.sub.5]. In terms of "if ... then ..." statements, that Bayesian network represent two rules: "If [A.sub.1], [A.sub.2] then [A.sub.4]", and "If [A.sub.2] and [A.sub.3], then [A.sub.5]". There are no arcs among [A.sub.1], [A.sub.2] and [A.sub.3], which means they are independent between each other. The prior probabilities are given to the observable assertions [A.sub.1], [A.sub.2] and [A.sub.3], while the full conditional probabilities are given to the inferred assertions [A.sub.4] and [A.sub.5] in form of tables.

With the given prior and conditional probabilities, we can calculate all joint probabilities. For instance, the joint probability:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Similarly,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

With the joint probabilities, the prior probability of inferred assertion [A.sub.4] can be derived:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Similarly, we can calculate the prior probability of [A.sub.5] and have P([A.sub.5]=T) = 0.571.

[FIGURE 2 OMITTED]

THE CRITERION FOR SELECTING KEY MISSING DATA

Let X represent the truth value [A.sub.x]=true, [logical not]X represent the truth value [A.sub.x]=false (i.e., [logical not][A.sub.x]=true). With this notation, we have P(X) for probability P([A.sub.x]=true), P([logical not]X) for P([logical not] [A.sub.x]=true) or P([A.sub.x]=false),and P(K|[logical not]I, J) for P([A.sub.k]=true|[logical not][A.sub.i]=true, [A.sub.j]=true) = P([A.sub.k]=true|[A.sub.i]=false, [A.sub.j]=true).

We define the presumed contribution index of truth value U to proving truth value T, [CI.sub.U[right arrow]T], as the difference between P(T|U) and P(T). That is, [CI.sub.U[right arrow]T] = P(T|U)-P(T) = P([A.sub.t]=true|[A.sub.u]=true)-P([A.sub.t]=true). [CI.sub.[logical not]U[right arrow]T], [CI.sub.U[right arrow][logical not]T], and [CI.sub.U[right arrow][logical not]T] are defined similarly. For example, [CI.sub.[logical not]U[right arrow][logical not]T] = P([logical not]T|[logical not]U)-P([logical not]T) = P([A.sub.t]=false|[A.sub.u]=false)-P([A.sub.t]=false).

[CI.sub.U[right arrow]T] measures how much it would help proving [A.sub.t]=true if knowing [A.sub.u]=true for any [A.sub.u] and any [A.sub.t]. If [A.sub.t] is a TLC and [A.sub.u] is a UOA, then [CI.sub.U[right arrow]T] shows that in case the UOA [A.sub.u]'s value is true, how much it would contribute to inferring that TLC [A.sub.t] is true. The larger the [CI.sub.U[right arrow]T] is, the more significant '[A.sub.u]=true' is towards proving '[A.sub.t]=true'. If [P.sub.u] does not have any influence on [P.sub.t], then [CI.sub.U[right arrow]T]= [CI.sub.[logical not]U[right arrow]T]= [CI.sub.U[right arrow][logical not]T]= [CI.sub.[logical not]U[right arrow][logical not]T] = 0. The presumed contribution index is an indicator of relevance between an UOA and a TLC.

However, the presumed contribution index has two flaws to be the criterion of selecting missing data. (1) Between [A.sub.u] and [A.sub.t], there are four presumed contribution indices, [CI.sub.U[right arrow]T], [CI.sub.[logical not]U[right arrow]T], [CI.sub.U[right arrow][logical not]T], and [CI.sub.[logical not]U[right arrow][logical not]T]. Which one should be used as the criterion? (2) The presumed contribution index may mislead inference sometime. If [A.sub.u1] and [A.sub.u2] are two UOAs, and [CI.sub.U1[right arrow]T] > [CI.sub.U2[right arrow]T], then it looks that [A.sub.u1] is more significant than [A.sub.u2] in proving [A.sub.t]=true if both [A.sub.u1] and [A.sub.u2] are true. But it is not always correct because [A.sub.u1] and [A.sub.u2]'s values can be false and we do not know their values when selecting one from them. If P([A.sub.u1]=true) were much smaller than P([A.sub.u2]=true), then selecting [A.sub.u2] to pursue would be better than selecting [A.sub.u1] even [CI.sub.U1[right arrow]T] > [CI.sub.U2[right arrow]T].

To improve the above two flaws of the presumed contribution index, we define the inference-guiding index (IG-index) of truth value [A.sub.u]=true towards proving [A.sub.t]=true, [IGI.sub.U[right arrow]T], as the product of the presumed contribution index CIU_T and the prior probability of U. That is,

[IGI.sub.U[right arrow]T] = [CI.sub.U[right arrow]T] x P(U).

[IGI.sub.U[right arrow]T] is a better indicator of contribution of [A.sub.u] than [CI.sub.U[right arrow]T] since it takes both the contribution of [A.sub.u]=true for deriving [A.sub.t]=true and the prior probability of [A.sub.u]=true into account. The IG-index [IGI.sub.U[right arrow]T] is high if both the presumed contribution of U and probability of U are high. With this improvement on the presumed contribution index, the IG-index can be called the "true contribution index" of [A.sub.u]=true for deriving [A.sub.t]=true. Thus, the second flaw of the presumed contribution index mentioned above is taken care of. But among the four IG-indices between [A.sub.u] and [A.sub.t], [IGI.sub.U[right arrow]T], [IGI.sub.U[right arrow][logical not]T], [IGI.sub.[logical not]U[right arrow]T], and [IGI.sub.[logical not]U[right arrow][logical not]T], which one should be used as the criterion in inference-guiding? Theorem 1 below answers this question by showing that the absolute values of the four IG-indices are same. To prove Theorem 1, we need three lemmas.

Lemma 1.

[IGI.sub.[logical not]U[right arrow][logical not]T] = [IGI.sub.U[right arrow]T] for any assertion [A.sub.u] and [A.sub.t].

Proof:

By definitions of [IGI.sub.*[right arrow]*] and [CI.sub.*[right arrow]*], and the probability theory, for any [A.sub.u] and [A.sub.t] we have:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Lemma 2.

[IGI.sub.U[right arrow][logical not]T] = [IGI.sub.[logical not]U[right arrow]T], for any assertion [A.sub.u] and [A.sub.t].

Proof:

Let [A.sub.v] an assertions. By Lemma 1, [IGI.sub.[logical not]V[right arrow][logical not]T] = [IGI.sub.V[right arrow]T]. Let [A.sub.u]=[logical not]Av. So, [logical not][A.sub.u]=Av, i.e., [logical not]U=V. Make the substitution, we have [IGI.sub.U[arrow right][logical not]T = [IGI[logical not]U[right arrow]T].

Lemma 3.

[IGI.sub.U[arrow right]T] = -[IGI.sub.U[arrow right][logical not]T] for any assertion [A.sub.u] and [A.sub.t].

Proof:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Theorem 1.

[IGI.sub.U[right arrow]T] = [IGI.sub.[logical not]U[right arrow][logical not]T] = - [IGI.sub.U[right arrow][logical not]T] = - [IGI.sub.[logical not]U[right arrow]T] for any [A.sub.u] and [A.sub.t] in a Bayesian network.

Proof:

Putting Lemma 1, 2, and 3 together, we have [IGI.sub.U[right arrow]T] = [IGI.sub.[logical not]U[right arrow][logical not]T] = - [IGI.sub.U[right arrow][logical not]T] = - [IGI.sub.[logical not]U[right arrow]T] for any [A.sub.u] and [A.sub.t] in a Bayesian network.

Recall that [IGI.sub.U[right arrow]T] is interpreted as the contribution index of [A.sub.u]=true for proving [A.sub.t]=true. Then, [IGI.sub.[logical not]U[right arrow][logical not]T] is interpreted as the contribution index of [A.sub.u]=false for proving [A.sub.t]=false. A negative IGI value is viewed as the contribution to 'disproving'. So, -[IGI.sub.U[right arrow][logical not]T] represents the contribution of '[A.sub.u]=true' to disproving '[A.sub.t]=false'.

With the above interpretation, Theorem 1 tells that the true contribution of [A.sub.u]=true to proving [A.sub.t]=true is same as its true contribution to disproving [A.sub.t]=false; and no matter whether [A.sub.u]=true or [A.sub.u]=false, it has contributions to confirming the value of [A.sub.t], either [A.sub.t]=true or [A.sub.t]=false, and the amounts of contributions are same. That is just the property we want the IG criterion to possess, --It would select such a UOA that once its value is known it would help confirm a TLC to a large extent.

By Theorem 1, absolute values of the four IG-indices between two assertions, [A.sub.u] and [A.sub.t] are same. Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] denote the absolute value of four IG-indices between [A.sub.u] and [A.sub.t]. For a TLC [A.sub.t], we would select a UOA [A.sub.u] such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the largest comparing to the other UOAs, even we do not yet know the value of [A.sub.u] at the time of selection.

Example 2. Illustration of Calculations and Property of IG-index

Continuing Example 1, in which [A.sub.1], [A.sub.2] and [A.sub.3] are UOAs, and [A.sub.4] and [A.sub.5] are TLCs.

For [A.sub.1], [A.sub.2], and [A.sub.4], the following probabilities are given in the Bayesian network: P([A.sub.1])=0.8, P([A.sub.2])=0.6, P([A.sub.4]|[A.sub.1],[A.sub.2])=0.9, P([A.sub.4]|[A.sub.1],[logical not][A.sub.2])=0.75, P([A.sub.4]|[logical not][A.sub.1],[A.sub.2])=0.6, P([A.sub.4]|[logical not][A.sub.1],[logical not][A.sub.2])=0.2. The calculation results of presumed contribution indices and IG-indices are shown in Table 1.

For [A.sub.2], [A.sub.3], and [A.sub.5], the following probabilities are given in the Bayesian network: P([A.sub.2])=0.6, P([A.sub.3])=0.2, P([A.sub.5]|[A.sub.2],[A.sub.3])=0.85, P([A.sub.5]|[A.sub.2],[logical not][A.sub.3])=0.7, P([A.sub.5]|[logical not][A.sub.2],[A.sub.3])=0.8, P([A.sub.5][logical not]-[A.sub.2], [logical not] [A.sub.3])=0.1. The calculation results of the presumed contribution indices and IG indices are shown in Table 2.

In summary, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Further investigations have revealed more characteristics of the IG-index. Theorem 2 and Corollary 1 below establish the correspondence between IG-indices and covariances.

Theorem 2.

Let [X.sub.u] and [X.sub.t] be 0-1 random variables with 1 representing 'true' and 0 representing 'false'. Let U represent the truth value [X.sub.u]=true, [logical not]U represent the truth value [logical not][X.sub.u]=true (i.e. [X.sub.u]=false). Let T represent the truth value [X.sub.t]=true, [logical not]T represent the truth value [logical not][X.sub.t]=true (i.e. [X.sub.t]=false). Then, [IGI.sub.U[right arrow]T] = Cov([X.sub.u], [X.sub.t]), where Cov([X.sub.u], [X.sub.t]) stands for covariance of [X.sub.u] and [X.sub.t].

Proof:

By the definition of covariance and probability theory:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Let Y=[X.sub.u][X.sub.t]. Y is such a 0-1 random variable that Y=1 if and only if both [X.sub.u]=1 and [X.sub.t]=1. Then, P(Y=1) = P([X.sub.u]=1, [X.sub.t]=1), and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

And E[[X.sub.u]] =P([X.sub.u]=1) = P(U), and E[[X.sub.t]] = P([X.sub.t]=1) = P(T).

Therefore,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Theorem 2 shows that the inference-guiding index, [IGI.sub.U[right arrow]T], is the covariance of two random variables corresponding to U and T. Covariance has limited use in the probability theory (for example, Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y)), and its properties are not explored sufficiently (Ross, 1985). Theorem 2 can be extended to the following corollary about other IG-indices and their corresponding covariances. Proofs are straightforward, so not provided.

Corollary 1.

Let us define random variables [[bar.X].sub.u] = 1 - [X.sub.u] and [[bar.X].sub.t] = 1 - [X.sub.t]. [[bar.X].sub.u] = 1 if and only if [X.sub.u]=0 (or [X.sub.u]=false, or [logical not]U); and [[bar.X].sub.u] =0 if and only if [X.sub.u]=1 (or [X.sub.u] is true, or U). [[bar.X].sub.t]=1 if and only if [X.sub.t]=0 (or [X.sub.t] is false, or [logical not]U); and [[bar.X].sub.t]=0 if and only if [X.sub.t]=1 (or [X.sub.t] is true, or U). Then:

[IGI.sub.[logical not]U[right arrow]T] = Cov([[bar.X].sub.u], [X.sub.t]), [IGI.sub.U[right arrow][logical not]T] = Cov([X.sub.u], [[bar.X].sub.t]), [IGI.sub.[logical not]U[right arrow][logical not]T] = Cov([[bar.X].sub.u], [[bar.X].sub.t]).

A REASONING APPROACH WITH INFERENCE-GUIDING

In this section a reasoned approach is presented that integrates logical inference and inference-guiding functions for a Bayesian knowledge-based system. The absolute value of IG-index, as we have investigated in Section 3, is used as the criterion in inference-guiding.

Let {TLC} be the index set of TLCs, {UOA} be the index set current UOAs. Let 1>h>0 be a preset threshold for the acceptable certainty level of a TLC. The reasoning approach with inference-guiding function for Bayesian knowledge-based systems is formalized as follows.

IG-index Inference Approach:

Step 1. Reasoning Calculate P([A.sub.i]) for each i[member of]{TLC}. Pick up [A.sub.t] such that t[member of]{TLC} and P([A.sub.t])=[Max.sub.i[member of]{TLC}]P([A.sub.i]). If P([A.sub.t])[greater than or equal to]h, Stop, - the inference is done with [A.sub.t] being the inferred TLC; otherwise, go to Step 2. Step 2. Inference-Guiding If {UOA} is empty, then Stop, - no TLC with certainty level equal to or higher than h can be derived. Calculate [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for each i[member of]{UOA}. Pick a UOA [A.sub.u] such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Find the truth value of [A.sub.u] from an information resource in environment. Step 3. Updating Update the set {UOA} such that {UOA} = {UOA} \ {[A.sub.u]}. Recalculate the conditional probabilities in Bayesian network with the newly obtained truth value of [A.sub.u]. Go back to Step 1.

Step 1 is doing logical inference. There are various inference algorithms available for the user to choose from, such as variable elimination, direct sampling, and Markov chain simulation (Russell & Norvig, 2003). The problem of logical inference in a Bayesian network is, same as in a knowledge base with other structures, computationally hard. The classic method based on the probability theory gives accurate results but are not efficient for large Bayesian networks. Approximate methods are efficient, but their results are more or less inaccurate. A user may choose an inference algorithm for this step at his/her discretion.

Step 2 is doing inference-guiding. It first calculates the IG-index for each UOA, [A.sub.i], associated with the TLC [A.sub.t] selected in Step 1. To do it, recall that the absolute value of IG-index is calculated with the formula:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The value of P([A.sub.i]) is given in the network since [A.sub.i] is a UOA. P([A.sub.t]) was calculated in Step 1. To calculate P([A.sub.t]|[A.sub.i]), we set [A.sub.i] to 'true' and then use the inference algorithm selected in Step 1 to figure out the probability of [A.sub.t]=true under that circumstance.

The UOA with the largest IG-index absolute value is picked as the key missing data, and its value is pursued from an information source such as a sensor or the user.

After obtaining the truth value of the selected UOA, [A.sub.u], Step 3 removes [A.sub.u] from the UOA set, since [A.sub.u] is no longer unconfirmed, and recalculates the conditional probabilities in the Bayesian network. For each assertion influenced by [A.sub.u], its conditional probabilities must be recalculated. Suppose [A.sub.u]=true from an information source, [A.sub.x] is an inferred assertion that is influenced by [A.sub.u], and the conditional probabilities are in the format of P(X|z, [A.sub.u]) where z represents a set of assertions. For example, z={[A.sub.1], [A.sub.2]}, then P(X|z, [A.sub.u]) = P(X|[A.sub.1], [A.sub.2], [A.sub.u]). By the probability theory, if [A.sub.u] is 'true' (i.e., P([A.sub.u])=1), then the new conditional probability is:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Similarly, if [A.sub.u] is 'false' (i.e., P([A.sub.u])=0), then the new conditional probability is: P(X|z) = (P(X|z,[logical not][A.sub.u]) x (P(z|[logical not][A.sub.u]) / P(z)).

In the formulas for updating conditional probabilities, P(X|z,[A.sub.u]) and P(X|z,[logical not][A.sub.u]) are currently given at the node [A.sub.x] in the Bayesian network before [A.sub.u]'s truth value is obtained. So, the new conditional probability is calculated from the current probability by multiplying a ratio proportional to the correlation between [A.sub.u] and z. If [A.sub.u] is independent of z, then P(z|[A.sub.u]) = P(z|[logical not][A.sub.u]) = P(z), and the ratios (P(z|[A.sub.u]) / P(z)) and (P(z|[logical not][A.sub.u]) / P(z)) will become one, so that P(X|z)=P(X|z,[A.sub.u]) if [A.sub.u]=true, and P(X|z)=P(X|z, [logical not][A.sub.u]) if [A.sub.u]=false. If [A.sub.u] is not independent of z, then the probabilities P(z|[A.sub.u]) and P(z) are calculated from the joint probabilities with the formulas P(z|[A.sub.u])=P(z,[A.sub.u])/P([A.sub.u]), and P(z)=P(z, [A.sub.u])+P(z,[logical not][A.sub.u]).

The three steps are executed repeatedly until a TLC with certainty level higher than the preset threshold h so that we can claim the TLC is reached, or all UOAs are asked and no TLC reaches the threshold.

Example 3. A complete inference process with IG-index Inference Approach

Continuing Example 1 and Example 2. Recall that [A.sub.1], [A.sub.2], and [A.sub.3] are UOAs, and [A.sub.4] and [A.sub.5] are TLCs. Suppose we set threshold h=0.82. We have initially {UOA}={[A.sub.1], [A.sub.2], [A.sub.3]} and {TLC}={[A.sub.4], [A.sub.5]}.

Iteration 1.

Step 1. Calculate the probabilities of TLCs:

P([A.sub.4]) = 0.76, and P([A.sub.5]) = 0.571 (as calculated in Example 1 in Section 2). Neither reaches the threshold h. We pick TLC [A.sub.4] since P([A.sub.4]) > P([A.sub.5]).

Step 2. The two influential UOAs on [A.sub.4] are [A.sub.1] and [A.sub.2] (referring to Fig. 1), whose IG-indices with respective to [A.sub.4] are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], as calculated in Example 2. Since [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the larger, we pick UOA [A.sub.1] and pursue its truth value. Suppose an information source in environment gives [A.sub.1]'s value that is 'true'.

Step 3.

Update {UOA} so that the new {UOA} = {[A.sub.2], [A.sub.3]}. The conditional probabilities at node [A.sub.4], after knowing [A.sub.1]=T, are updated. Note that [A.sub.1] is independent of [A.sub.2]. We have: P([A.sub.4]=true|[A.sub.2]=true)=0.9, and P([A.sub.4]=true|[A.sub.2]=false)=0.75.

The conditional probabilities at node [A.sub.5] do not change since [A.sub.1] has no influence on [A.sub.5].

Iteration 2.

Step 1. Calculate the probabilities of TLCs: P([A.sub.4]) = 0.84, and P([A.sub.5]) = 0.571.

Since P([A.sub.4]) > h = 0.82, we are done with this inference process and the TLC derived is [A.sub.4] whose certainty level is 84%.

CONCLUSION

IG-index and the inference approach presented in this paper are for the Bayesian network that is a robust structure of the knowledge base containing uncertainties. The IG-index provides an effective criterion in selecting the key missing information when given data is not sufficient to reach a top-level-conclusion. The reasoning approach, IG-index Inference Approach, integrates inference-guiding function with logical inference function, and "smartly" leads reasoning to a top-level-conclusion.

REFERENCES

Ben-Gal, I. (2007). Bayesian Networks, in F. Ruggeri, R. Kenett, & F. Faltin (editors), Encyclopedia of Statistics in Quality and Reliability, John Wiley & Sons.

Bishop, C. M., Spiegelhalter, D., & Winn, J. (2003). VIBES: A Variational Inference Engine for Bayesian Networks, on Advances in Neural Information Processing Systems (NIPS), May, http://books.nips.cc/papers/files/nips15/AA37.pdf

Buchanan, B. G. & Shortliffe, E. H. (1984). Rule-Based Expert Systems - The MYCIN Experiments of the Stanford Heuristic Programming Project, Addison-Wesley, Reading, MA.

Chavira, M. & Darwiche, A. (2007). Compiling Bayesian Networks Using Variable Elimination. Proceedings of IJCAI2007.

Duda, R., Gasching, J., & Hart, P. (1979). Model Design in the PROSPECT Consultant System for Mineral Exploration, Expert Systems in the Micro-Electronic Age, Edinburgh Univ. Press, UK.

Duda, R. O., Hart, P. E., & Nilsson, N. J. (1976). Subjective Bayesian Methods for Rulebased Inference Systems, Proceedings of National Computer Conference, 45, AFIPS, 1075-1082.

Fish, K. E. (2006). An Artificial Intelligence Approach to International Market Screening DSS. Journal of International Technology and Information Management, 15(2), 49-60.

Hall, O. P. Jr., & McPeak, C. J. (2003). Using Neural Net Technology to Analyze Corporate Restructuring Announcements. Journal of International Technology and Information Management, 12(2), 29-40.

Hayes-Roth, E., Waterman, D. A., & Lenat, D. B. (1983). Building Expert Systems, Addison-Wesley, Reading, MA

Human Edge Software Corporation, Expert Edge Manual for the IBM Personal Computer, California, 1985.

Hung, S. Y., Tang, K. Z., & Shu, T. C. (2008). Expanding Group Support System Capabilities from the Knowledge Management Perspective. Journal of International Technology and Information Management, 17(1) 21-42.

Ignizio, J. P. (1991). An Introduction to Expert Systems, McGraw-Hill, Inc.

Jensen, F. V. (2001). Bayesian Networks and Decision Graphs, Springer-Verlag, Berlin

Jeroslow, R. G. & Wang, J. (1989). Dynamic Programming, Integral Polyhedra and Horn Clause Knowledge Bases, ORSA's Journal on Computing 1, 7-19

Kanal, L. N. & Lemmer, J. F. (1986). Uncertainty in Artificial Intelligence, Elsevier/North-Holland, Amsterdam, London, New York.

Kickert, W. J. M. (1978). Fuzzy Theories on Decision Making, Leiden, Boston, Mass.

Langseth, H. & Portinale, L. (2007). Bayesian networks in reliability. Reliability Engineering and System Safety 92(1), 92-108.

Liu, R. F. & Soetjipto, R. (2004). Analysis of Three Bayesian Network Inference Algorithms: Variable Elimination, Likelihood Weighting, and Gibbs Samplin., http://www.eecs.berkeley.edu/~rfl/publications/6825p2.pdf, rliu.rusmin@mit.edu, Nov. 4, 2004.

McManus, D. J. & Snyder, C. A. (2003). Knowledge Management: The Role of EPSS. Journal of International technology and Information Management, 12(2), 17-28.

Magill, W. G. W. & Leech, S. A. (1991). Uncertainty Techniques in Expert System Software. Decision Support Systems, 7, 55-65.

Mellish, C. S. (1985). Generalized Alpha-beta Pruning as a Guide to Expert System Question Selection, Expert System 85, Proceedings 5th Technical Conference of the British Computer Society Specialist Group on Expert Systems, Univ. of Warwick, (Cambridge Univ. Press, Cambridge CB2 1RP).

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann.

Ross, S. (1985). Introduction to Probability Models, 3rd Edition, Academic Press, Inc, London, 49-50.

Russell, S. & Norvig, P. (2003). Artificial Intelligence, a Modern Approach, Prentice Hall, Pearson Education Inc., Upper Saddle River, New Jersey.

Shortliffe, E. H. & Buchanan, B. G. (1975). A Model of Inexact Reasoning in Medicine. Mathematical Biosciences, 23, 351-379.

Triantaphyllou, T. & Wang, J. (1992). A Cost-Effective Question-Asking Strategy. The Proceedings of the ACM/SIGAPP Symposium on Applied Computing, Kansas City, 505511.

Ullman, J. D. (1989). Principles of Database and Knowledge-Based Systems, Computer Science Press, Rockville, Maryland.

Wang, J. (2005a). Inference-Guiding for Intelligent Agents. Journal of International Technology and Information Management, 14(2) 93-108.

Wang, J. (2005-b) Inference-Guiding in a Knowledge Base with Bayesian Network Structure. Proceedings of NEDSI Annual Conference, Philadelphia, PA.

Wang, J. (1994). Identifying Key Missing Data for Inference under Uncertainty. International Journal of Approximate Reasoning, 10, 287-309.

Wang, J. & Triantaphyllou, E. (1994). The Problem of Asking the Minimum Number of Questions in Expert Systems with Horn Clause Knowledge Bases. Mathematical and Computer Modeling, 20(9), 75-87.

Wang, J. & Vande Vate, J. (1990). Question-Asking Strategies for Horn Clause Systems. Annals of Mathematics and Artificial Intelligence, 1, 359-370.

Wray, B. A., Markham, I. S., & Mathieu, R. G. (2003). An Artificial Neural Network Approach to Learning from Factory Performance in a Kanban-Based System. Journal of International Technology and Information Management, 12(2), 85-98.

Zadeh, L. A. (1965). Fuzzy Sets, Information and Control, 8, 338-353.

Jinchang Wang

Richard Stockton College of New Jersey

USA

Table 1: Presumed contribution indices and IG-indices relative to [A.sub.4]. Conditional Probabilities P([A.sub.4]|[A.sub.1]) 0.84 P([A.sub.4]|[logical not][A.sub.1]) 0.44 P([logical not][A.sub.4]|[A.sub.1]) 0.16 P([logical not][A.sub.4]|[logical not][A.sub.1]) 0.56 P([A.sub.4]|[A.sub.2]) 0.84 P([A.sub.4]|[logical not][A.sub.2]) 0.64 P([logical not][A.sub.4]|[A.sub.2]) 0.16 P([logical not][A.sub.4]|[logical not][A.sub.2]) 0.36 Presumed contribution index to [A.sub.4] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.08 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.32 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.08 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.32 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.08 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.12 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.08 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.12 IG-index to [A.sub.4] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.064 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.064 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.064 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.064 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.048 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.048 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.048 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.048 Table 2: Presumed contribution indices and IG-indices relative to [A.sub.5] Conditional Probabilities P([A.sub.5]|[A.sub.2]) 0.745 P([A.sub.5]|[logical not][A.sub.2]) 0.31 P([logical not][A.sub.5]|[A.sub.2]) 0.255 P([logical not][A.sub.5]|[logical not][A.sub.2]) 0.69 P([A.sub.5]|[A.sub.3]) 0.83 P([A.sub.5]|[logical not][A.sub.3]) 0.46 P([logical not][A.sub.5]|[A.sub.3]) 0.17 P([logical not][A.sub.5]|[logical not][A.sub.3]) 0.54 Presumed contribution index to [A.sub.5] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.174 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.261 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.174 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.261 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.259 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.111 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.259 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.111 IG-index to [A.sub.5] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.1044 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.1044 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.1044 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.1044 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.0777 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.0777 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] -0.0777 [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] 0.0777

Printer friendly Cite/link Email Feedback | |

Author: | Wang, Jinchang |
---|---|

Publication: | Journal of International Technology and Information Management |

Date: | May 1, 2009 |

Words: | 7244 |

Previous Article: | The moderating effects of technology on career success: can social networks shatter the glass ceiling? |

Next Article: | Does web site usability correlate with web site usage? |

Topics: |