4.2.2 variable that simultaneously correlates with the cause

4.2.2 Integrating Confounders

A target pollutant is likely to have several
different causal pathways under different environmental conditions, which
indicate the causal pathways we learn may be biased and may not reflect the
real reactions or propagations of pollutants. To overcome this, it is necessary
to model the environmental factors (humidity, wind, etc.) as extraneous
variables in the causality model, which simultaneously influence the cause and
effect we will elaborate how to integrate the environmental factors into the
GBN-based graphical model, to minimize the biases in causality analysis and
guarantee the causal pathways are faithful for the government’s decision
making. We first introduce the definition of confounder and then elaborate the

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


Confounder. A confounder is defined as a third
variable that simultaneously correlates with the cause and effect, e.g. gender
K may affect the effect of recovery P given a medicine Q, Ignoring the
confounders will lead to biased causality analysis. To guarantee an unbiased
causal inference, the cause-and-effect is usually adjusted by averaging all the
sub-classification cases of K 11,integrating environmental
factors as confounders, denoted as Et = {E(1) t ;E(2) t …..}g, into the GBN-based causal pathways, one challenge is there can
be too many sub-classifications of environmental statuses. For example, if
there are 5 environmental factors and each factor has 4 statuses, there will
exist 45 = 1024 causal pathways for each sub-classification case. Directly
integrating Et as confounders to the cause and effect will result in unreliable
causality analysis due to very few sample data conditioned on each
sub-classification case. Therefore, we introduce a discrete hidden confounding
variable K, which determines the
probabilities of different causal pathways from Qt  to Pt, . The environmental factors Et are further integrated into K, where K = 1; 2……K. In this ways, the large number of sub-classification cases of
confounders will be greatly reduced to a small number K, as K clusters of the environmental factors. Based on Markov
equivalence (DAGs which share the same joint probability distribution 10), we
can reverse the arrow Et  K to K Et, as shown in the right part of K determines the distributions of P,Qt;Et, thus enabling us to learn the distribution of the graphical
model from a generative process. To help us learn the hidden variable K, the generative process further introduces a hyper-parameter that
determines the distribution of K. Thus the graphical model
can be understood as a mixture model under K clusters. We learn the parameters
of the graphical model by maximizing the new log likelihood: