Entropy Computation

 

 

 

 

 

Based on the Weather Dataset in the Data Mining Decision Tree notes, answer the following questions. I’ve posted a spreadsheet for calculating entropy values under the Week 12 module; based on the values of n1 and n2, it gives you the value of info([n1,n2]). Number the solutions (1, 2, 3, … 8). Submit your solutions online in one file (Word, Excel, or pdf).

1. Calculate the entropy if Temp is used as the top node.
2. Calculate the entropy if Humidity is used as the top node.
3. Calculate the entropy if Windy is used as the top node.
4. If Outlook is the top node, and Windy is the next attribute selected on the Outook = sunny branch, what’s the entropy?
5. If Outook = sunny and then Temp is selected, what’s the entropy?
6. If Outook = sunny and then Humidity is selected, what’s the entropy?
7. If Outook = rainy and then Windy is selected, what’s the entropy?
8. What’s the final decision tree that C4.5 will generate based on entropy?

Entropy Computation
Given a probability distribution (p1, p2,… , pn),
entropy(p1, p2,… , pn) = – p1 log p1 – p2 log p2 … – pn log pn
Logs are to the base 2 and entropy is measured in bits.
Examples:
info([2, 3, 4]) = entropy(2/9, 3/9, 4/9)
= – 2/9 X log 2/9 – 3/9 X log 3/9 – 4/9 log 4/9
Information for the first leaf node of the tree is:
info([2,3]) = – 2/5 X log 2/5 – 3/5 X log 3/5 = 0.971 bits
info([2,3,4]) indicates that there are three values (e.g., high, medium,
low). There are two cases with the first value, three with the second
value and four with the third value.
If we select “outlook” as the first node, then the left branch (outlook =
sunny) has 2 yes’s and 3 no’s (see next slide), the middle branch
(outlook = overcast) has 4 yes’s and 0 no’s, and the right branch
(outlook = rainy) has 3 yes’s and 2 no’s. The entropy of the outlook
node, therefore, is:
info([2,3],[4,0],[3,2]) = 5/14 *info([2,3]) + 4/14 * info([4,0]) + 5/14 *
info([3,2])
Since the middle branch has all yes’s, there’s no need to proceed
further along this branch; info([4,0]) = 0, implying that there is no
uncertainty. But we still need to grow the tree along the left and right
branches.
Page 7
outlook
2y,3n
sunny
overcast
rainy
Subtree for Weather Data
4y,0n 3y,2n
Shows the number of yes and no classes in the three nodes
corresponding to outlook = sunny, outlook = overcast, outlook = rainy.
Page 8
Gain Computations for the Weather Data Tree
For the training examples at the root (9 yes and 5 no nodes),
information value is
info([9,5]) = 0.940 bits
gain(outlook) = info([9,5]) – info([2,3],[4,0],[3,2]) = 0.247 bits
where info([2,3],[4,0],[3,2]) =
(5/14) X 0.971 + (4/14) X 0 + (5/14) X 0.971 = 0.693 bits
gain(temperature) = 0.029 bits
gain(humidity) = 0.152 bits
gain(windy) = 0.048 bits
At the root of the tree, before we select any attribute, there are 9 yes
and 5 no nodes. Entropy is info([9,5]) = 0.940 bits. If we select outlook
as the first node, then entropy becomes:
info([2,3],[4,0],[3,2]) = 5/14 *info([2,3]) + 4/14 * info([4,0]) + 5/14 *
info([3,2]) = 0.693 bits.
So by using outlook as the first node, the gain in entropy is: 0.940 –
0.693 = 0.247 bits.
Your goal should be to maximize the entropy gain, which is the same as
minimizing entropy. You should try out the other attributes and find the
gain values. In this example, outlook produces the maximum gain
(minimum entropy), so we select output as the top node of the tree. Try
to develop the entire tree.

Sample Solution

The entropy of a random variable is the average level of “information,” “surprise,” or “uncertainty” inherent in the variable’s various outcomes, according to information theory. Given a discrete random variable, displaystyle XX, with possible outcomes displaystyle x 1,…,x nx 1,…,x n, which occur with probability displaystyle mathrm P (x 1),…,mathrm P (x n),displaystyle mathrm P (x n),displaystyle mathrm P (x n),displaystyle mathrm P (x n),displaystyle mathrm P (x n),disp displaystyle mathrm H (X)=-sum _i=1nmathrm P (x i)log mathrm P (x i)where displaystyle Sigma signifies the sum over the variable’s potential values. The base for displaystyle LOG LOG, the logarithm, differs depending on the application.

Decision-making is an act as old as humankind and the ancestors of modern humans made daily decisions based on interpretations of dreams, smokes, divinations and oracles (Buchanan and O’Connell, 2018). According to Gigerenzer (2011), modern decision-making dates back to the seventeen century; when Descartes and Pointcarre invented the first calculus of decision-making. Buchanan and O’Connell (2018) attributes the popularity of modern decision- making to Chester Barnard in the middle of the twentieth century; for importing the terminology “ decision-making” which was mainly a public administration concept to the business sector to substitute restrictive narratives like policy making and resource allocation. William Starbuck, a professor in Oregon University acknowledges the positive impact of Chester Barnard’s introduction of decision- making on managers by explaining that policy-making and resource allocation are never ending acts, while decision denotes the conclusion of a discussion and start of an action plan (Buchanan and O’Connell, 2018). In addition, Gigerenzer (2011) suggests that the contemporary view of decision-making involves the use of heuristics and human information processing; which is the revolutionary work of Herbert Simon. Heuristics are mental short cuts, cognitive tools and rules of thumb developed through experiences, to enable individuals make judgements and arrive at decisions quickly (Gigerenzer and Gaissmaier, 2011).

2.6 Decision theory

Decision theory is a divergent field because of the different perceptions held by researchers about decisions (Hansson 2005). Decision theory also known as the theory of choice is the study of the rationale behind the choices made by an agent (Stanford Encyclopaedia of Philosophy, 2015). Decision theory deals with goal oriented behaviour in the presence of alternatives (Hansson, 2005). Decision theory can be broken into three branches namely; normative, descriptive and prescriptive branch (Vareman, 2008). Normative theory deals with how to make accurate decisions in a scenario of uncertainty and values, descriptive theory, examines the possibility of imperfect individuals making decision and prescriptive theory is a combination of descriptive and normative theories to achieve the best decision at any given situation (Vareman, 2008). However, there is no universal agreement on a standardized classification on the theories and therefore many researchers have classified the theories as either rational or non-rational (Gigerenezer, 2001; Hansson, 2005; Oliveira, 2007). In differentiating the rational from non-rational theory, Gigerenezer (2001) id

This question has been answered.

Get Answer