Developing Cost Estimating Relationships

Apparently not just anybody can become an independent variable; we had to pass several tests. The analyst didn’t want to waste time and money by investigating and collecting data on all the engine characteristics, just the major drivers. He also wanted only “significant” variables. As it turned out, because the by-pass ratios were about the same for all the engines they looked at, the by-pass ratio was not considered to be significant since it didn’t vary with the cost. He also seemed particular about knowing with some confidence what the value of the independent variable would be for any engine he needed to estimate. He said, for example, that with software estimating one of the parameters they like to use is size, measured in lines of code or function points. The only problem is that before they can use size to estimate cost, they have to figure some way of estimating the size. The more uncertainty associated with the independent variable, the more uncertainty in being able to predict the value of the dependent variable. And, it goes without saying that an independent variable is no good to you if you can’t get the data on it, so some upfront planning is necessary to come up with a reasonable data collection plan. What’s my line? The next step, for those of us that made the first cut, was to determine in a general sense how we related to cost. Did we make the cost increase or decrease? What “specification” or “function” would best suit us? The analyst said that a function could be linear, curvilinear, or even nonlinear in nature like some of the examples below. YY Y Y XX X X He said that starting out with the right function was important when working with small data sets because one or two data points could easily lead you to the wrong conclusion, so you had better know what you were expecting to see. In God We Trust (all others must bring data) They also seemed pretty concerned about how many “analogous” data points or other engines they would be able to find cost and technical data on. The larger the sample of data you have the more the sample looks like the population that it came from and therefore the more confident you would be in any inferences or statements that you would make. Sure, you can run regression on two or three data points, but how much confidence would you have? According to the analyst, when you have a simple regression equation with one independent variable you will lose two “degrees of freedom” because you have estimated the Y-intercept and the slope of the independent variable. So if you have two data points, you have zero degrees of freedom left, with three data points you would have one degree of freedom, and so on. Just like the sample size, the more degrees of freedom the more confident you are in your equation. With each additional independent variable in the equation you lose another degree of freedom. Some analysts1 suggest that in an ideal situation you should have six to ten degrees of freedom for each independent variable in the equation. Now while that may be desirable, often times we find ourselves with a limited amount of data and have to be a little more realistic. Perhaps a more reasonable rule of thumb would be to count the data points, subtract a degree of freedom for the intercept and one more for each independent variable in the equation, and try to leave yourself with three to five degrees of freedom. You can see how this would affect the number of data points you need to collect and the number of independent variables you could have in an equation. If you only have a few data points you might be better off to find the best one-variable equation rather than using several independent variables together. Sometimes, good data is hard to come by because it may be: proprietary; collected in a different format; defined differently; not collected at all. Data can also be difficult to compare because things like costs, prices, labor hours, and technology change over time. These things can result in smaller sets of analogous data points. One piece of advice that the analyst offered was to not overly constrain the term “analogous”. For instance, you want to determine a reasonable price for a piece of fire fighting equipment on a ship. It might be analogous to equipment used at airports, fire departments, and in industrial settings. Another way that analysts can constrain themselves is to look at something like the F-117 stealth aircraft and say, “There’s nothing else like it”. Well, maybe there’s nothing like the aircraft as a whole, but there may be a number of aircraft using the same landing gear, avionics, engines, and material coatings. The same is true with services. Maybe you don’t know what a fair price is for overhauling a truck, but you can find prices for the individual tasks involved in performing an overhaul. Breaking an item or a task down into lower level components is one way of increasing the number of possible analogous data points. Remember, any regression equation is only as good as the data that was used to create it. It’s a lot like painting a room; most of the time is spent in preparation, and if you skip the spackling they won’t care how well you rolled the paint on. 1 Applied Linear Regression Models, Neter, Wasserman, & Kutner, 1989. 85.23567% of all estimates claim a false level of precision Now, after collecting data on a number of engines, the analyst was going to test myself and some of the other independent variables statistically, using a spreadsheet like Excel or one of the many statistical packages available, to see how well we could explain why the cost varied on jet engines. One of the tests was to determine how much of the variation in the engine cost I could explain. This was called the R2 or coefficient of determination, and it is a measure of the strength of the relationship between the dependent variable and myself. I was rated on a scale from 0% to 100%. Scoring 0% would have meant that I had nothing to offer, where as scoring 100% would have meant that there wasn’t anything about the change in engine cost that I couldn’t tell you. The analyst offered two cautions, one, if it sounds too good to be true it probably is; check sample sizes and both the data that was included as well as the data that was not included. Second, the R2 only represents correlation, not causation; don’t go on a fishing trip to find correlated variables, start out with variables that you believe are causal, and then see if there is correlation. Another thing I found out was that as more independent variables are added to the equation the R2 can only get better, not worse. I was just about to invite some of my friends to join me in the equation so that we could really max-out the R2 when the analyst warned me about the adjusted R2 . He said that in equations with more than one independent variable he used the adjusted R2 because it accounted for the loss of the additional degrees of freedom associated with including additional independent variables. Apparently the adjusted R2 can actually decrease if an independent variable is included that explains little of the variation. Another test that was run on me was the T test, a measure of the significance of the relationship between the dependent variable and myself. The analyst was concerned that factors such as sample size and sampling error could create a false impression of my worth. He assumed that I was not related to engine cost and that it would be up to me to prove otherwise. If I was not significantly related to cost I would have a slope of zero or thereabouts. The larger the slope value, relatively speaking, the more confidence the analyst would have in using me. The T statistic represents the number of standard deviations my slope is from zero, the higher the better. Most applications report a “P” value, which is the probability of my slope being zero. If the P value is .01 then there is only a 1% probability that my slope is zero. Roughly translated, the other 99% is the confidence that you can have in using me. Analysts usually have a pass/fail criterion for using an independent variable based on either an acceptable P value (the level of significance) or an acceptable level of confidence. If I fail the T test in a single independent variable model it means that you would actually prefer to use the average engine cost as your estimate rather than the equation with me. Don’t write me off to soon though because even if I don’t do well by myself I still might be helpful if paired with one or more other independent variables. When more than one of us independent variables are used together, our individual T statistics measure the marginal contribution we make to the equation. If I fail the T test here you would conclude that you are better off without me in the equation. After all, I cost you a degree of freedom and I wouldn’t be giving you anything in return. One thing you might consider if you decide to use me with another variable is whether the other variable and I are correlated, and how strong is that correlation. Some applications provide a pairwise correlation matrix that can show the correlation between all the variables in a particular model or the correlation between selected variables. This correlation is measured by the R value, which, appropriately enough, is the square root of R2 . The R value can range from a –1 to +1, the plus or minus driven by whether the slope between any two variables is positive or negative. The closer the R value is to one |1| the stronger the relationship between the two variables. The analyst would actually prefer if we were not related since then it would be easier to tell us apart. Some analysts would prefer not to use two independent variables in a model if their R is greater than |.7| even if they have good T statistics. Their concern is that the slopes of the independent variables are less accurate in this case and that they may change from one sample to the next. Other analysts feel that as long as the slopes make sense and have good T statistics there is no harm in using the equation. Given all that, the real issue is that there are two characteristics of, let’s say an engine, that are strongly related like the thrust and weight. If the engines in the data set have a thrust to weight ratio of 25% to 30% then we better not use this data, in any form, to estimate an engine with a thrust to weight ratio of 45%. This also brings up a good point about the range of the data which is, stay in the range whenever possible. Take me for example. If the turbine inlet temperatures for the engines in the data set range from 2500° to 3500° then you should not estimate any engine with a value outside this range unless you have the guidance of a technical expert. Of course, one of the most important considerations is my accuracy; how well do I predict the cost of an engine. In order to answer this question the analyst will look at my standard error (SE) or what is sometimes called the standard error of the estimate (SEE). The SE or SEE is the average or typical estimating error associated with using the equation. If the SE were $50,000, then, on average, the estimated engine cost would be off by $50,000 from the actual cost. Sometimes we might be off by a few thousand dollars and at other times by $60,000 or $70,000, but on average we would be off by about $50,000. Now while a SE of $50,000 is a lot of money, it is difficult to put the SE into perspective unless we compare it to the average cost of an engine. If the average engine cost $1,000,000 then the SE of $50,000 would only represent on average a 5% estimating error. This calculation where we divide the SE by the average or mean value is known as the coefficient of variation or CV and it represents the average percent estimating error. Putting it into Perspective While there are certainly other important issues to explore like the influence and treatment of outliers, residuals and standardized residuals, and so on, we will leave these for another story. Whether you are building the model yourself or evaluating a model that someone else has developed, the questions are the same. What are the cost drivers? What is the nature of the relationship between the dependent and independent variables? What data points do we include or exclude? What are the statistics and how do we interpret them? Also consider the use of scatter plots because a picture is truly worth a thousand statistics. Keep in mind that when evaluating the results of a parametric model, nothing is more important than the process by which it was built.

Sample Solution

Tribute to pre-winter not a sonnet? J. Keats is a ditty in harvest time and uses a great deal of jargon and jargon. In any case, with the separation of the jargon level, the most recent type of human breakdown: feeling. Some of the time, the state of the best feeling is a genuine writing without an analogy or picture. It is an apparatus to realize what all journalists use to pass on sentiments. Lost, upbeat, irate, authors can discover approaches to communicate sentiments through the thickest allegory. On the off chance that the writer can not communicate feelings, he won't lose the motivation behind composing verse. The melody of John Keats, a sonnet I focus on, is "Pre-winter Autumn" composed by John Keats. This sonnet gives one part of nature, and I will portray in detail how the innovation utilized by the artist permits me to ponder this subject. The title of this sonnet is Autumn. Fundamentally this is the importance of this sonnet. This sonnet centers around one of the fall and the four seasons. I will concentrate on the two strategies I utilized. John Keats 's Greek verses "John Keats' s sonnet" Greek ?? "" (counting life and workmanship). Keats utilizes Greek style as an image of life. He called Greek workmanship eternal, and passed on the message uncertainly. Walter J. Bate clarifies that the Sisobas container of Keats followed by craftsman's companion Haydon, the Townly jar of Townly Museum, or the Borghese jar of Louver are suggested by researchers. The sonnet "Acura of antiquated Greece" is a sonnet composed by John Keats as Carol. In its unique (Greek) structure, Acura is a painstakingly built sonnet that is utilized to mix completely intelligent and passionate ways and acclaim the occasion or person. Throughout the entire existence of British verse, Acura holds its motivation (greatness), however its structure is evolving. Carole contended by Keats' Big Yangko is one of '1819 Great Carols' composed by John Keats. This Orade comprises of six topics, "Night", "Despairing" or "Harvest time". Keats composed these six Carols in a single year. This melody will be distributed in a magazine named "History of Art" one year from now. Contrasted and numerous different sonnets, "Shuzi" is very not quite the same as subjects and styles. The vast majority of Keats 's verse contains sharp rhythms and passionate subjects, however "Qiu" is a peaceful and expressive sonnet about Keat' s seasons and associations with different seasons. In pre-winter verse to the buzzing about, Keats utilizes a basic, striking, visual and material picture to depict the scene of harvest time. With an adjustment in character and a moderate cadence, the clarification created a grand and fulfilling feeling.

Ready to attend?

Ready to join our block community of business leaders for four days of virtual sessions on driving developer happiness and boosting productivity?

Request a Quotation

Developing Cost Estimating Relationships

Sample Solution

Our Benefits

Our Services

Free Features

Ready to attend?

Comply today with Compliantpapers.com, at affordable rates