Wifcon.Forum Archives

Colors in Source Selection

By Anonymous on Thursday, February 20, 2003 - 01:46 pm:

Assume that an RFP evaluation factor had three subfactors of equal importance. The agency evaluation scheme provides for scoring as follows:

Blue - exceptional
Green - acceptable
Yellow - not acceptable but susceptible to being made acceptable.
Red - not acceptable - not susceptible.

Of the three subfactors scored for a particular proposal by a panel of government reviewers, one was scored blue, one was scored green and one was scored yellow. What would the overall score be for the evaluation factor?

Suppose two subfactors were blue and one was green?

Suppose subfactor #1 was twice as important as subfactor #2 and #3? Would the above answers change?

Do any of the agencies have any specific guidance on "rolling up" colors or adjectives?

By Eric Ottinger on Thursday, February 20, 2003 - 01:48 pm:

Anon,

Yes.

Don't

Eric

By Vern Edwards on Thursday, February 20, 2003 - 02:02 pm:

Air Force FAR Supplement § 5315.305(a)(3)(A) says, concerning technical evaluations:

"The Mission Capability subfactors shall be derived from requirements or objective and threshold performance requirements when used. Mission Capability ratings focus on the strengths and proposal inadequacies of the offeror's proposal. Mission capability shall be evaluated using the following color ratings. Subfactor ratings shall not be rolled up to an overall color rating. Through exchanges, the Government evaluators should be able to obtain the necessary information from offerors with interim Yellow/Marginal ratings to determine if the proposal inadequacies have been satisfactorily addressed. Yellow/Marginal ratings should be rare by the time of the final evaluation. Note that if an offeror's proposal demonstrates a material failure to meet a Government requirement, this is a deficiency in the offeror's proposal."

Underlining added.

In adjectival or color rating schemes, there are no standard arithmetical rules for combining subfactor ratings.

By Anonymous on Thursday, February 20, 2003 - 02:12 pm:

The Air Force guidance is to not roll up subfactors. So what color would we give the factor in the questions I asked? Is the AF saying that we should have leveled the contractors during discussions so this is a non-issue? Frankly, the AF guidance seems to border on goobledygook.

By Eric Ottinger on Thursday, February 20, 2003 - 02:15 pm:

Vern,

Thanks for citing an appropriate authority.

Anon,

Source selection is a team effort. Everyone has a role to play. The source selection authority should be the only person to (implicitly) "roll-up" all of the factors.

There is a more subtle point. A significant "Yellow" weakness may legitimately outweigh a number of "Blue" ratings.

The SSA is required to make a "rational" decision, not a mechanical decision.

Eric

By Eric Ottinger on Thursday, February 20, 2003 - 02:33 pm:

Anon,

Apologies. Evaluation teams who try to roll up all of the factors into a single color or score for the whole proposal, are effectively making the decision for the SSA. This is not a good idea, although it is a very natural human tendency.

I would note that the AF prohibition is specifically for the "Mission Capability" factor, and only for the "sub-factors" directly under the Mission Capability factor.

If you intend to roll sub-sub-factors into sub-factors or sub-factors into factors other than Mission Capbility, I would say to determine the color of the factor in light of the specific strengths and weaknesses and the definitions that go with the colors. Don't try to roll up mechanically.

I would not say that two blues and a yellow must be green. It might be yellow, if I think the weakness is really significant.

Eric

By Anonymous on Thursday, February 20, 2003 - 02:39 pm:

Eric,

Couldn't agree with you more re: "rational" vs "mechanical" decisions.

I would also agree that the source selection authority has the authority to use the input of evaluators in a manner he or she sees fit.

However, you seem to be missing my point. AF evaluators are - to the best of my knowledge - required to give the contracting officer and/or other source selection authority a summary report reflecting the scores of factors and subfactors and the rationale supporting those scores. Under the scenarios I posed, what score would they give the factor? Merely saying that the subfactors should not be rolled up seems to evade the issue. If you are not going to consider the scores of the subfactors, why have them?

By Eric Ottinger on Thursday, February 20, 2003 - 02:54 pm:

Anon,

Glad to see that we are almost on the same sheet of music.

In the final analysis, you (and the SSA) look at the discriminators. Add up all of the plusses and minuses and ask whether the factor (or subfactor) looks blue, green or yellow.

The operative answer is that your evaluators should vote (or do their consensus thing, whatever that is) the same way for the higher level factor or subfactor as they would for the lower level subfactor or sub-subfactor.

The decision memo should address the specific strengths and weaknesses which support the decision, not the colors.

(I would say "rating" rather than "score." Score connotes numerical.)

Eric

By Anonymous on Thursday, February 20, 2003 - 03:43 pm:

Eric,

I guess that is as clear as it is going to get.
The evaluators look at the subfactors and then determine an overall "rating" (color) for the factor. In some cases two blues and a green will equal a green and in other cases they will equal a blue depending upon the consensus perspective of the evaluators. Thanks all.

By Vern Edwards on Thursday, February 20, 2003 - 06:34 pm:

Anonymous:

The Air Force invented color-rating and I have to say that it has never been an especially well thought-out scheme. So don't feel badly if it doesn't entirely make sense to you. I was an Air Force contracting officer when color rating became mandatory in the early 1980s, and it didn't make a heck of a lot of sense to us then, either. But you can make it work.

The essential information that an evaluation board must produce is a set of findings about how well each proposal performed on each of the evaluation factors at the lowest level of evaluation factor subdivision. Thus, if the evaluation factors are (foolishly) broken down into sub-subfactors, or even lower levels, the crucially important information is how well each proposal performed at the lowest level. It is at that level of analysis that the differences among offerors must be determined and at which nonprice-price tradeoffs must be made.

Rating and scoring are merely ways of summarizing more detailed information. It is perfectly natural for a source selection decision maker to ask an evaluation board to cut to the chase and tell him or her how each proposal did overall. Ratings and scores can be helpful in letting a decisionmaker quickly see the big picture. But the decisionmaker's final judgment must reflect an understanding and appreciation of the specific differences among the proposals at the lowest level of evaluation factor subdivision. Rating and scoring necessarily entail the suppression of more detailed information, which is probably why the Air Force prohibits the rolling up of subfactor ratings; it may be that Air Force policymakers want to force source selection authorities to look at the critical information.

If you are using color rating and don't work for the Air Force, then your agency may not have a rule against rolling up subfactor ratings into factor level ratings, or into a summary proposal rating. If so and if you want to roll up the ratings in order to be able to present the big picture, then you must think things through before deciding what ratings to assing at higher factor levels, since there is no arithmetical (the GAO would say "mechanical") way of computing weighted average factor adjectival or color ratings based on adjectival or color subfactor ratings.

Here is one piece of very important advice: Go ahead and use ratings to get an overall sense of the differences among proposals. But whatever you do, do not justify a source selection decision on the basis of differences in ratings. In fact, do not even mention ratings in your source selection decision memorandum. Instead, describe the differences among proposals in terms of specific findings about the differences in their performance at the lowest level of evaluation factor subdivision and explain and justify tradeoffs on the basis of that information.

By Eric Ottinger on Friday, February 21, 2003 - 12:39 pm:

Vern,

Now that it is clear that, in the AF jargon, Anon is asking how to role up elements into subfactors (or sub-elelments to elements). In short, Anon was asking what color to put in block for the briefing slides. This is a reasonable question. I believe we have given him the correct answer.

Since the 1980s the other services have followed the Air Force lead and adopted color rating schemes. The Air Force must be doing something right.

I am trying to conceptualize “nonprice-price tradeoffs” at the lowest “level of analysis.” The price has to be the price for the total contract. Hence, I can’t see how you intend to trade-off price at the total contract level against discriminators at the element or subfactor level.

Eric

By Vern Edwards on Friday, February 21, 2003 - 01:32 pm:

Eric:

I have explained how to make tradeoffs in many publications and many times here at Wifcon, so I'm not going to repeat myself here. Anyone interested in my views can search my Wifcon posts or read my Source Selection Answer Book, pp. 293 - 299. I hestitated before citing my own book, because I do not believe in promoting my own publications at Wifcon, but I think my answer therein will satisfy most readers.

I think that my views on rating and scoring are also well-known. As to the Air Force color-rating system, I think it is one of the most awkward rating systems in use, which is why, to the best of my knowledge, only a few contracting offices have adopted it. I do not believe that color-rating is widely-used. I love the Air Force and owe it much, having been an Air Force contracting official GS-5 through GS-15; but I wish for its own sake that the Air Force would discard color-rating.

Best regards,

Vern

By Anonymous on Wednesday, February 26, 2003 - 04:56 pm:

Let me interject my attempt at clarification into the "Vern vs. the world" dynamic. I adamently agree with what I understand Vern's position to be and will state my understanding in my own terms.

The "adjectives", "scores", "ratings" and/or "colors" used in a source selection evaluation scheme are merely summary labels characterizing the information that is in fact the basis or the evaluation -- i.e., the quality of information contained in the proposals that shed light on the offeror's intention and ability to perform the contract requirements.

Assigning a color or other designator is merely a way of quickly summarizing how good or bad the proposal was in regard to the specific evaluation criteria set forth in the solicitation. "Rolling-up" the colors merely means presenting a similar shorthand, summary "label" at a higher level (e.g, an overall rating for "technical capability" vs. the ratings assigned for each criteria set forth under technical capability). The higher level color or other rating designator does NOT represent balancing "two blues and a yellow" and coming out with light blue. It represents synthesizing the underlying evaluation of propsosal information that led to the assigned ratings for each evaluation criteria and balancing them in a way mindful of their relative order of importance as stated in the solicitation (presumed to be equally important if unstated).

If no proposal ends up being evaluated as the most superior for non-price factors and best price/lowest cost there has to be a trade-off between offerors proposing different mixes of cost/price and non-price qualities. The basis for such a trade-off (and the written supporting documentation thereof) must address the relative specific attributes of the proposals constituting what used to be categorized as "strengths, weaknesses and deficiencies" and whether the net cumulative benefits of one offeror's proposal is worth paying a price differential over the net cumulative benefits proposed by a lower priced offeror. I would not base or defend a decision on a source selection decision that boiled down to three blues and a green is worth a 20% price differential amountung to $20 million over three greens and a blue.

It's the information and analysis of the information that is important, not the label assigned or even if a label is assigned.

By joel hoffman on Wednesday, February 26, 2003 - 11:47 pm:

Well said Anon 4:56. Let me add that the primary significance of delineating the relative importance of subfactors is for assimilating the evaluation comments under subfactors into the overall factor evaluation and for comparing evaluations of individual factors. Thus, comments on a highly important subfactor or factor should theoretically be of more significance than comments related to a minor subfactor.

Many folks get hung up on individual subfactor ratings, when determining an overall rating at the factor level. The "color" or adjectival rating for a factor should simply be a byproduct of the important aspect of the evaluation - the rollup comments on all the subfactors concerning strengths, weaknesses, deficiencies, etc. As stated above, one uses the subfactor relative "weights" to keep each comment in perspective. Overall factor ratings (colors, adjectives, points, etc.) - or subfactor ratings - are simply a visual aid - a summary of the meat of the evaluation. The comments are the meat. happy sails! joel hoffman

By Vern Edwards on Thursday, February 27, 2003 - 06:59 am:

Joel:

One point: "weights" in the sense of numerical expressions of relative importance (e.g., "50 percent"), are of no use when using adjectival scoring.

It makes no sense to use adjectival scoring and then say in the RFP that Technical Factor No. 1 is worth "50 percent" of the total technical rating, Factor No. 2 is "30 percent," and Factor No. 3 is "20 percent."

By Anonymous on Thursday, February 27, 2003 - 04:34 pm:

The other services have not adopted the color rating scheme -- at least the Navy hasn't.

By Vern Edwards on Thursday, February 27, 2003 - 05:08 pm:

None of the other services have mandated the use of color rating, but offices within each of the four services and the Defense Logistics Agency have tried it. Here are three GAO decisions which describe the use of color rating by Navy offices:

Wesley Medical Resources, B-261938, Nov. 20, 1995.

Allied Signal Aerospace Company, B-250822, Feb. 19, 1993.

Burnside-Ott Aviation Training Center, Inc.; Reflectone Training Systems, Inc., B-233113, Feb. 15, 1989.

By joel hoffman on Friday, February 28, 2003 - 12:00 am:

Vern, I agree with you. I wasn't trying to mix numerical weights with adjectival rating systems. I meant to refer to relative importance of factors and subfactors. happy sails! joel

By Anonymous on Friday, February 28, 2003 - 09:29 am:

FYI the Army has prohibited the use of color ratings. And it wasn't because most color ratings had "blue" as the highest color.

By Eric Ottinger on Friday, February 28, 2003 - 10:05 am:

Anon 9:29,

When did this happen?

Army Source Selection Guide; June 2001

“When using the tradeoff process, you evaluate the non-cost portion(s) of the proposal and associated performance and proposal risks using rating scales. These scales must be included in the SSP and may consist of words, colors, or other indicators, with the exception of numbers. (Numerical rating systems appear to give more precise distinctions of merit, but they may obscure the strengths, weaknesses, and risks that support the numbers.)"

Could you be a little more forthcoming on the "because."

Eric

By Vern Edwards on Friday, February 28, 2003 - 11:17 am:

Eric:

Blue is the Air Force's color; that's why blue stands for "exceptional" in the Air Force color-rating scheme. Green is the Army's color. That's what Anon was referring to. I'm surprised that you weren't aware of the Air Force's joke.

Vern

By Eric Ottinger on Friday, February 28, 2003 - 11:36 am:

Vern,

Anon says that the Army has prohibited color ratings. For all I know, this may be true. But a link or a reference would be helpful.

In any case, I doubt the Army has gone back to point scores.

I was aware of the Army's discomfort with Blue. Also, if the Air Force has a four color system, the Army has to have a five color system, which means that the Army's "Yellow" is not quite the same as the Air Force's "Yellow."

So it goes.

Eric

By Eric Ottinger on Friday, February 28, 2003 - 11:44 am:

All,

I said that the other services had “adopted.” My intent was simply to indicate that color rating schemes were hardly unique to the Air Force at this point. I knew that the Army uses colors and I thought (incorrectly) that some parts of the Navy had adopted color rating by now. (My thanks to Vern for identifying the three known instances where the Navy has used colors.) A quick check of the CCH database indicates that DISA, NSA, GSA and the Coast Guard also use colors, albeit colors may not be mandated.

Other than the fact that the briefing slides will be more colorful with a color rating scheme (pardon the pun), does anyone want to argue that there is any fundamental difference between colors (Blue, Green, Yellow, Red) and adjectival (Outstanding, Acceptable, Marginal, Unacceptable)?

Here is a bit out of the AF Guide to demonstrate that colors and adjectives are interchangeable: “ * BLUE (Exceptional) * GREEN (Acceptable) * YELLOW (Marginal ) * RED (Unacceptable).”

Actually, Anon’s original question equated colors and adjectives.

“Blue - exceptional
Green - acceptable
Yellow - not acceptable but susceptible to being made acceptable.
Red - not acceptable - not susceptible.”

If Anon had asked how to roll-up an “Exceptional” and an “Acceptable” and a “Not acceptable but susceptible to being made acceptable,” my answer would not have been any different.

As far as I can tell, the only meaningful distinction is between point scoring and adjectival/colors. I’ve used both. Both work. Both have problems. I prefer colors. Point scores have the appearance of giving a precise answer, which is usually illusory. And adjectival/color schemes don’t even pretend to give a precise answer, which makes some people nervous.

Eric

By Anon2U on Friday, February 28, 2003 - 06:13 pm:

I do not like points because it is too hard to justify why one contractor got a 91 while another got an 89 and lost. However, if colors are going to be related to adjective ratings such as outstanding and excellent then why not just use the adjective ratings. Why have to cross relate what you mean by the colors.

I use Outstanding, excellent, satisfactory and unsatisfactory as ratings and require the tech team to support each subfactor with Strengths and weaknesses. A lot of narrative not just a couple of one line bullets. It is the narrative that I want the source selection decision made on, not the one word rating.

By joel hoffman on Friday, February 28, 2003 - 08:56 pm:

Anon2u, point rating systems are supposed to work the same way as adjectival. People often don't know or care that the strengths, weaknesses, etc. actually determine the rating. I think people only look at the "color", without understanding the basis for the rating.

The adjective, color or point rating for a factor or subfactor should simply fall out, based on the underlying criteria. The "rating" is simply a summary of the meat of the evaluation.

happy sails! joel

By joel hoffman on Friday, February 28, 2003 - 09:01 pm:

Anon2u - I thought some more about what you said. Nobody should ever use points alone in choosing the successful contractor or in rstablishing the competitive range. An 89 is essentially the same rating as a 91. One should never use points alone to justify any decision. The total points are merely an indicator of the quality of a proposal. happy sails! joel

By Vern Edwards on Saturday, March 01, 2003 - 10:35 am:

As far as I know, no agency has prohibited the use of color rating. The color rating system is nothing more than an adjectival rating system with a color chart visual presentation device. Some people don't like it because preparation of the color charts seems to them to be an unnecessary adjunct to the adjectives.

The system was originally devised in the late 1960s or early 1970s to facilitate the presentation of complex information during briefings to senior military and civilian officials -- the Secretary of the Air Force, assistant secretaries, general officers and such -- in source selections of high dollar value. The Air Force has since extended its application to source selections of all types. Earlier versions of Air Force source selection regulations permitted the use of numerical scoring at the subfactor level or below, but prohibited the presentation of numerical scores to the source selection authority.

Think about the current terror alert system. The colors red, orange, yellow and so forth merely simplify more complex expressions of official alarm about the current state of affairs. The colors are useful, but not strictly necessary. They don't convey details, which is something that bugs the heck out of the news media -- they always want to know exactly what prompted any change in status -- but they convey the main message in terms that even most idiots can understand.

Got to go now. Need to buy some duct tape and plastic sheeting.

By Chuck Solloway on Monday, March 03, 2003 - 01:27 pm:

Last I heard, the Army folks were told that they must use colors and that they should not even bother requesting a deviation from this policy. Does Anon have a source for his/her claim that this policy has changed?

Further, I have heard that some in the Navy are using colors. It appears that the colors being used to equate to "Exceptional" are:

Army -green
AF -blue
Navy -gold

By joel hoffman on Monday, March 03, 2003 - 01:38 pm:

In April of 2001, I believe it was Pete Aldridge's (?) dictate that numerical ratings or weights would no longer be allowed. It was a last minute revision to AFARS, put out as an edict. Adjectival systems (including colors or other descriptions) shall be used. happy sails! joel hoffman

By cherokee21 on Monday, March 03, 2003 - 02:12 pm:

Gee Vern, things still haven't changed much since the late 50's it's still "duct and cover!"

By Vern Edwards on Monday, March 03, 2003 - 04:00 pm:

Army FAR Supplement 5115.304(b)(2)(iv) says:

"Evaluation factors, subfactors, and elements:... Must be qualitative. Numerical weighting (i.e., assigning points or percentages to evaluation factors and subfactors) is not an authorized method of expressing the relative importance of these factors and subfactors. Evaluation factors and subfactors must be definable in readily understood qualitative terms (i.e., adjectival, colors, or other indicators, but not numbers) and represent the key areas of importance to be considered in the source selection process. The direction of this subparagraph is not waivable, either on an individual or class basis, as an AFARS deviation."

Note the confusion over "weighting" versus rating or scoring. Note the goofy notion that adjectives and colors can be "readily understood," (colors!) but not numbers.

By joel hoffman on Monday, March 03, 2003 - 07:36 pm:

I spoke with the case manager and the 1102 who was the chairperson of the 2001 AFARS rewrite committee at different times. According to both, the personal view of the (then) new Undersecretary for Acquisition, Technology and Logistics concerning rating systems was interjected by edict into the 2001 rewrite of AFARS, after the draft had been reviewed by the field. The word was that's the way it was going to be, no discussions or debates and there would be no waivers or deviations from that policy.

Can't say that I blame him after reading protests showing how misunderstood and poorly the numerical system has been used. Apparently, people still think that a total score by itself (or at the most, total category factor scores), is all that is necessary to justify the selection. There were many cases where one or two point difference in scores were treated as significant differences. For instance, in an earlier post in this thread, someone said "it is too hard to justify why one contractor got a 91 while another got an 89 and lost." Au contraire, it's not hard at all, if one makes a decent cost-technical trade-off analysis, citing the advantages and disadvantages of each proposal, rather than relying on point scores. In fact, one can readily justify selecting the lower scoring proposal the same way.

Many local and state governments, as well as private firms, use "total scores" to make selections. I've seen some of those rating schemes and bases of award. Oversimplifying numerical rating systems is rampant. happy sails! joel hoffman

By Vern Edwards on Monday, March 03, 2003 - 08:09 pm:

Joel:

A couple of thoughts:

First, it's not a matter of misusing "the numerical system." Agencies use a variety of numerical schemes which differ in their structure and operation.

Second, agencies have made the same kinds of mistakes with adjectives. In one case, NASA used both adjectives and numbers and still screwed up! See: Engineering and Computation, Inc., B-261658, Oct. 16, 1995. The Army's prohibition on the use of numbers hasn't prevented Army source selection officials from screwing up with adjectives. See: Dyncorp International LLC, B-289863, May 13, 2002 (a Corps of Engineers procurement -- protest sustained). See, too, Kathryn Huddleston and Associates, Ltd., B-289453, March 11, 2002.

You can make good source selection decisions with the help of any kind of rating or scoring system if you know what you're doing -- numbers, adjectives, color ratings, stars, happy/sad faces -- you name it. And if you don't know what you're doing, then no system will prevent you from screwing up.

The answer is training, not rules.

Vern

By joel hoffman on Monday, March 03, 2003 - 08:53 pm:

You may be right, Vern. It seems that any system can be screwed up. Training is preferred to "edicts" (Army's rules). But training isn't an end-all either. It seems that we can train until we're blue in the face (no pun intended) and some people still won't or can't "get it". Gets frustrating sometimes. happy sails! joel

By Vern Edwards on Monday, March 03, 2003 - 09:18 pm:

Joel:

Yes, unfortunately, you're dead right about that.

Vern