By
Anonymous on Thursday, February 20,
2003 - 01:46 pm:
Assume that an RFP evaluation factor had three
subfactors of equal importance. The agency evaluation scheme
provides for scoring as follows:
Blue - exceptional
Green - acceptable
Yellow - not acceptable but susceptible to being made
acceptable.
Red - not acceptable - not susceptible.
Of the three subfactors scored for a particular proposal by a
panel of government reviewers, one was scored blue, one was
scored green and one was scored yellow. What would the overall
score be for the evaluation factor?
Suppose two subfactors were blue and one was green?
Suppose subfactor #1 was twice as important as subfactor #2 and
#3? Would the above answers change?
Do any of the agencies have any specific guidance on "rolling
up" colors or adjectives?
By
Eric Ottinger on Thursday, February
20, 2003 - 01:48 pm:
Anon,
Yes.
Don't
Eric
By
Vern Edwards on Thursday, February
20, 2003 - 02:02 pm:
Air Force FAR Supplement § 5315.305(a)(3)(A) says,
concerning technical evaluations:
"The Mission Capability subfactors shall be derived from
requirements or objective and threshold performance requirements
when used. Mission Capability ratings focus on the strengths and
proposal inadequacies of the offeror's proposal. Mission
capability shall be evaluated using the following color ratings.
Subfactor ratings shall not be rolled up to an overall color
rating. Through exchanges, the Government evaluators should
be able to obtain the necessary information from offerors with
interim Yellow/Marginal ratings to determine if the proposal
inadequacies have been satisfactorily addressed. Yellow/Marginal
ratings should be rare by the time of the final evaluation. Note
that if an offeror's proposal demonstrates a material failure to
meet a Government requirement, this is a deficiency in the
offeror's proposal."
Underlining added.
In adjectival or color rating schemes, there are no standard
arithmetical rules for combining subfactor ratings.
By
Anonymous on Thursday, February 20,
2003 - 02:12 pm:
The Air Force guidance is to not roll up subfactors.
So what color would we give the factor in the questions I asked?
Is the AF saying that we should have leveled the contractors
during discussions so this is a non-issue? Frankly, the AF
guidance seems to border on goobledygook.
By
Eric Ottinger on Thursday, February
20, 2003 - 02:15 pm:
Vern,
Thanks for citing an appropriate authority.
Anon,
Source selection is a team effort. Everyone has a role to play.
The source selection authority should be the only person to
(implicitly) "roll-up" all of the factors.
There is a more subtle point. A significant "Yellow" weakness
may legitimately outweigh a number of "Blue" ratings.
The SSA is required to make a "rational" decision, not a
mechanical decision.
Eric
By
Eric Ottinger on Thursday, February
20, 2003 - 02:33 pm:
Anon,
Apologies. Evaluation teams who try to roll up all of the
factors into a single color or score for the whole proposal, are
effectively making the decision for the SSA. This is not a good
idea, although it is a very natural human tendency.
I would note that the AF prohibition is specifically for the
"Mission Capability" factor, and only for the "sub-factors"
directly under the Mission Capability factor.
If you intend to roll sub-sub-factors into sub-factors or
sub-factors into factors other than Mission Capbility, I would
say to determine the color of the factor in light of the
specific strengths and weaknesses and the definitions that go
with the colors. Don't try to roll up mechanically.
I would not say that two blues and a yellow must be green. It
might be yellow, if I think the weakness is really significant.
Eric
By
Anonymous on Thursday, February 20,
2003 - 02:39 pm:
Eric,
Couldn't agree with you more re: "rational" vs "mechanical"
decisions.
I would also agree that the source selection authority has the
authority to use the input of evaluators in a manner he or she
sees fit.
However, you seem to be missing my point. AF evaluators are - to
the best of my knowledge - required to give the contracting
officer and/or other source selection authority a summary report
reflecting the scores of factors and subfactors and the
rationale supporting those scores. Under the scenarios I posed,
what score would they give the factor? Merely saying that the
subfactors should not be rolled up seems to evade the issue. If
you are not going to consider the scores of the subfactors, why
have them?
By
Eric Ottinger on Thursday, February
20, 2003 - 02:54 pm:
Anon,
Glad to see that we are almost on the same sheet of music.
In the final analysis, you (and the SSA) look at the
discriminators. Add up all of the plusses and minuses and ask
whether the factor (or subfactor) looks blue, green or yellow.
The operative answer is that your evaluators should vote (or do
their consensus thing, whatever that is) the same way for the
higher level factor or subfactor as they would for the lower
level subfactor or sub-subfactor.
The decision memo should address the specific strengths and
weaknesses which support the decision, not the colors.
(I would say "rating" rather than "score." Score connotes
numerical.)
Eric
By
Anonymous on Thursday, February 20,
2003 - 03:43 pm:
Eric,
I guess that is as clear as it is going to get.
The evaluators look at the subfactors and then determine an
overall "rating" (color) for the factor. In some cases two blues
and a green will equal a green and in other cases they will
equal a blue depending upon the consensus perspective of the
evaluators. Thanks all.
By
Vern Edwards on Thursday, February
20, 2003 - 06:34 pm:
Anonymous:
The Air Force invented color-rating and I have to say that it
has never been an especially well thought-out scheme. So don't
feel badly if it doesn't entirely make sense to you. I was an
Air Force contracting officer when color rating became mandatory
in the early 1980s, and it didn't make a heck of a lot of sense
to us then, either. But you can make it work.
The essential information that an evaluation board must produce
is a set of findings about how well each proposal performed on
each of the evaluation factors at the lowest level of
evaluation factor subdivision. Thus, if the evaluation
factors are (foolishly) broken down into sub-subfactors, or even
lower levels, the crucially important information is how well
each proposal performed at the lowest level. It is at that level
of analysis that the differences among offerors must be
determined and at which nonprice-price tradeoffs must be made.
Rating and scoring are merely ways of summarizing more detailed
information. It is perfectly natural for a source selection
decision maker to ask an evaluation board to cut to the chase
and tell him or her how each proposal did overall.
Ratings and scores can be helpful in letting a decisionmaker
quickly see the big picture. But the decisionmaker's final
judgment must reflect an understanding and appreciation of the
specific differences among the proposals at the lowest level of
evaluation factor subdivision. Rating and scoring necessarily
entail the suppression of more detailed information, which is
probably why the Air Force prohibits the rolling up of subfactor
ratings; it may be that Air Force policymakers want to force
source selection authorities to look at the critical
information.
If you are using color rating and don't work for the Air Force,
then your agency may not have a rule against rolling up
subfactor ratings into factor level ratings, or into a summary
proposal rating. If so and if you want to roll up the ratings in
order to be able to present the big picture, then you must think
things through before deciding what ratings to assing at higher
factor levels, since there is no arithmetical (the GAO would say
"mechanical") way of computing weighted average factor
adjectival or color ratings based on adjectival or color
subfactor ratings.
Here is one piece of very important advice: Go ahead and use
ratings to get an overall sense of the differences among
proposals. But whatever you do, do not justify a source
selection decision on the basis of differences in ratings. In
fact, do not even mention ratings in your source selection
decision memorandum. Instead, describe the differences among
proposals in terms of specific findings about the differences in
their performance at the lowest level of evaluation factor
subdivision and explain and justify tradeoffs on the basis of
that information.
By
Eric Ottinger on Friday, February
21, 2003 - 12:39 pm:
Vern,
Now that it is clear that, in the AF jargon, Anon is asking how
to role up elements into subfactors (or sub-elelments to
elements). In short, Anon was asking what color to put in block
for the briefing slides. This is a reasonable question. I
believe we have given him the correct answer.
Since the 1980s the other services have followed the Air Force
lead and adopted color rating schemes. The Air Force must be
doing something right.
I am trying to conceptualize “nonprice-price tradeoffs” at the
lowest “level of analysis.” The price has to be the price for
the total contract. Hence, I can’t see how you intend to
trade-off price at the total contract level against
discriminators at the element or subfactor level.
Eric
By
Vern Edwards on Friday, February
21, 2003 - 01:32 pm:
Eric:
I have explained how to make tradeoffs in many publications and
many times here at Wifcon, so I'm not going to repeat myself
here. Anyone interested in my views can search my Wifcon posts
or read my Source Selection Answer Book, pp. 293 - 299. I
hestitated before citing my own book, because I do not believe
in promoting my own publications at Wifcon, but I think my
answer therein will satisfy most readers.
I think that my views on rating and scoring are also well-known.
As to the Air Force color-rating system, I think it is one of
the most awkward rating systems in use, which is why, to the
best of my knowledge, only a few contracting offices have
adopted it. I do not believe that color-rating is widely-used. I
love the Air Force and owe it much, having been an Air Force
contracting official GS-5 through GS-15; but I wish for its own
sake that the Air Force would discard color-rating.
Best regards,
Vern
By
Anonymous on Wednesday, February
26, 2003 - 04:56 pm:
Let me interject my attempt at clarification into the
"Vern vs. the world" dynamic. I adamently agree with what I
understand Vern's position to be and will state my understanding
in my own terms.
The "adjectives", "scores", "ratings" and/or "colors" used in a
source selection evaluation scheme are merely summary labels
characterizing the information that is in fact the basis or the
evaluation -- i.e., the quality of information contained in the
proposals that shed light on the offeror's intention and ability
to perform the contract requirements.
Assigning a color or other designator is merely a way of quickly
summarizing how good or bad the proposal was in regard to the
specific evaluation criteria set forth in the solicitation.
"Rolling-up" the colors merely means presenting a similar
shorthand, summary "label" at a higher level (e.g, an overall
rating for "technical capability" vs. the ratings assigned for
each criteria set forth under technical capability). The higher
level color or other rating designator does NOT represent
balancing "two blues and a yellow" and coming out with light
blue. It represents synthesizing the underlying evaluation of
propsosal information that led to the assigned ratings for each
evaluation criteria and balancing them in a way mindful of their
relative order of importance as stated in the solicitation
(presumed to be equally important if unstated).
If no proposal ends up being evaluated as the most superior for
non-price factors and best price/lowest cost there has to be a
trade-off between offerors proposing different mixes of
cost/price and non-price qualities. The basis for such a
trade-off (and the written supporting documentation thereof)
must address the relative specific attributes of the proposals
constituting what used to be categorized as "strengths,
weaknesses and deficiencies" and whether the net cumulative
benefits of one offeror's proposal is worth paying a price
differential over the net cumulative benefits proposed by a
lower priced offeror. I would not base or defend a decision on a
source selection decision that boiled down to three blues and a
green is worth a 20% price differential amountung to $20 million
over three greens and a blue.
It's the information and analysis of the information that is
important, not the label assigned or even if a label is
assigned.
By
joel hoffman on Wednesday, February
26, 2003 - 11:47 pm:
Well said Anon 4:56. Let me add that the primary
significance of delineating the relative importance of
subfactors is for assimilating the evaluation comments under
subfactors into the overall factor evaluation and for comparing
evaluations of individual factors. Thus, comments on a highly
important subfactor or factor should theoretically be of more
significance than comments related to a minor subfactor.
Many folks get hung up on individual subfactor ratings, when
determining an overall rating at the factor level. The "color"
or adjectival rating for a factor should simply be a byproduct
of the important aspect of the evaluation - the rollup comments
on all the subfactors concerning strengths, weaknesses,
deficiencies, etc. As stated above, one uses the subfactor
relative "weights" to keep each comment in perspective. Overall
factor ratings (colors, adjectives, points, etc.) - or subfactor
ratings - are simply a visual aid - a summary of the meat of the
evaluation. The comments are the meat. happy sails! joel hoffman
By
Vern Edwards on Thursday, February
27, 2003 - 06:59 am:
Joel:
One point: "weights" in the sense of numerical expressions of
relative importance (e.g., "50 percent"), are of no use when
using adjectival scoring.
It makes no sense to use adjectival scoring and then say in the
RFP that Technical Factor No. 1 is worth "50 percent" of the
total technical rating, Factor No. 2 is "30 percent," and Factor
No. 3 is "20 percent."
By
Anonymous on Thursday, February 27,
2003 - 04:34 pm:
The other services have not adopted the color rating
scheme -- at least the Navy hasn't.
By
Vern Edwards on Thursday, February
27, 2003 - 05:08 pm:
None of the other services have mandated the use of
color rating, but offices within each of the four services and
the Defense Logistics Agency have tried it. Here are three GAO
decisions which describe the use of color rating by Navy
offices:
Wesley Medical Resources, B-261938, Nov. 20, 1995.
Allied Signal Aerospace Company, B-250822, Feb. 19, 1993.
Burnside-Ott Aviation Training Center, Inc.; Reflectone
Training Systems, Inc., B-233113, Feb. 15, 1989.
By
joel hoffman on Friday, February
28, 2003 - 12:00 am:
Vern, I agree with you. I wasn't trying to mix
numerical weights with adjectival rating systems. I meant to
refer to relative importance of factors and subfactors. happy
sails! joel
By
Anonymous on Friday, February 28,
2003 - 09:29 am:
FYI the Army has prohibited the use of color ratings.
And it wasn't because most color ratings had "blue" as the
highest color.
By
Eric Ottinger on Friday, February
28, 2003 - 10:05 am:
Anon 9:29,
When did this happen?
Army Source Selection Guide; June 2001
“When using the tradeoff process, you evaluate the non-cost
portion(s) of the proposal and associated performance and
proposal risks using rating scales. These scales must be
included in the SSP and may consist of words, colors, or other
indicators, with the exception of numbers. (Numerical rating
systems appear to give more precise distinctions of merit, but
they may obscure the strengths, weaknesses, and risks that
support the numbers.)"
Could you be a little more forthcoming on the "because."
Eric
By
Vern Edwards on Friday, February
28, 2003 - 11:17 am:
Eric:
Blue is the Air Force's color; that's why blue stands for
"exceptional" in the Air Force color-rating scheme. Green is the
Army's color. That's what Anon was referring to. I'm surprised
that you weren't aware of the Air Force's joke.
Vern
By
Eric Ottinger on Friday, February
28, 2003 - 11:36 am:
Vern,
Anon says that the Army has prohibited color ratings. For all I
know, this may be true. But a link or a reference would be
helpful.
In any case, I doubt the Army has gone back to point scores.
I was aware of the Army's discomfort with Blue. Also, if the Air
Force has a four color system, the Army has to have a five color
system, which means that the Army's "Yellow" is not quite the
same as the Air Force's "Yellow."
So it goes.
Eric
By
Eric Ottinger on Friday, February
28, 2003 - 11:44 am:
All,
I said that the other services had “adopted.” My intent was
simply to indicate that color rating schemes were hardly unique
to the Air Force at this point. I knew that the Army uses colors
and I thought (incorrectly) that some parts of the Navy had
adopted color rating by now. (My thanks to Vern for identifying
the three known instances where the Navy has used colors.) A
quick check of the CCH database indicates that DISA, NSA, GSA
and the Coast Guard also use colors, albeit colors may not be
mandated.
Other than the fact that the briefing slides will be more
colorful with a color rating scheme (pardon the pun), does
anyone want to argue that there is any fundamental difference
between colors (Blue, Green, Yellow, Red) and adjectival
(Outstanding, Acceptable, Marginal, Unacceptable)?
Here is a bit out of the AF Guide to demonstrate that colors and
adjectives are interchangeable: “ * BLUE (Exceptional) * GREEN
(Acceptable) * YELLOW (Marginal ) * RED (Unacceptable).”
Actually, Anon’s original question equated colors and
adjectives.
“Blue - exceptional
Green - acceptable
Yellow - not acceptable but susceptible to being made
acceptable.
Red - not acceptable - not susceptible.”
If Anon had asked how to roll-up an “Exceptional” and an
“Acceptable” and a “Not acceptable but susceptible to being made
acceptable,” my answer would not have been any different.
As far as I can tell, the only meaningful distinction is between
point scoring and adjectival/colors. I’ve used both. Both work.
Both have problems. I prefer colors. Point scores have the
appearance of giving a precise answer, which is usually
illusory. And adjectival/color schemes don’t even pretend to
give a precise answer, which makes some people nervous.
Eric
By
Anon2U on Friday, February 28, 2003
- 06:13 pm:
I do not like points because it is too hard to justify
why one contractor got a 91 while another got an 89 and lost.
However, if colors are going to be related to adjective ratings
such as outstanding and excellent then why not just use the
adjective ratings. Why have to cross relate what you mean by the
colors.
I use Outstanding, excellent, satisfactory and unsatisfactory as
ratings and require the tech team to support each subfactor with
Strengths and weaknesses. A lot of narrative not just a couple
of one line bullets. It is the narrative that I want the source
selection decision made on, not the one word rating.
By
joel hoffman on Friday, February
28, 2003 - 08:56 pm:
Anon2u, point rating systems are supposed to work the
same way as adjectival. People often don't know or care that the
strengths, weaknesses, etc. actually determine the rating. I
think people only look at the "color", without understanding the
basis for the rating.
The adjective, color or point rating for a factor or subfactor
should simply fall out, based on the underlying criteria. The
"rating" is simply a summary of the meat of the evaluation.
happy sails! joel
By
joel hoffman on Friday, February
28, 2003 - 09:01 pm:
Anon2u - I thought some more about what you said.
Nobody should ever use points alone in choosing the successful
contractor or in rstablishing the competitive range. An 89 is
essentially the same rating as a 91. One should never use points
alone to justify any decision. The total points are merely an
indicator of the quality of a proposal. happy sails! joel
By
Vern Edwards on Saturday, March 01,
2003 - 10:35 am:
As far as I know, no agency has prohibited the use of
color rating. The color rating system is nothing more than an
adjectival rating system with a color chart visual presentation
device. Some people don't like it because preparation of the
color charts seems to them to be an unnecessary adjunct to the
adjectives.
The system was originally devised in the late 1960s or early
1970s to facilitate the presentation of complex information
during briefings to senior military and civilian officials --
the Secretary of the Air Force, assistant secretaries, general
officers and such -- in source selections of high dollar value.
The Air Force has since extended its application to source
selections of all types. Earlier versions of Air Force source
selection regulations permitted the use of numerical scoring at
the subfactor level or below, but prohibited the presentation of
numerical scores to the source selection authority.
Think about the current terror alert system. The colors red,
orange, yellow and so forth merely simplify more complex
expressions of official alarm about the current state of
affairs. The colors are useful, but not strictly necessary. They
don't convey details, which is something that bugs the heck out
of the news media -- they always want to know exactly what
prompted any change in status -- but they convey the main
message in terms that even most idiots can understand.
Got to go now. Need to buy some duct tape and plastic sheeting.
By
Chuck Solloway on Monday, March 03,
2003 - 01:27 pm:
Last I heard, the Army folks were told that they must
use colors and that they should not even bother requesting a
deviation from this policy. Does Anon have a source for his/her
claim that this policy has changed?
Further, I have heard that some in the Navy are using colors. It
appears that the colors being used to equate to "Exceptional"
are:
Army -green
AF -blue
Navy -gold
By
joel hoffman on Monday, March 03,
2003 - 01:38 pm:
In April of 2001, I believe it was Pete Aldridge's (?)
dictate that numerical ratings or weights would no longer be
allowed. It was a last minute revision to AFARS, put out as an
edict. Adjectival systems (including colors or other
descriptions) shall be used. happy sails! joel hoffman
By
cherokee21 on Monday, March 03,
2003 - 02:12 pm:
Gee Vern, things still haven't changed much since the
late 50's it's still "duct and cover!"
By
Vern Edwards on Monday, March 03,
2003 - 04:00 pm:
Army FAR Supplement 5115.304(b)(2)(iv) says:
"Evaluation factors, subfactors, and elements:... Must be
qualitative. Numerical weighting (i.e., assigning points or
percentages to evaluation factors and subfactors) is not an
authorized method of expressing the relative importance of these
factors and subfactors. Evaluation factors and subfactors must
be definable in readily understood qualitative terms (i.e.,
adjectival, colors, or other indicators, but not numbers) and
represent the key areas of importance to be considered in the
source selection process. The direction of this subparagraph is
not waivable, either on an individual or class basis, as an
AFARS deviation."
Note the confusion over "weighting" versus rating or scoring.
Note the goofy notion that adjectives and colors can be "readily
understood," (colors!) but not numbers.
By
joel hoffman on Monday, March 03,
2003 - 07:36 pm:
I spoke with the case manager and the 1102 who was the
chairperson of the 2001 AFARS rewrite committee at different
times. According to both, the personal view of the (then) new
Undersecretary for Acquisition, Technology and Logistics
concerning rating systems was interjected by edict into the 2001
rewrite of AFARS, after the draft had been reviewed by the
field. The word was that's the way it was going to be, no
discussions or debates and there would be no waivers or
deviations from that policy.
Can't say that I blame him after reading protests showing how
misunderstood and poorly the numerical system has been used.
Apparently, people still think that a total score by itself (or
at the most, total category factor scores), is all that is
necessary to justify the selection. There were many cases where
one or two point difference in scores were treated as
significant differences. For instance, in an earlier post in
this thread, someone said "it is too hard to justify why one
contractor got a 91 while another got an 89 and lost." Au
contraire, it's not hard at all, if one makes a decent
cost-technical trade-off analysis, citing the advantages and
disadvantages of each proposal, rather than relying on point
scores. In fact, one can readily justify selecting the lower
scoring proposal the same way.
Many local and state governments, as well as private firms, use
"total scores" to make selections. I've seen some of those
rating schemes and bases of award. Oversimplifying numerical
rating systems is rampant. happy sails! joel hoffman
By
Vern Edwards on Monday, March 03,
2003 - 08:09 pm:
Joel:
A couple of thoughts:
First, it's not a matter of misusing "the numerical
system." Agencies use a variety of numerical schemes which
differ in their structure and operation.
Second, agencies have made the same kinds of mistakes with
adjectives. In one case, NASA used both adjectives and numbers
and still screwed up! See: Engineering and Computation, Inc.,
B-261658, Oct. 16, 1995. The Army's prohibition on the use of
numbers hasn't prevented Army source selection officials from
screwing up with adjectives. See: Dyncorp International LLC,
B-289863, May 13, 2002 (a Corps of Engineers procurement --
protest sustained). See, too, Kathryn Huddleston and
Associates, Ltd., B-289453, March 11, 2002.
You can make good source selection decisions with the help of
any kind of rating or scoring system if you know what you're
doing -- numbers, adjectives, color ratings, stars, happy/sad
faces -- you name it. And if you don't know what you're doing,
then no system will prevent you from screwing up.
The answer is training, not rules.
Vern
By
joel hoffman on Monday, March 03,
2003 - 08:53 pm:
You may be right, Vern. It seems that any system can
be screwed up. Training is preferred to "edicts" (Army's rules).
But training isn't an end-all either. It seems that we can train
until we're blue in the face (no pun intended) and some people
still won't or can't "get it". Gets frustrating sometimes. happy
sails!
joel
By
Vern Edwards on Monday, March 03,
2003 - 09:18 pm:
Joel:
Yes, unfortunately, you're dead right about that.
Vern
|