Dear All.
Let me add some more elements to the explanations by Christian.
1) The difficulty coefficients are capped to 75 (to be distributed among all objectives/metrics, experts could not distribute all 75 points). How these coefficients will be distributed will depend in the experts' judgement of each objective. It is very very unlikely that all 75 points will be assigned to a single metric, unless: 1) they judge that all the other metrics are trivial/unambitious but one, 2) the only metric with non-null difficulty points is judged extremely difficult/ambitious (such that it deserves all the 75 points).
2) The improvement is not capped at 100%. Thus, "overperforming" is possible when achievement > target (achievement < target, in the case target < baseline).
When deciding the scoring system we discussed a lot on this: we decided to let teams overperform for a number of reasons. Of course, putting a trivial target to achieve overperforming is not a good idea, since this would lead experts to assign very low difficulty coefficients.
The template we provided, in my view, was quite clear about the above issues, since capping of the achievement/improvement was never mentioned (formulas for the improvement simply reported the case A<B and A>=B), while to was clearly stated that at most 75 points are to be distributed across all objectives. However, if it was not clear enough, please, accept my apologies.
3) Not only plitting of the 75 difficulty points is a very difficult task, the whole evaluation of Stage II and III is a quite innovative, and thus complex, task. But developing innovative approaches to comparative performance evaluation is a major objective of EuRoC. That's the reason why we are going to ask the help of all challengers to determine difficulty points (more on this soon in the next week).
4) The general rules and procedures for evaluation were presented at the Benchmarking workshop in June 2015 and described in the document "Benchmarking Rules and Evaluation Procedures" (sent to all challengers). Moreover, I sent a clarification regarding the coolness factor in December 2015. The text of this e-mail is reported below:
"Coolness Factor:
Please note the coolness of your freestyle makes another 15 points, which you can earn in addition to the points you earn through the quantifiable objectives.
With regard to coolness factor: As Prof. Behnke points out, this criterion is hard to quantify and the evaluation will be very subjective.
The coolness factor will be judged solely by the independent experts on a scale from 0 to 15 points, using the videos you will shoot at the challenge hosts facilities. the reports prepared by the challenge Host and the Challenger Team, and the interviews with the Challenger Team.
As this criterion is the same for all teams, you don’t need to include it in the quantifiable objectives."
I hope this contributes to clarify all issues. Please, feel free to ask in the case you need further clarifications.
Please, don't forget to send you revised deliverable for the quantifiable objectives of the free-style task by monday march 28 evening. In the case you don't, we will assume that you are happy with the last version you sent us.
Best regards.
Fabrizio.
nunolau wrote:We would also like to have the complete specification of the Freestyle Evaluation Process.
We were surprised when we were informed that teams can actually have Achievements that are greater than one in a single metric, if they surpass the Target. In fact, we think this is a bad idea.
Now, it is not clear if the evaluation based on the quantifiable objectives/metrics will be saturated at 75 points or not.
Can theoretically a team achieve the 75 points from a single metric by obtaining an Achievement significantly higher than its Target? We were obliged to define at least 3 metrics, hence we find it strange that teams can get the maximum evaluation from a single metric, unless the maximum is unbounded (which is also strange, in our opinion).
The splitting of the 75 difficulty points is, in any case, a very difficult task, but we feel it is much more difficult if the Achievements can be greater than one.
We would also like to know how is the coolness factor evaluation merged in the final result and if there are any additional evaluation factors (interview?, others?).
Could you provide a specific example on how the evaluation will be performed when a team obtains Achievements with a classification higher than 1?
Best,
TIMAIRIS