Google is accused of using novices to fact-check Gemini's AI answers

There’s no denying that the AI still has some unreliable moments, but one would hope that at least its ratings are accurate. However, last week Google reportedly told contract workers evaluating Gemini not to skip any prompts, regardless of their expertise. TechCrunch Reports based on internal guidelines it has seen. Google released a preview of Gemini 2.0 earlier this month.

Google has reportedly told GlobalLogic, an outsourcing company whose contractors evaluate AI-generated results, not to let reviewers skip prompts that are not beyond their expertise. Previously, contractors could skip any question that was far outside their expertise – such as asking a doctor about the law. The guidelines stated: “If you do not have critical subject knowledge (e.g., programming, math) to evaluate this prompt, please skip this assignment.”

Now contractors have reportedly been instructed: “You should not skip prompts that require specific domain knowledge” and that they should “evaluate the parts of the prompt that you understand,” while adding a note that it is not a domain in which they have knowledge. The only cases where contracts can now be skipped are if a large portion of the information is missing or if they contain harmful content that requires special consent forms to evaluate.

One contractor aptly responded to the changes: “I thought the point of skipping was to increase accuracy by leaving it to someone better?”

Shortly after this article was first published, Google Engadget provided the following statement: “Raters perform a variety of tasks across many different Google products and platforms. They provide valuable feedback not only on the content of the answers, but also on the style, format, and other factors. The ratings they provide do not directly influence our algorithms, but taken as a whole they are a helpful data point that helps us measure how well our systems are performing.”

A Google spokesperson also noted that the new language shouldn’t necessarily result in changes to Gemini’s accuracy, as raters are asked to specifically rate the parts of the prompts that they understand. This could involve providing feedback on things like formatting issues, even if the reviewer doesn’t have specific expertise on the topic. The company also pointed to this week’s release of the FACTS Grounding benchmark, which can be used to check LLM responses to ensure “that they are not only factually correct with respect to given inputs, but also sufficiently detailed to provide satisfactory answers.” to respond to user requests”.

Update, December 19, 2024, 11:23 a.m. ET: This story has been updated with a statement from Google and more details about how its review system works.

Related Posts

A strong surf advisory is in effect for the Clatsop County and Tillamook County coasts through Saturday evening

Drivers will have to deal with sub-freezing temperatures and icy roads while commuting on Tuesday morning

Martínez holds off Juventus before Rogers parries Aston Villa’s late winner | Champions League

Leave a Reply Cancel reply