Video

Don’t Ask AI Which Rocks You Can Lick

And other performance issues when it comes to language models and geology

By Julia Angwin

Sep 27, 2024

“What rocks can I lick?” is apparently a question geologists like Cate Larsen often get.

Her answer? Probably don’t lick any rocks, or minerals for that matter. “The only mineral you can and definitely should lick is halite, rock salt,” she said.

AI models, however, have their own strange ideas about when to put tongue to stone.

Larsen, also known as TikTok's Groovy Geologist, used Proof’s AI testing software to run that question, and other mineral-related inquiries, through five leading AI models.

The project is part of an initiative at Proof, where we are making our investigative tools available to content creators to use for their own research in their own domains of expertise. Creators have used our tool to test AI responses to questions about Black History, software coding, and voting access.

Ingredients

Hypothesis

AI models will not provide updated, accurate information to questions commonly posed to geologists.

Sample size

A dozen frequently asked questions, including some pertaining to common misconceptions and geological conspiracy theories.

Techniques

Use Proof’s software to run queries on 5 leading AI models and check responses for accuracy, missing information, and relevance.

Key findings

Gemini consistently gave the most accurate, nuanced, and elaborate responses, while GPT-4’s brevity made it less useful to a person looking to understand scientific concepts.

Limitations

A small sample that is not necessarily representative of questions that geologists receive. It’s difficult to evaluate the helpfulness of a response.

Larsen found a wide variety of accuracy in the responses from the AI models. Even though Google made headlines when its AI tools wrongly suggested that people should eat a few rocks per day, Larsen said she found Google’s Gemini provided the most accurate responses in her tests.

For instance, when it came to rock licking, Gemini, Mistral’s Mixtral, and Anthropic’s Claude 3, generally recommended avoiding it, offering a smattering of safety issues like “sharp edges” and “bacterial contamination” as deterrents.

OpenAI’s GPT-4, meanwhile, recommended cleaning rocks before tasting. And Meta’s Llama 3 listed several “safe to lick” options, including quartz and calcite, though strongly recommended against licking mercury, arsenic, or uranium-rich rocks. OpenAI and Meta did not respond to Proof’s request for comment.

Larsen found similarly fuzzy quality when posing a variety of frequently-asked geological questions to the models. She sat down with Proof to discuss her findings for the latest video in our Ingredients series.