Quantcast

South SFV Today

Sunday, December 22, 2024

AI's impact on survey integrity raises concerns among researchers

Webp lw9kgvt1d34kt9bktq07xn0ak8x0

John Taylor, Professor of Economics at Stanford University and developer of the "Taylor Rule" for setting interest rates | Stanford University

John Taylor, Professor of Economics at Stanford University and developer of the "Taylor Rule" for setting interest rates | Stanford University

Academics and researchers often use crowdsourcing platforms like Prolific or Amazon Mechanical Turk to recruit participants for large-scale surveys. These platforms offer monetary compensation or gift cards in exchange for demographic information and opinions. Prolific claims about 200,000 active users who have been vetted to ensure authenticity.

Despite this vetting process, there are indications that some participants may be using AI tools to complete survey questions. Janet Xu, an assistant professor at Stanford Graduate School of Business, noticed that certain responses appeared unusually polished and lacked the typical human snarkiness. This observation led her to investigate further with colleagues Simone Zhang from New York University and AJ Alvero from Cornell University.

Their study revealed that nearly one-third of Prolific users admitted to using large language models (LLMs) like ChatGPT for some survey tasks. The research involved around 800 participants who had previously taken surveys on Prolific. While two-thirds claimed never to have used LLMs for open-ended questions, a quarter acknowledged occasional use of AI assistants, primarily for help in expressing thoughts.

Concerns about authenticity were common among those who refrained from using AI tools. "So many of their answers had this moral inflection where it seems like [using AI] would be doing the research a disservice; it would be cheating," Xu noted.

The study also found demographic patterns in AI usage: newer users or those identifying as male, Black, Republican, or college-educated were more likely to report using AI writing assistance. Xu highlighted these findings as preliminary but significant due to potential biases they could introduce into public opinion data.

To understand differences between human-crafted and AI-generated responses, the authors analyzed data from studies conducted before ChatGPT's release in November 2022. Human responses typically contained more emotionally charged language compared to the neutral tone of LLMs.

Xu emphasized that while AI-generated responses might already exist in published studies, she does not believe they necessitate corrections or retractions yet. Instead, she suggests increased scrutiny on data quality by scholars and editors is warranted.

"We don’t want to make the case that AI usage is unilaterally bad or wrong," Xu said. She distinguished between scenarios where AI aids expression versus generating generic ideas—highlighting concerns over potential homogenization of human responses if overused.

Beyond academia, reliance on AI could skew perceptions in workplace diversity surveys by masking genuine issues with overly positive feedback.

The authors suggest discouraging LLM use through direct requests or technological measures like blocking text copying and pasting. They also advocate designing clearer survey questions as confusion can lead participants toward seeking external help like ChatGPT.

"A lot of the same general principles of good survey design still apply," Xu concluded, emphasizing their heightened importance today.

ORGANIZATIONS IN THIS STORY

!RECEIVE ALERTS

The next time we write about any of these orgs, we’ll email you a link to the story. You may edit your settings or unsubscribe at any time.
Sign-up

DONATE

Help support the Metric Media Foundation's mission to restore community based news.
Donate

MORE NEWS