A weird phrase is plaguing scientific papers – and we traced it back to a glitch in AI training data

Fallstar@mander.xyz · 10 hours ago

A weird phrase is plaguing scientific papers – and we traced it back to a glitch in AI training data

crystalmerchant@lemmy.world · 9 hours ago

The phrase is “vegetative electron microscopy”

catloaf@lemm.ee · 8 hours ago

And it looks more like a machine translation error than anything else. Per the article, there was a dataset with two instances of the phrase being created from bad OCR. Then, more recently, somehow the bad phrase got associated with a typo: in Farsi, the words “scanning” and “vegetative” are extremely similar. Thus, when some Iranian authors wanted to translate their paper to English, they used an LLM, and it decided that since “vegetative electron microscope” was apparently a valid term (since it was included in its training data), that’s what they meant.

It’s not that the entire papers were being invented from nothing by Chatgpt.

criitz@reddthat.com · 4 hours ago

It’s been found in many papers though. Do they all have such excuses?

BussyCat@lemmy.world · 3 hours ago

It probably is decently common to translate articles using ChatGPT as it is a large language model so that does seem likely

Telorand@reddthat.com · 9 hours ago

The lede is buried deep in this one. Yeah, these dumb LLMs got bad training data that persists to this day, but more concerning is the fact that some scientists are relying upon LLMs to write their papers. This is literally the way scientists communicate their findings to other scientists, lawmakers, and the public, and they’re using fucking predictive text like it has cognition and knows anything.

Sure, most (all?) of those papers got retracted, but those are just the ones that got caught. How many more are lurking out there with garbage claims fabricated by a chatbot?

Thankfully, science will inevitably sus those papers out eventually, as it always does, but it’s shameful that any scientist would be so fatuous to put out a paper written by a dumb bot. You’re the experts. Write your own goddamn papers.

BussyCat@lemmy.world · 3 hours ago

They were translating them not actually writing them like obviously it should have been caught by reviewers but that’s not nearly as bad

Em Adespoton@lemmy.ca · 8 hours ago

In some cases, it’s people who’ve done the research and written the paper who then use an LLM to give it a final polish. Often, it’s people who are writing in a non-native language.

Doesn’t make it good or right, but adds some context.

Telorand@reddthat.com · 7 hours ago

Sure, and I’m sympathetic to the baffling difficulties of English, but use Google Translate and ask someone who’s more fluent for help with the final polish (as a single suggestion). Trusting your work, trusting science to an LLM is lunacy.

Squirrelsdrivemenuts@lemmy.world · 4 hours ago

It might be hard for them to find someone who is both fluent in english AND knows the field well enough to know vegetative electron microscopy is not a thing. Most universities have one general translation help service and science has a lot of field-specific weird terms.

Saleh@feddit.org · edit-2 4 hours ago

Google translate is using the same approach like an LLM.

https://en.wikipedia.org/wiki/Google_Translate
https://en.wikipedia.org/wiki/Neural_machine_translation

So is DeepL

https://en.wikipedia.org/wiki/DeepL_Translator

And before they were using neural network approaches they used statistical approaches, which are subject to the same errors as a result of bad training data.

Dave.@aussie.zone · 5 hours ago

Thankfully, science will inevitably sus those papers out eventually, as it always does,

In the future, all search engines will have an option to ignore any results from 2022-20xx, the era of AI slop.

unexposedhazard@discuss.tchncs.de · 6 hours ago

Its the immediate takeaway i made from the headline, so i dont feel like its buried deep

TachyonTele@lemm.ee · 9 hours ago

Don’t use fucking AI to write scientific papers and the problem is solved. Wtf.

HailSeitan@lemmy.world · 9 hours ago

Let’s delve into the issue

Archangel@lemm.ee · 8 hours ago

So, all those research papers were written by AI? Huh.

angrystego@lemmy.world · 5 hours ago

No, they were not. AI was probably used for translation.