Unfortunately this is not the blurb for a banned dystopic science fiction novel

April 15, 2024

Librarians fear new penalties, even prison, as activists challenge books

It's not just Ron DeSantis' “War on Woke” in Florida that is interfering in core librarian work in the United States. Librarians in the state of Missouri are also working in a climate of fear after their far-right government passed vague laws that are ultimately aimed at criminalising queerness, and criminalise basic library work as collateral damage.

Putting “Nothing About Us Without Us” into Practice in Small and Rural Libraries

Something more positive, this time from rural North America. A few case studies in the use of funding to improve accessibility in libraries, with examples of how libraries changed their original plans after consulting with community members with disabilities.

Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work

This is a pretty interesting preprint from December, looking at some of the more subtle problems with using large language models in the production of academic papers.

Due to the pressure put on scholars to publish, they may increasingly rely on automative solutions that are capable of producing fast yet potentially problematic papers

This is indeed a problem. I have thoughts on where those pressures come from but that's for another time.

We find 104 scholarly works using our criteria and query (Author: “ChatGPT” OR “GPT-2” OR “GPT-3” OR “Generative AI” OR “Generative Pretrained Transformer” OR “OpenAI”).

Nice, simple. But also, “some people say”...

We note that remnants of AI prompts left in manuscripts due to careless errors indicate that our systematic review leaves out articles that used a generative AI tool, but neglected to cite it as an author. We verify this claim via Google Scholar’s search engine, using the query as shown in Figure 2. We suspect that far more authors have used ChatGPT similarly, but will have edited out such apparent markers of direct AI use—which some may deem plagiarism, depending on journal policy.

A key point the authors make is that the norms of academia expect that any automation or computer model used in the research or the production of a paper should be openly available so that it can be verified and any suspected errors can be traced back to their source. But commercial LLMs are mostly black boxes:

if an AI-produced work is made public, yet the model used to generate that work is not, how do we judge the cause of the bias—is it attributable to the model or the user who provided a prompt to it?

Something I hadn't really considered before that is explored in the paper is the concept of “inadvertent red teaming”, which refers to what happens when an LLM is given a prompt without enough context to produce a coherent answer. Even with a good prompt, LLMs are prone to just make stuff up (or “hallucinate”, as AI boosters like to sanitise it as). With a prompt that provides inadequate context, they can easily go off the rails, however, and if researchers don't vet the output properly this can lead to highly problematic statements in papers.

A more subtle problem they identified is that if an LLM is given context about the prompter – even if this is inadvertent such as their name and position title in an ingested draft paper – it can bias the output in some obvious and some subtle ways.

If you're interested in this sort of stuff, you can also read this paper desperately trying to make it ok to make up your research data. Seems to fly in the face of the whole foundation principles of science to me, but what would I know?

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.