Libraries and Learning Links of the Week

We're done here

July 2, 2024

I started this weekly newsletter/blog/ActivityPub thingy for a few reasons.

My earlier attempt, Marginalia, had become shouty and attention-seeking. I wanted something more focussed: mostly links, minimal commentary. I also liked the idea of “forcing” myself to read at last a few work-related articles, posts or links each week, in order to have something to share.

Unfortunately for the last several months this has started to come unstuck. I grew resentful of the thing I myself had created. I would start “cheating”, rifling through my Zotero library for the shortest thing I could read that was not completely unhinged. I wasn't proud of it, and begun to doubt it has much value at all. It was making me stressed, and frankly creating a lot of negative feelings that weren't doing anything good.

So this is my last post here.

Of course, from time to time I will still want to share links to things I find interesting or important, perhaps with some commentary. If you think you'd be interested in a slightly wider range of things with likely lower but sporadic volume, you can subscribe to this via RSS or ActivityPub/Mastodon. Maybe I'll start blogging more again. We'll see.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Seeing like a mendacious datafied corporation

June 23, 2024

Stepping Down as Co-Chair of the National Information Standards Controlled Digital Lending Working Group

Jennie Rose Halperin on the bad faith and dishonesty of corporate publishers who deliberately wasted everyone's time in NISO's multi-year attempt to reach a consensus on controlled digital lending as a way everyone in the USA could comply with their copyright act.

One index, two publishers and the global research economy

An interesting historical overview of scholarly publishing and metrics.

It is little wonder that publishers seek to turn the blame on ‘predatory publishers’ or amplify media caricatures of Chinese ‘paper mills’. Fraud and malpractice get portrayed as an existential threat to the integrity of science, rather than the inevitable consequence of unequal resourcing and distorted reward structures.

Seeing Like a Data Structure

We’re now delegating the process of creating legibility to technology. Along the way, we’ve made it approximate: legible to someone or something else but not to the person who actually is in charge.

I'm not entirely sure about the conclusion, but this is a nice summary of the present way of being in the industrial world, and to some extent how we go here. It doesn't hurt that it uses one of my favourite books, Seeing like a state, as the framework for thinking.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

This Māori data centre won't be storing any fake journals

June 17, 2024

World-first data storage infrastructure solution built by Iwi Māori, for Iwi Māori

This rules.

Becoming a SoTL Scholar

The scholarship of teaching and learning (SoTL), a multidisciplinary field that focuses on systematic investigation in teaching and learning, is now over 30 years old. No longer just a grassroots movement of individual faculty committed to taking teaching and learning seriously, SoTL has become professionalized. Becoming a SoTL Scholar maps out what it looks like to be a SoTL scholar and how to get there by design.

How a widely used ranking system ended up with three fake journals in its top 10 philosophy list

I'm trying to be outraged by it, but universities bring this upon themselves:

Recently our philosophy faculty at Jagiellonian University in Kraków, like many institutions around the world, introduced a ranking of journals based on Elsevier’s Scopus database to evaluate the research output of its employees for awards and promotions. This database is also used by our institution in the hiring process.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Taxonomy and coding

June 10, 2024

On learning to code without mathematics

Jez Cope on the weird obsession many computer programming tutorials have with mathematics, and how it inhibits broader comfort and take-up of coding knowledge.

Program—PyCon AU 2024

Speaking of coding...

This year, PyCon AU is partnering with the Journal of Open Source Software (JOSS) to publish academic papers in association with PyCon AU submissions. Nice!

Defining Taxonomy Borders

Interesting piece on when to stop when creating a taxonomy from scratch.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

(RAG and Boolean)

May 27, 2024

Retrieval Augmented Generation and academic search engines – some suggestions for system builders

Aaron Tay persists with his awful Blogger theme, but luckily his detailed expert posts are worth it. This one is takes us through the ins and outs of scholarly search tools based on Retrieval Augmented Generation (RAG). This is the thing that drives the common “summary with sources” approach, which is sometimes claimed to be “free of hallucinations”. Tay explains why this isn't really true, and provides some interesting suggestions for how tools designed for academic researchers could be better designed.

Boolean is Dead AND I feel fine

Aaron Tay's post pairs well with this one from Mita Williams back in February. Here we get a crash course in the basics of LLM tokenisation, and an explanation of what Tay meant when he said boolean searching doesn't really work any more with AI-driven semantic search tools. Also, apparently DIALOG still exists??!!

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Infrastructure – decentralised, centralised and recentralised

May 13, 2024

2024 Library Systems Report | American Libraries Magazine

Marshall Breeding's annual report is out for this year. Basically a sober analysis, however Breeding's claim that there is robust competition at all levels of the library tech industry seems a little strong to me. Whilst it looks like perhaps there are multiple products in the LSP space, for example, OCLC is so far behind in the index space that there are really only two options for the knowledge graph sitting underneath these systems: Clarivate's Central Discovery Index, and the EBSCO Discovery Service. The impact of these huge indexes on library staffing and workflows is still playing out, but in public libraries it's already well established, and vendors are cashing in:

Last year, SirsiDynix introduced Outsourced System Administration Services, a new support model for select operations related to ILS administration. This premium service enables library workers to offload many routine tasks, such as loading bibliographic records, producing reports, and updating loan rules.

Watch for more deskilling I mean “offloading routine tasks”.

Decentralized Infrastructure for (Neuro)science

Jonny Saunders wrote and published this enormous treatise back in 2022 but I've only just managed to read it. It's pretty dense in parts, but if you're interested in some deep thinking about scholarly communication, it's definitely worth investing the time.

Only two links today because I'm sick, and it will take you a long time to read Jonny's piece.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

On the distribution of word forms

May 6, 2024

Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web?

Emily Bender's latest article with Chirag Shah is an interesting read for anyone interested in what librarians would call discovery tools, and what place generative AI might (but mostly doesn't) have in them. For those who insist that LLMs are generating “answers” based on “understanding” questions, Linguistics professor Bender has this to say:

Information is not knowledge, and that is even more true when the information is only information about the distribution of word forms.

As to the impact of “now with AI” web search:

An ecosystem is a collection of interdependent entities standing in relationship to each other. On the Web, one key type of relationship is that between information providers and information accessors. In this relationship, information accessors desire to find information sources they can trust; information providers desire to show themselves to be trustworthy. Synthetic media break these relationships of trust and trustworthiness, making it harder for people seeking to access information to find sources that are trustworthy—and eventually to be able to trust them even if they have found them.

Reimagining Cultural Heritage Data

This is a really interesting blog post from the Llyfrgell Genedlaethol Cymru (National Library of Wales) about using linked data for bilingual name authorities, via controlled synchronisation with Wikidata. It sounds cool, though the Shah/Bender paper above did remind me of the famous controversy when it was discovered that most of the articles in the Scots language wikipedia were authored by an American teenger who didn't speak Scots. Small language communities need to be ever vigilant.

No one buys books

This is a fascinating article about what was learned about the mainstream commercial book industry from the Penguin Random House attempt to but Simon & Schuster, and the subsequent US antitrust case. It seems that what many of us have suspected for a while is even worse: Not only do most authors never sell enough copies to earn anything beyond their advance (if they were lucky to get one), but even the big celebrities commanding bidding wars often don't pay them out. It's worth a read, even if it is somewhat depressing.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Safelinks are looting the small web

April 28, 2024

Kagi small web

I recently heard some unfortunate things about Kagi's lead developer, but you can use Kagi small web without contributing any funds to Kagi. You can check out their Small Web site with appropriately-old-timey-web-vibes and it will open a random “small web” site in an iframe immediately. You can scroll through random pages like this, or subscribe to the RSS feed, or access them in other ways.

While there is no single definition, “small web” typically refers to the non-commercial part of the web, crafted by individuals to express themselves or share knowledge without seeking any financial gain. This concept often evokes nostalgia for the early, less commercialized days of the web, before the ad-supported business model took over the internet (and we started fighting back!)

Kagi Small Web offers a fresh approach by promoting recently published content from the “small web.” We gather new content, published within the last week, from a handpicked list of blogs and surface it in multiple ways.

Safelinks are a fragile foundation for publishing

If you're reading this you are probably a librarian or tech person, so this blog post won't be news to you. And if you work in a large organisation you probably don't have any choice in how the email system is set up. But...

Here's my prediction. In the next five or so years, Microsoft is going to accidentally shut off *.safelinks.protection.outlook.com and a million copy-and-pasted links across the web are going to break.

The tl;dr on this is – if you're pasting a link from an email into something else, it's best to open the URL in a browser first, let it do all its redirects, (and remove all the tracking gunk), and then paste it into your document. Future readers will thank you.

As an aside, this also made me think about the DOI system and how if doi.org gets DDoS'd or fails for some other reason, most recent academic research will become a lot harder to locate.

They're looting the Internet

Ed Zitron sent this zinger out into the world a couple of weeks ago, and his follow up/companion piece (The man who killed Google Search) is decidedly more brutal. Not much in Zitron's post is new as such, but he lays out clearly what has happened to the “online experience” over the last 15 years and, to some extent, why.

We negotiate with Instagram or Facebook to see content from the people we chose to follow, because these platforms are no longer built to show us things that we want to see. We no longer “search” Google, but barter with a seedy search box to try and coax out a result that isn't either a search engine-optimized half-answer or an attempt to trick us into clicking an ad.

This is essentially the same thesis as Cory Doctorow's awkwardly-named “enshittification”, though I can't help but think both descriptions reveal some level of prior naivety:

The tradeoff was meant to be that these platforms would make creating and hosting this content easier, and help either surface it to a wider audience or to quickly get it to the people we cared about , all while making sure the conditions we created and posted it under were both interesting and safe for the user.

I mean sure, that was always the sales pitch from these companies. But there were people right at the very beginning who warned about the true nature of corporations and capitalism. It's certainly right and proper to point out that they are lying, making human societies demonstrably worse, and can't be trusted. But let's not buy in to the idea that this is a new development.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Intelligence, attention, and emotions on the Internet

April 22, 2024

Human intelligence: another abominable idea from the AI industry

Helen Beetham has been on fire lately, and this piece is particularly sharp. Beetham writes about how the “AI” industry has tried to redefine “human intelligence” in contrast to, or to justify, its idea of “machine intelligence”:

In the guise of starting from something self-evident, the project of ‘artificial intelligence’ in fact serves to define what ‘intelligence’ is and how to value it, and therefore how diverse people should be valued too. Educators have good reason to be wary of the concept of ‘intelligence’ at all, but particularly as a single scale of potential that people have in measurable degrees.

Beetham goes on to discuss the discourse we've all seen ramp up in the last year or two, where helpful “AI” will make us “more productive” and do our work for us while we supervise. She sees this for exactly what it is:

What these self-serving comparisons produce is a double bind for students and intellectual workers. Submit yourself to the pace, the productivity, the datafied routines of algorithmic systems, and at the same time ‘be more human’. Be more human, so that you can add value to the algorithms. Be more human , so more of your behaviour can be modelled and used to undercut your value. Be more human so that when AI fails to meet human needs, the work required to ‘fix’ it is clearly specified and can cheaply be fulfilled.

We've been here before. Fool me twice, shame on me.

Attention, moral skill, and algorithmic recommendation

This is a pretty interesting paper by two authors from ANU, in Philosophical Studies. They make the case for attention as a “moral skill”, and argue that how we pay attention is as important as whether, or on what, we do so.

Online platforms can direct us toward things we should not attend to just as easily as toward things we should. And even when they direct our attention to the right things, they may not do so in the right ways, to the right degrees, or for the right reasons.

I find their argument compelling and it seems to open up a lot of further intersting questions to explore. However their conclusion was somewhat surprising and, to be honest, baffling. If AI-driven recommender systems are bad for our attention and moral health, according to these authors the solution is...more AI recommender systems, but build on generative AI running on your operating system. Fair to say it's not the conclusion I would have come to.

We Need To Rewild The Internet

On a somewhat similar theme, Maria Farrell and Robin Berjon explore rewilding as a both a metaphor and a somewhat literal suggestion for repairing the dystopian disappointment that is the Internet – and more specifically the World Wide Web – in 2024.

I admit I was hooked with the early reference to James Scott's Seeing like a State, one of the books that has most profoundly influenced how I think about the world, but just as I was intrigued by the idea of attention as a moral question, I wanted to know more about the Internet Oligarchy problem as an emotional one:

Rewilding the internet is more than a metaphor. It’s a framework and plan. It gives us fresh eyes for the wicked problem of extraction and control, and new means and allies to fix it. It recognizes that ending internet monopolies isn’t just an intellectual problem. It’s an emotional one. It answers questions like: How do we keep going when the monopolies have more money and power? How do we act collectively when they suborn our community spaces, funding and networks? And how do we communicate to our allies what fixing it will look and feel like?

What Farrell and Berjon are suggesting in this piece is some combination of legalist liberal-democratic power through enforcement of anti-monopoly laws, Lenin's concept of “dual power”, and anarchistic “building the new world in the shell of the old”. Not that they'd likely put it like that.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Unfortunately this is not the blurb for a banned dystopic science fiction novel

April 15, 2024

Librarians fear new penalties, even prison, as activists challenge books

It's not just Ron DeSantis' “War on Woke” in Florida that is interfering in core librarian work in the United States. Librarians in the state of Missouri are also working in a climate of fear after their far-right government passed vague laws that are ultimately aimed at criminalising queerness, and criminalise basic library work as collateral damage.

Putting “Nothing About Us Without Us” into Practice in Small and Rural Libraries

Something more positive, this time from rural North America. A few case studies in the use of funding to improve accessibility in libraries, with examples of how libraries changed their original plans after consulting with community members with disabilities.

Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work

This is a pretty interesting preprint from December, looking at some of the more subtle problems with using large language models in the production of academic papers.

Due to the pressure put on scholars to publish, they may increasingly rely on automative solutions that are capable of producing fast yet potentially problematic papers

This is indeed a problem. I have thoughts on where those pressures come from but that's for another time.

We find 104 scholarly works using our criteria and query (Author: “ChatGPT” OR “GPT-2” OR “GPT-3” OR “Generative AI” OR “Generative Pretrained Transformer” OR “OpenAI”).

Nice, simple. But also, “some people say”...

We note that remnants of AI prompts left in manuscripts due to careless errors indicate that our systematic review leaves out articles that used a generative AI tool, but neglected to cite it as an author. We verify this claim via Google Scholar’s search engine, using the query as shown in Figure 2. We suspect that far more authors have used ChatGPT similarly, but will have edited out such apparent markers of direct AI use—which some may deem plagiarism, depending on journal policy.

A key point the authors make is that the norms of academia expect that any automation or computer model used in the research or the production of a paper should be openly available so that it can be verified and any suspected errors can be traced back to their source. But commercial LLMs are mostly black boxes:

if an AI-produced work is made public, yet the model used to generate that work is not, how do we judge the cause of the bias—is it attributable to the model or the user who provided a prompt to it?

Something I hadn't really considered before that is explored in the paper is the concept of “inadvertent red teaming”, which refers to what happens when an LLM is given a prompt without enough context to produce a coherent answer. Even with a good prompt, LLMs are prone to just make stuff up (or “hallucinate”, as AI boosters like to sanitise it as). With a prompt that provides inadequate context, they can easily go off the rails, however, and if researchers don't vet the output properly this can lead to highly problematic statements in papers.

A more subtle problem they identified is that if an LLM is given context about the prompter – even if this is inadvertent such as their name and position title in an ingested draft paper – it can bias the output in some obvious and some subtle ways.

If you're interested in this sort of stuff, you can also read this paper desperately trying to make it ok to make up your research data. Seems to fly in the face of the whole foundation principles of science to me, but what would I know?

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.