Libraries and Learning Links of the Week

Infrastructure – decentralised, centralised and recentralised

May 13, 2024

2024 Library Systems Report | American Libraries Magazine

Marshall Breeding's annual report is out for this year. Basically a sober analysis, however Breeding's claim that there is robust competition at all levels of the library tech industry seems a little strong to me. Whilst it looks like perhaps there are multiple products in the LSP space, for example, OCLC is so far behind in the index space that there are really only two options for the knowledge graph sitting underneath these systems: Clarivate's Central Discovery Index, and the EBSCO Discovery Service. The impact of these huge indexes on library staffing and workflows is still playing out, but in public libraries it's already well established, and vendors are cashing in:

Last year, SirsiDynix introduced Outsourced System Administration Services, a new support model for select operations related to ILS administration. This premium service enables library workers to offload many routine tasks, such as loading bibliographic records, producing reports, and updating loan rules.

Watch for more deskilling I mean “offloading routine tasks”.

Decentralized Infrastructure for (Neuro)science

Jonny Saunders wrote and published this enormous treatise back in 2022 but I've only just managed to read it. It's pretty dense in parts, but if you're interested in some deep thinking about scholarly communication, it's definitely worth investing the time.

Only two links today because I'm sick, and it will take you a long time to read Jonny's piece.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

On the distribution of word forms

May 6, 2024

Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web?

Emily Bender's latest article with Chirag Shah is an interesting read for anyone interested in what librarians would call discovery tools, and what place generative AI might (but mostly doesn't) have in them. For those who insist that LLMs are generating “answers” based on “understanding” questions, Linguistics professor Bender has this to say:

Information is not knowledge, and that is even more true when the information is only information about the distribution of word forms.

As to the impact of “now with AI” web search:

An ecosystem is a collection of interdependent entities standing in relationship to each other. On the Web, one key type of relationship is that between information providers and information accessors. In this relationship, information accessors desire to find information sources they can trust; information providers desire to show themselves to be trustworthy. Synthetic media break these relationships of trust and trustworthiness, making it harder for people seeking to access information to find sources that are trustworthy—and eventually to be able to trust them even if they have found them.

Reimagining Cultural Heritage Data

This is a really interesting blog post from the Llyfrgell Genedlaethol Cymru (National Library of Wales) about using linked data for bilingual name authorities, via controlled synchronisation with Wikidata. It sounds cool, though the Shah/Bender paper above did remind me of the famous controversy when it was discovered that most of the articles in the Scots language wikipedia were authored by an American teenger who didn't speak Scots. Small language communities need to be ever vigilant.

No one buys books

This is a fascinating article about what was learned about the mainstream commercial book industry from the Penguin Random House attempt to but Simon & Schuster, and the subsequent US antitrust case. It seems that what many of us have suspected for a while is even worse: Not only do most authors never sell enough copies to earn anything beyond their advance (if they were lucky to get one), but even the big celebrities commanding bidding wars often don't pay them out. It's worth a read, even if it is somewhat depressing.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Safelinks are looting the small web

April 28, 2024

Kagi small web

I recently heard some unfortunate things about Kagi's lead developer, but you can use Kagi small web without contributing any funds to Kagi. You can check out their Small Web site with appropriately-old-timey-web-vibes and it will open a random “small web” site in an iframe immediately. You can scroll through random pages like this, or subscribe to the RSS feed, or access them in other ways.

While there is no single definition, “small web” typically refers to the non-commercial part of the web, crafted by individuals to express themselves or share knowledge without seeking any financial gain. This concept often evokes nostalgia for the early, less commercialized days of the web, before the ad-supported business model took over the internet (and we started fighting back!)

Kagi Small Web offers a fresh approach by promoting recently published content from the “small web.” We gather new content, published within the last week, from a handpicked list of blogs and surface it in multiple ways.

Safelinks are a fragile foundation for publishing

If you're reading this you are probably a librarian or tech person, so this blog post won't be news to you. And if you work in a large organisation you probably don't have any choice in how the email system is set up. But...

Here's my prediction. In the next five or so years, Microsoft is going to accidentally shut off *.safelinks.protection.outlook.com and a million copy-and-pasted links across the web are going to break.

The tl;dr on this is – if you're pasting a link from an email into something else, it's best to open the URL in a browser first, let it do all its redirects, (and remove all the tracking gunk), and then paste it into your document. Future readers will thank you.

As an aside, this also made me think about the DOI system and how if doi.org gets DDoS'd or fails for some other reason, most recent academic research will become a lot harder to locate.

They're looting the Internet

Ed Zitron sent this zinger out into the world a couple of weeks ago, and his follow up/companion piece (The man who killed Google Search) is decidedly more brutal. Not much in Zitron's post is new as such, but he lays out clearly what has happened to the “online experience” over the last 15 years and, to some extent, why.

We negotiate with Instagram or Facebook to see content from the people we chose to follow, because these platforms are no longer built to show us things that we want to see. We no longer “search” Google, but barter with a seedy search box to try and coax out a result that isn't either a search engine-optimized half-answer or an attempt to trick us into clicking an ad.

This is essentially the same thesis as Cory Doctorow's awkwardly-named “enshittification”, though I can't help but think both descriptions reveal some level of prior naivety:

The tradeoff was meant to be that these platforms would make creating and hosting this content easier, and help either surface it to a wider audience or to quickly get it to the people we cared about , all while making sure the conditions we created and posted it under were both interesting and safe for the user.

I mean sure, that was always the sales pitch from these companies. But there were people right at the very beginning who warned about the true nature of corporations and capitalism. It's certainly right and proper to point out that they are lying, making human societies demonstrably worse, and can't be trusted. But let's not buy in to the idea that this is a new development.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Intelligence, attention, and emotions on the Internet

April 22, 2024

Human intelligence: another abominable idea from the AI industry

Helen Beetham has been on fire lately, and this piece is particularly sharp. Beetham writes about how the “AI” industry has tried to redefine “human intelligence” in contrast to, or to justify, its idea of “machine intelligence”:

In the guise of starting from something self-evident, the project of ‘artificial intelligence’ in fact serves to define what ‘intelligence’ is and how to value it, and therefore how diverse people should be valued too. Educators have good reason to be wary of the concept of ‘intelligence’ at all, but particularly as a single scale of potential that people have in measurable degrees.

Beetham goes on to discuss the discourse we've all seen ramp up in the last year or two, where helpful “AI” will make us “more productive” and do our work for us while we supervise. She sees this for exactly what it is:

What these self-serving comparisons produce is a double bind for students and intellectual workers. Submit yourself to the pace, the productivity, the datafied routines of algorithmic systems, and at the same time ‘be more human’. Be more human, so that you can add value to the algorithms. Be more human , so more of your behaviour can be modelled and used to undercut your value. Be more human so that when AI fails to meet human needs, the work required to ‘fix’ it is clearly specified and can cheaply be fulfilled.

We've been here before. Fool me twice, shame on me.

Attention, moral skill, and algorithmic recommendation

This is a pretty interesting paper by two authors from ANU, in Philosophical Studies. They make the case for attention as a “moral skill”, and argue that how we pay attention is as important as whether, or on what, we do so.

Online platforms can direct us toward things we should not attend to just as easily as toward things we should. And even when they direct our attention to the right things, they may not do so in the right ways, to the right degrees, or for the right reasons.

I find their argument compelling and it seems to open up a lot of further intersting questions to explore. However their conclusion was somewhat surprising and, to be honest, baffling. If AI-driven recommender systems are bad for our attention and moral health, according to these authors the solution is...more AI recommender systems, but build on generative AI running on your operating system. Fair to say it's not the conclusion I would have come to.

We Need To Rewild The Internet

On a somewhat similar theme, Maria Farrell and Robin Berjon explore rewilding as a both a metaphor and a somewhat literal suggestion for repairing the dystopian disappointment that is the Internet – and more specifically the World Wide Web – in 2024.

I admit I was hooked with the early reference to James Scott's Seeing like a State, one of the books that has most profoundly influenced how I think about the world, but just as I was intrigued by the idea of attention as a moral question, I wanted to know more about the Internet Oligarchy problem as an emotional one:

Rewilding the internet is more than a metaphor. It’s a framework and plan. It gives us fresh eyes for the wicked problem of extraction and control, and new means and allies to fix it. It recognizes that ending internet monopolies isn’t just an intellectual problem. It’s an emotional one. It answers questions like: How do we keep going when the monopolies have more money and power? How do we act collectively when they suborn our community spaces, funding and networks? And how do we communicate to our allies what fixing it will look and feel like?

What Farrell and Berjon are suggesting in this piece is some combination of legalist liberal-democratic power through enforcement of anti-monopoly laws, Lenin's concept of “dual power”, and anarchistic “building the new world in the shell of the old”. Not that they'd likely put it like that.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Unfortunately this is not the blurb for a banned dystopic science fiction novel

April 15, 2024

Librarians fear new penalties, even prison, as activists challenge books

It's not just Ron DeSantis' “War on Woke” in Florida that is interfering in core librarian work in the United States. Librarians in the state of Missouri are also working in a climate of fear after their far-right government passed vague laws that are ultimately aimed at criminalising queerness, and criminalise basic library work as collateral damage.

Putting “Nothing About Us Without Us” into Practice in Small and Rural Libraries

Something more positive, this time from rural North America. A few case studies in the use of funding to improve accessibility in libraries, with examples of how libraries changed their original plans after consulting with community members with disabilities.

Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work

This is a pretty interesting preprint from December, looking at some of the more subtle problems with using large language models in the production of academic papers.

Due to the pressure put on scholars to publish, they may increasingly rely on automative solutions that are capable of producing fast yet potentially problematic papers

This is indeed a problem. I have thoughts on where those pressures come from but that's for another time.

We find 104 scholarly works using our criteria and query (Author: “ChatGPT” OR “GPT-2” OR “GPT-3” OR “Generative AI” OR “Generative Pretrained Transformer” OR “OpenAI”).

Nice, simple. But also, “some people say”...

We note that remnants of AI prompts left in manuscripts due to careless errors indicate that our systematic review leaves out articles that used a generative AI tool, but neglected to cite it as an author. We verify this claim via Google Scholar’s search engine, using the query as shown in Figure 2. We suspect that far more authors have used ChatGPT similarly, but will have edited out such apparent markers of direct AI use—which some may deem plagiarism, depending on journal policy.

A key point the authors make is that the norms of academia expect that any automation or computer model used in the research or the production of a paper should be openly available so that it can be verified and any suspected errors can be traced back to their source. But commercial LLMs are mostly black boxes:

if an AI-produced work is made public, yet the model used to generate that work is not, how do we judge the cause of the bias—is it attributable to the model or the user who provided a prompt to it?

Something I hadn't really considered before that is explored in the paper is the concept of “inadvertent red teaming”, which refers to what happens when an LLM is given a prompt without enough context to produce a coherent answer. Even with a good prompt, LLMs are prone to just make stuff up (or “hallucinate”, as AI boosters like to sanitise it as). With a prompt that provides inadequate context, they can easily go off the rails, however, and if researchers don't vet the output properly this can lead to highly problematic statements in papers.

A more subtle problem they identified is that if an LLM is given context about the prompter – even if this is inadvertent such as their name and position title in an ingested draft paper – it can bias the output in some obvious and some subtle ways.

If you're interested in this sort of stuff, you can also read this paper desperately trying to make it ok to make up your research data. Seems to fly in the face of the whole foundation principles of science to me, but what would I know?

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

How much does it cost to run a conference forever?

April 8, 2024

Growing Up: A Maturity Model for Open Education

This model from Pressbooks is an interesting way to think about where your institution is at with open education efforts.

I found it particularly helpful as a reminder that whilst a lot of the ideas held within open education are very old, as a conceptual model unto itself it is still quite new. At my work we run one of the oldest open book publishing operations in Australian universities, yet it's only ten years old. No wonder we feel like we're still working out what it is we're even trying to do...

Choice overload: Finding the right tool for the job (conference)

A great piece from Sae Ra Germaine about choosing a platform for online or hybrid conferences. I've spoken with Sae Ra a lot about what matters in running conferences over the years, and I really like her focus on the three important things of people to speak, people to participate, and somewhere for them to come together. One might argue that this could be simplified to simply “a community that wants to meet, and a place and time for them to meet”. A lot of conferences have become huge behemoths with conference committees focussing on bells and whistles at great expense, but the basics are pretty simple.

Apple's Vision + The Cost of Forever

I share this not so much for the Apple part, but rather the “cost of forever”. Dan Cohen (Dean of the Northeastern University Library) shares some numbers and some provocations about how to spend a mythical USD$200 million. I guess all these figures are dependent on how we define “forever”.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

A mismatched group of PDFs I would not find particularly accessible

April 1, 2024

A Mismatched Group of Items That I Would Not Find Particularly Interesting: Challenges and Opportunities with Digital Exhibits and Collections Labels

This week's first article comes from Evidence based library and information practice and I admit I mostly chose to read this based on the amusing title. It turns out this is a great practical study of a problem that comes up a lot in my work – what to name menu items in a way that is neither too vague nor too specific and jargony. I'll be reading this one again to remind myself of the interesting combination of card sorts and surveys they used, and the simplicity of their study design (send out a link in regular newsletters that are already sent to the target audience).

Hidden Inequities of Access: Document Accessibility in an Aggregated Database

Next up, a horrifying article with an unfortunately vague title. By “horrifying” I mean the results, rather than the paper itself. The study is limited and uses random sampling, but it seems very likely that it would be simple to replicate the findings in other corpora.

This is a study of the accessibility (in the disability sense) of a selection of highly-cited journals from EBSCO’s Library & Information Source database. It's absolutely damning of a profession that claims to champion equitable access to information.

Fewer than half (48%) of the articles overall included an HTML format option.

Which is a pretty big problem since well-marked-up HTML is about the most accessible format you can use when it comes to screen readers. But it somehow gets worse:

Findings from the audit of PDF accessibility showed that 100% of the PDF articles (N= 120) from this study’s original sample failed the minimum standard of PDF/UA accessibility of containing a tagged structure.

That is, the entire sample of article PDFs was inaccessible without running them through third party software to attempt remediation. As the authors state, this is not the fault of EBSCO but rather of the journal publishers. The irony of this article is that it is only available in PDF format – ITAL doesn't publish articles in HTML.

Taking Control of Our Data: A Discussion Paper on Indigenous Data Governance for Aboriginal and Torres Strait Islander People and Communities

Finally, an interesting paper from the Lowitja Institute. This helped clarify for me what is actually meant by “Indigenous Data Sovereignty”, “Indigenous Data Governance”, and how they are different. I really like the way this discussion paper is framed in part as a practical guide for local Aboriginal and Torres Strait Islander communities and organisations. The case studies and planning guides make the theory more tangible.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Oh look capitalism is bad for scholarly research

March 25, 2024

Ending profiteering from publicly-funded research

This is a mildly interesting and somewhat confused report from the Australia Institute. It's interesting in that the absurdity of scholarly publishing is now becoming a salient public policy debating point in Australia. Unfortunately the report itself offers up a fairly standard grab-bag of “solutions” that don't address the problems it accurately identifies, and don't address the incentives and pressures on researchers that drive the current system. It's all very well to identify that RELX and friends are “greedy” (they would say “maximising shareholder value”), but “greedy publishers” isn't much of an analysis of what drives the system.

Producing more but understanding less: The risks of AI for scientific research

May contain traces of the infamous “Giant rat dick” illustration. You have been warned.

Many AI tools reawaken the myth that there can be an objective standpoint-free science in the form of the “objective” AI. But these AI tools don't come from nowhere. They're not a view from nowhere. They're a view from a very particular somewhere. And that somewhere embeds the standpoint of those who create these AI tools: a very narrow set of disciplinary expertise—computer scientists, machine learning experts. Any knowledge we ask from these tools is reinforcing that single standpoint, but it's pretending as if that standpoint doesn't exist.

Indexing the information age / The birth of our system for describing web content

Over a weekend in 1995, a small group gathered in Ohio to unleash the power of the internet by making it navigable

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

Maybe the journal publishers got confused between IA and AI?

March 18, 2024

Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles

From the Journal of Librarianship and Scholarly Communication. There are some limitations to this article, but on the other hand these limitations mostly serve to identify the difficulties of discovery when it comes to archiving scholarly literature: the study didn't consider institutional archives, presumably because there's no easy way to know whether any given article is stored in one. This would be the same problem should someone want to find an archived copy of something when the DOI no longer resolves. The tl;dr is right there in the title, (for which the authors should be congratulated).

Some things to consider when deciding whether to start building with “AI” in libraries and archives

Ed Summers said elsewhere that he got some pushback from colleagues for this short talk, but I'm personally very grateful that he published it, as it will probably form some of the basis for work I will be doing in the first half of this year to work out a framework for how we assess the various “AI” discovery tools that will increasingly infest librarianship and academia.

Call for Proposals for JLSC Special Issue: Open Access: Diverse Experiences and Expectations

I didn't mean to have two links related to JLSC but it just kind of happened. Anyway, proposals are due by 5 April.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.

You have a new machine-generated MaRC record in your MyGov inbox

March 11, 2024

LEARNING LESSONS FROM THE CYBER-ATTACK; British Library cyber incident review

On Friday/Saturday the British Library released a “lessons learned” report into the October ransomware attack that severely damaged their ability to function and is still affecting operations. Whilst understandably a lot of detail is missing, the report is a useful document for anyone interested in cultural memory institutions, government services, organisational cyber security, or public policy.

I've seen a few takes on this on Mastodon, mostly along the lines of blaming the BL for poor security practices and culture. The report admits that their culture was not as security-focussed as it should have been, however I have a different view on this. I'll possibly provide my own lukewarm take in blog form at some point, but in essence I think this highlights a significant problem within all cultural institutions where there is a clash of organisational cultures and values between a short-term focussed future-looking information technology industry and a primarily long-term focussed, past-looking knowledge management profession. I'm probably betraying my bias with how I describe the two, but one doesn't have to view one side of the relationship as inherently better/smarter to understand how this difference in outlook causes problems.

Artificial Intelligence Blog Series: Introducing Our AI Metadata Generator

Wait, did somebody say “tech bros”? In hindsight we probably should be surprised that Ex Libris have taken this long to release an “AI-generated catalogue records” product. They're being cautious with this initially, pitching it as an “enhancement” tool, but I'd say it's pretty clear where they want this to go.

I think it's significant that this product is aimed at the “Alma Community Zone” (i.e. library-created records) rather than vendor-supplied records. The latter are by far the most complained-about and likely to be completely borked “at scale” as they like to say, but there's more long-term money in convincing libraries that they don't need to hire cataloguers any more.

Australia’s chief scientist takes on the journal publishers gatekeeping knowledge

Rounding out our week of depressing libraries and learning news, Chief Scientist Cathy Foley has come up with a great idea for increasing the profits of the North-Atlantic investment companies holding scientific knowledge ransom. According to the Grauniad she will “take on” the journal publishers by the cunning wheeze of offering them a big cheque to keep operating in exactly the same way but with more of the Australian public's money, paid to them directly instead of being laundered via universities. Apparently Elsevier couldn't say “yes” fast enough. Read to the end to get some good takes from people who actually understand what's at stake here and try to ignore Dr Foley's patronising comments about it being “threatening for some”. The entrenched interests here are the people whose pockets she wants to fill with your money.

Libraries and Learning Links of the Week is published every week by Hugh Rundle.

Subscribe by following @fedi@lllotw.hugh.run on the fediverse or sign up for email below.