LLLotW 2022.11

AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability

This is a really interesting article about the shady ethical behaviour of ...well, basically every machine learning company that exists. The crux of the argument is that commercial “AI” companies are abusing academic research arrangements in order to “datawash” inputs that are otherwise unavailable for commercial use.

This academic-to-commercial pipeline abstracts away ownership of data models from their practical applications, a kind of data laundering where vast amounts of information are ingested, manipulated, and frequently relicensed under an open-source license for commercial use.

And more personally for the author:

I was happy to let people remix and reuse my photos for non-commercial use with attribution, but that’s not how they were used. Instead, academic researchers took the work of millions of people, stripped it of attribution against its license terms, and redistributed it to thousands of groups, including corporations, military agencies, and law enforcement.

open/ed Review project

Last week I linked to the DOERS3 Open Education in Tenure and Promotion Case Studies. This page from the open/ed group assists with another common problem for open education advocates: a widespread belief amongst educators that “you get what you pay for” and therefore OERs cannot possibly be high quality.

A recent nationally representative survey of 2,144 faculty members in the United States found that “most faculty remain unaware of OER” (Babson Survey, 2014 )...This same survey found that college professors rate “proven efficacy” and “trusted quality” as the two most important criteria for selecting teaching resources. Thus we believe that for OER to gain traction it is important to gather empirical research demonstrating its efficacy and quality...To this end, we have gathered articles that focus on the efficacy of OER or teacher/student perceptions of such resources in actual practice.

Shared Vocabularies Create Oceans of Opportunities

From plant taxonomy to disease classification, science depends on precise language and referencing. Finding evidence-based solutions to the grand societal challenges of this century requires that scientists use shared scientific concepts to pool their work. This enables them to aggregate vast amounts of data from multiple sources, often from multiple disciplines and domains, and from countries where differing languages are spoken. Clearly, unless a data collection is tagged using globally agreed terms, it cannot be part of the global web of information systems necessary for tackling challenges such as climate change.

An interesting piece from the ARDC. Read this and then reflect on why “shared” vocabularies aren't always more desirable than localised knowledge systems.


Libraries and Learning Links of the Week is published every Thursday by Hugh Rundle. If you like email newsletters you might also like Marginalia, a monthly commentary on things I've read and listened to more broadly.