This is a guide brewed from conversations with initiatives in London
(UK), Berlin (DE), Prague (CZ), Philadelphia (USA), Rotterdam (NL),
Vienna (AT), Lutruwtia (Tas/AU), County Mayo (IE) and a community from
Middle America gathering on servers. This text can support you when
starting a permacomputing collective. We use fermenting as a metaphor.
It isn’t a strict recipe but more of a loose framework that can be
freely modified to suit local tastes and conditions. Many actions are
cyclical and can be seen as opportunities to revisit or re-purpose
later.
located at 311 North 20th Street (at Wood Street), one block north of
the Ben Franklin Parkway, directly behind the Central Library on Logan
Square and just five minutes from Routes 95 and 76 - is a used bookstore
staffed by a team of paid and volunteer booksellers that offers gently
used, publicly donated books, records, CDs at reduced prices. Now the
new home of Chris’ Kids Corner for children’s books and story hours,
they also feature an eclectic stock of fiction, non-fiction,
contemporary, classics, out of print, first editions, art, architecture
and crafts.
Le Sommet pour l’action sur l’intelligence artificielle se tient à Paris
du 6 au 11 février 2025. C’est l’occasion de rappeler l’existence d’une
boîte à outils pour révéler les angles morts des discours sur le
numérique. Autour du concept d’« impensé numérique », ces outils de
compréhension sont précieux pour garder la tête froide face à
l’irruption de l’IA générative dans notre quotidien.
We analyzed 470 open-source GitHub pull requests, including 320
AI-co-authored PRs and 150 human-only PRs, using CodeRabbit’s structured
issue taxonomy. Every finding was normalized to issues per 100 PRs and
we used statistical rate ratios to compare how often different types of
problems appeared in each group.
The results? Clear, measurable, and consistent with what many developers
have been feeling intuitively: AI accelerates output, but it also
amplifies certain categories of mistakes.
Father Mother Sister Brother is a 2025 comedy-drama anthology film
written and directed by Jim Jarmusch. It follows three estranged family
relationships in three different countries around the world, starring an
ensemble cast that includes Tom Waits, Adam Driver, Mayim Bialik,
Charlotte Rampling, Cate Blanchett, Vicky Krieps, Sarah Greene, Indya
Moore, and Luka Sabbat.
Columnar storage formats are the foundation for modern data analytics
systems. The proliferation of open- source file formats (i.e., Parquet,
ORC) allows seamless data sharing across disparate platforms. However,
these formats were created over a decade ago for hardware and workload
environments that are much different from today. Although these formats
have incorporated some updates to their specification to adapt to these
changes, not all deployments support those modifications, and too often
systems cannot overcome the formats’ deficiencies and limitations
without a rewrite. In this paper, we present the F uture-proof F ile
Format (F3) project. It is a next-generation open-source file format
with interoperability, extensibility, and efficiency as its core design
principles. F3 obviates the need to create a new format every time a
shift occurs in data processing and computing by providing a data
organization structure and a general-purpose API to allow developers to
add new encoding schemes easily. Each self-describing F3 file includes
both the data and meta-data, as well as WebAssembly (Wasm) binaries to
decode the data. Embedding the decoders in each file requires minimal
storage (kilobytes) and ensures compatibility on any platform in case
native decoders are unavailable. To evaluate F3, we compared it against
legacy and state-of-the-art open-source file formats. Our evaluations
demonstrate the efficacy of F3’s storage layout and the benefits of
Wasm-driven decoding.
Tsing and Zhou instead prefer to describe FUNGI: Anarchist Designers as
an exhibition of ‘anti-design’. In this show, fungi are not passive
building materials, but rather anarchic co-designers – unconscious and
uncontrollable – of a world that can only exist through alliances
between humans and other living things. The exhibition reveals how fungi
manifest themselves at different scales, from sick frogs and the
dishwasher in your kitchen, to hospital beds, termite mounds, the human
digestive system, coffee, banana and conifer plantations, and the
jungle.
Here’s what I keep coming back to: in any mature system, most of the
graph will be memories of memories. You ask me my favorite restaurants,
I think about it, answer, and now “that list I made” becomes its own
retrievable thing. Next time someone asks about dinner plans, I don’t
re-derive preferences from first principles. I remember what I concluded
last time. Psychologists say this is how human recall actually works;
you’re not accessing the original, you’re accessing the last retrieval.
Gets a little distorted each time.
In 2023, I returned to Taiwan for further social science research
focused on environmental activism in response to the pollution caused by
Formosa Plastics, one of the biggest plastics companies in the world,
headquartered in Taiwan. One thread of research upon my return extended
an initiative that I’ve come to call “Archiving Formosa Plastics”,
designed to study environmental justice and governance issues related to
the company across diverse settings, supporting various research,
advocacy, and teaching endeavours.
The project started after getting to know an anti-Formosa activist in
Texas named Diane Wilson. Since the Texas Formosa Plant started
operations in the late 1980s, Wilson has collected many kinds of
documents about the plant, eventually needing a large barn to store them
all. I had many questions about these documents: When and how did these
various documents gain and lose political value? Who would take care of
and steward these documents into the future
A couple of days ago I wrote about ActivityPub becoming a W3C
Recommendation. This was one output of the Social Working Group, and the
blogpost was about my experiences, and most of my experiences were on my
direct work on ActivityPub. But the Social Working Group did more than
ActivityPub; it also on the same day published WebSub, a useful piece of
technology in its own right which amongst other things also plays a
significant historical role in what is even ActivityPub’s history (but
is not used by ActivityPub itself), and it has also published several
documents which are not compatible with ActivityPub at all, and appear
to play the same role. This, to outsiders, may appear confusing, but
there are reasons which I will go into in this post
We are part of a growing global effort to ensure innovative technologies
serve Indigenous communities and their environmental priorities. As AI
rapidly becomes even more ubiquitous, Western science should look to
Indigenous experts to guide the development of ethical AI tools for
conservation in ways that assert their own goals, priorities and
cautions.
Responsible AI that benefits Indigenous communities and conservation
must implement Indigenous data sovereignty principles, and determine how
and if Indigenous traditional ecological knowledge should be
incorporated into these systems. AI that is co-designed with Indigenous
partners, rather than for them, can make these technologies more
accessible, culturally appropriate and aligned with community goals
A critical review of Generative AI for Indigenous-focused organizations,
governments, and anyone who wants to better understand and protect their
rights and privacy.
Tramas is an action-oriented research project of the Decolonial Feminist
Coalition for Digital and Environmental Justice. By examining case
studies and testimonies of resistance across Latin America, we aim to
show how the digital technology production chain has a wide range of
socio-environmental impacts on communities and their territories.
All the “Discord-only” communities that currently exist will likely
disappear, however. And unlike all the dead phpBB forums that nobody
uses anymore but are still up for now, dead Discord communities will
never appear on the Wayback Machine. A decade or more of the internet’s
history gone. Sure, you can’t easily google search the Wayback Machine,
but at least it still exists somewhere. Discords? Truly lost forever.
Entire hobby communities that spent years of work discussing things,
answering questions, a wealth of information… gone.
Indigenous math isn’t just about numbers and equations, it involves
culture, spirituality and more. Math professor Edward Doolittle, a
Mohawk from Six Nations in Ontario, sees math as something embedded in
Creation itself. In his Hagey Lecture at the University of Waterloo, he
describes Indigenous mathematics as being grounded in cognition,
emotion, the physical world and community. Indigenizing math, Doolittle
hopes, will make it more approachable and meaningful to Indigenous
students — show them how entwined it is with everyday life and something
much bigger than ourselves
The Man Who Knew Infinity is a 2015 British biographical drama film
about the Indian mathematician Srinivasa Ramanujan, based on the 1991
book of the same name by Robert Kanigel.
While AI is undeniably good at writing code, it remains poor at
architecting maintainable, distributable, and scalable systems. This is
where non-technical leaders who think they can fire their development
teams are making a significant mistake. Until we see the arrival of an
artificial intelligence that renders this entire discussion moot,
believing that technical expertise can be replaced by a prompt is a
strategic error. Building robust software still requires a human who
understands the underlying principles of the craft.
Now we are faced with the conundrum of figuring out which ones will
manage to reach broad adoption in the ecosystem. From where I’m standing
it seems easier to contribute the missing bits to Parquet and build
those systems on top. These formats prove that these techniques work. We
can get much better performance by applying new approaches. As a
community, we should take a hint and evolve what we have
But like said: the whenwords library contains no code. Instead,
whenwords contains specs and tests, specifically:
SPEC.md: A detailed description of how the library should behave and how
it should be implemented. tests.yaml: A list of language-agnostic test
cases, defined as input/output pairs, that any implementation must pass.
INSTALL.md: Instructions for building whenwords, for you, the human.
Bitcoin’s proof-of-work algorithm has a difficulty ratchet to keep production steady. Network difficulty adjusts every two weeks, approximately, based on whether miners have been finding new blocks for the blockchain too quickly or too slowly.
Difficulty tightens when a higher bitcoin price is encouraging miners to join the market, that much makes sense. But when a lower bitcoin price is squeezing margins and pushing the least efficient miners out, shouldn’t difficulty fall?
Maybe difficulty lags behind the price? As said above, the ratchet only adjusts approximately every two weeks. Total hashrate offers a more dynamic measure of power deployed to the network, albeit an estimated one based on how long it’s taking to mine a block. Any change in market structure might show up there first.
And has it? Not really:
The first thing to note is that, depending upon power costs and where they are in the queue for the latest rigs from Bitmain, miners margins vary greatly. So Elder is right that, if one of the two-week adjustemnts increases the difficulty, some of the least economic rigs should be switched off. In theory, if the adjustment decreases the difficulty, some of these idled rigs should be switched back on.
Eyeballing the graph, between early July and late October the price was between $110K and $120K, a 9% range. During that period the difficulty increased from around 140trn to around 155trn, about an 11% range. These aren't big changes, but it appears a bit strange that roughly flat price coincides with a steady-ish increase in difficulty.
Do miners with sunk costs keep running on negative margins in the hope of getting lucky? Are a handful of big miners, maybe advantaged by free power or whatever, keeping difficulty high to drive out competitors, either inadvertently or as part of some devious plan to centralise production and control the network? Or has mining become 55 per cent more efficient since last November?
I estimate that during that time Bitmain and its competitors shipped an additional 1000EH/s to the miners, over and above the new rigs replacing obsolete ones. These leading-edge rigs would have been turned on immediately. During that time the hash rate rose from about 870EH/s to around 1120EH/s, or about 250EH/s. So around 150EH/s must have represented idled, less-efficient rigs being turned on. At the start of the period at most 85% of the rig fleet was working. At the end of the period, as the price started to fall, the most efficient miners had about 100EH/s more than they did at the beginning.
In the subsequent 2 months the price dropped from about $120K to around $85K and the hash rate dropped from around 1120EH/s to around 1060EH/s, or by 60EH/s. But in that time there ws an incremental 50EH/s of new rigs. So 110EH/s of the 150EH/s from the originally idle fleet of uneconomic rigs were turned off. This within the margin or error of my back-of-the-envelope estimates.
My take is that what happened was what should have been expected. Elder's view was too simplistic, and he was misled by (a) the noise in the graph, and ((b) the fact that the Y axis doesn't start at zero, making the noise much more evident.
Elder goes on to note an issue I've been writing about for really a long time:
Related, is it a worry that just three mining pools accounted for more than 45 per cent of this week’s block production? Given nearly all the hardware used across the network is Chinese-made, with Beijing-based Bitmain Technologies alone having an estimated market share of 82 per cent, when do concentration levels among a few organisations become a security concern?
Yes, this concentration has been a problem for Bitcoin's entire history, but everyone has decided to ignore it. As I write, on 6th January 2026, over the last three days Foundry USA and AntPool have controlled 51.2% of the hash power. As I explained at length in Sabotaging Bitcoin, there are practical difficulties facing an insider attack with 30% of the hash power, but with 51.2% things are much easier.
Bitcoin mining is quietly staging a comeback in China despite being banned four years ago, as individual and corporate miners exploit cheap electricity and a data center boom in some energy-rich provinces, according to miners and industry data.
China had been the world's biggest crypto mining country until Beijing banned all cryptocurrency trading and mining in 2021, citing threats to the country's financial stability and energy conservation.
After having seen its global bitcoin mining market share slump to zero as a result of the ban, China crept back to third place with a 14% share at the end of October, according to Hashrate Index, which tracks bitcoin mining activities.
This may be what the 80% of new data centers in China that were empty because unsuitable for AI are being used for.
Stablecoins have long been pitched as crypto’s on-ramp. Swapping fiat money for a fiat-pegged stablecoin like Tether’s USDT or Circle’s USDC allows a trader to switch in and out of positions without having to touch tradfi.
Shouldn’t an on-ramp also work as an off-ramp? There’s not much evidence it does. In the six weeks or so when $1.2tn in value was drawn down from the cryptoverse, the market cap of USDT has increased by approximately $20bn:
This USDT graph is updated from the one Elder used. It shows a clear break in the upward trend on 24th October. In the 10 preceding weeks it increased by $17B (~10%) as Bitcoin traded beteween a low of $108K and a high on 6th October of almost $125K.
In the 10 weeks since it has increased by only $3B as Bitcoin dropped from $111K to a low of $84K on 22nd November, subsequently trading in a range from there to $94K.
Here is its updated graph. On 24th October its market cap was around $76B, it is now around $75B. So, yes, Elder was right, it has been basically flat for 10 weeks where the preceding 10 weeks it had increased by $8B (~12%).
Had the trend of the previous 10 weeks continued, the market cap of (USDT+USDC) would have been $25B higher than it is. One way of looking at this would be that traders "withdrew" $25B or around 10% of the (USDT+USDC) market cap on 24th October. On this basis Elder is just wrong, the off-ramp has been quite busy.
Even if we assume a large percentage of stablecoins are used for non-crypto things (sports betting, remittances, crimes), the recent issuance still looks at odds with the trend.
First, this contradicts the premise of Elder's question as I understand it. Second, most of the uses Elder cites involve a chain of transactions from fiat to stablecoin to fiat on one or more exchanges. These would not cause a net increase in demand for the stablecoin, which would come from and go back to the exchange's reserves.
Maybe demand is high because crypto traders have been parking money rather than seeking to withdraw it?
First, demand isn't high relative to the historic trend, it is now $25B lower. Second, even if we accept Elder's $20B number, this is peanuts relative to the $1.2T drop in aggregate cryptocurrency market cap.
More idle money in the system ought to be good news for the likes of Coinbase, which uses the promise of higher yields on USDC deposits to sell monthly subscription schemes. And how have Coinbase shares been doing?
The day Bitcoin hit the skids on 8th, COIN closed at $387.27. By 11/20 it was down 38% while Bitcoin was down 47%. It can hardly be a surprise that COIN is highly correlated with Bitcoin.
What explains trash crash PTSD?
Next, Elder asks why cryptocurrencies haven't resumed their progress moonwards:
A popular argument among crypto commentators is that token prices are down because traders are still digesting one bad day in early October.
Reasons for the October drawdown go from banal (maybe the high-beta cryptosphere just amplified an equities pullback on US-China trade tensions?) to wonky (maybe it all cascaded from a weird synthetic stablecoin depegging on one marketplace?) to the darkly cynical (maybe the big sharp drop was to let bucket-shop crypto brokers close out customer positions they’d never actually bought?).
In anything to do with cryptocurrency, and definitely in this case, I'd go with "darkly cynical". He then asks:
If a market’s not deep, efficient or clean enough to digest a bit of one-day volatility, why get involved?
I'd agree with Elder on this. If we look at the log plot of Bitcoin's history, we can barely see the 47% drop in 6 weeks last October. We can just see, to take some recent examples, the 48% drop in 5 weeks in early 2020, and the 42% drop in 2 weeks in mid-2021. Traders who don't learn from history are doomed to repeat it. Of course, the volatility is precisely the thing that brings the traders to Bitcoin.
Crypto exchange-traded products have been haemorrhaging money all week. Spot ETP net redemptions yesterday were $1.14bn, including $901mn just from bitcoin ETPs, according to JPMorgan estimates. That’s the worst single day for net outflows since February.
With so much selling, you might expect to see an increase in bitcoin velocity, which measures the rate at which tokens move on the chain.
This graph covers a longer history than Elder's. Already back in 2021 Igor Makarov & Antoinette Schoar found that the "rate at which tokens move on the chain" was irrelevant:
90% of transaction volume on the Bitcoin blockchain is not tied to economically meaningful activities but is the byproduct of the Bitcoin protocol design as well as the preference of many participants for anonymity ... exchanges play a central role in the Bitcoin system. They explain 75% of real Bitcoin volume.
Velocity dropped sharply until November 2023, and then continued to drop gradually. November 2023 was the start of a major increase in Bitcoin's price, which coincided with a sustained increase in trading volume on exchanges.
It thus seems likely that the trend identified by Makarov & Schoar that on-chain activity was largely confided to exchanges increased between 2021 and 2023, and was then saturated. Elder's explanation is only part of the story:
Bitcoin velocity has been plummeting for years, for reasonable reasons. “Digital gold” overtook “internet money” as the preferred reason to hold, while derivatives like perpetual futures removed any need to faff around with the underlying asset.
It isn't just derivatives. Spot trading happens on exchanges. The blockchain is pretty much only used for inter-exchange netting transactions. So Elder's question misses the point:
Nevertheless, is it odd to have a sudden wave of selling that’s almost invisible
in the underlying asset? Bitcoin velocity has barely changed over the past
month, having bounced meekly off a record low in early October. Why?
Having a vast derivative market based off a much smaller spot market on exchanges based on a tiny set of transactions on the blockchain is a wonderful playground for traders, because it is easy to manipulate.
Spoiler: No. In this section Elder expresses skepticism about a piece by Alex “Crypto Alex” Saunders, of Citigroup suggesting that:
The halving cycle is a reason that long-time Bitcoin holders are nervous. We show the price performance in the years after halvings in Figure 3 with the second year showing weakness. These crypto winters have been associated with 80%+drawdowns in the past as shown in Figure 4.
Elder doesn't need any help from me on this "chart necromancy". Halvenings primarily affect miners by roughly halving their income without a corresponding decrease in costs. This may force them to sell coins they have stashed, which at the margin may drive the price down. But with miners' income currently around $40M/day and recent volume peaks on major exchanges around $1.4B/day this is likely to have marginal impact.
This post was written by Emily Woehrle, who attended the 2025 DLF Forum as an Emerging Professionals Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR. 2025 Emerging Professionals Fellowships were supported by a grant from MetaArchive.
Emily Woehrle is a Digital Content Librarian at the University of Toronto Libraries, where she supports a large-scale website renewal project and manages the library’s LibGuides service. She also works as a part-time librarian at the Toronto Public Library. With a background in non-profit communications and content management, Emily brings an interdisciplinary perspective to her work and looks forward to sharing experiences and learning from peers at the DLF Forum to advance sustainable, user-centered digital library practices.
Documentation as responsible digital stewardship
Attending the DLF Forum as one of the Emerging Professionals fellows was an incredibly positive experience that left me with new connections, new ideas, and a validating sense of solidarity. The Forum was only my second library conference, yet I felt immediately comfortable among people who get it and were ready to share and listen to each other’s experiences. It was also a joy to navigate the conference with my fellow Fellows.
Two presentations stood out to me over the course of the Forum, and neither focused on trendy hot topics. Instead, they highlighted the importance of documentation; the practical, behind-the-scenes work that keeps digital libraries and archives running by codifying tacit knowledge and establishing the workflows, structures, and guidelines that sustain digital library work.
In my current role, I’m coordinating a large-scale website consolidation project that requires my colleagues and me to build processes and governance structures impacting 20+libraries and departments. Documentation has become essential to scaling the project and keeping everyone aligned. Over the past year, I’ve spent a lot of time thinking about the most effective ways to develop knowledge-management systems, why some workplace cultures prioritize them more than others, and what that means for long-term success.
The first session, “Agile Documentation Development for Digital Preservation Systems,” offered strategies to make documentation immediately useful, iterative, and collaborative. The presenters emphasized creating “minimum viable documents” that favor progress over perfection – start with something usable, then refine it over time. They also underscored how role clarity and interdepartmental culture shape the success of documentation efforts. This session helped me reframe documentation as a living tool whose maintenance must be built into our work rather than treated as an afterthought.
The second session, “Renaming Failure as “readme.files”: Lessons Learned from Early and Mid-career Archives Perspectives,” reminded me that unexpected challenges are inevitable and that they can serve as learning opportunities instead of being perceived as failures. The speaker spoke about the importance of recording “detours” as they happen and how documentation can play a key part in reflection on lessons learned. She also discussed the value of using documentation to close “open loops” when offboarding from a project, ensuring that future staff can build on past work rather than unknowingly duplicating it. It was a practical reminder that documenting failure isn’t about dwelling on mistakes; it’s about giving the next person a clearer path forward.
Taken together, these sessions reinforced that documentation is more than a checklist. It can be a form of care—not only for colleagues and users who will later take on or inherit the work, but also for the library systems that depend on it. Creating collaborative documentation is an often overlooked and undervalued core competency, yet it is fundamental to both project and organizational success. I left the Forum with a renewed commitment to integrating these approaches into my own knowledge-management practices and to advocating for clearer, more collaborative documentation across the teams I work with.
In the fall of 2025, I was presented with the exciting opportunity to teach CS 433/533: Web Security at Old Dominion University (ODU). This course was designed and previously taught by Dr. Michael L. Nelson. Having experienced this class firsthand as a student, I was thrilled to transition into the role of an instructor, bringing my unique perspective to the curriculum. Through this blog, I aim to share my journey and insights from teaching this course, with the hope that my experiences will serve as a useful resource for fellow colleagues and those venturing into teaching for the first time.
The goal of this course was to review common web security vulnerabilities and exploits, along with the defenses designed to counter them. Students explored topics such as browser security models, web application vulnerabilities, and various attack and defense mechanisms including injection, denial-of-service, cross-site scripting, and more. Alongside theoretical knowledge, students also gain hands-on experience with technologies like Git and GitHub, DOM and JavaScript, the command line interface (CLI), and Node.js.
Before the class commenced, one of the primary tasks involved setting up the course online where students could easily access materials and engage in communication. This platform is crucial, as it serves as the central point for sharing the syllabus, detailing office hours, distributing course materials, and facilitating ongoing dialogue with students. This course utilized GitHub primarily for assignments and sharing resources. While Google Groups were previously utilized effectively for communication, I transitioned to Canvas, the official learning management system of ODU. This decision leveraged a familiar environment, minimizing the learning curve for students and streamlining course interactions.
The first task on Canvas was crafting an updated syllabus that clearly conveyed the course's essence and objectives. It also laid out prerequisites, grading criteria, and ODU's plagiarism policies. It provided students with a clear roadmap of what to expect, including schedules and exam details. Additionally, I strongly recommend implementing a weekly Summary Schedule, an approach inspired by Dr. Michele Weigle's courses. This schedule detailed the topics to be discussed each week, listed any assignments for that week, and their respective due dates. This approach not only helped students gain a clear overview of the entire course, but it also helped me keep track of everything and stay on schedule. It also facilitated timely communication, as it reminded me when to send out weekly announcements, release assigned tasks, and mark due dates to ensure efficient grading.
One of the highlights of this class was the weekly discussion forum, which significantly enhanced communication among students. Initially, Dr. Nelson had developed an activity where students would retweet tweets related to web security weekly on Twitter/X, followed by in-class discussions. This approach was an excellent way for students to learn about current trends and major news in web security. However, as the class was asynchronous, I needed a different method to maintain this engagement. I utilized Canvas to create weekly discussion forums where students could earn points by sharing the latest stories or news related to web security. Each week, I observed students sharing intriguing news and actively reacting to and discussing each other's posts. This method was not only enjoyable, but it also fostered a collaborative learning environment, allowing us all to discover new information together each week.
Canvas Discussion Forum for students to post weekly updates on web security news
Next, it's essential to establish a communication channel with students. I used Canvas Announcements to send weekly messages that informed students about the week's focus, the materials they would need, links to lectures and slides, and any updates regarding assignments. If you have prepared materials in advance, Canvas allows you to schedule these announcements to be released at a later date. This approach ensures timely and organized communication, keeping students informed and engaged throughout the course.
As the saying goes, "All fingers are not equal," and this holds true for students as well. Each student learns at their own pace, some grasp concepts quickly, while others need a bit more time. I learned that it's crucial to offer flexibility and understanding to accommodate these differences. When I first noticed that some students were scarcely participating in class, I realized I needed to take proactive steps. Reaching out to struggling students via email, acknowledging their challenges, and inviting them to connect if they needed support proved to be an effective strategy. This simple gesture often served as an icebreaker, leading to increased attendance at office hours and more frequent email communication. I also discovered that many students in my class preferred office hours after standard working hours due to their job commitments. In response, I adjusted my regular office hours to accommodate students' working schedules, moving them to after 5 PM. Timely grading and constructive feedback also proved to be a key to encouraging student development. Prompt feedback allowed students to reflect and improve their work over the semester. Additionally, offering extra credit opportunities, provided students the chance to boost their grades and reinforce their understanding.
Reflecting on my experience teaching this course, I've come to appreciate the dynamic nature of education and the profound impact that thoughtful instructional design can have on students' learning experiences. Transitioning from student to instructor offered me a unique perspective, allowing me to tailor the course to address the diverse needs and learning styles of my students. Developing online resources, a detailed syllabus, and an interactive discussion forum highlighted the importance of clear communication. Building a supportive environment requires flexibility and empathy, acknowledging each student's unique challenges. Adjusting office hours and maintaining open communication ensured students felt supported, while consistent feedback and extra credit fostered growth and improvement. Overall, I had immense fun connecting with the students, understanding their perspectives, and learning alongside them. The collaborative learning atmosphere enriched not just their educational journey but also my teaching experience, reminding me how rewarding it can be to explore new ideas together.
I extend my heartfelt gratitude to Dr. Michael L. Nelson, Dr. Michele C. Weigle, and Dr. Steven J. Zeil for providing me with this invaluable experience. This course greatly benefited from Dr. Nelson's materials. I also appreciate Dr. Nelson and Dr. Weigle for their willingness to address any questions I had.
This post was written by Dorian McIntush, who attended the 2025 DLF Forum as an Emerging Professionals Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR. 2025 Emerging Professionals Fellowships were supported by a grant from MetaArchive.
Dorian McIntush is the Open Scholarship and Data Resident Librarian at Washington and Lee University, where he supports faculty and students with digital research and open knowledge initiatives. He has a commitment to creating equitable access to knowledge and is particularly interested in exploring the environmental impact of digital technologies and open scholarship models that prioritize accessibility and long-term sustainability. Beyond his professional work, he enjoys tromping through Virginia’s hiking trails with his dog, taking on new knitting projects, and cooking interesting recipes for dinner.
When I stepped into my current role of Data and Open Scholarship Resident Librarian at Washington and Lee University in July of this year, I was also stepping for the first time into the world of academic libraries. I focused on public libraries during my MLIS and after graduation I worked at the DC Public Library. My new academic librarian position was also brand new at my institution and was constructed to be a librarian residency. This meant that I would have a lot of freedom to grow into and shape the role, but also no real history to use as a support and learn from. This was simultaneously a gift and also a little daunting.
Coming from public libraries, I had grown used to thinking about access in very practical, immediate terms. Who has a library card? Who can physically get to our building? What barriers keep people from the resources they need? But the sessions at DLF pushed me to think about access in ways that felt more expansive.
Amber Dierking’s presentation on the Queer Liberation Library was a particular highlight for me. I’d already been a user and huge fan of QLL, but hearing Dierking talk about the work behind it reinforced everything I loved about their approach. QLL didn’t reinvent the wheel. They focused on using existing tools, keeping it simple, making it free. As someone building a role from scratch, the creative pragmatism of QLL felt like a blueprint I could make use of.
I was also drawn to Mariam Ismail’s presentation on the 23/54 Project. The work of preserving a community quilt through 3D scanning and building an interactive digital exhibit felt like a perfect example of what digital humanities could be at its best: deeply rooted in community, respectful of material culture, and genuinely expanding access rather than just digitizing for digitization’s sake. It made me think about the special collections and archives at my own institution and how we might engage descendant communities and students in similar ways.
The Data Advocacy for All Toolkit presentation tied these threads together for me in a way I didn’t expect. The team was talking about who gets to tell stories with data, who gets left out of those stories, and how we can teach people to use data ethically for social change. This toolkit offered a framework that felt aligned with my public library values, one that’s accessible, focused on equity, and designed to empower data users and learners.
DLF gave me permission to think big while starting small. I’m returning to W&L with a clearer sense of what this residency could become, not a replica of someone else’s role, but something shaped by the communities I serve and the values I bring from public libraries into this new academic space.
Hopefully this isn’t perceived as me caving, but I’m trying to redirect
my ire at the spread
of genAI into learning how to evaluate genAI, especially when
comparing it to existing non-genAI systems, but also between different
genAI solutions (models, etc).
I don’t consider myself a doomer
or a boomer, and see genAI as normal
technology that needs to be evaluated as a technology. I know this
is a broad area, that overlaps somewhat with how the models are
themselves built (benchmarking), but if you have recommendations please
let me know?
Eight courses from the Web Science and Digital Libraries (WS-DL) Group will be offered for Spring 2026. The classes will be a mixture of on-line synchronous, asynchronous, and f2f.
CS 431/531 Web Server Design, Mr. David Calano, Mondays, Wednesdays, & Fridays, 10:00-10:50 Topics: HTTP, REST (Representational State Transfer), HATEOAS
CS 432/532 Web Science, Ms. Nasreen Muhammad Arif, asynchronous Topics: Python, R, D3, ML, and IR.
CS 620 Intro to Data Science & Analytics, Dr. Bhanuka Mahanama, Web, Tuesdays & Thursdays 6:00-7:15 and asynchronous (8 week session starting in October) Topics: Python, Pandas, NumPy, NoSQL, Data Wrangling, ML, Colab.
CS 625 Data Visualization, Dr. Bhanuka Mahanama, Tuesdays & Thursdays 1:30-2:45 Topics: Tableau, Python Seaborn, Matplotlib, Vega-Lite, Observable, Markdown, OpenRefine, data cleaning, visual perception, visualization design, exploratory data analysis, visual storytelling
CS 732/832 Human Computer Interaction, Dr. Bhanuka Mahanama, Tuesdays 4:30-7:10 Topics: Cognitive and social phenomena surrounding human use of computers.
CS 733/833 Natural Language Processing, Dr. Vikas Ashok, asynchronous Topics: Language Models, Parsing, Word Embedding, Machine Translation, Text Simplification, Text Summarization, Question Answering
CS 734/834 Information Retrieval, Dr. Santosh Nukavarapu, Tuesdays 4:30-7:10 Topics: crawling, ranking, query processing, retrieval models, evaluation, clustering, machine learning.
The Fall 2026 semester is still undecided, but will likely be similar to the previous Fall semesters. Previous course offerings: F25, S25, F24, S24, F23, S23, F22, S22, F21, S21, F20, S20, F19, S19, and F18.
This text, part of the #ODDStories series, tells a story of Open Data Day‘s grassroots impact directly from the community’s voices. The event ‘Leveraging Open Data for Child Advocacy in a Polycrisis Context’ was successfully held on 6th March 2025 in Imo State, bringing together child rights, teachers, advocates, policymakers, data analysts, church leaders and...
This text, part of the #ODDStories series, tells a story of Open Data Day‘s grassroots impact directly from the community’s voices. On March 4, 2025, the YouthMappers Chapter at the Institute of Remote Sensing and GIS (IRS), Jahangirnagar University, proudly celebrated Open Data Day 2025 with an impactful event titled “Mapping Resilience: Harnessing Open Data...
I learned about Marimo a few months ago
from this
podcast episode and have been meaning to try it out. As a long time
user of Jupyter notebooks I was
interested to hear how Marimo was built (in part) to help solve the
problem of reproducibility
in data science and research software (Pimentel, Murta, Braganholo, & Freire,
2019). This is a problem I have had a lot of experience with,
especially when sharing Jupyter notebooks with others.
The key thing that Marimo brings to the Python notebook to improve
reproducibility is reactive execution. Marimo uses the Python
AST to
remember what cells depend on other cells, and when a change in one
requires the execution of another, it will go and update it for you.
Because of how they are created and edited, it’s very common for Jupyter
notebooks to get into an inconsistent state because of the order in
which cells are executed. This problem goes away with Marimo.
But, this aside, I thought I’d mention some somewhat superficial things
that I immediately liked about Marimo…bearing in mind I’ve only been
using it for one day.
Pandas dataframes appear as nicely formatted tables that can be easily
scrolled horizontally, without truncation of values, or elided columns.
I’ve gotten this to work in Jupyter notebooks in the past, but it always
requires some fiddling it seems, and Marimo does it out of the box.
Table columns are sortable, filterable and can be summarized.
You can page through large tables.
You can easily download tables
You can commit your notebook to a git repository, and diffs in pull
requests make sense.
You can easily run the notebook from the command line.
You can embed tests in your notebooks, and run them separately.
Built in basic charts (pie, bar, line, etc) which you can view source
for and hand craft if you want (seems to use Altair).
The Department of Computer Science (CS) at Old Dominion University (ODU) hosted its annual Trick-or-Research event on October 31, 2025, blending academic exploration with Halloween-inspired creativity. Check out our previous Trick-or-Research blog posts for 2021, 2022, 2023, and 2024. Building on the success of previous years, this event brought together faculty, staff, and students to showcase the department’s cutting-edge research and encourage new collaborations. Designed especially to introduce undergraduate students to the wide range of research opportunities within the department, Trick-or-Research featured interactive lab tours, engaging demonstrations, and opportunities to network with professors and join research groups.
🎃 Trick or Research! | ODU CS Lab Tours 2025 👩💻👻 Tour CS labs, meet faculty & grad students, and win prizes!— Computer Science Graduate Society - ODU (@CSGS_ODU) October 13, 2025
Participants explored CS research labs in person at The E.V. Williams Engineering & Computational Sciences Building (ECSB) and Dragas Hall, with virtual option via Gather.town for remote attendees. To add to the festive atmosphere, attendees were encouraged to show their Halloween spirit by dressing in costume, with opportunities to win CS swag and a special prize for the best costume. The combination of hands-on research experiences, creative expression, and community engagement made Trick-or-Research an event that was both academically enriching and fun. 🎃
The Web Science and Digital Libraries (WS-DL) group, led by Dr. Michael Nelson and Dr. Michele Weigle, welcomed visitors interested in web-focused research and digital preservation. Lab members discussed ongoing projects through posters and informal conversations, answering questions about research directions and opportunities for student involvement. The group’s research spans web archiving, web science, social media analysis, digital preservation, human-computer interaction, accessibility, information visualization, natural language processing, machine learning, artificial intelligence, and scholarly data mining. Visitors also had the opportunity to learn more about CS graduate courses and pathways into research.
Lab for Applied Machine Learning and Natural Language Processing Systems (LAMP-SYS)
The LAMP-SYS lab, led by Dr. Jian Wu, shared its work in applied machine learning and natural language processing. Visitors learned about ongoing research in areas such as entity extraction, mining electronic documents, and computational reproducibility in deep learning. Lab members discussed how machine learning and NLP techniques are applied to real-world problems using building blocks from information retrieval, digital libraries, and scholarly big data.
Neuro Information Retrieval and Data Science (NIRDS) Lab
The Neuro Information Retrieval and Data Science (NIRDS) lab, led by Dr. Sampath Jayarathna, engaged attendees with demonstrations and discussions centered on perception- and cognition-aware systems. Students shared examples of their research involving eye tracking, EEG, wearable sensors, and explained how these technologies are used to study user behavior and support real-world applications. The lab’s work emphasizes integrating psychophysiological signals with information retrieval and data science.
Accessible Computing Group
The Accessible Computing group, led by Dr. Vikas Ashok, introduced visitors to research focused on improving digital experiences for users with visual impairments. Lab members discussed intelligent interactive systems, accessible user interfaces, and image-to-speech technologies. Through demonstrations and conversations, the team highlighted how their work enhances web accessibility and human-computer interaction.
Bioinformatics Lab
The Bioinformatics Lab, led by Dr. Jing He, showcased research in computational biology and bioinformatics. Posters and discussions highlighted projects involving genomic data analysis, protein structure modeling, and 3D molecular imaging. Lab members also explained how machine learning techniques are applied to address challenges in medicine and health-related research.
High Performance Scientific Computing Team for Efficient Research Simulations (HiPSTERS)
The HiPSTERS group, led by Dr. Desh Ranjan and Dr. Mohammad Zubair, introduced students to interdisciplinary research in high-performance computing (HPC). Visitors learned about the group’s use of advanced mathematical methods and GPU programming to support large-scale simulations, including examples related to particle collider beam dynamics and scientific computing workflows.
Artificial Intelligence (AI) and Applications Research Group
The AI and Applications research group, led by Dr. Yaohang Li, shared ongoing work in artificial intelligence and machine learning. Through posters and discussions, visitors learned about projects involving machine learning-based physics event generation, particle production simulations, protein crystallization classification, financial data analysis, and generative models. The group connected with students interested in applying AI techniques across scientific and real-world domains.
Hands-On Lab
The Hands-On Lab, established by Ajay Gupta, presented research focused on building practical, end-to-end systems. Lab members discussed projects involving real-time monitoring using sensors and wearables, health-related data collection and analysis platforms, learning management portals, and mobile applications for medical and educational use. Visitors learned how these systems are designed, implemented, and evaluated in real-world settings.
Data Mining Lab
The Data Mining Lab, led by Dr. Lusi Li, shared research in data mining, machine learning, and optimization theory. Topics discussed included online machine learning, representation learning, transfer and multi-view learning, recommender systems, and explainable AI. Attendees explored how these techniques are applied to complex datasets across different application areas.
Internet Security Research Lab
The Internet Security Research Lab, led by Dr. Shuai Hao, presented research focused on networking and security. Lab members discussed measurement-driven and data-centric approaches to studying internet infrastructure, web security, privacy, and cybercrime. Visitors learned how empirical analysis and large-scale data studies are used to understand and address modern security challenges.
Wadduwage Lab
The Wadduwage Lab, led by Dr. Dushan N. Wadduwage, focuses on developing novel computational microscopy techniques that capture biological systems at their most information-rich representations while minimizing redundancy. Students learned how the lab integrates optics, machine learning, and signal processing to advance high-fidelity biomedical imaging and build trustworthy medical AI systems. Lab members discussed research in computational and differentiable microscopy, interpretable and reliable AI for medical decision-making, and advanced tracking algorithms that enable pinpoint-level object and particle tracking in microscopic environments. Through these conversations, visitors gained insight into how interdisciplinary methods are applied to solve real-world challenges in biomedical imaging.
Summary
The 2025 Trick-or-Research event once again highlighted the depth and diversity of research within ODU’s Computer Science department. By combining hands-on demonstrations, engaging conversations, and a festive hybrid format, the event provided students with valuable insight into research opportunities and pathways for involvement. Whether attending in person or virtually, participants left with a deeper understanding of the innovative work happening across CS. If you missed this year’s event, be sure to keep an eye out for the next Trick-or-Research! 🎃💻
This post was written by Darcy Ruppert, who attended the 2025 DLF Forum as a Public Library Worker Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR. The 2025 Public Library Worker Fellowship was supported by Platinum sponsor AM Quartex.
Darcy Ruppert is an archivist and librarian living and working in the greater Seattle area. Since receiving their MLIS from the University of Washington in 2023, Darcy has worked in various museums, special collections, and community archives, specializing in the digital preservation of audiovisual collections. Their professional work has been defined by a commitment towards democratizing access to digital archives themselves and the tools of digital preservation. They are currently managing King County Library System’s Memory Lab, a Mellon Foundation-funded community oral history project with a mission to record, preserve, and share the stories of King County.
I am so grateful to have had the opportunity to attend my first DLF Forum as a 2025 Public Library Worker Fellow. In my current work as project manager for a new community oral history project based out of a large public library system, I often feel somewhat separate from the day-to-day of the rest of my organization. The work that I do connects to and is born out of the work of the library, but the unique nature of the program within my organization means that I am often working through problems and making decisions on my own. For this and many other reasons, it was refreshing to be surrounded by a community of my professional peers who are facing similar challenges and grappling with similar emergent questions, both practical and philosophical, within our field.
The first day’s opening plenary by Dr. Kay Coghill provided, for me, a grounding moment at the very beginning of the Forum. Dr. Coghill shared an entreaty for all of us, as digital library professionals and as human beings, to use our platforms, skills, and resources to mitigate harms to members of systemically marginalized groups in digital spaces. Their incisive talk made me think about my own positionality in digital spaces, and drove me to reflect on my own professional decisions and the cascading effects these (at times, seemingly small) decisions can have. In the work that we do, I think it’s easy to become complacent, to lose sight of the fact that we hold power to protect and, by the same token, harm, when we introduce new solutions without appropriate testing, community consultation, and evaluation. I think there’s a real danger in our field of focusing so much on integrating new technologies and providing “results” to our institutions that we may passively introduce real risks into the lives of our users and members of our greater community.
With Dr. Coghill’s talk framing the rest of the Forum for me, I think I appreciated the sessions I attended with a rejuvenated perspective towards care and stewardship. I thought about the race to adopt “industry standard” digital preservation tools, and what we lose when we fail to properly evaluate these tools for the unique needs of our organizations and user groups. At the University of Denver’s session on curating digital exhibits, I reflected on our ability as cultural heritage stewards to uplift and uncover marginalized voices rather than bend to dominate cultural narratives. At a session on AI-powered transcription, I considered the balance of risk and reward that is inherent to the world of LLMs. I’m thankful for all of the speakers who generously shared their work at the Forum, and for the opportunity to reflect on these important issues alongside my peers.
LibraryThing is pleased to sit down this month with Kelly Scarborough, who makes her authorial debut this month with Butterfly Games, a historical novel set in the Swedish royal court during the early 19th century. After working for two decades as a law firm partner and white-collar prosecutor, Scarborough returned to her interest in historical fiction and her love of writing, determined to tell stories about fascinating women who lived through challenging times. Scarborough sat down with Abigail this month to discuss her new book, due out later this month from She Writes Press.
Butterfly Games is based on a true story, and its heroine, Jacquette Gyldenstolpe, on a real person. Tell us a little bit about that story and how you discovered it. What made you feel that it needed to be retold?
Like so many turning points in my life, Butterfly Games began with a book. As a teenager, I fell in love with Désirée, Annemarie Selinko’s novel about Désirée Clary—the silk merchant’s daughter who was once engaged to Napoleon and later became Queen of Sweden. I read it over and over, fascinated by how a woman could be swept into history by forces she never chose.
Years later, during a difficult period in my life, that novel came back to me. I began researching Désirée’s descendants—the Bernadotte dynasty, which still reigns in Sweden today—and uncovered a world of political upheaval, fragile alliances, and private heartbreak. That’s when I stumbled across Jacquette Gyldenstolpe.
Jacquette appears in the historical record mostly as a scandal: a young countess who fell in love with Prince Oscar, the heir to the throne. But the more I read—letters, memoirs, court gossip—the more I realized how much of her story had been left untold. She wasn’t just a footnote in someone else’s rise to power. She was a young woman navigating impossible choices in a world where love could threaten a dynasty.
Once I found her, I couldn’t look away. I knew her story needed to be retold.
What kind of research did you need to do, while writing the book, and what were some of the most interesting things you learned in that process?
Can you see me smiling? I don’t think I’m capable of separating the research I needed to do from the research that simply called to me and took over my brain.
Over the course of several years, I spent more than eighty nights in Sweden, translated hundreds of handwritten letters, and built a chronology with more than five thousand entries to track who was where, with whom, and why. Jacquette’s world became a place I loved to inhabit. One day stands out above all others. I was granted special access to Finspång Castle, Jacquette’s childhood home—now a corporate headquarters, a place closed to the public. No photographs were allowed, so I took frantic notes on my phone as we walked through the women’s wing. In a sitting room, I noticed a small mother-of-pearl nécessaire—a sewing and writing box with tiny compartments for her most personal objects. It stopped me cold. My guide, a retired corporate executive who knew the house intimately, leaned in and whispered, “Jacquette’s.”
The box had been a gift from Jacquette’s husband, Carl Löwenhielm. That moment—imagining her hands opening it, choosing a needle or a quill knife—changed the direction of the book.
Suddenly, Jacquette wasn’t a scandal or a symbol. She was real.
Your book has been described as a good fit for admirers of Philippa Gregory and Allison Pataki. Did the work of these authors, or others, influence you when writing your story?
Absolutely—though in different ways. Philippa Gregory is a master of taking a story with a known, often tragic ending and making it feel suspenseful and intimate. I admire how she builds emotional momentum even when readers think they know what’s coming. Two of my favorites are The Kingmaker’s Daughter and her most recent novel, Boleyn Traitor.
Allison Pataki has also been influential, particularly in how she blends rigorous research with accessible storytelling. I love the smart, resourceful heroines she creates from women who otherwise might be lost to history. Her work reminds me that historical fiction can be immersive without being intimidating—and romantic without losing its seriousness. Both my book clubs loved Finding Margaret Fuller, and I did, too.
You’ve had a full career as a lawyer and prosecutor, before turning to writing. How has that work informed your writing and storytelling?
Don’t get me wrong, I had a lot to learn before writing a novel, but some of the things I loved about law proved useful for writing historical fiction. Law trained me to think in terms of evidence, motive, and connections. When you’re preparing a case, you assemble fragments—documents, testimony, inconsistencies—and shape them into a coherent narrative that persuades a jury.
Writing historical fiction isn’t so different. The facts matter deeply, but facts alone don’t tell a story. You have to decide what belongs at the center, what remains in the background, and where the emotional truth lives. My legal background also made me comfortable sitting with ambiguity. History is full of unanswered questions, and I don’t feel the need to resolve every one neatly. Sometimes what’s most compelling is what can’t be proven.
Tell us a little bit about your writing process. Do you have a particular routine—a time and place you like to write, a particular method? Do you plot your stories out ahead of time, or discover how they will unfold as you go along?
When the stars align, I retreat early in the day to the attic office of my nineteenth-century house in Connecticut, take my Shih Tzu upstairs with me, and leave the modern world behind. I wrote Butterfly Games in nine drafts. There was an outline, but I changed the plot in significant ways as I went along. For the sequel, I’m trying to be a little more disciplined. I started with an outline—but found myself getting too granular—so I switched to ninety old-fashioned index cards. Each card holds one scene: chapter number, date, setting, point-of-view character, and the scene’s pivot point. There’s barely room left for anything else, which forces clarity. I transcribed those cards into Scrivener, and now I’m writing. We’ll see how closely I stick to the plan.
What comes next? Are you working on any additional books?
Yes. Butterfly Games is the first novel in a planned series. The second book picks up after the events of the first and follows Jacquette and Oscar into a far more dangerous phase of their lives—when love has consequences, secrets carry weight, and survival requires choices that can’t be undone.
Tell us about your library. What’s on your own shelves?
My physical library is filled mostly with historical fiction, especially novels with complex, non- linear structures. I return again and again to Hamnet and The Marriage Portrait, as well as The Time Traveler’s Wife and Pure.
On a special shelf, I keep books connected to Jacquette’s world—like Désirée and The Queen’s Fortune—alongside more than a hundred antique Swedish memoirs and histories, many written by people who actually knew Jacquette.
What have you been reading lately, and what would you recommend to other readers?
For lovers of royal historical fiction, Boleyn Traitor is a must-read. I was also lucky enough to read an advance copy of It Girl, which I loved.
My favorite read last year was Broken Country—a deeply emotional novel with one of those intricate narrative structures that stays with you. In fact, I want to read it again.
A general strike is when working people refuse their labor until demands
are met • Research shows We need 3.5% of the population, OR 11 million
Americans, to be successFul • The STRIKE CARD below tracks our progress
so we all know When it’s time to strike •
Your Second Brain has little somatic metadata. It’s disembodied text
floating in a void, stripped of the rich contextual markers that would
actually help you remember and use it. When you try to retrieve
something from your Obsidian vault, you’re searching keywords. When you
try to retrieve something from your actual brain, you might think “that
thing I was reading when I was annoyed at that airport” and the whole
cluster of associated memories lights up.
We’ve been treating digital notes like they’re interchangeable with
mental notes, when they’re actually a much degraded format.
Here are the beginnings of a pattern language for user interface design.
These patterns drive the initial phases of design. They key off of
Story, a pattern from the early development language. The patterns are:
User Decision
Task
Task Window
I’m indebted to the user interface pattern mailing list and to Ward
Cunningham for the seeds for these patterns.
Earlier this year, we warned you to turn off Gemini on Android because
Google decided that its AI would have access to its users’ apps - even
if they had previously turned tracking for Gemini Apps Activity off. So
regardless of whether you told Google’s Gemini not to track you in the
past, its AI tool might be able to run tasks like send WhatsApp
messages, set timers, and even make calls on your Android device.
Worryingly, Google’s AI invasion does not stop there. Now, Google has
turned on settings like smart features, without your consent, and Gemini
can scan your Gmail emails, Photos, Drive, and other apps. In this
guide, we show you how to turn off Gemini on Android and all your Google
apps – Gmail, Chrome, Photos, and more. Protect your privacy and your
data from invasive AI tracking
/e/OS is an open-source mobile operating system paired with carefully
selected applications. They form a privacy-enabled internal system for
your smartphone. And it’s not just claims: open-source means auditable
privacy. /e/OS has received academic recognition from researchers at the
University of Edinburgh and Trinity College of Dublin.
The Information Technologies (IT) industry has an increasing carbon
footprint (2.1-3.9% of global greenhouse gas emissions in 2020),
incompatible with the rapid decarbonization needed to mitigate climate
change. Data centers hold a significant share due to their electricity
consumption amounting to 1% of the global electricity consumption in
2018. To reduce this footprint, research has mainly focused on energy
efficiency measures and use of renewable energy. While these works are
needed, they also convey the risk of rebound effects, i.e., a growth in
demand as a result of the efficiency gains. For this reason, it appears
essential to accompany them with sufficiency measures, i.e., a conscious
use of these technologies with the aim to decrease the total energy and
resource consumption. In this thesis, we introduce a model for data
centers and their users. In the first part, we focus on direct users,
interacting with the infrastructure by submitting jobs. We define five
sufficiency behaviors they can adopt to reduce their stress on the
infrastructure, namely Delay, Reconfig, Space Degrad, Time Degrad and
Renounce. We characterize these behaviors through simulation on
real-world inputs. The results allow us to classify them according to
their energy saving potential, impact on scheduling metrics and effort
required from users.
Hello! This past fall, I decided to take some time to work on Git’s
documentation. I’ve been thinking about working on open source docs for
a long time – usually if I think the documentation for something could
be improved, I’ll write a blog post or a zine or something. But this
time I wondered: could I instead make a few improvements to the official
documentation?
This specification defines a Zstandard-based format for compressed WARC
files, as an alternative to the GZIP-based format defined in WARC/1.1
Annex D.
In general, Zstandard can produce significantly smaller files than GZIP
while also achieving faster compression and decompression. Zstandard
also offers a much wider range of compression levels, ranging from
extremely fast compression with a modest size reduction to extremely
slow compression with a much better reduction. For files containing many
small records, Zstandard dictionaries can be used to reduce file size
even further, while still permitting random access to individual
records.
A scathing critique of proposals to geoengineer our way out of climate
disaster, by the bestselling authors of Overshoot
The world is crossing the 1.5°C global warming limit, perhaps exceeding
2°C soon after. What is to be done when these boundaries, set by the
Paris Agreement, have been passed? In the overshoot era, schemes
proliferate for muscular adaptation or for new technologies to turn the
heat down at a later date by removing CO2 from the air or blocking
sunlight. Such technologies are by no means safe; they come with immense
risks and provide an excuse for those who would prefer to avoid limiting
emissions in the present. But do they also hold out some potential? Can
the catastrophe be reversed, masked or simply adapted to once it is a
fact? Or will any such roundabout measures simply make things worse?
The Long Heat maps the new front lines in the struggle for a liveable
planet and insists on the climate revolution long overdue. In the end,
no technology can absolve us of respon
Perennials are plants that have a life cycle of at least three years—as
opposed to annuals (plants that have to be grown from seed anew each
year because they have a single-year lifecycle) and biennials (plants
that spend a year growing and then another year producing seed and
dying—so they have a two-year life cycle). Some perennial plants are
herbaceous (meaning they have tender stalks that may die back in the
cold season, then grow again from the root) and others are woody
(meaning they have woody stalks that tend to continue growing above soil
year after year), In this zine we are mainly discussing edible
herbaceous perennials, but we will mention a few woody plants as well.
In this project, we attempt to establish a large-scale soundscape
monitoring network and characterize ecosystem-specific soundscapes by
separating sounds from geophonic, biological, and anthropogenic sources.
Based on information retrieval techniques, the acoustic data are
transformed into metrics that describe the quality of acoustic habitat,
the behavior of soniferous animals, and noise-generating activities. The
outcomes will allow managers and stakeholders to use soundscape
information to monitor the trends of marine ecosystems and perform
data-driven decision making in conservation management.
« Il neige. » Indépendamment de tout cela et de tout ce moi, la neige a
toujours été un conversationnel ; un élément de nos conversations
communes, parce que même à l’époque où elle n’était pas rare, même à
l’époque où elle était attendue, elle advenait et survenait
soudainement, nuitamment le plus souvent ou tôt le matin derrière les
vitres de nos écoles et de nos yeux encore étourdis de sommeil. Elle
arrivait et tout le paysage changeait. La neige est une mutation du
paysage et de l’accroche de presque tous nos sens. L’une des rares
mutations de la nature qu’il nous est donné d’observer en totalité.
Sorry, Baby is a 2025 American black comedy-drama film written and
directed by Eva Victor, in their directing debut. Starring Victor, Naomi
Ackie, Louis Cancelmi, Kelly McCormack, Lucas Hedges, and John Carroll
Lynch. The film follows a reclusive college literature professor
struggling with depression following a sexual assault.
The film had its world premiere at the 2025 Sundance Film Festival on
January 27, where it won the Waldo Salt Screenwriting Award and received
widespread critical acclaim. It was released by A24 in selected theaters
in the United States on June 27, before expanding nationwide on July 25.
The film grossed $3.3 million worldwide against a production budget of
nearly $1.5 million. For their acting, Victor was nominated at the 83rd
Golden Globe Awards in Best Actress in a Motion Picture – Drama. For
their filmmaking, Victor won Best Directorial Debut from the National
Board of Review and was nominated for Best Original Screenplay at the
31st Critics’ Choice Awards.
Deep Learning with Python, Third Edition makes the concepts behind deep
learning and generative AI understandable and approachable. This
complete rewrite of the bestselling original includes fresh chapters on
transformers, building your own GPT-like LLM, and generating images with
diffusion models. Each chapter introduces practical projects and code
examples that build your understanding of deep learning, layer by layer.
The third edition is available here for anyone to read online, free of
charge.
Neural nets have an inclination for recipe crafting that resembles
rewriting languages, but do not incur the cost of searching and matching
facts against a database, reagents in rules are wired directly between
each other like state-machines.
Now, that’s a ridiculous (and arrogant) statement to make, of course,
but it is an ideal that we on the htmx team are striving for.
In particular, we want to emulate these technical characteristics of
jQuery that make it such a low-cost, high-value addition to the toolkits
of web developers. Alex has discussed “Building The 100 Year Web
Service” and we want htmx to be a useful tool for exactly that use case.
Websites that are built with jQuery stay online for a very long time,
and websites built with htmx should be capable of the same (or better).
Going forward, htmx will be developed with its existing users in mind.
If you are an existing user of htmx—or are thinking about becoming
one—here’s what that means
Diderot built the Encyclopédie because he believed that organizing
knowledge properly could change how people thought. He spent two decades
on it. He went broke. He watched collaborators quit and authorities try
to destroy his work. He kept going because the infrastructure mattered,
because how we structure the presentation of ideas affects the ideas
themselves.
We’re not going to get a better internet by waiting for platforms to
become less extractive. We build it by building it. By maintaining our
own spaces, linking to each other, creating the interconnected web of
independent sites that the blogosphere once was and could be again.
This is a great question, and one I have put a lot of thought into, even
going so far as to put “Built to last forever” on the landing page.
While drafting a lengthly reply I realised that I’d never articulated
the design philosophy of Bear to anyone bar friends.
So, without further ado, here are the choices I made in designing and
building Bear with regards to longevity…
Triptych is three simple proposals that make HTML much more expressive
in how it can make and handle network requests.
If you are a practical person, you could say it brings the best of htmx
(and other attributed-based page replacement libraries, like turbo and
unpoly) to HTML. For the more theoretically-inclined, it completes
HTML’s ability to do Representational State Transfer (REST) by making it
a sufficient self-describing representation for a much wider variety of
problem spaces.
I’m building Scour, a personalized content feed that sifts through noisy
feeds like Hacker News Newest, subreddits, and blogs to find great
content for you. It works pretty well – and it’s fast. Scour is written
in Rust and if you’re building a website or service in Rust, you should
consider using this “stack”.
After evaluating various frameworks and libraries, I settled on a couple
of key ones and then discovered that someone had written it up as a
stack. Shantanu Mishra described the same set of libraries I landed on
as the “mash 🥔 stack” and gave it the tagline “as simple as potatoes”.
This stack is fast and nice to work with, so I wanted to write up my
experience building with it to help spread the word.
TL;DR: The stack is made up of Maud, Axum, SQLx, and HTMX and, if you
want, you can skip down to where I talk about synergies between these
libraries. (Also, Scour is free to use and I’d love it if you tried it
out and posted feedback on the suggestions board!)
We have developed sophisticated safety and security measures to prevent
the misuse of our AI models. While these measures are generally
effective, cybercriminals and other malicious actors continually attempt
to find ways around them. This report details a recent threat campaign
we identified and disrupted, along with the steps we’ve taken to detect
and counter this type of abuse. This represents the work of Threat
Intelligence: a dedicated team at Anthropic that investigates real world
cases of misuse and works within our Safeguards organization to improve
our defenses against such cases.
The XY problem is asking about your attempted solution rather than your
actual problem. This leads to enormous amounts of wasted time and
energy, both on the part of people asking for help, and on the part of
those providing help.
The United States has become the world’s biggest bully, threatening any
country that doesn’t do as it demands with tariffs, and its tech
companies are taking full advantage by flexing their muscle and trying
to avoid effective regulation around the world. The drawbacks of our
dependence on US tech companies have become more obvious with every
passing year, but now there can hardly be any denying that where we can
pry ourselves away from them, we should make the effort to do so.
Break up with the Big Tech cloud in this unique 1 x 6hr live training
intensive where you are wholly supported in the process of building up
your own sovereign, self-hosted and private infrastructure – free from
AI, shady data-sharing agreements and prying eyes. Running atop
renewable energy on a European Internet backbone, the cloud platform is
both tightly locked-down and easy to maintain
Once upon a time, I taught a course on AI in the Library in an iSchool. (Check out the reading list and syllabus, which are relevant context for this post.) I haven’t taught it for a while — I got busy with other things, and then ChatGPT dropped and immediately rendered my syllabus obsolete and I wasn’t up for overhauling it.
At least, that was why I wasn’t teaching it in 2022-2024. In 2025 and 2026, there are some pretty different reasons not to be adjunct faculty, particularly in a public university, if you don’t have to be. Like an enormous Project-2025-based, federally directed but decentralized system to dictate syllabi, destroy academic freedom, and fire professors who don’t toe the line. (Examples unfortunately abound, such as University of Oklahoma, Texas A&M, San José State University — the last of these being the university where I taught.)
And in the midst of all this, here’s the (excellent) Texas Tribune reporting on ChatGPT prompts used at Texas A&M to flag course content that might take a perspective on race and gender. And so, with a rising sense of nausea, I may as well see how my syllabus would fare in this brave new world.
First (and startlingly), the prompts only cover very high-level course metadata: course title, description, and learning objectives.1 The week-by-week descriptions and reading list, or indeed the syllabus as a whole, aren’t covered, so these prompts don’t reach week 2’s case study of an AI metadata generation prototype in a Black photographer’s archive, or week 3 on the broad and heavily researched field of (ironically) algorithmic bias.
And so, the results:
Course Title. Unsurprisingly, “AI in the Library” raises no issues for the state of Texas (though of course chatgpt takes a great many words to be unconcerned about it).
Course Description.
This course covers what artificial intelligence is and critical analysis of AI systems, in the context of real-world library and archival AI applications. Programming skills not required.
To my surprise, chatGPT is pretty okay with this:
Review of value-neutrality
Advocacy or ideological language: None identified. The description does not include prescriptive claims, affective outcomes, or ideological commitments.
Potential neutrality concerns: The phrase “critical analysis of AI systems” is standard academic language and does not, on its own, imply advocacy. However, “critical” can be interpreted broadly; specifying analytical methods or perspectives can improve clarity without reducing neutrality.
Overall assessment: The description is value-neutral, concise, and appropriate, though it is shorter than the requested 75–125 words and could benefit from greater methodological detail.
I was sure that “critical analysis” would get more pushback (how could we possibly want that in higher education in 2026??), but I guess mostly I’m supposed to be more unnnecessarily wordy, and I could keep waving this IYKYK flag.
Learning Outcomes.
Upon successful completion of the course, students will be able to:
Understand and explain the basics of AI: both its underlying principles and common machine learning techniques.
Discuss realistic ways that AI can be a part of library services.
Critically analyze potential pitfalls of AI systems, including the role of their underlying data sets and their ramifications in society.
Y’all. Despite some honestly fair ways chatgpt pointed out that these verbs are squishy and might be replaced with ones that are sharper and/or lend themselves more obviously to assessment, I absolutely slipped “critically analyze…their ramifications in society” right past it:
“Critically analyze potential pitfalls…” — This is largely value-neutral. The phrase “ramifications in society” is acceptable, though specifying analytical contexts (e.g., professional or institutional settings) would improve precision.
Pretty sure any antagonistic human reading learning objective number 3 there would know what I was getting at, but I guess chatgpt still can’t replace all the jobs.
There’s also a prompt for “course justificiation”, but I was not asked to write one of these for my syllabus. The closest it gets is a “core competencies” section aligning the course with learning objectives for the overall iSchool program. However, these objectives are govened by the iSchool, not the individual professors. Therefore I’ll be leaving the course justification prompt aside. ︎
“Ghost kitchens” are pop-up restaurants geared entirely toward food delivery. They typically rent space in traditional restaurants to prepare food, take orders online, and deliver them to the doorstep via delivery apps like DoorDash or Uber Eats. Ghost kitchens proliferated during the COVID pandemic, which for a time practically extinguished dine-in food service. Restaurants of all descriptions needed to restructure their operations to scale up food delivery as their main service model; ghost kitchens were the extreme example, with the entire service model built around delivery.
The story of ghost kitchens is one of a specific business sector—restaurants—retooling traditional operational structures and service models to meet changing conditions in the marketplace. Gary Pearce, Director, Academic Services in the Monash University Library, touched on a similar theme in a recent OCLC Research Library Partnership webinar, describing how the Library reimagined its operational and service models to scale up research support capacities and better address institutional needs and priorities. As with ghost kitchens, Monash sought to reimagine its services in response to changing imperatives—specifically, the need to deliver research support at scale, within the confines of prevailing budgetary limitations. This situation will surely resonate with other research libraries, and there is much to be learned from Monash’s experiences and innovative solutions.
Retooling operational structures and service models
Academic Services is one of three portfolios at Monash University Library. To address the need to scale research support services and align more closely with stakeholder needs, Academic Services shifted from a traditional liaison librarian model organized on disciplinary lines to a functional specialization approach based on library expertise. This change moved away from multiple teams providing duplicate services to specific disciplines, in favor of agile, project-based service teams that work across disciplines.
A key aspect of Monash’s approach is the creation of a new Library Business Partner role, whose chief responsibility is strategic relationship management with senior leadership in a specific academic area. The Library Business Partner serves as a conduit for two-way communication between the Library and its academic stakeholders: on the one hand, communicating library messaging to the academic unit, and on the other, gathering intelligence and feedback on the unit’s needs and mobilizing capacity within the Library’s service teams to address them.
Pearce provided a rich description of how this retooling of operational structures and service models was conceived and implemented. Here are a few of the themes that emerged from his discussion:
Acknowledging relationship management as a dedicated role: A key innovation was the creation of the Library Business Partner role to manage outreach and engagement with academic units. The Library Business Partner represents the entire Library and therefore can provide a comprehensive view of Library capacities, as well as expedite responses to stakeholder needs. Separating relationship management from service delivery facilitated a shift from a reactive, transactional model to a more proactive, two-way partnership.
Emphasizing a culture of agility: Building a service model that was both scalable and responsive led Monash to adopt an agile approach. Academic Services implemented a matrix organizational structure in which staff have fixed reporting lines with flexible membership across multiple service teams—including research support. Staff have the option to rotate across teams to deepen expertise and experience. Work is divided between “business as usual” work and project work, the latter of which can be scaled up or down as needs and resource availability dictate. While this new operational structure could pose challenges to long-standing professional identities tied to traditional service models, it also opens up new pathways to leverage existing areas of expertise and develop new ones.
Close attention to change management: The new operational structures were a significant departure from previous models. In recognition of this, the development and implementation processes were characterized by consultation, transparency, and communication, including a series of consultative visits to peer institutions facing similar challenges in adapting service development and delivery; presentations to stakeholder groups; regular updates to staff, along with clear milestones and timelines; open channels for questions and feedback; and planning for professional development needs.
These are just some of the key themes that provide the foundation for Monash Library’s story of transformation, scalability, and responsiveness.
Managing implementation is crucial
Pearce’s presentation elicited many questions from the audience in attendance. Collectively, the questions reflected a keen interest in the implementation aspect of the shift to new operational structures and service models, touching on issues like:
Stakeholder response and buy-in: how researchers and staff reacted to service model changes; channels for communication and feedback
Staffing implications: impact of restructuring on staffing counts; work allocations between “business as usual work” and project work; cross-training opportunities
Strategic relationship management: interest in the details of how the new Library Business Partner role works in practice
The audience’s interest in these topics highlight that a shift from a traditional/subject-focused service model to a functional/specialization model requires attention to both structural innovation and the staffing and stakeholder reactions to significant organizational change.
Additional reading
Pearce’s webinar intersects with several OCLC Research studies that complement some of the themes from the emerging from the service model transformation experience. First, check out our work on social interoperability, which we define as the creation and maintenance of working relationships between individuals and organizational units within an institution. Our report describes strategies and tactics that can help strengthen social interoperability skills—an essential element of roles like Monash’s Library Business Partner. In addition, OCLC Research’s forthcoming work on the Library Beyond the Library—an operational principle that emphasizes the importance of the library engaging with the broader institutional environment through strategic alignment, collaboration, and storytelling—connects with Monash’s ambitions to retool its service model to better align with institutional research needs and priorities.
Ready to dive deeper? Listen to the full recording
The webinar and subsequent Q&A offered a richly informative look behind the scenes of a major shift in operational structures and service models to better address the needs of stakeholders. If you didn’t have a chance to join us for the live webinar, please take some time to view the recording. Many thanks to Gary Pearce for sharing his perspective with all of us!
This post was written by Cláudia De Souza, who attended the 2025 DLF Forum as the Grassroots Archives and Cultural Heritage Workers Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR.
Cláudia De Souza is an associate professor at the College of Communication and Information at the University of Puerto Rico. She teaches in the Graduate Program in Information Science, focusing on information organization and retrieval, with particular emphasis on the analysis, evaluation, and design of digital libraries and archives. She is also the academic coordinator of the UPR Caribe Digital project, an initiative dedicated to advancing Digital Humanities research and scholarship in and for the Caribbean. She advocates for and fosters open access to knowledge, supports digital preservation, and promotes the dissemination of documentary heritage. Her work is driven by a commitment to enhancing the visibility and accessibility of information resources across the Insular Caribbean.
Attending the 2025 DLF Forum for the first time was an exceptional experience, especially as the sole fellow selected for the Grassroots Archives and Cultural Heritage Workers program. I arrived with high expectations, and the event exceeded them, offering a rich environment for learning, collaboration, and professional growth. I had the chance to meet and engage with attendees from a wide range of institutions and backgrounds, including students, emerging professionals, librarians, and archivists. I am especially grateful to the selection committee for choosing such a diverse group of fellows, which I truly appreciate. This experience of connection and dialogue with new voices gave me the opportunity to interact with a diversity of approaches in professional and academic practice, as well as the value of building collaborative networks.
One of the challenges was choosing among the multiple sessions happening simultaneously over the three days. I decided to focus on the topics most closely related to my work at the University of Puerto Rico and that I am most passionate about in library and information science: open access, metadata, information organization and retrieval, community archives, and the visibility of digital collections.
Among the new tools and approaches I explored, the Marriott Reparative Metadata Assessment Tool (MaRMAT) stood out, an open-source application designed to support reparative metadata evaluations and processes of repair and justice in information. It allows librarians to identify harmful, outdated, or otherwise problematic language in tabular metadata using pre-curated or custom lexicons. I definitely plan to explore its use in the work we are developing with community groups in Puerto Rico, as part of the UPR Caribe Digital project. I was also inspired by the presentation of Krystyna Matusiak, which demonstrated how digital collections can be expanded through exhibits that highlight the stories of underrepresented communities. Digital curation not only extends the reach of archives but also provides new opportunities to tell stories in an inclusive and meaningful way, showing how information organization can impact representation and access to collective memory. I leave the Forum inspired and motivated to apply these insights as part of the Digital Humanities initiatives we are planning at my institution, and it will serve as a model to follow.
I am grateful for the opportunity to participate and look forward to continuing these conversations and collaborations. I also hope that next year I will have the chance to submit a presentation to share with others all that I have put into practice. See you at DLF 2026!
This post was written by Amaobi Otiji, who attended the 2025 DLF Forum as a Student Fellow. The views and opinions expressed in this blog post are solely those of the author and do not necessarily reflect the official policy or position of the Digital Library Federation or CLIR. 2025 Student Fellowships were supported by a grant from MetaArchive.
Amaobi Otiji is pursuing his Master of Information at Rutgers University concentrating in the Technology, Information, and Management pathway. Prior to entering this program, Amaobi earned a bachelor’s degree in history from Howard University and has worked in roles involving federal collections, both digitized and born-digital. His professional interests center on digital curation, metadata development, and exploring new approaches to preserving and sharing underrepresented histories. He is focused on increasing equitable access to information and helping to shape how emerging technologies influence our cultural memory. In his spare time, Amaobi enjoys playing baritone ukulele, attending live theater, and playing video games.
Digital Memory Work Across Regions and Histories: Reflections on Community-Driven Projects
As I attended this year’s DLF Forum, I kept returning to two themes that resonated with me across multiple different presentations: community engagement and the quiet connective work that builds the infrastructure for it. Two of the sessions I attended during the conference stood out to me in particular because they approached these ideas from different perspectives but utilized similar underlying practices for catering their projects to their communities’ needs. These sessions were about the HBCU Digital Library Trust and the Borderlands storytelling initiative. Both of these projects work with different kinds of communities, rooted in distinct histories spread across North America. Yet all of them demonstrated how effective digital stewardship can be when community engagement is built into the planning of a project rather than treated as a final step at the end.
The Historically Black Colleges and Universities (HBCU) Digital Library trust session outlined a fantastic model that was centered on providing long term support for HBCUs and their unique archival histories. Their emphasis on shared ownership reflected a great understanding of these institutions and their historical challenges trying to navigate limited resources, public scrutiny, and a society that too often worked against them. What stood out to me in particular was how intentional their approach felt. Rather than expecting HBCUs to adapt to them, they adapted to the HBCUs by meeting them where they were. The Trust, hosted by the Atlanta University Center’s Woodruff Library, focused on building capacity in ways that supported institutional autonomy and reflected the needs of the communities they were trying to serve. Their model felt refreshing, informed and grounded in culturally informed practices that support long term institutional resilience.
The Borderlands storytelling session approached their community engagement from another direction. Their work seemed shaped by the layered histories and cultural dynamics found in the U.S.-Mexico Borderlands and by the University of Arizona’s position as a Hispanic-Serving Institution (HSI) with ties to the region. Their presentation drew from movements, identities, and complex narratives that define the region as well as from the data intensive methods they were using to support the work. They spoke about their efforts mapping, visualizing, and other forms of data storytelling that were central to how researchers were interpreting that complexity. What especially stood out to me was how they treated their approach to “place” as more than a backdrop. It functioned as a structure that was allowed to shape the research itself and set the terms for how they partnered with their researchers. It felt rooted in the region in a way that kept the work responsive so that it could move across research, teaching, and student engagement while still staying grounded in the histories and contexts that give it direction.
Across both of the sessions, I found myself thinking about how our community and networks shape our digital memory work every day. The HBCU Digital Library Trust and the Borderlands initiative each operate within distinct historical and cultural environments, yet they are undeniably interconnected through their commitment to engagement shaped by the histories and needs of the communities they serve. Together, these sessions were a great illumination of how digital memory work is always anchored somewhere and shaped by places, relationships, and shared histories that give it meaning. For those hoping to steward this work, our role is to listen closely enough that those anchors guide the paths we build.
This is the one thousandth post to this blog in the 212 months since the first post. That is an average of 4.7 posts per month, or just over one per week, which is my long-term goal for the roughly half my time that isn't taken up with grand-parenting.
The 1000 posts have gained over 6.88M page views, 7.6% of which were for my EE380 Talk. Less publicized but popular posts get around 30K page views, well above the 6.9K average.
The only one of these statistics that I care about is the goal of a post a week. Having an audience is nice when it happens, but that's not why I'm writing. I write for myself, to understand not necessarily to communicate. Despite this, I'd like to thank those who read and comment.
Win free books from the January 2026 batch of Early Reviewer titles! We’ve got 227 books this month, and a grand total of 2,976 copies to give out. Which books are you hoping to snag this month? Come tell us on Talk.
The deadline to request a copy is Monday, January 26th at 6PM EST.
Eligibility: Publishers do things country-by-country. This month we have publishers who can send books to Canada, the US, the UK, Australia, Belgium, Netherlands, Poland, Latvia, Lithuania, Luxembourg and more. Make sure to check the message on each book to see if it can be sent to your country.
Thanks to all the publishers participating this month!
Hi DLF Community! I’ve really enjoyed connecting with so many of you at the Forum and in Working Group meetings over the past few months. Every conversation, formal and informal, has left me inspired and grateful to be part of a community fueled by care, creativity, and collaboration. I’m excited to keep learning with you all and stay in touch. If you want to chat one-on-one, just send me an email anytime ([email protected]). Wishing you an encouraged, grounded start to 2026!
Closing Today, Call for Proposals: Digital Library Pedagogy Group (DLF Teach) has extended its toolkit CFP through end of day today; submit here.
Metadata Quality Benchmarks Announced: DLF-AIG Metadata Assessment Working Group (MWG) is pleased to announce the public release of benchmarks for metadata quality. Read about them on the DLF Blog.
Fellow Reflections: Starting tomorrow (January 6), look for reflections from the 2025 DLF Forum fellows on the DLF blog. Read past fellow reflections here.
Office closure: CLIR’s offices are closed on Monday, January 19, for Martin Luther King Jr. Day.
This month’s open DLF group meetings:
For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), bookmark the DLF Community Calendar. Meeting dates are subject to change. Can’t find the meeting call-in information? Email us at [email protected]. Reminder: Team DLF working days are Monday through Thursday.
DLF Digital Accessibility Working Group (DAWG): Tuesday, 1/6, 2pm ET / 11am PT
DLF Born-Digital Access Working Group (BDAWG): Tuesday, 1/6, 2pm ET / 11am PT
DLF AIG Cultural Assessment Working Group: Monday, 1/12, 1pm ET / 10am PT
AIG User Experience Working Group: Friday, 1/16, 11am ET / 8am PT
I’m a few days late with this post. Things have been busy with visiting family for the holidays, and we’ve been dealing with various house things since we moved. Of course, those are just excuses, since I’m managing to write this while I have a cold. Some people may have noticed that I didn’t write … Continue reading "2025 Blog Year in Review + Brief Update"
Developing a long-term illness, whether chronic or acute, is like being dropped into a country completely unfamiliar to you. You don’t know the language, the customs, the cuisine, the people. You feel alone, isolated, and totally out of your depth. Eventually, you start to learn the language, the customs. You find community, fellow travelers, people who can help you understand your new life better. It doesn’t stop being hard, but the learning curve becomes less steep and the isolation less intense.
However, unlike when you’re immersed in a new country and culture, you’re falling into this new place and experiencing that painfully slow acculturation while you’re trying to still live your regular life in parallel. You’re expected to be a good parent, partner, family member, friend, employee, housekeeper, bill-payer, etc. But you’re living in two different realities now and because of that, it’s easy to feel alienated from your regular life, especially if you don’t feel like you can bring that other part of yourself to your interactions at work, at home, or out with friends. The cognitive dissonance can be jarring.
It’s hard enough to live that double life, but adding in the vagaries of seeking out a diagnosis and often not being believed, not to mention coping with the symptoms themselves, can make life feel completely untenable. Before my autoimmune diagnosis, I spent more than a year seeing medical professionals who didn’t believe there was anything wrong with me other than the normal discomforts of aging. I kept asking doctors if they thought my symptoms could be autoimmune and was told “no” over and over again, though that felt wrong to me. One PA suggested that some of my symptoms might stem from anxiety since I have a history of anxiety (like I wouldn’t know at this point what anxiety feels like). Within five minutes of talking with the first rheumatologist I saw, after waiting five months for the appointment, he said to me “this doesn’t sound rheumatological at all.” Luckily he still did all the standard testing which showed that he was very wrong. But not being believed by so many doctors for so long stays with you. It leaves a scar. Every time I see a doctor now, I feel like I’m going to court and I’m ready to be cross-examined, to be picked apart. I’m a bundle of nerves.
And my experience is painfully common, especially for women, as poet Meghan O’Rourke writes in her amazing book about chronic illness, The Invisible Kingdom(a meditation on and journalistic exploration of chronic illness and how it is positioned in our social fabric):
And so it is a truth universally acknowledged among the chronically ill that a young woman in possession of vague symptoms like fatigue and pain will be in search of a doctor who believes she is actually sick. More than 45 percent of autoimmune disease patients, a survey by the Autoimmune Association found, “have been labeled hypochondriacs in the earliest stages of their illness.” Of the nearly one hundred women I interviewed, all of whom were eventually diagnosed with an autoimmune disease or another concrete illness, more than 90 percent had been encouraged to seek treatment for anxiety or depression by doctors who told them nothing physical was wrong with them. (p. 103)
Once I got on meds for my condition that finally started working (most first-line meds for autoimmune conditions take three months on average to produce any effects on the immune system), I thought I was past the worst of it. Other than occasional much smaller flares, I was essentially in remission. I learned my limits. I protected my spoons with my life. I kept my stress low. I felt like I had it figured out. I felt good even. And then I got sicker with new symptoms that have stolen so much from me, including my sleep. The past 10 months have honestly been a nightmare with a carousel of doctors who all have completely different theories of what is going on and with a condition that is constantly evolving so what they see in the moment isn’t the full picture. And each doctor I’ve seen hyperfocuses on something different and ignores every other aspect of my case. It’s all making me feel like I’m going crazy.
Each wrong diagnosis brought me to another country, another reality, another identity, that I lived in for a short while. And in each of these countries, I spent countless hours learning, learning, learning all I could, going down research and subreddit rabbit holes, and spending way too much money on products that did nothing because none of those diagnoses were correct. The dermatologist I see who specializes in autoimmune conditions at a research university seems to have given up even trying to diagnose me and she’s basically my last best hope in the state. She wouldn’t even give me differential diagnoses last time beyond “it’s clearly autoimmune given your systemic symptoms.” She’s the one who has put me on a serious immunosuppressant with debilitating side effects, which I guess at least shows she’s taking it seriously, but if she doesn’t know what she’s treating me for, how does she know that this drug that is making me feel terrible is even going to help? Atul Gawande said of doctors that “nothing is more threatening to who you think you are than a patient with a problem you cannot solve” (quoted in The Invisible Kingdom, page 209) and I feel that in how I’m treated at every appointment I go to. With the exception of the charlatan immunologist who was desperate to diagnose me with MCAS though there was no evidence in support of it, not one doctor has seemed at all interested in figuring out what this is – it’s felt more like a game of “not it.”
Of a similar liminal moment in her own illness, O’Rourke wrote, “in my illness I was moored in an unreachable northern realm, exiled to an invisible kingdom, and it made me angry. I wanted to rejoin the throngs. In dark moments I continued to wonder if the wrongness was me” (99). And of course people will feel like that wrongness is them when we live in a culture that views chronic illness as some sort of weakness or something we caused through through our own bad habits. Like O’Rourke, I feel both exiled from my world and forced to be in it at the same time, which is a unique form of torture. Going to work, pretending things are ok, doing my job, meeting deadlines, helping students, smiling, all the while my body is attacking itself, I’m barely sleeping, I’m spontaneously bleeding from my skin and under my skin, and I’m so itchy I sometimes have to wear gloves to bed or I’ll scratch myself raw in my sleep. You feel like you’re play-acting being yourself, being a person in the world, because you’re not really there anymore. And when you’re suffering, and don’t know what your illness is, and you feel abandoned, it’s easy to go down rabbit holes of self-loathing along with those rabbit holes of fruitless research that make you feel like an unhinged obsessive with a murder board and yarn. As Meghan O’Rourke wrote, “your sense of story is disrupted” (p. 259) and you feel like a stranger to yourself.
I started writing about slow librarianship long before I got really sick, but even then, I knew the importance of fostering a work culture where you can be a whole person. I knew how it felt to have a child and feel like you couldn’t prioritize family obligations over work ever (though working during family time? Totally ok, right?). In a workplace that encourages people to be whole people, workers feel like they can prioritize the things in their lives outside of work that are important – their caregiving responsibilities, their health, the people they love, etc. They feel like they can talk about these things – that they don’t make them liabilities. They can be vulnerable and real. And feeling like you can be vulnerable and real about who you are and how you’re doing means that you can also be vulnerable and real in your work, which makes us better employees who are energized to try new things.
I think a lot of people in positions of power might even want a culture like this, but very few actively create it. They might think that saying “take what time you need” when someone is facing illness is enough. But I think two pieces are missing from this. First, managers need to not only say “take what time you need” but work with their direct reports to address the work that would otherwise pile up. If you say “take all the time you need” but all the work with its stressors and deadlines is still there, you’re not really giving people space. Can you take good care of yourself while you watch the work pile up and up and up? How many of us have come back to work while still not fully recovered from an illness because of the work that was piling up or a class they needed to teach?
Also, managers need to model vulnerability, transparency, and being whole people themselves. If they put up a false front of strength, if they’re not willing to be vulnerable and human and real themselves, if they do not model transparency, there’s no way that others will feel safe doing so. I was lucky to have a boss in my first academic library job who was deeply human in her interactions with her employees, so I got to see what that looked like. And it was her humanity that engendered fierce loyalty in her employees – we all thought the world of her. Even when she made decisions that people didn’t like (which was rare as she really did take our insights to heart), she explained her thinking in a transparent way. Given my later experiences, her way of being feels vanishingly rare. I think a lot of managers feel like they need to project strength, not explain their decisions, not let their direct reports really know them as people with full lives, but I don’t think that’s true. A lot of managers operate out of a place of fear or insecurity, but my first academic library director was confident enough to be her full self, flaws and all.
In a culture where we don’t feel like we can bring ourselves fully to work, I don’t feel like I can talk about my illness. I feel selfish and weak for even considering it. Like, we all have shit going on, right? The world is pretty awful right now. People’s lives are complicated and messy and there’s probably a lot of suffering I know nothing about happening all around me. If they don’t talk about it, who am I to talk about it? I’m not special. While I’ve mentioned being sick at work in the context of being immunocompromised and needing to protect myself and not participate in large, crowded events, even that has felt really uncomfortable. Everyone should feel like they are important enough to their places of work and valued enough to bring up these things without feeling embarrassed or like they’re asking for “special treatment.” I read recently (can’t remember the source) that close to 60% of people with chronic illnesses have not told people at work about it. Imagine hiding something that is such a pivotal and ineluctable piece of many people’s identities and think about the double life that forces them to live.
At work I feel a lot of shame about being sick and I work more than I should given how I’ve been feeling (this is common). I feel like I need to mask how I’m really doing, that people don’t want to hear it. And it’s true that most don’t. There are three people at work I can talk to about my illness, but others seem so incredibly uncomfortable when I mention it, so I’ve learned to just pretend it’s all ok or say nothing. I know that some of my reticence and shame comes from my own internalized ableism as it’s the water we all swim in, but when I worked in that library where the culture encouraged vulnerability, humanity, and care, I remember how different it felt. How much less distance there was between the person I really was and the person I was at work.
In a meeting last year, our Dean was talking about making the next all-library meeting in-person only. Previously, they had always offered them hybrid, but she didn’t like that most people were choosing not to come in-person. And I totally get it, even though it sucks to always be the outlier. She wants it to be a team-building experience and that’s really hard to do when most people are participating from home with their cameras off. At that meeting, for the first time, I disclosed my illness in front of a bunch of people and talked about how important it is to always offer an online option for folks who are medically vulnerable or at least find ways to make indoor spaces safer for those who can’t afford to get sick. My boss then asked to meet with me to talk about how to make our spaces safer. I talked about airflow, encouraging and providing masks, and, during temperate months, having the meetings at places where windows and doors can remain open or even holding them outside (we’d had one meeting at a park a few years ago which had been the best one ever from both a health and team-building perspective). I suggested that we make the Winter meeting fully virtual since it’s the height of flu season and you can usually get people participating more in a fully virtual meeting than a hybrid one, where the online people feel like weird lurkers. I was thanked for my feedback and didn’t hear anything after that. This September, the all-library meeting was held in our campus library where the windows don’t open (albeit with a couple of HEPA filters scattered around, but I know we do have large college spaces where doors and windows can be opened because I went to an all-day union meeting in one where they did just that and we’d used that space for library meetings in the past) and our February meeting is going to be held in-person during the worst flu season in decades. Obviously, none of this was personal or intended to cause harm, but, at the same time, how should I feel under the circumstances? Clearly speaking out in that meeting, something that I had to really steel myself to do, had been pointless. Why would I ever bring it up or ask for anything again?
When we come back from winter break, people inevitably ask each other how it was, but do they really want to know? I think they want to hear “good,” “fun,” “restful,” etc. How do you talk about a “vacation” in which you spent most of it doubled over in pain after eating anything thanks to the toxic meds you have to take and your father-in-law was in the hospital dying the entire time? I wish that I felt I could share what an absolute shitshow it’s been, to feel like I could be a full person at work, but when you know no one really wants to hear it, when it just makes people uncomfortable, it becomes so much easier to smile and say “it was good!” and move on. Who wants to be a buzzkill?
Slow librarianship puts worker well-being over productivity and deadlines, allows workers to be whole people at work, and supports a culture of care. While a radical idea, this even makes good business sense because depleted and burned out workers have been shown to be a major drain on the organization and negatively impact the culture. If you’re a manager and you’re not actively fostering a culture where people can bring their whole selves to work, then you are fostering a culture where people do not feel safe being vulnerable and having needs. Workers, if you know a colleague is struggling with something (an illness, losing a loved one, a difficult caregiving situation, etc.) and you don’t check in to see how things have been going for them, you’re sending the message that you don’t want to hear about these things, that they make you uncomfortable, that they’re not appropriate to take to work. I think we’ve all probably been guilty of this at some point in our lives and maybe we even thought that not asking was the right thing to do. I can imagine some people think that asking is invasive or reminds the person that they are sick, but it is an expression of care. As Philip Hoover writes in his excellent Sick Times article entitled “You know someone with Long COVID. They need you to ask about it genuinely”
Approach us with empathy and curiosity. Ask questions that show a sincere desire to understand. How are you feeling this week? works because it acknowledges the chronic, fluctuating nature of my health. Or a version of Nunez’s question, and one I’ve longed to be asked since I’ve been ill: What has this been like for you?
If our response is tough to hear, try not to smother it in optimism. And tread lightly. While some long-haulers may appear okay in public, much of our suffering occurs in private, shade-drawn rooms, across lonely afternoons, stuck in bed. When in doubt, remember that the act of asking never hurts — but never being asked certainly does.
While I don’t have Long COVID, this piece so perfectly encapsulates how I feel as someone with a mostly invisible chronic illness who would just love to be asked “how are you feeling this week?” instead of feeling like I have to pretend I’m okay. I told my manager at the start of Fall term that my autoimmune disease had become significantly worse and that I didn’t know what my capacity might look like going forward. She expressed sympathy, told me to take what time I needed, and never checked in with me after that. That September, I was taking over a very important committee chair role that was made enormously more time-consuming and onerous by the departure of the three colleagues most involved in supporting this work (two of whom had more than 20 years of institutional knowledge locked up in their heads). I didn’t feel like I had the leeway or support to let things drop as more and more kept piling up on my plate related to my chair role and it became clear that I was expected to do a lot of onboarding for the new people in this role even though I was new to my role and was no one’s manager. While I know my boss is extraordinarily busy, the message that not checking in with how I was doing sent me was very different from what I assume she’d wanted to convey. Checking in with a colleague or direct report seems like such a small thing, and it is in terms of the effort it requires, but the impact it can have in making someone feel cared for and less like they have to live a painful double life is enormous.
We have been at this location for over 20 years as formerly Saigonese
Restaurant. The Wheaton area has changed dramatically over time as our
customers and their preferences. Our restaurant became more and
well-known good places to enjoy Vietnamese cuisines. After serious
considerations, with kind suggestions from our customers, we rebranded
to Pho and Banh Mi Saigonese.
Standard Ebooks is a volunteer-driven effort to produce a collection of
high quality, carefully formatted, accessible, open source, and free
public domain ebooks that meet or exceed the quality of commercially
produced ebooks. The text and cover art in our ebooks are already
believed to be in the U.S. public domain, and Standard Ebooks dedicates
its own work to the public domain, thus releasing the entirety of each
ebook file into the public domain. All the ebooks we produce are
distributed free of cost and free of U.S. copyright restrictions.
Every culture produces heroes that reflect its deepest anxieties. The
Greeks, terrified of both mortality and immortality, gave us Achilles.
The Victorians, haunted by social mobility, gave us the self-made
industrialist. And Silicon Valley, drunk on exponential curves and both
terrified and entranced by endless funding rounds, has given us the Hero
Developer: a figure who ships features at midnight, who “moves fast and
breaks things,” who transforms whiteboard scribbles into billion-dollar
unicorns through sheer caffeinated will.
We celebrate this person constantly. They’re on the front page of
TechCrunch et al. They keynote conferences. Their GitHub contributions
get screenshotted and shared like saintly relics.
Meanwhile, an unsung developer is updating dependencies, patching
security vulnerabilities, and refactoring code that the Hero Developer
wrote three years ago before moving on to their next “zero to one”
opportunity.
They will never be profiled in Wired.
But they’re doing something far more important than innovation.
No app did exactly what I needed, so I built my own personal finance
system using plain-text accounting principles and a powerful Python
library called Beancount. This post shows you how I handle imports,
investments, multi-currency, and a two-person view.
Today, Ruby 4.0 was released. What an exciting milestone for the
language!
This release brings some amazing new features like the experimental
Ruby::Box isolation mechanism, the new ZJIT compiler, significant
performance improvements for class instantiation, and promotions of Set
and Pathname to core classes. It’s incredible to see how Ruby continues
to thrive and be pushed forward 30 years after its first release.
To celebrate this release, I’m happy to announce that I’ve been working
on porting the Charmbracelet Go terminal libraries to Ruby, and today
I’m releasing a first version of them. What better way to make this Ruby
4.0 release a little more glamorous and charming?
I remembered that people have been working towards documenting the Mac
ROM startup tests and using them to diagnose problems, so I decided to
give it a shot and see if Apple’s Serial Test Manager could identify my
Performa’s issue. Where was the fault on this complicated board? Sure, I
could test a zillion traces by hand, but why bother when the computer
already knows what is wrong?
1925 was a watershed year for the recording industry. The Jazz Age was
in full swing and beginning in March, 1925, widespread adoption of
electrical recording meant greater fidelity and new realism for
recordings. DAHR documents over 11,000 recordings made in 1925–an
all-time high for the industry.
This is a screen recording of a 60 Minutes segment about the Centro de
Confinamiento del Terrorismo (CECOT) prison in El Salvador, which was
intended to be aired December 22, 2025 but was pulled last minute for
unclear reasons. Despite being pulled, it aired on Global-TV in Canada
anyway.
Cormac McCarthy, one of the greatest novelists America has ever produced
and one of the most private, had been dead for 13 months when I arrived
at his final residence outside Santa Fe, New Mexico. It was a stately
old adobe house, two stories high with beam-ends jutting out of the
exterior walls, set back from a country road in a valley below the
mountains. First built in 1892, the house was expanded and modernized in
the 1970s and extensively modified by McCarthy himself, who, it turns
out, was a self-taught architect as well as a master of literary
fiction.
I was invited to the house by two McCarthy scholars who were embroiled
in a herculean endeavor. Working unpaid, with help from other volunteer
scholars and occasional graduate students, they had taken it upon
themselves to physically examine and digitally catalog every single book
in McCarthy’s enormous and chaotically disorganized personal library.
They were guessing it contained upwards of 20,000 volumes. By
comparison, Ernest Hemingway, considered a voracious book collector,
left behind a personal library of 9,000.
uv is fast because of what it doesn’t do, not because of what language
it’s written in. The standards work of PEP 518, 517, 621, and 658 made
fast package management possible. Dropping eggs, pip.conf, and
permissive parsing made it achievable. Rust makes it a bit faster still.
The Post-Platform Digital Publishing Toolkit is an iterative digital and
print publication by Well Gedacht Publishing exploring how to overcome
the limitations of digital publishing on social media and other online
platforms, and advocates for self-hosted infrastructures and practices
for artists and artists’ book publishers. You can find the first
iteration of the print publication here.
Giambattista Vico (born Giovan Battista Vico /ˈviːkoʊ/; Italian:
[ˈviko]; 23 June 1668 – 23 January 1744) was an Italian philosopher,
rhetorician, historian, and jurist during the Italian Enlightenment. He
criticized the expansion and development of modern rationalism, finding
Cartesian analysis and other types of reductionism impractical to human
life, and he was an apologist for classical antiquity and the
Renaissance humanities, in addition to being the first expositor of the
fundamentals of social science and of semiotics. He is recognised as one
of the first Counter-Enlightenment figures in history.
Commercial robots have widespread and exploitable vulnerabilities that
can allow hackers to take over within hours or even minutes, according
to Chinese cybersecurity experts.
Security in the robotics industry is “riddled with holes,” said Xiao
Xuangan, who works at Darknavy, an independent cybersecurity research
and services firm based in Singapore and Shanghai. Xiao noted that when
testing low-level security issues in quadruped robots, his team gained
control of one of Deep Robotics’ Lite-series products in just an hour.
Better late than never, right? This review of WS-DL's 2024 is incredibly late, but some family concerns delayed my writing and then I never quite got back on track. We had quite a productive 2024, graduating a record three MS students and four PhD students.
Students and Faculty
We did not add or lose any faculty this year, but Dr. Jayarathna received tenureearly! Congratulations to Sampath!
We graduated four PhD students:
Bathsheba Farrow defended her dissertation on 2024-06-28. Bathsheba already had a position with the Naval Surface Warfare Center, and will continue with them after her graduation.
Muntabir Hasan Choudhury defended his dissertation on 2024-11-06. Muntabir took a research fellow position at the Food and Drug Administration (FDA).
Congratulations to Dr. Bathsheba Farrrow @sheissheba for successfully defending her dissertation "A Microservices Approach to EEG Research in the Public Cloud" She is the very first student from the @NirdsLab research group @WebSciDL@oducs to defend the Ph.D. pic.twitter.com/K5wDRa7qu4
Congratulations to Dr. Yasith Jayawardana @yasithdev for successfully defending his doctoral dissertation "A Realtime Biosignal Processing Framework for Lab Scale Experimentation." Second Ph.D to come out of @NirdsLab research team @WebSciDL@oducs@ODUSCI. pic.twitter.com/zMYQ9Tq6KP
Dominik Soós defended his MS thesis and joined the PhD program.
Lesley Frew defended her MS thesis and joined the PhD program.
My student Dominik Soós successfully defended his Master's thesis today. His thesis title is "Who Wrote the Scientific News? Improving the Discernibility of LLMs to Human-Written Scientific News". Thanks @vikas_daveb@Meng_CS. Dominik will be a PhD student in Fall 2024 @WebSciDLpic.twitter.com/7yVXdoOT0w
To generate our annual publication list, we use our tool "Scholar Groups", which scrapes Google Scholar profiles and merges and deduplicates publications. As such, our list is limited by the accuracy of Google Scholar, which is pretty good but not perfect. It looks like for 2024 we published about 28 refereed conference papers, 12 journal articles, and one patent (congratulations, Dr. Ashok!).
Conferences have mostly returned to f2f, but frequently there are still virtual/remote options. Below is a partial list of trip reports for the events where we presented our work:
In addition to the conferences and workshops listed above where we presented papers, we also had an array of presentations and outreach.
We hosted the third and final year of our NSF Disinformation Detection and Analytics Research Experiences for Undergraduates (REU) Program. The cohort of the third year was especially successful, as reflected in the mid-point and final presentations. As you can see from the list above, many of the REU projects resulted in publications.
We also had our 5th annual "Trick or Research" event. Dr. Jayarathna initiated the event five years ago and the entire department has embraced it.
Dr. Jayarathna gave an invited talk at Ocean Lakes High School (2024-10-16)
Good to see our friend Professor Herzog @maherzog during his annual visit @odu. We had a nice lunch at the Webb Center, got a chance to exchange some ideas, and met our @NirdsLab@WebSciDL students during the weekly lab meeting. pic.twitter.com/Ry0rr99KzJ
Our scholarly contributions are not limited to conventional publications or presentations: we also advance the state of the art through releasing software, data sets, and proof-of-concept services & demos. Some of the software, data sets, and services that we either initially released or made significant updates to in 2024 include:
Dr. Jayarathna was Senior Faculty Fellow, Office of Naval Research Summer Faculty Program, at Naval Surface Warfare Center Dahlgren Division, Dam Neck Activity during the summer
I am honored to be awarded the 2024 Provost's Outstanding Undergraduate Research Mentor of the Old Dominion University. Thanks President Hemphill and Provost Agho. @WebSciDL@oducs@ODUSCI@ODUpic.twitter.com/sAl2xiM2Dh
2024 was a strong year for us: four PhD and three MS students graduated, three new PhD students, and six PhD students advancing their status. One of our faculty members, Dr. Sampath Jayarathna, received tenure. We continued to publish in prestigious venues, with about 40 refereed publications. We helped generate just over $7M in new external funds, from six different grants. WS-DL continues to grow and thrive, and I am proud of all the members and alumni and their progress in 2024.
If you would like to join WS-DL, please get in touch. To get a feel for our recent activities, please review our previous WS-DL annual summaries (2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, and 2013) and be sure to follow us at @WebSciDL to keep up to date with publications, software releases, trip reports and other activities. We especially would like to thank all those who have publicly complimented some aspect of the Web Science and Digital Libraries Research Group, our members, and the impact of our research.
–Michael
Congratulations to All three Dr. J's from the @NirdsLab Dr. Bathsheba Jackson @sheissheba , Dr. Gavindya Jayawardena @Gavindya2 , and Dr. Yasith Jayawardana. @yasithdev. @WebSciDL PhD Crush board updated, just waiting in the border of alumni cloud until the paperwork is done! pic.twitter.com/dwEMMzOpy5
After the defense, @lesley_elis continued @WebSciDL's tradition of students sharing a hometown meal with us. Lesley's Philly roots showed up in the cheesesteaks (made authentic with Cheez Whiz), @Tastykake, birch beer soda, and @GrittyNHL decorations. Congrats again, Lesley! 🎉🎓 pic.twitter.com/WP8Ny1d2BY
I’m glad we’ve reached a new Public Domain Day, and that the works I’ve been featuring in my #PublicDomainDayCountdown, and many more, are now free to copy and reuse. I’ve been posting about works joining the public domain in the United States, which include sound recordings published in 1925, and other works published in 1930 that had maintained their copyrights. (Numerous works from 1930, and later, that had to renew their copyrights, and did not, were already in the public domain, though many of the best-known works did renew copyrights as required.) This is the eighth straight year Americans have seen a year’s worth of works join the public domain, after a 20-year freeze following a 1998 copyright extension.
I intend my countdown not just to celebrate the works joining the public domain, but also to celebrate what people have done with those works. In some posts, I note later creations based on those works. In nearly all my posts, I link to things that people have written about those works. Like the works themselves, those responses may have flaws or quirks, but I value them as human reactions to human creations. Whether they’re reviews, personal blog posts, professionally written essays, scholarly analyses, or Wikipedia articles, they’re created by people who encountered an interesting work and cared about it enough to craft a response to it and share it with the world. Those shared responses in turn pique my interest in the writers and the works.
It wasn’t always easy for me to find such responses online. Sometimes I’d go searching for responses to a promising-sounding work, and only find sales listings on e-commerce sites, social media posts not easily linkable or displayable without logging into a commercial platform, paywalled articles that many of my readers can’t view, or generic-sounding pages that read like they were generated by a large language model or a content farm, but not by anyone who I could clearly tell cared about or even read the work in question. Some works I initially hoped to feature got left off my countdown, replaced by other works where I could more readily link to an interesting response.
The people publishing the responses I link to are often swimming against a strong current online. Many online writing systems– including the one I’ve been writing these posts on— are now urging their users to “improve” their posts by letting “AI” write them. Some writers may be tempted to allow it, when facing an impending deadline or writer’s block or anxiety, even when the costs can include muffling one’s own voice, signing onto falsehoods confidently stated by a stochastic text generator, or abusively exploiting existing content and services. Other writers may feel pushed to put their work behind paywalls or other access controls that makes them less likely to be plagiarized or aggressively crawled by those same “AI” systems. And most writers, myself included, find it easy to dash off a quick short take on a social media platform, be quickly gratified by some “like”s, and then have it forgotten. It’s harder to take the time to craft something longer or more thought-out that will be readable for years, and that might take much longer for us to hear appreciated. The easy alternatives can discourage people from devoting their time to better, more lasting creations.
As I’ve noted before, both copyright and the public domain serve important purposes in encouraging the creation, dissemination, sharing, and reuse of literature and art. One reason I write my public domain posts is to promote a better balance between them, particularly in encouraging shorter copyright lengths to benefit both original creators and the public. Similarly, as I’ve noted in another recent post, I value both human creation and automated processes, but I increasingly see a need to improve the balance between those as well, especially as some corporations aggressively push “generative AI”. While I appreciate many ways in which automation can help us create and manage our work, I treasure the humanity that people thoughtfully put into the creation of literature and art of all kinds, and the human responses that those creations elicit.
Today I’m thankful for all of the people, most no longer with us, who made the works that are joining the public domain today. I’m thankful for the new opportunities we have to share and build on those works now that they’re public domain. I’m thankful to all the people who have responded to those works, whether as brief reactions or as new works as ambitious as the works they respond to. And I hope you’ll keeping making and sharing those responses with the world when you can. I look forward to reading them, and perhaps linking to them in future posts.
Happy Public Domain Day! You might hear people say that books published in 1930 have "fallen" into the US Public Domain, or, that they have lost copyright "protection". This is not quite correct. Rather, books published in 1930 have been FREED of copyright restrictions. They have ASCENDED into the public domain and into the embrace of organizations like Project Gutenberg. They now belong to ALL of us, and we need to take care of them for future generations.
On October 21, Project Gutenberg lost its longtime leader, Greg Newby, to pancreatic cancer. I had agreed to step up as Acting Executive Director so that Project Gutenberg could continue the mission that had become Greg's life work: to serve and preserve public domain books so that all of us can use and enjoy them without restrictions. Although I've been doing development work for Project Gutenberg for the past 8 years, I did not really understand what Greg's job entailed, or how many tasks he had been juggling. Three months in, I'm still discovering mysterious-to-me aspects of the organization. I've also been amazed at the dedication and talent of the many volunteers behind Project Gutenberg and our sister organization, Distributed Proofreaders. And at the large number of donors who make the organization financially viable and sustainable. So as of 2026, with your support, I'm continuing as Executive Director.
In the past three months Project Gutenberg has proven to be resilient; we took a heavy blow and managed to keep going. My top priority going forward is to make Project Gutenberg even more sustainable as well as resilient. In other words, my job is be one runner in a relay race: take the baton and make sure I get it to the next runner. That's what we all have to do with public domain books, too. We want them to still be there in 50 years! Whether you're already a volunteer or booster, an avid reader, or just someone curious about what we do, I hope you'll help us pass that baton.
showed that (i) a successful block-reverting attack does not necessarily require ... a majority of the hash power; (ii) obtaining a majority of the hash power ... costs roughly 6.77 billion ... and (iii) Bitcoin derivatives, i.e. options and futures, imperil Bitcoin’s security by creating an incentive for a block-reverting/majority attack.
It is worth noting that they are not talking about profiting from double-spending. The Bitcoin blockchain transacts around $17B/day of nominal value in around 450K transactions (average ~$38K), but in 2021 Igor Makarov & Antoinette Schoar found that:
90% of transaction volume on the Bitcoin blockchain is not tied to economically meaningful activities but is the byproduct of the Bitcoin protocol design as well as the preference of many participants for anonymity ... exchanges play a central role in the Bitcoin system. They explain 75% of real Bitcoin volume.
Of course, just because they aren't "economically meaningful" doesn't mean they aren't worth attacking! The average block has ~3.2K transactions, so ~$121.6M/block. As a check. $121.6M * 144 block/day = $17.5B. So to recover their cost for a 51% attack would require double-spending about 8 hours worth of transactions.
I agree with their technical analysis of the attack, but I believe there would be significant difficulties in putting it into practice. Below the fold I try to set out these difficulties.
First, I should point out that I wrote about using derivatives to profit from manipulating Bitcoin's price more than three years ago in Pump-and-Dump Schemes. These schemes have a long history in cryptocurrencies, but they are not the attack involved here. I don't claim expertise in derivatives trading, so it is possible my analysis is faulty. If so, please point out the problems in a comment.
The key idea behind this strategy, called Selfish Mining, is for a pool to keep its discovered blocks private, thereby intentionally forking the chain. The honest nodes continue to mine on the public chain, while the pool mines on its own private branch. If the pool discovers more blocks, it develops a longer lead on the public chain, and continues to keep these new blocks private. When the public branch approaches the pool's private branch in length, the selfish miners reveal blocks from their private chain to the public.
...
We further show that the Bitcoin mining protocol will never be safe against attacks by a selfish mining pool that commands more than 1/3 of the total mining power of the network. Such a pool will always be able to collect mining rewards that exceed its proportion of mining power, even if it loses every single block race in the network. The resulting bound of 2/3 for the fraction of Bitcoin mining power that needs to follow the honest protocol to ensure that the protocol remains resistant to being gamed is substantially lower than the 50% figure currently assumed, and difficult to achieve in practice.
Given that the rule of thumb followed by most practitioners is to wait for 6 confirmations, a fork that goes 6 levels deep can very likely diminish the public’s trust in Bitcoin and cause a crash in its market price. It is also widely accepted that a prolonged majority attack (if it happens) would be catastrophic to the cryptocurrency and can cause its downfall.
But, as they lay out, this possibility is discounted:
The conventional wisdom in the blockchain community is to assume that such block-reverting attacks are highly unlikely to happen. The reasoning goes as follows:
Reverting multiple blocks and specifically double-spending a transaction that has 6 confirmations requires control of a majority of the mining power;
Having a majority of the mining power is prohibitively expensive and requires an outlandish investment in hardware;
Even if a miner, mining pool or group of pools does control a majority of the mining power, they have no incentive to act dishonestly and revert the blockchain, as that would crash the price of Bitcoin, which is ultimately not in their favor, since they rely on mining rewards denominated in BTC for their income.
Source
Starting in late 2020, as shown in The Economist's graphic, the spot market in Bitcoin became dwarfed by the derivatives markets. In the last month $1.7T of Bitcoin futures traded on unregulated exchanges, and $6.4B on regulated exchanges. Compare this with the $1.8B of the spot market in the same month.
These huge futures markets enable Farokhnia & Goharshady's attack:
In short, an attacker can first use the Bitcoin derivatives market to short Bitcoin by purchasing a sufficient amount of put options or other equivalent financial instruments. She can then invest any of the amounts calculated above, depending on the timeline of the attack, to obtain the necessary hardware and hash power to perform the attack. If the attacker chooses to obtain a majority of the hash power, her success is guaranteed and she can revert the blocks as deeply as she wishes. However, she also has the option of a smaller upfront investment in hardware in exchange for longer wait times to achieve a high probability of success. In any case, as long as her earnings from shorting Bitcoin and then causing an intentional price crash outweighs her investments in hardware, there is a clear financial incentive to perform such an attack. The numbers above show that the annual trade volume in Bitcoin derivatives is more than three orders of magnitude larger than the required investment in hardware. Thus, it is possible and profitable to perform such an attack.
We only consider the cost of hardware at the time of writing. We assume the attacker is buying the hardware, rather than renting it and do not consider potential discounts on bulk orders.
We ignore electricity costs as they vary widely based on location.
The justification for the first assumption is that it keeps our analysis sound, i.e. we can only over-approximate the cost by making this assumption. As for the second assumption, we note that electricity costs are often negligible in comparison to hardware costs and that our main argument, i.e. the vulnerability of Bitcoin to majority attacks and block-reverting attacks, remains intact even if the estimates we obtain here are doubled. Indeed, as we will soon see, the trade volume of Bitcoin derivatives is more than three orders of magnitude larger than the numbers obtained here.
Goal
As Farokhnia & Goharshady stress, the success of a block-reverting attack is probabilistic, so the attacker needs to have a high enough probability of making a large enough profit to make up for the risk of failure.
My analysis thus assumes that the goal of the attacker is to have a 95% probability of earning at least double the cost of the attack.
Attacker
There are two different kinds of attackers with different sets of difficulties:
Outsiders: someone who has to acquire or rent sufficient hash power.
Insiders: someone or some mining pool who already controls sufficient hash power.
Farokhnia & Goharshady study the outsider case. Both kinds of attacker's practical problems occur in two areas:
Obtaining and maintaining for the duration of the attack sufficient hash power without detection.
Obtaining and maintaining for the duration of the attack a sufficient short position in Bitcoin without detection.
The short position must be maintained for the duration of the attack because succeess may come at any 10-minute block time, and there would not be time to obtain a large enough short position in ten minutes.
Hash Power
The outsider's problems are more complex than the insider's.
Outsider Attack
The outsider attacker requires three kinds of resource:
Mining rigs.
Power to run the rigs.
Data center space to hold the rigs.
Each of these is problematic, but assuming that the difficulties could be overcome, there is then the question of what it would cost to run the attack..
Mining rigs
Could they acquire mining rigs sufficient to provide 30% of the combined insider and outsider hash power, or ~43% of the pre-attack hash power?
How long would it take to acquire the rigs?
Would their acquisition of the rigs be detected?
Bitmain is estimated to have 82% of the market for mining rigs, and they either control or have very close relations with all the major mining pools, who thus have priority access to the latest rigs. Because rigs depreciate rapidly, position in the queue for rigs has a big impact on the profitability of mining. Bitmain is unlikely to give a new customer priority access to rigs.
Because the economic life of mining rigs is less than two years, the first part of Bitmain's production goes into maintaining the hash rate by replacing obsolete rigs. The second part goes into increasing the hash rate. If we assume that the outsider attacker could absorb the second part of Bitmain's production, how long would it take to get the necessary 43% of the previous hash power?
This can be estimated by examining the hash power through time graph. Over the last 3 years the hash rate has increased from about 240EH/s to about 1120EH/s, or about 24EH/s/month. Roughly, 82% of this is Bitmain's output, or about 20EH/s/month. The attacker needs 43% of the current hash rate, or about 482EH/s, or 24 months of the second part of Bitmain's production. At the current price for leading-edge rigs of $14.11/TH/s this would cost about $6.8B plus say $340M in interest at 5%, or $7.14B.
The lack of rigs to increase the hash rate over a period of much less than two years would clearly be detectable.
Power
The Cambridge Bitcoin Energy Consumption Index's current estimate is that the network consumes 22GW. The outside attacker would need 43% of this, or about 9.5GW, for the duration of the attack. For context, Meta's extraordinarily aggressive AI data center plans claim to bring a single 1GW data center online in 2026, and the first 2GW phase of their planned $27B 5GW Louisiana data center in 2030. The constraint on the roll-out is largely that lack of access to sufficient power. The attacker would need double the power Meta's Louisiana data center plans to have in 2030.
Access to gigawatts of power is available only on long-term contracts and only after significant delays.
Meta's 5GW Louisiana "Hyperion" data center's "footprint will be large enough to cover most of Manhattan", and the outsider attacker would need two of them. If Meta expects to take more than 5 years to build one of them, the outsider attacker is likely to need a decade.
Estimates for AI data centers are that 60% of the capital cost is the hardware and 40% everything else. Thus the "everything else" for Meta's $27B 5GW data center is $10.8B. "Everything else" for the attacker's two similar data centers would thus be $21.6B. Plus say 5 years of interest at 5% or $5.4B.
Operational cost
Ignoring the evident impossibility of the outsider attacker amassing the necessary mining rigs, power and data center space, what would the operational costs of the attack be?
It is hard to estimate the costs for power, data center space, etc. But an estimate can be based upon the cost to rent hash power, noting that in practice renting 43% of the total would be impossible, and guessing that renters have a 30% margin. A typical rental fee would be $0.10/TH/day so the costs might be $0.07/TH/day. The attack would have a 95% probability of needing 482EH/s over 34 days or less, so $516M or less.
Thus the estimated total cost for the hash power used in the attack would have a 95% probability of being no more than $7.66B. Plus about $27B in data center cost, which could presumably be repurposed to AI after the attack.
The insider attacker already controls the 30% of the hash power they need, so only the question of detection remains. The essence of the block-reverting attack is that the attacker mines in secret until they can publish a chain with 6 blocks following a target block. A reduction of 30% of the public hash rate over an average period of 17 days would clearly be detectable. The hash rate is noisy, but the graph shows that over the last year the largest drop was 16% from June 14th to 27th. There was one large drop in the hash rate, 51% between May 10th and July 1st 2021 as China cracked down on mining.
The insider's loss of income from the blocks they would otherwise have mined would have a 95% probability of being 4,590 BTC or less, or about $425M.
Short Position
Both kinds of attackers need to ensure that, when the attack succeeds, they have a large enough short position in Bitcoin that would generate their expected return from the attack's decrease in the Bitcoin price. There are two possibilities:
When the attacker's chain is within one block of being the longest, they have ten minutes to purchase the shorts. There is unlikely to be enough liquidity in the market to accommodate this sudden demand, which in any case would greatly increase the price of the shorts. I will ignore this possibility in what follows.
At the start of the attack the attacker gradually accumulates sufficient shorts. Even assuming there were enough liquidity, and that the purchases didn't increase the price, the attacker has to bear both the cost of maintaining the shorts for the duration of the attack, and the risk of the market moving up enough to cause the position to be liquidated.
The success of a block-reverting attack on Bitcoin would have implications on other cryptocurrencies. It would likely increase the prices of Proof-of-Stake coins such as Ethereum, as being much more difficult to attack, and decrease the price of other Proof-of-Work coins. Derivatives on these coins might be included in the attacker's toolkit, but I will ignore this possibility as the open interest on these coins is smaller.
At the time of writing, the open interest of BTC options is a bit more than 20 billion USD. Thus, a malicious party performing the attack mentioned in this work would need to obtain a considerable amount of the available put contracts. This may lead to market disruptions whose analysis is beyond the scope of this work. This being said, if the derivatives market continues to grow and becomes much larger than it currently is, purchasing this amount of contracts might not even be detected.
There are two different kinds of market in which Bitcoin shorts are available:
Regulated exchanges such as the CME offering options on Bitcoin and stock exchanges with Bitcoin ETFs and Bitcoin treasury companies such as Strategy.
Unregulated exchanges such as Binance offering "perpetual futures" (perps) on Bitcoin.
Unregulated Exchanges
Patrick McKenzie's Perpetual futures, explained is a clear and comprehensive description of the derivative common on unregulated exchanges:
Instead of all of a particular futures vintage settling on the same day, perps settle multiple times a day for a particular market on a particular exchange. The mechanism for this is the funding rate. At a high level: winners get paid by losers every e.g. 4 hours and then the game continues, unless you’ve been blown out due to becoming overleveraged or for other reasons (discussed in a moment).
Consider a toy example: a retail user buys 0.1 Bitcoin via a perp. The price on their screen, which they understand to be for Bitcoin, might be $86,000 each, and so they might pay $8,600 cash. Should the price rise to $90,000 before the next settlement, they will get +/- $400 of winnings credited to their account, and their account will continue to reflect exposure to 0.1 units of Bitcoin via the perp. They might choose to sell their future at this point (or any other). They’ll have paid one commission (and a spread) to buy, one (of each) to sell, and perhaps they’ll leave the casino with their winnings, or perhaps they’ll play another game.
Where did the money come from? Someone else was symmetrically short exposure to Bitcoin via a perp. It is, with some very important caveats incoming, a closed system: since no good or service is being produced except the speculation, winning money means someone else lost.
So the exchange makes money from commissions, and from the spread against the actual spot price. The price of the perp is maintained close to the spot price by the "basis trade", traders providing liquidity by shorting the perp and buying the spot when the perp is above spot, and vice versa. Of course, the spot price itself may have been manipulated, for example by Pump-and-Dump Schemes.
Perp funding rates also embed an interest rate component. This might get quoted as 3 bps a day, or 1 bps every eight hours, or similar. However, because of the impact of leverage, gamblers are paying more than you might expect: at 10X leverage that’s 30 bps a day.
In a standard U.S. brokerage account, Regulation T has, for almost 100 years now, set maximum leverage limits (by setting minimums for margins). These are 2X at position opening time and 4X “maintenance” (before one closes out the position). Your brokerage would be obligated to forcibly close your position if volatility causes you to exceed those limits.
Although these huge amounts of leverage greatly increase the reward from a small market movement in favor of the position, they greatly reduce the amount the market has to move against the position before something bad happens. The first bad thing is liquidation:
One reason perps are structurally better for exchanges and market makers is that they simplify the business of blowing out leveraged traders. The exact mechanics depend on the exchange, the amount, etc, but generally speaking you can either force the customer to enter a closing trade or you can assign their position to someone willing to bear the risk in return for a discount.
Blowing out losing traders is lucrative for exchanges except when it catastrophically isn’t. It is a priced service in many places. The price is quoted to be low (“a nominal fee of 0.5%” is one way Binance describes it) but, since it is calculated from the amount at risk, it can be a large portion of the money lost. If the account’s negative balance is less than the liquidation fee, wonderful, thanks for playing and the exchange / “the insurance fund” keeps the rest, as a tip.
The bigger and faster the market move, the more likely the loss exceeds your collateral:
In the case where the amount an account is negative by is more than the fee, that “insurance fund” can choose to pay the winners on behalf of the liquidated user, at management’s discretion. Management will usually decide to do this, because a casino with a reputation for not paying winners will not long remain a casino.
But tail risk is a real thing. The capital efficiency has a price: there physically does not exist enough money in the system to pay all winners given sufficiently dramatic price moves. Forced liquidations happen. Sophisticated participants withdraw liquidity (for reasons we’ll soon discuss) or the exchange becomes overwhelmed technically / operationally. The forced liquidations eat through the diminished / unreplenished liquidity in the book, and the magnitude of the move increases.
Risk in perps has to be symmetric: if (accounting for leverage) there are 100,000 units of Somecoin exposure long, then there are 100,000 units of Somecoin exposure short. This does not imply that the shorts or longs are sufficiently capitalized to actually pay for all the exposure in all instances.
In cases where management deems paying winners from the insurance fund would be too costly and/or impossible, they automatically deleverage some winners.
So perhaps you understood, prior to a 20% move, that you were 4X leveraged. You just earned 80%, right? Ah, except you were only 2X leveraged, so you earned 40%. Why were you retroactively only 2X? That’s what automatic deleveraging means. Why couldn’t you get the other 40% you feel entitled to? Because the collective group of losers doesn’t have enough to pay you your winnings and the insurance fund was insufficient or deemed insufficient by management.
In theory, this can happen to the upside or the downside. In practice in crypto, this seems to usually happen after sharp decreases in prices, not sharp increases. For example, October 2025 saw widespread ADLing as (more than) $19 billion of liquidations happened, across a variety of assets.
How does this affect the outsider attacker? Lets assume that the attack has a 95% probability of costing no more than $7.5B and would reduce the Bitcoin price from $100K to $80K in a single 4-hour period. With 10X leverage this would generate $200K/BTC in gains.
The outsider wants to double the cost of the attack, so needs to short $15B/$180K BTC, or 83,333BTC at 10X leverage for the duration of the attack. Establishing the position costs $8.333B. Assuming the BTC price is fixed at $100K until the attack succeeds the funding rate is zero. But we have to assume that the attacker borrowed the $8.333B for the duration, so would pay interest, plus two commissions plus two spreads. I'll ignore these costs.
I will also ignore the fact that $83B is around 148% of the peak aggregated open interest in Bitcoin options over the past year of about $56B.
The way liquidation of a short works is that as the market moves up, the initial leverage increases. Each exchange will have a limit on the leverage it will allow so, allowing for the liquidation fee, if the leverage of the short position gets to this limit the exchange will liquidate it.
Move %
Leverage
0
10
1
11.1
2
12.5
3
14.3
4
16.7
5
20
6
25
7
33.3
8
50
9
100
The table shows the effect of increasing percentage moves against an initial 10X leveraged short. If we assume a short with an initial 10X leverage and an exchange limit of 50X was taken out on the first of each month of 2025, on 9 of the 12 months it would have been liquidated before the month was out. So Bitcoin is volatile enough that the attacker's short has a high probability of being liquidated before the attack succeeds. And note Binance's "nominal fee" of 0.5% for liquidating $83.33B, or $417M.
In the unlikely event that the attack succeeds early enough to avoid liquidation there would have been one of those "sharp decreases in prices" that cause ADL, so as a huge winner it would be essentially certain that the attacker would suffer ADL and most of the winnings needed to justify the attack would evaporate.
Regulated Exchanges
The peak open interest in Bitcoin futures on the Chicago Mercantile Exchange over the past year was less than $20B, so even if we add together both kinds of exchange, the peak open interest over the last year isn't enough for the attacker.
Conclusions
Neither an outsider nor an insider attack appears feasible.
Outsider Attack
An outsider attack seems infeasible because in practice:
They could not acquire 43% or more of the hash power.
Even if they could it would take so long as to make detection inevitable.
Even if they could and they were not detected, the high cost of the rigs makes the necessary shorts large relative to the open interest, and expensive to maintain.
These large shorts would need to be leveraged perpetual futures, bringing significant risks of loss of collateral through liquidation, and of the potential payoff being reduced through automatic de-leveraging.
The attacker would need more than the peak aggregate open interest in Bitcoin futures over the past year.
Insider Attack
The order-of-magnitude lower direct cost of an insider attack makes it appear less infeasible, but insiders have to consider the impact on their continuing mining business. If the assumed 20% drop in the Bitcoin price were sustained for a year, the cost to the miner controlling 30% of the hash rate would be about 15,750 BTC or nearly $1.5B making the total cost of the attack (excluding the cost of carrying the shorts) almost $2B.
The drop in Bitcoin's price and the smaller, lagging drop in network difficulty over the last three months has decreased miners' revenue by about 25%. In the medium term a further drop is in prospect. Sometime in April 2028 the regular Bitcoin halvening will occur, halving the income of miners in aggregate Bitcoin terms.
mining-company stocks are still flying, even with cryptocurrency prices in retreat. That's because these firms have something in common with the hottest investment theme on the planet: the massive, electricity-hungry data centers expected to power the artificial-intelligence boom. Some companies are figuring out how to remake themselves as vital suppliers to Alphabet, Amazon, Meta, Microsoft and other "hyperscalers" bent on AI dominance.
...
Miners often have to build new, specialized facilities, because running AI requires more-advanced cooling and network systems, as well as replacing bitcoin-mining computers with AI-focused graphics processing units. But signing deals with miners allows AI giants to expand faster and cheaper than starting new facilities from scratch.
...
Shares of Core Scientific quadrupled in 2024 after the company signed its first AI contract that February. The stock has gained 10% this year. The company now expects to exit bitcoin mining entirely by 2028.
I wonder why the date is 2028! As profit-driven miners use their bouyant stock price to fund a pivot to AI the hash rate and the network difficuty will decrease, making an insider attack less infeasible. The drop in their customer's income will likely encourage Bitmain to similarly pivot to AI, devoting an increasing proportion of their wafers to AI chips, especially given the Chinese government's goal of localizing AI.
A 30% miner whose rigs were fully depreciated might consider an insider attack shortly before the halvening as a viable exit strategy, since their future earnings from mining would be greatly reduced. But they would still be detected.
Counter-measures
Even if we assume the feasibility of both the hash rate and the short position aspects of the attack, it is still the case that for example, an attack with 30% of the hash power and a 95% probability of success will, on average, last 17 days. it seems very unlikely that the coincidence over an extended period of a large reduction in the expected hash rate and a huge increase in short interest would escape attention from Bitcoin HODl-ers, miners and exchanges, not to mention Bitmain. What counter-measures could they employ?
The theoretically correct counter-measure would be to raise the 6-block finality criterion to the 24 blocks that corresponds to a pool with 30% of the hash power. But this would violate what people incorrectly believe is the revealed word of Satoshi. And Goharshady correctly pointed out in email that this is in any case impractical:
The 6-block rule is just a convention, there is no dial that can be turned.
Much of the access to the Bitcoin blockchain is via APIs that typically have the 6-block rule hard-codded in.
Many, typically low-value, transactions do not wait for even a single confirmation.
Even it were possible, changing from a one-hour to a four-hour confirmation would have significant negative impacts on the Bitcoin ecosystem.
In the case of an insider attack, the absence of a pool previously mining around one in three of all blocks would readily de-anonymize the attacker. Bitmain would necessarily be aware of the identity of an outside attacker. Although unregulated exchanges are notoriously poor at KYC/AML, the sums involved in the shorts are so large that they would be highly motivated to use the blockchain information to de-anonymize the attacker. Given their terms of service, and the lack of effective recourse, they would be able to ham-string the attack.
Acknowledgements
This post benefited greatly from insightful comments on a draft from Jonathan Reiter, Amir Kafshdar Goharshady and Joel Wallenberg, but the errors are all mine.
I don’t think any of us went into 2025 thinking this would be an easy year, and boy wasn’t it. Rather than restate our collective challenges, I’m going to stick to my own little life and keep it short with bullet points and pictures. I’ll also list as many positives as I can, because even in the darkest of times, there are sources of joy and reasons to be grateful.
Real life, in no particular order
(OK, there’s some order, let’s do the hard one first) My mom passed away in late October – our relationship was complicated, and so are my feelings, but I’m working through it all with a therapist
(Bittersweet) I adopted Mom’s green cheek conure, Tutu (which we jokingly spell “Teauxtu,” thanks to my brother), who bonded to me quickly and thoroughly, but who is currently plucking all of his feathers anyway — we’re working on it with his vet, and I’m hopeful we’ll get him past this (though I’ll love him anyway, even if he’s a naked bird forever; I cannot overstate how sweet a little guy he is!)
My brother got married to a really nice lady, and I’m so happy for them!
In early October, we moved from Midcoast Maine to Western NY, where there is a larger and more active covid-conscious community – because the economy was so unstable, it took months to sell our house in Maine, but we did finally succeed, the day before Christmas
We joined the mask bloc here for a D&D night and had a lot of fun – looking forward to getting involved in mask distribution and attending more events in 2026
I finally read all of the Murderbot Diaries series by Martha Wells, spurred on by the release of the TV show – then I listened to the audiobooks, and it became my comfort listen for the whole latter part of the year
I ran an accessible birding event here in Rochester, and I’m hoping to run more next year
I took American Sign Language 101 and 102, which I’ve really enjoyed. I’ll be retaking 102 in early 2026 (it was a rough autumn/winter), and then I hope to move on to 103 and 104.
I learned to darn (as in, mending damaged knitted items), and I’m working on expanding my mending skills repertoire – this is well-timed, since, besides his own feathers and our hair, Tutu loves to bite on shirts until they have holes in them
We saw the most vivid aurora we’ve ever seen, including during our time in Alaska
My aunt sent me her scone recipe, and I made so. many. scones. this year
Bird Buddies! (Outdoor birds captured with our feeder cameras.)
The three baby bluebirds in the video below brought us so much joy this spring! Here, they appear to have a little conference, to decide whether it’s time to fly away or not, before a starling shows up.
Bird Buddies who live indoors
Miscellaney!
Work, in no particular order
I ran the first part of a major, potentially multi-year LibGuides accessibility remediation project (“potentially” because there’s a task force considering how important LibGuides are to us and our patrons, and maybe we will shift focus away from them or have a single accessibility remediator or … something I haven’t thought of, who knows?)
I co-ran a task force focused on accessibility training for library staff
We survived a university-wide “Cybersecurity Alignment” (or the first part of one?)
I learned the developer side of Figma (kind of, I still find parts of it baffling)
I learned a ridiculous amount about authentication to electronic resources (which would be really satisfying if I didn’t keep running into things I still don’t know)
Coming up in 2026, an incomplete list
I’ll take more ASL classes
I’ll mend some things
I’ll ride my bike more, because there are more safe places to do so where I live!
I’m hoping to work through more herbal studies and practice more of those skills – it may become vital in the near future
There will be an IT “alignment” at work mid-year – it could mean anything from “nothing” to “now I report to Central IT (who doesn’t care about my MLIS and might actually consider it a negative) instead of the Library (who doesn’t require an MLIS for my position, but knows it’s a strength, and also my team is so good, I don’t want to lose them, OK, yes, I am very stressed about this possibility)”
Our landlord will be selling our house, something I think (hope) he didn’t realize he’d be doing when we started renting here – we’ll get first right of refusal on buying it, and he’s willing to write us a lease for as long as we want one if we choose not to buy (NY law means the new owner will have to honor it), so while it’s a stressful situation, it’s not as bad as it could be
A valued colleague (everyone in my department is valued, truly, but his institutional knowledge is unmatched) will be retiring, and because of budget constraints, that will probably mean some shuffling of responsibilities semi-permanently
If we decide we aren’t moving (so, we decide to buy this house or to sign a longer lease), I’ll set up some garden beds and grow some things
I want this to be a thing, but financially, it may not: I really very much want to build or buy a small (like, Class B, or C at the highest) camper so that I can travel again
Ninety-five (or 100) years is a very long time for copyrights to last. But Olaf Stapledon saw a much longer future for us in Last and First Men (reviewed here and here). His book tells a story of successive human species over the next 2 billion years.
In January 2021, I began a journey that would span nearly five years, three children, countless late nights, and a singular focus: teaching machines to extract data from complex scientific tables with confidence—and to know when they're uncertain. On October 29, 2025, I successfully defended my dissertation titled "SCITEUQ: Toward Uncertainty-Aware Complex Scientific Table Data Extraction and Understanding" at Old Dominion University. This milestone represents not just the culmination of intensive research but a testament to perseverance, family support, and the power of focused determination.
Scientific tables are ubiquitous in research papers, containing critical experimental data, statistical results, and research findings. Yet extracting this data automatically from PDF documents remains surprisingly difficult. Unlike the simple, well-structured tables you might see on Wikipedia, scientific tables are complex beasts—featuring multi-level headers, merged cells, irregular layouts, and domain-specific notations that confound even state-of-the-art machine learning models.
But here's the real problem: existing methods don't tell you when they're wrong. They extract data with the same confidence whether they're processing a simple table or struggling with a complex one. For scientific applications where accuracy is paramount, this means researchers must manually verify every single extracted cell—a task that doesn't scale when you're dealing with thousands of tables.
The Research Challenge
My research addressed a fundamental question: How can we build systems that not only extract data from complex scientific tables but also quantify their uncertainty, allowing us to focus human verification effort only where it's needed?
To tackle this challenge, I formulated four research questions:
RQ1: What is the status of reproducibility and replicability of existing TSR models?
Before building something new, I needed to understand what already existed. I conducted the first systematic reproducibility and replicability study of 16 state-of-the-art Table Structure Recognition (TSR) methods. The results were sobering: only 8 of 16 papers made their code and data publicly available, and merely 5 had executable code. When I tested these methods on my newly created GenTSR dataset (386 tables from six scientific domains), none of the methods replicated their original performance. This highlighted a critical gap in the field. This work was published at ICDAR 2023: "A Study on Reproducibility and Replicability of Table Structure Recognition Methods."
RQ2: How do we quantify the uncertainties of TSR results?
To address this, I developed TTA-m, a novel uncertainty quantification pipeline that adapts Test-Time Augmentation specifically for TSR. Unlike vanilla TTA, my approach fine-tunes pre-trained models on augmented table images and employs ensemble-based methods to generate cell-level confidence scores. On the GenTSR dataset, TTA-m achieved an F1-score of 0.798, with over 80% accuracy for high-confidence predictions—enabling reliable automatic detection of extraction errors. This work was published at IEEE IRI 2024: "Uncertainty Quantification in Table Structure Recognition."
RQ3: How can we integrate uncertainties from TSR and OCR for holistic table data extraction?
I designed and implemented the TSR-OCR-UQ framework, which integrates table structure recognition (using TATR), optical character recognition (using PaddleOCR), and conformal prediction-based uncertainty quantification into a unified pipeline. The results were compelling: the accuracy improved from 53-71% to 83-97% for different complexity levels, with the system achieving 69% precision in flagging incorrect extractions and reducing manual verification labor by 53%. This work was published at ICDAR 2025: "Uncertainty-Aware Complex Scientific Table Data Extraction."
RQ4: How well do LLMs answer questions about complex scientific tables?
To evaluate the QA capability of Large Language Models on scientific tables, I created SciTableQA, a benchmark dataset containing 8,700 question-answer pairs across 320 complex scientific tables from multiple domains. My evaluation revealed that while GPT-3.5 achieved 79% accuracy on cell selection TableQA tasks, performance dropped to 49% on arithmetic reasoning TableQA tasks—highlighting significant limitations of current LLMs when dealing with complex table structures and numerical reasoning. This work was published at TPDL 2025: "SciTableQA: A Question-Answering Benchmark for Complex Scientific Tables."
The SCITEUQ Framework
Putting it all together, SCITEUQ (Scientific Table Extraction with Uncertainty Quantification) represents a comprehensive solution to uncertainty-aware scientific table data extraction. The framework achieves the state-of-the-art performance while providing essential uncertainty quantification capabilities that enable efficient human-in-the-loop verification.
Each component contributes to a more reliable approach:
Each of these papers faced initial rejection before ultimately being accepted. This taught me an invaluable lesson: rejection is not failure; it's an opportunity to refine and improve your work.
Industry Experience: From Azure to Alexa to Microsoft AI
While my research focused on scientific tables, my internships at Microsoft and Amazon broadened my perspective on applying machine learning at scale.
Microsoft (Summers 2022, 2023, 2025)
My first two summers at Microsoft were with the Azure team, where I worked on infrastructure optimization problems far from my research area. I developed an AI-human hybrid LLM-based multi-agent system for AKS Cluster configuration, reducing cluster type generation time from 2 weeks to 1 hour (link to blog post). I also designed ML anomaly detection systems on Azure Synapse that reduced hardware maintenance costs by over 20% and formulated new metrics for characterizing node interruption rates that decreased hardware downtime by 25% (link to blog post).
In Summer 2025, I joined the Microsoft AI team under the Bing organization, working on problems at the intersection of large-scale search and AI—which is what I'll be doing when I return to Microsoft full-time in January 2026.
Amazon (Summer 2024)
At Amazon, I worked with the Alexa Certification Technology team in California, where I drove 10% customer growth by designing LLM-based RAG systems with advanced prompt engineering techniques and increased the revenue by over 5% by developing LLM Agents on AWS to improve Alexa-enabled applications (link to blog post).
These internships, while not directly related to my dissertation research, taught me how to apply ML thinking to diverse industrial problems and to work effectively in large, complex organizations.
Balancing PhD Life with Family
Perhaps the most challenging aspect of my PhD journey had nothing to do with research—it was combining my studies with raising three young children. My youngest son, Daniel, was born just six month after I was enrolled in the PhD program. Managing research deadlines, experimental runs, paper submissions, and the demands of parenting three boys (Paul, David, and Daniel) required discipline and sacrifice.
I developed a strict routine: work from 9 AM to 3 PM every day at my research lab, then pick up my kids from school and be fully present for them. This meant no late nights in the lab, no weekend marathons of coding—just consistent, focused work during designated hours. It wasn't always easy. Conference deadlines sometimes meant asking my wife, Olabisi, to take on even more, or my mother, Beatrice, to provide extra support. But this routine kept me grounded and taught me that quality of work matters more than quantity of hours.
The Defense
On October 27, 2025, I defended my dissertation before my committee:
Dr. Yi He (Co-advisor) - College of William & Mary
Their thoughtful feedback, probing questions, and constructive critiques throughout my PhD journey were instrumental in refining my research and pushing me to think deeper about the implications and limitations of my work.
Lessons Learned
Looking back on nearly five years of doctoral work, several lessons stand out:
1. Embrace Rejection as Refinement
Four of my papers were initially rejected. Each rejection stung, but each one ultimately led to a stronger paper. The review process, while sometimes frustrating, forced me to clarify my arguments, strengthen my experiments, and address weaknesses I hadn't noticed. My TPDL 2025 paper on SciTableQA went through two rounds of revisions, but the final version is significantly better than the original submission.
2. Establish Non-Negotiable Boundaries
My 9 AM to 3 PM schedule wasn't just convenient—it was essential for maintaining my sanity and my family relationships. While some might argue that PhD students need to work 80-hour weeks, I proved that focused, disciplined work during reasonable hours can produce quality research. Those boundaries also made me more efficient: when you only have six hours a day, you learn to prioritize ruthlessly.
3. Build for Reproducibility from Day One
My systematic study on TSR reproducibility taught me the hard way how difficult it is to reproduce other people's work. This experience shaped how I approached my own research. Every framework I built—TTA-m, TSR-OCR-UQ, SciTableQA—comes with comprehensive documentation, publicly available code, and clear instructions for replication. Future researchers shouldn't struggle to build upon my work the way I struggled with others'.
4. Choose Problems That Matter to You
I entered my PhD knowing I wanted to work on table extraction with uncertainty quantification, and I never wavered from that focus. This singular vision helped me navigate the inevitable setbacks and distractions that come with doctoral research. When experiments failed or papers got rejected, I could always return to the core question: How do we make scientific data extraction both accurate and trustworthy?
5. Internships Broaden Your Perspective
While my Microsoft and Amazon internships didn't directly contribute to my dissertation, they fundamentally shaped how I think about research. Working on production systems with millions of users taught me to think about scalability, robustness, and real-world constraints in ways that academic research rarely emphasizes. These experiences make me a better researcher because I can now evaluate my work not just on benchmark performance, but on whether it could actually be deployed at scale.
Looking Forward
In January 2026, I'll be joining Microsoft as a Data Scientist 2 with the Microsoft AI team at the Redmond campus in Washington state. My family and I are excited about this new chapter—moving from Norfolk, Virginia, to the Pacific Northwest, and transitioning from academic research to industry applications.
While I'll be working on different problems at Microsoft, the skills and mindset I developed during my PhD—rigorous experimentation, systematic evaluation, uncertainty quantification, and reproducible research—will continue to guide my work. I'm particularly excited about the opportunity to apply research-driven thinking to real-world problems at a scale that can impact millions of users.
Acknowledgments
This journey would have been impossible without extraordinary support:
To Dr. Jian Wu, my advisor, mentor, and guide—thank you for believing in my research vision, for pushing me to think bigger, and for your patience during the inevitable frustrations of doctoral research. Your mentorship has not only shaped my research but also my approach to solving complex problems.
To Dr. Yi He, my co-advisor at William & Mary, your expertise and thoughtful feedback greatly enriched this research. Thank you for your guidance and support throughout this journey.
To my dissertation committee—Drs. Michael Nelson, Michele Weigle, and Sampath Jayarathna—your constructive critiques and expert insights were essential in refining my ideas and strengthening this work.
To my colleagues in WSDL and LAMP-SYS, the collaborative environment, intellectual exchanges, and camaraderie made this journey both enriching and memorable.
To my wife, Olabisi—you walked beside me every step of this journey with unwavering devotion and love. Your patience during the long hours, your understanding through the challenges, and your constant encouragement when the path seemed difficult made this achievement possible. This accomplishment is as much yours as it is mine.
To my sons—Paul, David, and Daniel—you are my greatest blessings and my constant source of joy and motivation. I hope this work serves as an example that with dedication and faith, you too can achieve your dreams.
To God Almighty, who is the source of all wisdom and strength, I give thanks and praise.
Marlene Dietrich enjoyed success on stage and screen in 1920s Berlin, but became an international star in 1930. That year she came to the United States to star in Morocco alongside Gary Cooper. Her performance was nominated for an Academy Award. So was the direction by Josef von Sternberg, who also directed her in The Blue Angel and several other films. Morocco was inducted into the National Film Registry in 1992, and will be inducted into the public domain in 4 days. #PublicDomainDayCountdown
A gauge of risk on Oracle Corp.’s (ORCL) debt reached a three-year high in November, and things are only going to get worse in 2026 unless the database giant is able to assuage investor anxiety about a massive artificial intelligence spending spree, according to Morgan Stanley.
A funding gap, swelling balance sheet and obsolescence risk are just some of the hazards Oracle is facing, according to Lindsay Tyler and David Hamburger, credit analysts at the brokerage. The cost of insuring Oracle Corp.’s debt against default over the next five years rose to 1.25 percentage point a year on Tuesday, according to ICE Data Services.
The company borrowed $18 billion in the US high-grade market in September. Then in early November, a group of about 20 banks arranged a roughly $18 billion project finance loan to construct a data center campus in New Mexico, which Oracle will take over as tenant.
Banks are also providing a separate $38 billion loan package to help finance the construction of data centers in Texas and Wisconsin developed by Vantage Data Centers,
But notice that only $18B of this debt appears on Oracle's balance sheet. Despite that, their credit default swaps spiked and the stock dropped 29% in the last month.
Below the fold I look into why Oracle and other hyperscalers desperate efforts to keep the vast sums they're borrowing off their books aren't working.
Part of the reason the market is unhappy started in mid-September with The Economist's The $4trn accounting puzzle at the heart of the AI cloud. It raised the issue that I covered in Depreciation, that the hardware that represents about 60% of the cost of a new AI data center doesn't last long. It took a while for the financial press to focus on the issuea, but now they have.
The most recent one I've seen was triggered by the outage at the CME (caused by overheating in Chicago in November!). In AI Can Cook the Entire Market Now Tracy Alloway posted part of the transcript of an Odd Lots podcast with Paul Kedrosky pointing out a reason I didn't cover why the GPUs in AI data centers depreciate quickly:
When you run using the latest, say, an Nvidia chip for training a model, those things are being run flat out, 24 hours a day, seven days a week, which is why they're liquid-cooled, they're inside of these giant centers where one of your primary problems is keeping them all cool. It's like saying ‘I bought a used car and I don't care what it was used for.’ Well, if it turns out it was used by someone who was doing like Le Mans 24 hours of endurance with it, that's very different even if the mileage is the same as someone who only drove to church on Sundays.
These are very different consequences with respect to what's called the thermal degradation of the chip. The chip's been run hot and flat out, so probably its useful lifespan might be on the order of two years, maybe even 18 months. There's a huge difference in terms of how the chip was used, leaving aside whether or not there's a new generation of what's come along. So it takes us back to these depreciation schedules.
73% of Ethereum miners have just given up: “About 10.6 million RTX 3070 equivalents have stopped mining since the merge.”
We strongly recommend that you do not hit eBay for a cheap video card, despite the listings reassuring you that this card was only used by a little old lady to play Minecraft on Sundays and totally not for crypto mining, and that you should ignore the burnt odor and the charred RAM. Unless you’re poor, and the card’s so incredibly cheap that you’re willing to play NVidia Roulette.
How well do miners treat their precious babies? “GPU crypto miners in Vietnam appear to be jet washing their old mining kit before putting the components up for sale.” There are real cleaning methods that involve doing something like this with liquid fluorocarbons — but the crypto miners seem to be using just water.
But this depreciation problem is only one part of why the market is skeptical of the hyperscalers technique for financing their AI data centers. The technique is called Conduit Debt Financing, and Les Barclays' Unpacking the Mechanics of Conduit Debt Financing provides an accessible explanation of how it works:
Conduit debt financing is a structure where an intermediary entity (the “conduit”) issues debt securities to investors and passes the proceeds through to an end borrower. The key feature distinguishing conduit debt from regular corporate bonds is that the conduit issuer has no substantial operations or assets beyond the financing transaction itself. The conduit is purely a pass-through vehicle, the debt repayment relies entirely on revenues or assets from the ultimate borrower.
Think of it this way: Company A wants to borrow money but doesn’t want that debt appearing on its balance sheet or affecting its credit rating. So it works with a conduit entity, Company B, which issues bonds to investors. Company B takes that capital and uses it to build infrastructure or acquire assets that Company A needs. Company A then enters into long-term lease or service agreements with Company B, and those payments service the debt. On paper, Company A is just a customer making payments, not a debtor owing bondholders.
The structure creates separation. The conduit issuer’s creditworthiness depends on the revenue stream from the end user, not on the conduit’s own balance sheet (because there isn’t really one). This is why conduit debt is often referred to as “pass-through” financing, the economics flow through the conduit structure to reach the underlying obligor.
Legal risks when things break: Substantive consolidation (court merges conduit with sponsor), recharacterization (lease treated as secured financing), and fraudulent transfer challenges. The structures haven’t been stress-tested yet because hyperscalers are wildly profitable. But if AI monetization disappoints or custom silicon undercuts demand, we’ll discover whether bondholders have secured claims on essential infrastructure or are functionally unsecured creditors of overleveraged single-purpose entities.
Why would Meta finance this via the project finance markets? And why does it cost $6.5 billion more?
That’s how much more Meta is paying to finance this new AI data center using the project finance market versus what they could have paid had they used traditional corporate debt. So why on earth is this being called a win? And even crazier, why are other AI giants like Oracle and xAI looking to copy it?
The $6.5B is the total of the 1% extra interest above Meta's corporate bond rate over the 20 years.
If Counduit Debt Financing is a standard tool of project finance, why is Mr. Market unhappy with the hyperscalers' use of it? Jonathan Weil's somewhat less detailed look at Meta's $27B deal in AI Meets Aggressive Accounting at Meta’s Gigantic New Data Center reveals how they are pushing the envelope of GAAP (Generally Accepted Accounting Principles):
Construction on the project was well under way when Meta announced a new financing deal last month. Meta moved the project, called Hyperion, off its books into a new joint venture with investment manager Blue Owl Capital. Meta owns 20%, and funds managed by Blue Owl own the other 80%. Last month, a holding company called Beignet Investor, which owns the Blue Owl portion, sold a then-record $27.3 billion of bonds to investors, mostly to Pimco.
Meta said it won’t be consolidating the joint venture, meaning the venture’s assets and liabilities will remain off Meta’s balance sheet. Instead Meta will rent the data center for as long as 20 years, beginning in 2029. But it will start with a four-year lease term, with options to renew every four years.
This lease structure minimizes the lease liabilities and related assets Meta will recognize, and enables Meta to use “operating lease,” rather than “finance lease,” treatment. If Meta used the latter, it would look more like Meta owns the asset and is financing it with debt.
Under GAAP, when would Meta be required to treat it as a finance lease?
The joint venture is what is known in accounting parlance as a variable interest entity, or VIE for short. That term means the ownership doesn’t necessarily reflect which company controls it or has the most economic exposure. If Meta is the venture’s “primary beneficiary”—which is another accounting term of art—Meta is required to consolidate it.
Under the accounting rules, Meta is the primary beneficiary if two things are true. First, it must have “the power to direct the activities that most significantly impact the VIE’s economic performance.” Second, it must have the obligation to absorb significant losses of the VIE, or the right to receive significant benefits from it.
Blue Owl has control over the venture’s board. But voting rights and legal form aren’t determinative for these purposes. What counts under the accounting rules is Meta’s substantive power and economic influence. Meta in its disclosures said “we do not direct the activities that most significantly impact the venture’s economic performance.” But the test under the accounting rules is whether Meta has the power to do so.
Does Meta receive "significant benefits"? Is it required to "absorb losses"?:
The second test—whether Meta has skin in the game economically—has an even clearer answer. Meta has operational control over the data center and its construction. It bears the risks of cost overruns and construction delays. Meta also has provided what is called a residual-value guarantee to cover bondholders for the full amount owed if Meta doesn’t renew its lease or terminates early.
The lease is notionally for 20 years but Meta can get out every four years. Is Meta likely to terminate early? In other words, how likely in 2041 is Meta to need an enormous 16-year old data center? Assuming that the hardware has an economic life of 2 years, the kit representing about 60% of the initial cost would be 8 generations behind the state of the art. In fact 60% of the cost is likely to be obsolete by the first renewal deadline, even if we assume Nvidia won't actually be on the one-year cadence it has announced.
But what about the other 40%? It has a longer life, but not that long. The reason everyone builds new data centers is that the older ones can't deliver the power and cooling current Nvidia systems need. 80% of recent data centers in China are empty because they were built for old systems.
Today, Nvidia's rack systems are hovering around 140kW in compute capacity. But we've yet to reach a limit. By 2027, Nvidia plans to launch 600kW racks which pack 576 GPU dies into the space one occupied by just 32.
Current data centers won't handle these systems - indeed how to build data centers that do is a research problem:
To get ahead of this trend toward denser AI deployments, Digital Realty announced a research center in collaboration with Nvidia in October.
The facility, located in Manassas, Virginia, aims to develop a new kind of datacenter, which Nvidia CEO Jensen Haung has taken to calling AI factories, that consumes power and churn out tokens in return.
If the design of data centers for Nvidia's 2027 systems is only now being researched, how likely is it that Meta will renew the lease on a data center built for Nvidia's 2025 systems in 2041? So while the risk that Meta will terminate the lease in 2029 is low, termination before 2041 is certain. And thus so are residual-value guarantee payments.
How does the risk of non-renewal play out under GAAP?
Another judgment call: Under the accounting rules, Meta would have to include the residual-value guarantee in its lease liabilities if the payments owed are “probable.” That could be in tension with Meta’s assumption that the lease renewal isn’t “reasonably certain.”
If renewal is uncertain, the guarantee is more likely to be triggered. But if the guarantee is triggered, Meta would have to recognize the liability.
Ultimately, the fact pattern Meta relies on to meet its conflicting objectives strains credibility. To believe Meta’s books, one must accept that Meta lacks the power to call the shots that matter most, that there’s reasonable doubt it will stay beyond four years, and that it probably won’t have to honor its guarantee—all at the same time.
OpenAI explicitly requested federal loan guarantees for AI infrastructure in an October 27 letter to the White House—which kindly refused the offer, with AI czar David Sacks saying that at least 5 other companies could take OpenAI’s place—directly contradicting CEO Sam Altman's public statements claiming the company doesn't want government support.
The 11-page letter, submitted to the Office of Science and Technology Policy, called for expanding tax credits and deploying "grants, cost-sharing agreements, loans, or loan guarantees to expand industrial base capacity" for AI data centers and grid components. The letter detailed how "direct funding could also help shorten lead times for critical grid components—transformers, HVDC converters, switchgear, and cables—from years to months."
Trump unveiled the “Genesis Mission” as part of an executive order he signed Monday that directs the Department of Energy and national labs to build a digital platform to concentrate the nation’s scientific data in one place.
It solicits private sector and university partners to use their AI capability to help the government solve engineering, energy and national security problems, including streamlining the nation’s electric grid, according to White House officials who spoke to reporters on condition of anonymity to describe the order before it was signed.
Mr. Sacks has offered astonishing White House access to his tech industry compatriots and pushed to eliminate government obstacles facing A.I. companies. That has set up giants like Nvidia to reap an estimate of as much as $200 billion in new sales.
Mr. Sacks has recommended A.I. policies that have sometimes run counter to national security recommendations, alarming some of his White House colleagues and raising questions about his priorities.
Mr. Sacks has positioned himself to personally benefit. He has 708 tech investments, including at least 449 stakes in companies with ties to artificial intelligence that could be aided directly or indirectly by his policies, according to a New York Times analysis of his financial disclosures.
His public filings designate 438 of his tech investments as software or hardware companies, even though the firms promote themselves as A.I. enterprises, offer A.I. services or have A.I. in their names, The Times found.
Mr. Sacks has raised the profile of his weekly podcast, “All-In,” through his government role, and expanded its business.
Steve Bannon, a former adviser to Mr. Trump and a critic of Silicon Valley billionaires, said Mr. Sacks was a quintessential example of ethical conflicts in an administration where “the tech bros are out of control.”
“They are leading the White House down the road to perdition with this ascendant technocratic oligarchy,” he said.
“The way this works”, said an investor friend to me this morning: “is that when Nvidia is about to miss their quarter, Jen Hsun calls David Sacks, who then gets this government initiative to place a giant order for chips that go into a warehouse.”
I obviously can’t confirm or deny that actually happened. My friend might or might not have been kidding. But either way the White House’s new Science and AI program, Genesis, announced by Executive Order on Monday, does seem to involve the government buying a lot of chips from a lot of AI companies, many of which are losing money.
And David Sack’s turnaround from “read my lips, no AI bailout” (November 6) to “we can’t afford to [let this all crash]” tweet (November 24) came just hours before the Genesis announcement.
I think the six companies Sacks was talking about are divided into two groups:
OpenAI, Anthropic and xAI, none of whom have a viable business model.
Meta, Google and Microsoft, all of whom are pouring the cash from their viable business models into this non-viable business,
This is the reason why the hyperscalers are taking desperate financial measures. They are driven by FOMO but they all see the probability that the debt won't be paid back. Where is the revenue to pay them back going to come from? It isn't going to come from consumers, because edge inference is good enough for almost all consumers (which is why 92% of OpenAI's customers pay $0). It isn't going to come from companies laying off hordes of low-paid workers, because they're low-paid.
So before they need to replace the 60% of the loan's value with the next generation of hardware in 2027 they need to find enterprise generative AI applications that are so wildly profitable for their customers that they will pay enough over the cost of running the applications to cover not just the payments on the loans but also another 30% of the loan value every year. For Meta alone this is around $30B a year!
And they need to be aware that the Chinese are going to kill their margins. Thanks to their massive investments in the "hoax" of renewable energy, power is so much cheaper in China that systems built with their less efficient chips are cost-competitive with Nvida's in operation. Not to mention that the Chinese chip makers operate on much lower margins than Nvidia. Nvidia's chips will get better, and so will the Chinese chips. But power in the US will get more expensive, in part because of the AI buildout, and in China it will get cheaper.
If I were advising Xi, I’d counsel him to go for the jugular by engaging in AI-dumping, a repeat of their aughts steel-dumping playbook. It’s already underway — and working. Eighty percent of a16z startups use open-source Chinese models. Same story at Airbnb. China is registering similar or better performance as the American LLM leaders, but with a fraction of the capex. Flooding the market with competitive, less-expensive AI models will put pressure on the margins and pricing power of the Mag 7, taking down a frighteningly concentrated S&P and likely sending the U.S., possibly the globe, into recession.
The average of the three US models' cost is $12.33. The average of the three Chinese models' cost is $1.36. The US models are 9 times more expensive, but they are nowhere near 9 times better.
This was only the second conference I have ever attended, with my first being ACM Hypertext in September, which focused on hypertext and web-based systems. Having that experience as a reference point made it easier to navigate HiPC, which also highlighted how different large international conferences can feel in terms of scale, structure, and research focus.
This trip can be broken down into two main parts: a short sightseeing visit to Delhi and Agra, followed by the conference in Hyderabad.
Delhi and Agra
The trip started with a couple of days in New Delhi. Since I had not been to this part of the world before, I wanted to take the opportunity to explore the city before the conference. Delhi is enormous, both geographically and culturally. Over the first two days of the trip, I ended up walking more than 50 kilometers.
During this time, I visited several UNESCO World Heritage sites, including the Taj Mahal, Agra Fort, and landmarks within Delhi such as Qutub Minar, which is the world's tallest brick minaret. While in Delhi, I also met up with a few friends from Fermilab. Although we didn't do any sightseeing together, we did manage to go out for dinner one evening, which was a nice break from traveling and a fun way to catch up before the conference started. Starting the trip with sightseeing was a great contrast to the dense technical program that followed.
HiPC 2025
I arrived in Hyderabad on December 17, one day before the main technical program began. HiPC 2025 turned out to be the largest conference I have attended so far, both in terms of attendance and breadth of topics covered, spanning high-performance computing, AI systems, and quantum computing.
Day 1: December 18, 2025
The main conference started on December 18. One notable statistic that stood out in the opening session was that only 29% of all submitted papers were accepted to the main proceedings. I was fortunate that Zeus was part of that small fraction.
The Day 1 keynote was given by Dr. Pratyush Kumar from Sarvam AI. His talk focused on what it actually takes to train large language models from scratch. He walked through the challenges of setting up compute and data infrastructure and shared lessons learned while building LLMs in practice.
One part I found especially interesting was his discussion of real-world applications, including examples where language models helped with educational videos with real-time audio in multiple languages, while keeping the same voice as the original speaker. Overall, the keynote gave a very practical view of LLM development, beyond just model architectures.
The rest of the day featured workshops and technical sessions covering HPC systems, AI, and education.
Day 2: December 19th, 2025
The second day was truly special, as it was the day I presented our work at the conference. The Day 2 keynote was delivered by Dr. Jasjeet Singh Bagla from IISER Mohali, who gave an overview of the history and evolution of high-performance scientific computing in India, starting from early academic efforts and moving toward current large-scale systems. He also discussed challenges faced by the scientific community, especially as we move toward exascale computing and increased use of AI/ML in scientific applications. A major point he emphasized was that effective use of HPC systems depends not just on hardware, but also on ease of use, documentation, maintenance, training, and user support.
He explained how Q-CTRL's software helps unlock better performance across a wide range of applications, including logistics, scheduling, quantum machine learning, and automotive design. He also shared real-world deployments with industry partners such as Airbus, Mitsubishi, and Mazda.
Another interesting point he made was about the future direction of quantum computing, including integrating quantum systems into data centers and using smaller, more affordable quantum processors that can still deliver useful results when paired with strong, pre-validated software. Overall, the talk gave a clear and practical perspective on where quantum computing is today and how software will play a major role in its progress.
Particle Swarm Optimization to perform an initial global search, identify promising regions of the search space,
BFGS, a quasi-Newton method, for fast local convergence,
Automatic Differentiation (AD) to compute gradients accurately without requiring users to manually derive them,
and massively parallel GPU execution, where hundreds or thousands of independent optimizations run concurrently.
The algorithm operates in two phases. First, a small number of PSO iterations are used to improve the quality of the starting points. In the second phase, each particle independently invokes a BFGS optimization on the GPU, using forward-mode AD to compute gradients efficiently. Once sufficient convergence is reached, all the threads synchronize and terminate to stop early using atomic operations.
By running many independent optimizations in parallel on GPUs, Zeus achieves 10x--100x speedups over a perfectly parallel CPU implementation while also improving accuracy compared to existing GPU-based methods. One of the advantages of the parallel algorithm is that it is less sensitive to poor starting points, whereas for the sequential version, we must repeatedly restart until sufficient convergence is achieved.
In the talk, I also discussed experimental results from both synthetic benchmark functions, such as the Rastrigin and Rosenbrock, and a real-world high-energy physics application. The example plot shows simulated data from proton-proton collisions at the Large Hadron Collider. When protons collide, their quarks and gluons produce sprays of particles called jets. When two jets are produced, their invariant mass can be reconstructed and fitted by minimizing a negative log likelihood. The pull distribution measures how far each data point is from the fit, in units of its expected uncertainty. A good fit should have pulls fluctuating around zero and mostly within ±2σ. This shows agreement between the simulated data and the model prediction. I also touched on current limitations, such as handling objectives with discontinuous derivatives, and outlined future work, including deeper levels of parallelism and improved stopping criteria.
Presenting this work felt especially meaningful because it tied together my internship experience at Fermilab and my growing interest in high-performance computing. It was rewarding to share our ideas with the community and see how the broader themes of the conference connected directly with our contribution.
Day 3: December 20th, 2025
The third and final day focused heavily on AI/ML topics, along with a very interesting keynote speaker, and concluded with a quantum computing workshop.
The Day 3 keynote was given by Dr. Christos Kozryakis from Stanford University and NVIDIA Research. His talk focused on how AI workloads are shaping modern datacenter design. He argued that current AI systems often follow a supercomputing-style approach, which may not be the best fit as models continue to scale.
Instead, he made a case for scale-out AI systems, where efficiency and system-level design play a bigger role. One idea that stayed with me was his discussion of power and energy efficiency, especially the question of how much AI can realistically fit within a gigawatt of power.
Later in the day, I attended the Quantum Computing Workshop, which was one of the highlights of the conference for me. This workshop was particularly exciting for me, as I will be taking the Quantum Computing course in Spring 2026, and I am interested in exploring how Zeus could be mapped into a hybrid classical-quantum optimization algorithm.
To close the workshop, a speaker from Fujitsu presented the current state of their quantum research, including ambitious plans toward a 1000-qubit machine. After the workshop, I had several valuable discussions with experts in the field. In particular, Dr. Anirban Pathak provided initial guidance on how my current algorithm could be adapted toward a hybrid classical-quantum approach.
Additionally, Aravind Ratnam pointed me to Q-CTRL's learning tutorials, which he recommended as an excellent hands-on resource for building a stronger foundation in quantum computing.
To close the conference, I attended the banquet, which featured a cultural program and an Indian Dinner at the rooftop restaurant.
Closing Thoughts
As only my second conference, HiPC 2025 was both intense and deeply rewarding. Compared to my first conference, I felt noticeably more confident presenting my work, asking questions, and engaging with researchers across different fields. At the same time, the experience reinforced a familiar lesson that conferences are just as much about people and conversations as they are about papers and talks.
I am grateful for the opportunity to present this work, for the feedback I received, and for the many discussions that will shape my future research directions. HiPC 2025 was an unforgettable experience, and I hope to return again.
Inspired by Tom Whitwell's 52 things I learned in 2022, I started my own list of things I learned in 2023 and repeated it last year.
Reaching the end of another year, it is time for Things I Learned In 2025. Part way through the year I had the brilliant idea of putting a learning at the bottom of my weekly newsletter, and that worked well until the middle of the year when I stopped publishing newsletter issues. So here is a half year of learnings.
What did you learn this year?
Let me know on Mastodon or Bluesky.
In Ethiopia, time follows the sun like nowhere else
Because Ethiopia is close to the Equator, daylight is pretty consistent throughout the year. So many Ethiopians use a 12-hour clock, with one cycle of 1 to 12 — from dawn to dusk — and the other cycle from dusk to dawn. Most countries start the day at midnight. So 7:00 a.m. in East Africa Time, Ethiopia&aposs time zone, is 1:00 in daylight hours in local Ethiopian time. At 7:00 p.m., East Africa Time, Ethiopians start over again, so it&aposs 1:00 on their 12-hour clock.
This could have easily gone in the Thursday Threads on time standards.
There are 12 hours of daylight, numbered 1 through 12.
Then 12 hours of night, numbered 1 through 12.
What could be easier?
A biographer embedded with the Manhattan Project influenced what we think about the atomic bomb
In early 1945, a fellow named Henry DeWolf Smyth was called into an office in Washington and asked if he would write this book that was about a new kind of weapon that the US was developing. The guy who had called him into his office, Vannevar Bush, knew that by the end of the year, the US was going to drop an atomic bomb that had the potential to end the war, but also that as soon as it was dropped, everybody was going to want to know what is this weapon, how was it made, and so forth. Smyth accepted the assignment. It was published by Princeton University Press about a week after the bomb was dropped. It explained how the US made the bomb, but it told a very specific kind of story, the Oppenheimer story that you see in the movies, where a group of shaggy-haired physicists figured out how to split the atom and fission, and all of this stuff. The thing is, the physics of building an atomic bomb is, in some respects, the least important part. More important, if you actually want to make the thing explode, is the chemistry, the metallurgy, the engineering that were left out of the story.
The quote above comes from the transcript of this podcast episode.
I've thought about this a lot in the past week as the Trump administration's flood-the-zone strategy overwhelms the senses.
In a valiant effort to cover everything that is news, I can't help but wonder about the lost perspective of what isn't being covered.
And I wonder where I can look to find that perspective.
The origin of the computer term "mainframe" comes from "main frame" — the 1952 name of an IBM computer's central processing section
Based on my research, the earliest computer to use the term "main frame" was the IBM 701 computer (1952), which consisted of boxes called "frames." The 701 system consisted of two power frames, a power distribution frame, an electrostatic storage frame, a drum frame, tape frames, and most importantly a main frame.
"Mainframe" is such a common word in my lexicon that it didn't occur to me that its origins was from "main frame" — as in the primary frame in which everything else connected.
I've heard "frame" used to describe a rack of telecommunications equipment as well, but a quick Kagi search couldn't find the origins of the word "frame" from a telecom perspective.
It takes nearly 3¢ to make a penny, but almost 14¢ to make a nickel
FY 2024 unit costs increased for all circulating denominations compared to last year. The penny’s unit cost increased 20.2 percent, the nickel’s unit cost increased by 19.4 percent, the dime’s unit cost increased by 8.7 percent, and the quarter-dollar’s unit cost increased by 26.2 percent. The unit cost for pennies (3.69 cents) and nickels (13.78 cents) remained above face value for the 19th consecutive fiscal year
I knew pennies cost the U.S. mint more than one cent to make, but I didn't realize that the cost of nickels is so much more out of whack.
I also learned a new word: seigniorage — the difference between the face value of money and the cost to produce it.
It is much harder to get to the Sun than it is to Mars
The Sun contains 99.8 percent of the mass in our solar system. Its gravitational pull is what keeps everything here, from tiny Mercury to the gas giants to the Oort Cloud, 186 billion miles away. But even though the Sun has such a powerful pull, it’s surprisingly hard to actually go to the Sun: It takes 55 times more energy to go to the Sun than it does to go to Mars.
I suppose it that headline above needs some nuance.
It is easy to get to the Sun...just escape Earth's gravity and point yourself there.
It is hard to get to the Sun in a controlled way that means you won't burn up along the way.
There are now 23 Dark Sky Sanctuaries in the World
Rum, a diamond-shaped island off the western coast of Scotland, is home to 40 people. Most of the island — 40 square miles of mountains, peatland and heath — is a national nature reserve, with residents mainly nestled around Kinloch Bay to the east. What the Isle of Rum lacks is artificial illumination. There are no streetlights, light-flooded sports fields, neon signs, industrial sites or anything else casting a glow against the night sky. On a cold January day, the sun sets early and rises late, yielding to a blackness that envelopes the island, a blackness so deep that the light of stars manifests suddenly at dusk and the glow of the moon is bright enough to navigate by.
The pictures that accompany this article from the New York Times are stunning (gift link).
And to think that there are only 23 places in the world that have reached this level of commitment to the environment.
Mexico has only one gun store for the entire country
Mexico notes that it is a country where guns are supposed to be difficult to get. There is just one store in the whole country where guns can be bought legally, yet the nation is awash in illegal guns sold most often to the cartels.
Plants reproduce by spreading little plant-like things
This is where pollen comes in. Like sperm, pollen contains one DNA set from its parent, but unlike sperm, pollen itself is actually its own separate living plant made of multiple cells that under the right conditions can live for months depending on the species... So this tiny male offspring plant is ejected out into the world, biding its time until it meets up with its counterpart. The female offspring of the plant, called an embryosac, which you're probably less familiar with since they basically never leave home. They just stay inside flowers. Like again, they're not part of the flower. They are a separate plant living inside the flower. Once the pollen meets an embryosac, the pollen builds a tube to bridge the gap between them. Now it's time for the sperm. At this point, the pollen produces exactly two sperm cells, which it pipes over to the embryosac,
which in the meantime has produced an egg that the sperm can meet up with. Once fertilized, that egg develops into an embryo within the embryosac, hence the name, then a seed and then with luck a new plant. This one with two sets of DNA.
—Pollen Is Not Plant Sperm (It’s MUCH Weirder), MinuteEarth, 7-Mar-2025
Pollen is not sperm...it is a separate living thing!
And it meets up with another separate living thing to make a seed!
Weird!
The video is only three and a half minutes long, and it is well worth checking out at some point today.
Most plastic in the ocean isn't from littering, and recycling will not save us
Littering is responsible for a very small percentage of the overall plastic in the environment. Based on this graph from the OECD, you can see littering is this teeny-tiny blue bar here, and mismanaged waste, not including littering, is this massive one at the bottom. Mismanaged waste includes all the things that end up either in illegal dump sites or burned in the open or in the rivers or oceans or wherever. The focus on littering specifically, it's an easy answer because obviously there's nothing wrong with discouraging people from littering, but it focuses on individual people's bad choices rather than systemic forces that are basically flushing plastic into the ocean every minute. Mismanaged waste includes everything that escapes formal waste systems. So they might end up dumped, they might end up burned, they might end up in the environment.
Contrary to popular belief, most plastic in the Great Pacific Garbage Patch stems from the fishing industry, with only a small fraction linked to consumer waste.
The video highlights that mismanaged waste, rather than individual littering, is the primary contributor to plastic pollution, with 82% of macroplastic leakage resulting from this issue.
It emphasizes the ineffectiveness of recycling as a solution, noting that less than 10% of plastics are currently recycled, and the industry has perpetuated the myth that recycling can resolve the plastic crisis.
Microplastics, which are increasingly recognized as a major problem, originate from various sources, including tires and paint, with new data suggesting that paint is a significant contributor.
"But where is everybody?!?" — the origins of Fermi's Paradox
The eminent physicist Enrico Fermi was visiting his colleagues at Los Alamos National Laboratory in New Mexico that summer, and the mealtime conversation turned to the subject of UFOs. Very quickly, the assembled physicists realized that if UFOs were alien machines, that meant it was possible to travel faster than the speed of light. Otherwise, those alien craft would have never made it here. At first, Fermi boisterously participated in the conversation, offering his usual keen insights. But soon, he fell silent, withdrawing into his own ruminations. The conversation drifted to other subjects, but Fermi stayed quiet. Sometime later, long after the group had largely forgotten about the issue of UFOs, Fermi sat up and blurted out: “But where is everybody!?”
This retelling of the Fermi Paradox coms from this story about why, despite the vastness of the universe, we have yet to encounter evidence of extraterrestrial civilizations.
Enrico Fermi famously posed the question, "Where is everybody?" suggesting a disconnect between the expectation of abundant intelligent life and the lack of observable evidence.
With this comes the Great Filter notion...proposing that there may be significant barriers preventing intelligent life from becoming spacefaring.
The article goes on to speculate where we are relative to the "Great Filter" — are we past it, or is it yet in front of us?
In other words, have we survived the filter or is our biggest challenge ahead of us?
The pronoun "I" was capitalized to distinguish it from similarly typset letters
In fact, the habit of capitalizing “I” was also a practical adaptation to avoid confusion, back in the days when m was written “ııı” and n was written “ıı.” A stray “i” floating around before or after one of those could make the whole thing hard to read, so uppercase it went. And now it seems perfectly logical.
I'm not buying the opinion author's underlying premise (capitalizing “they” in writing when it refers to a nonbinary person), but the origins of why we capitalize "I" and not other pronouns are fascinating.
The word "scapegoat" originated in a 1530 bible translation
Early English Christian Bible versions follow the translation of the Septuagint and Latin Vulgate, which interpret azazel as "the goat that departs" (Greek tragos apopompaios, "goat sent out", Latin caper emissarius, "emissary goat"). William Tyndale rendered the Latin as "(e)scape goat" in his 1530 Bible. This translation was followed by subsequent versions up through the King James Version of the Bible in 1611: "And Aaron shall cast lots upon the two goats; one lot for the Lord, and the other lot for the scapegoat."
—Scapegoat, Wikipedia
Have you stared at a word and suddenly wondered about its origins?
This entry from the New York Times Flashback Quiz had me wondering about "scapegoat".
"scape" — "goat".
Why do we say that?
It comes from a phrase in the bible where a goat sent into the wilderness on the Day of Atonement as a symbolic bearer of the sins of the people — Leviticus 16:22, to be exact.
The translator coined the term from the interpretation of "the goat that departs" and "emissary goat" in that verse.
It was one of the first memes ever, a viral sensation that went mainstream back when people still used dial-up internet. Yet the cameraman behind “Leeroy Jenkins” still seems stupefied that anyone fell for it.
First posted on May 10, 2005, this year marks the 20th anniversary of this bit of internet folklore.
I remember when this first came out, and I totally believed it was real until earlier this year.
Ammonium chloride is a slightly toxic chemical most notably found in “salmiak,” a salt licorice candy, which is popular in northern Europe. In a new study, researchers found that the compound triggers a specific proton channel called OTOP1 in sour taste receptor cells, which fulfills one of the key requirements to be considered a primary taste like sweet, salty, sour, bitter, and umami. Ammonium is commonly found in waste products and decaying organic matter and is slightly toxic, so it makes sense that vertebrates evolved a specific taste sensor to recognize it.
The Penn Libraries, where I work, has first editions of many of the works featured in my #PublicDomainDayCountdown . From today through Public Domain Day, the Librariessocialmedia will feature photos of some distinctive books from 1930.
Figure 1: (a) Difficulties in understanding posts due to missing or assumed context, (b) need for standardization of posts (Figure 1 in M.J. Ferdous et al.).
Introduction
Social media sites such as Reddit, Facebook, and YouTube play a central role in how people exchange ideas, debate current issues, and build communities, with threaded discussions serving as a key mechanism for interaction across these platforms. These platforms are designed around visual structures: nested replies, indentation, spacing, and visual grouping allow sighted users to quickly scan conversations, identify relationships between posts, and decide where to engage. For blind users who rely on screen readers, accessing these conversations involves a fundamentally different interaction model. As shown in the accompanying video, screen readers present content linearly, reading one element at a time while the user moves through posts and interface elements using keyboard commands such as Tab and Shift+Tab. This linear, element-by-element experience sits awkwardly against the visual threading of replies, creating a clunky mismatch between how the conversation is visually structured and how it is revealed through speech. While this provides access to text, it does not preserve the visual organization that conveys conversational structure. Most accessibility efforts on the web focus on technical compliance. For example, in Figure 1(a), a reply references prior context that appears earlier in the thread, but that relationship is not explicitly conveyed. A blind reader must navigate sequentially through multiple preceding posts to reconstruct the missing context, making it difficult to understand the intent of the reply without significant effort. On the other hand, in Figure 1(b), for blind users relying on screen readers, such posts written in an informal, non-standardized style are difficult to interpret aurally, increasing the need for clearer, more standardized wording to support comprehension. As a result, this difference creates a gap between technical accessibility and conversational usability. In this blog, I address the findings from our recently published IJHCI paper “Understanding Online Discussion Experiences of Blind Screen Reader Users” where we examined how blind screen reader users experience and navigate online discussions, with a focus on challenges that extend beyond traditional notions of web accessibility.
Study Overview
Our findings are based on a qualitative interview study with 20 blind individuals who regularly participate in online discussions. Participants ranged in age from 30 to 60 and included both expert (8) and non-expert (12) screen reader users. All participants reported frequent use of platforms such as Reddit, Facebook, and YouTube for reading and contributing to discussions. We relied on in-depth, semi-structured interviews rather than log analysis or automated metrics. This approach allowed participants to describe their experiences, challenges, and strategies in their words (Figure 2). The goal was to capture not only what difficulties occur, but also why they occur and how users adapt to them in practice.
Figure 2: Illustration of the interview study process.(Figure 2 in M.J. Ferdous et al.)
Key Findings
1. Preference for Longer, Context-Rich Posts
A majority of participants (14 out of 20) reported a preference for longer discussion posts rather than short or minimal replies. This preference contrasts with common design assumptions that shorter content is always easier to consume. Participants explained that longer posts often restate context, clarify intent, and make explicit references to earlier points in the discussion. This reduces the need to navigate backward through a thread to recover missing information. Longer posts also help mitigate issues related to pronunciation errors, slang, or non-standard language, which can be difficult for screen readers to handle accurately. For these users, additional detail supports comprehension and reduces cognitive effort.
2. Difficulty Joining Ongoing Conversations
Nearly all participants (18 out of 20) described joining an already active discussion as particularly challenging. When a discussion has many replies, blind users often need to listen to a substantial portion of the thread before understanding the current state of the conversation. This process can take considerable time, especially when new posts continue to appear while the user is still catching up. Participants frequently described feeling out of sync with the conversation. By the time they felt confident enough to contribute, the discussion had often moved on, resulting in fewer responses to their comments. Two expert users reported fewer difficulties, attributing their success to extremely high speech rates and years of experience developing audio-based skimming strategies. However, these strategies require significant effort to acquire and are not representative of typical screen reader use.
3. Impact of Missing Context
All participants reported encountering posts that were difficult to interpret because they lacked sufficient context. Common examples included replies that referenced earlier comments without restating them, posts that relied on images or videos without description, and comments that implicitly addressed specific parts of an article without indicating which section was being discussed. When context is missing, blind users often attempt to navigate backward through the thread to locate the original reference. This process is time-consuming and cognitively demanding, and it does not always succeed. Participants noted that while searching backward, they sometimes forgot the original question or comment that prompted the search, further increasing confusion and frustration.
4. Limitations of a Single Screen Reader Voice
Some participants, particularly those with extensive screen reader experience, highlighted limitations related to how conversations sound. Listening to multi-person discussions through a single, uniform voice made it difficult to distinguish between speakers and reduced engagement. Participants noted that while basic access was available, the experience lacked the social cues present in face-to-face conversations or even audio chats. These users suggested that auditory differentiation, such as varying voice characteristics across speakers, could improve both comprehension and engagement. This finding suggests that once basic access barriers are addressed, experiential factors become increasingly important for long-term participation.
Design Implications
The findings point to several opportunities for improving conversational usability for blind screen reader users. These opportunities focus on reducing cognitive load and improving contextual awareness rather than solely improving access to individual elements.
Thread summarization could help users quickly understand the main points of a discussion without listening to every post.
Context-aware navigation could allow users to follow specific reply chains or sub-conversations more easily.
Text normalization could convert slang, abbreviations, and informal language into more screen-reader-friendly forms while preserving original content.
Auditory differentiation could improve speaker identification and conversational flow in multi-participant discussions.
These directions build directly on participant feedback and reflect practical extensions of existing assistive technologies.
Conclusion
This study highlights that accessibility in online discussions involves more than making text readable by assistive technologies. For blind screen reader users, the primary challenge lies in understanding and participating in conversations that were designed around visual structure and rapid interaction. Addressing these challenges requires attention to conversational context, navigation, and cognitive effort. By examining the lived experiences of blind users, this work emphasizes the importance of designing discussion platforms that support not only access but also meaningful participation. As online discussions continue to shape public discourse, improving conversational usability is an essential step toward more inclusive digital spaces.
This post includes my presentation made before the Library Futures panel discussion about Ai and IL. It also includes an addendum of a couple points I wish I could have added at the time.