Brushing Up Science

Science is a like a …

Improving public understanding of science would be a good thing, right?

Right?

Not so much, argued professor of communication Leon Trachtman 40 years ago, because it’s difficult to communicate to the public “the real feel of scientific research, which is tentative, sometimes seeming to take two steps backwards for every one forward, ultimately imprecise in technique and measurement, and full of experimental error.”

What, then, is a good analogy for science that captures these challenges?

Science is like a jigsaw puzzle

In both science and jigsaws, you start with the low-hanging fruit, the places where it is easiest to make fast progress. Some bright red pieces, say, that all clearly go together. For a while, you make quick progress, until you don’t. So you jump elsewhere and repeat the process with a different subset of pieces.

Some parts of a jigsaw are satisfying. Others, like the sky pieces, are tedious but necessary. Sounds a lot like science. So is a jigsaw puzzle a good analogy?

If science had badges like StackExchange

Standard citation metrics are bland: The h-index? boring. The total number of citations? Meh. The i-10 index? Arbitrary. Imagine if, instead, there was a badge system like the one used by StackExchange.

The StackExchange Badge system

StackExchange has two types of reputation. The first is simple: you score 10 points whenever someone upvotes a question or answer you’ve written. The second is more unusual: you collect badges—gold, silver, and bronze—for achieving specific (and sometimes odd) things.

The reputation score is like a citation count. Both are fake internet points with no direct tangible value. By comparison, the badges are subjective, but they highlight that there are many possible ways that a question or answer can be helpful to the community—something a single number can’t do.

Take the Necromancer badge, which is given for answering an unanswered question 60+ days later. It encourages users to answer questions that might otherwise languish unanswered forever.

We could have a Necromancer badge for academic citations as well. The time limit would need to be longer, but the idea is the same. You could, say, get the Necromancer badge by citing an uncited paper that is more than five years old.

The 100-scientific-papers rule

If you’re ready to submit a scientific paper, you will have read 100 related papers.

Why 100? Well that advice has no basis more reliable than my own meandering experience. It’s my take on what it takes these days to be well versed on a specific topic and its broader background.

A typical scientific paper these days includes 30–50 references. Personally, I’ve gone as low as 24 and as high as 77. Twenty years ago, these numbers would’ve been lower, perhaps half as many. But rather than dwell on issues of inflation of the academic coin, we’ll just stick with 30–50 papers as our rough guess for now.

By the time you’re writing your own paper, you should’ve read more papers than you cite. And if you do the math, I’m perhaps implying that you should read 2–3 papers for every one that gets cited. Explore the literature beyond its essentials, but only so far before you reach a point of diminishing returns. Reasonable advice, right?

Continue reading “The 100-scientific-papers rule”

Scientific publishers, please give your documents better file names

I just wanna be able to download a scientific paper and have it come with a useful file name. Is that too much to ask?

Examples of file naming conventions used by scientific publishers that are not useful to humans

Jupyter Notebooks are gone from my scientific workflow

TL;DR: I’ve just learned that the text editor Sublime Text can display images within Markdown files. Gone therefore is my need to use Jupyter Notebooks.

I was never a true convert to Jupyter Notebooks. I used them for several years, and saw their appeal, but they just didn’t quite feel right to me.

Most complaints against Notebooks are technical ones: they’re awkward to version control, they’re hard to debug, and they promote poor programming practices. But these issues are tangential to my complaints against Notebooks, which are are less concrete:

I’m always scrolling. It’s inefficient.
I don’t want to do work in a browser. Maybe it’s a weak reason, but I like keeping my scientific and programming tools separate from the browser.
Editing and navigating Notebooks feels clumsy. Maybe it’s a lack of practice, but I’d rather leverage the time I’ve invested in learning and setting up my text editor than spend time learning a bunch of new shortcuts specific to Notebooks.

Non-scientific software that helps me get science done

This is a shout out to all the software that helps my science happen despite not necessarily being developed for scientific purposes.

Fair warning, the list skews toward Linux programs since that’s what I use in my day-to-day work.

Tmux

I spend a lot of time at the command line. Or rather, command lines (note the plural). I often have four open at once. And I want to see all four at once, and jump back and forth between them all. Separate terminal windows or tabs don’t cut it. But Tmux does.

Here’s a pared-down example of how I might typically use Tmux: two panes, with one for editing text and the other exploring exploring directories.

Not gonna lie, Tmux is awkward to start with. The default keyboard shortcuts aren’t intuitive, simple things like copy/paste functionality don’t necessary work as you’d expect them to, and many online resources are outdated because older versions of Tmux used configuration commands that are no longer compatible.

But Tmux is well worth the learning curve.

Introductions in scientific papers can give warped and inflated perspectives

A direct and quantifiable impact on science to come out of my PhD was the 50-odd times that I brewed coffee for the department morning tea. Scientists turned up and got coffee; I got thanked for helping make that happen.

Despite its impact, brewing coffee is not listed on my CV¹. Instead, I have publications. Yet, compared to coffee, the direct impacts of these publications are hard to define.

Does your scientific paper smell?

In computer programming¹, code smells are “surface indications that usually correspond to deeper problems in the system”. Duplicated code is one example. Copying a code fragment into many different places is generally considered bad form; Don’t Repeat Yourself is a well known principle of software development. However, duplicating code can be beneficial if, say, it makes the code easier to read and maintain.

Although code smells are undesirable, “they are not technically incorrect and do not prevent the program from functioning.”

By this description, I’d argue that smells also exist in scientific papers. Hence, I’m proposing a few of these easy-to-spot (aka sniffable) features that may point to a deeper underlying issue.

Continue reading “Does your scientific paper smell?”

Don’t write about your scientific paper, just write it

Scientific writing is obsessed with other scientific writing¹ and itself.

Phrases like ‘this paper‘ and ‘this study‘ are everywhere in scientific writing²—which is not a problem per se. Used well, these phrases concisely differentiate the current study from others. Used poorly, these phrases fill the word count without adding value to the reader.

Never, for example, start a Conclusion with ‘In this paper, we showed . . .’ or ‘The main conclusions of this paper are . . .“. The first few words of a Conclusion (any section, in fact) are precious. Don’t waste them reminding me that I’m reading a paper in which you’ve shown or concluded something. Tell me something profound—something about your science.

“In this paper, we showed . . .” is a signpost (aka metadiscourse). It’s writing about the writing. And it’s a main reason that so much of science writing, like any academic writing, is so boring.

Line graphs: the best and worst way to visualise data

Line graphs are the Swiss army knives of data visualisation. They can be almost anything… which is both good and bad.

Line graphs are slow to interpret

Many graphs serve one clear purpose. Take the five graphs below:

Even without labels, it’s clear what role each of these graphs serves:

Pie chart—components of a total
Thermometer—progress toward a goal amount
Speedometer—percentage of the largest possible value
Histogram—distribution of values
Box plot—statistical summaries of several datasets

In other words, if I’m presented with one of the graphs above, I have an immediate head start on interpreting it. If, instead, I’m presented with a line graph, I’m forced to read the axes labels and limits first.

Deciphering text is the slow way to intake information. Shape is fastest, then colour, and only then text. This so-called Sequence of Cognition, popularised by Alina Wheeler, is something marketers need to know about.