Staying afloat in a sea of literature


An informal history of how coders and researchers have been trying to answer one of academic life’s biggest questions — how can we stay on top of new publications?

Wave in the ocean-2

I'm drowning

A regenerative medicine professor at King’s College London once told me that “A PhD student should be reading at least five papers a week, and expect to find that four of the five aren’t actually relevant to their work.” My personal experience is more akin to 24 of 25 papers not being directly relevant to my PhD thesis—though obviously I had to read them to make sure they weren’t.

A PhD student should be reading at least 5 papers a week, and expect to find that 4 of the 5 aren’t actually relevant to their work.

The Internet has enabled instantaneous, low-cost dissemination of content, but with more and more information available, and increasingly multidisciplinary approaches to academic research, the problem becomes that of finding needles in ever growing haystacks.

Researchers, junior and senior alike, are perpetually haunted by the worry that they may have missed new publications of importance to their own work. What’s being done to keep us afloat in this sea of information we find ourselves in?

Let's start from the '90s

For those of you lucky enough to have experienced the ’90s, you may remember the rom-com classics and era-defining boy bands, but also the popularisation of Internet usage and the advent of Google Search, which celebrated its 20th birthday this September.

Tools and innovation in scholarly communication-2Academics have, in some ways, been spoilt ever since with digital publication formats, electronic databases, and a concerted effort by innovators across the academic, publishing and technology communities to build increasingly sophisticated tools (400 and counting, according to this study by librarians Bianca Kramer and Jeroen Bosman of Utrecht University).

In my view of the last two decades, technologies to tackle the information influx have evolved in a number of overlapping spurts.

Phase I: Structure all content

Think librarianship in overdrive.

Screenshot of arXiv in 1994-2Searchable digital repositories

Sophisticated keyword tagging

Phase II: Delegate the dumb tasks

What can I trust a computer to do without me?

The NECI Scientific literature digital library -2Satellites offer a promising solution, but satellite data are affected by cloud cover, solar light input angle, and how often the satellite passes overhead.

The two main satellite-based technologies for tracking ships are the Automatic Information System (AIS) and Vessel Monitoring System (VMS).

Both are cooperative systems where a transponder is installed on a vessel and communicates with the shipboard Global Positioning System (GPS). AIS is an open, non-proprietary system with international standards that usually transmits continuously, but it can easily be turned off or hacked. VMS is more difficult to tinker with, but it’s proprietary and has high barriers to data access because the country or shipowner has to hand over the data. More about ships and satellites here.

Automatic indexing and refined search algorithms

The meteoric rise of Google Search can be attributed to:

The launch of Google Scholar in 2004 gave researchers the convenience of the Google search algorithm paired with a formidable content database of tens of millions of scholarly publications and legal cases, dwarfing its open-access predecessor, CiteSeer, the 1997 brainchild of American and Australian researchers at the NEC Research Institute.

A vintage 2009 eTOC from Science from my very own inbox — imagine screening a dozen of these every weekEmail alerts and RSS feeds

The digitisation of journals and literature repositories meant that it was now possible to set up RSS feeds to track new content matching your keywords of interest, or subscribe to emails detailing the latest table of contents (eTOCs) for your favourite journals. A great help until you change institute and realise that you have to change the email address for over 50 journal eTOC subscriptions one by one.

Phase III: PC, PC, on the wall…

Citation graphs and metrics

Natural language-based recommendations


Where’s our bright AI future?

Though machine learning has evolved in leaps and bounds in the past decade, the most likely way to be notified of a new, useful publication is still an ad hoc recommendation from a well-meaning colleague or collaborator.

Crowd recommendations

Targeting your gut microbiome to prevent cardiovascular disease-2Sparrho, on the other hand, encourages the crowd to curate their own public collections of research articles (pinboards) and write short summaries to explain why these papers belong together, tapping into the unique ability of humans to make unexpected connections between research in different fields. The result is a new way for experts as well as newcomers to explore the literature and hear directly from researchers.

But alas, unavoidable bias?

So, what now?


A little bit of advice to finish



Sybil Wong

by Sybil Wong

Partnerships @Synthace / biochemist / occasionally writes