Philosophy's Digital Future

How technology could transform academic research

Feb 12, 2024

Our current system for academic publishing strikes me as outdated. The ‘filter then publish’ model was designed for a non-digital world of high publication costs. Online publishing removes that constraint, enabling the shift to a superior ‘publish then filter’ model. What’s more: future advances in AI will make it easier to “map” our collective knowledge, identifying the most important contributions and highlighting gaps where more work is needed. Putting the two together yields a vision of a future academic system that seems far better suited to advancing our collective understanding than our current system.

Mapping the Literature

Imagine having access to an accurate synthesis of the academic literature, viewable at varying degrees of detail, mapping out everything from (a) the central positions in a debate, and the main arguments for and against each candidate position, to (z) the current status of the debate down to the n-th level of replies to replies to sub-objections. Such a comprehensive mapping would be far too much work for any human to do (though the high-level summaries of a debate offered in “survey” papers can be very helpful, they are inevitably far from complete, and may be tendentious). And current-generation LLMs don’t seem capable of reliably accurate synthesis. But presumably it’s just a matter of time. Within a decade or two (maybe much less), AIs could produce this mapping for us, situating (e.g.) every paper in the PhilPapers database according to its philosophical contributions and citation networks.

You could see at a glance where the main “fault lines” lie in a debate, and which objections remain unanswered. This opens up new ways to allocate professional esteem: incentivizing people to plug a genuine gap in the literature (or to generate entirely new branches), and not just whatever they can sneak past referees. This in turn could remedy the problem of neglected objections (and general lack of cross-camp engagement) that I’ve previously lamented, and encourage philosophical work that is more interesting and genuinely valuable.

Publish then filter

Suppose that your paper gets “added to the literature” simply by uploading it to PhilPapers. The PhilAI then analyzes it and updates the PhilMap accordingly. So far, no referees needed.

The crucial question for any academic system is how filtering works. Information is cheap. What we want is some way to identify the most valuable information: the papers of greatest philosophical merit (on any given topic) that are worth reading, assigning, and esteeming. Currently we rely on hyper-selective prestigious journals to do much of this filtering work for us, but I think they’re not very good at this task. Here I’ll suggest two forms of post-publication filtering that could better help us to identify worthwhile philosophy. (Though let me flag in advance that I’m more confident of the second.)

1. PhilMap influence

Right now, the main numerical measure of influence is citation counts. But this is a pretty terrible metric: an offhand citation is extremely weak evidence of influence,1 and (in principle) a work could decisively settle a debate and yet secure no subsequent citations precisely because it was so decisive that there was nothing more to say.

An interesting question is whether the PhilAI could do a better job of measuring a contribution’s impact upon the PhilMap. One could imagine getting credit based upon measures of originality (being the first to make a certain kind of move in the debate), significance (productively addressing more central issues, rather than epicycles upon epicycles—unless, perhaps, a particular epicycle looked to be the crux of an entire debate), positive influence (like citation counts try to measure, but more contentful) and maybe even negative influence (if the AI can detect that a certain kind of “discredited” move is made less often following the publication of an article explaining why it is a mistake).

If the AI’s judgments are opaque, few may be inclined to defer to its judgments, at least initially. But perhaps it could transparently explain them. Or perhaps we would trust it more over time, as it amassed a reliable-seeming track record. Otherwise, if it’s no better than citation counts, we may need to rely more on human judgment (as we currently do). Still, there’s also room to improve our use of the latter, as per below.

2. Crowdsourcing peer evaluation

This part doesn’t require AI, just suitable web design. Let anyone write a review of any paper in the database, or perhaps even submit ratings without comments.2 Give users options to filter or adjust ratings in various ways. Options could include, e.g., only counting professional philosophers, filtering by reviewer AOS, and calibrating for “grade inflation” (by adjusting downwards the ratings of those who routinely rate papers higher than other users do, and upwards for those who do the opposite) and “mutual admiration societies” (by giving less weight to reviews by philosophers that the author themselves tends to review unusually generously). Ease of adding custom filters (e.g. giving more weight to “reviewers like me” who share your philosophical tastes and standards) would provide users more options, over time, to adopt the evaluative filters that prove most useful.

Then iterate. Reviews are themselves philosophical contributions that can be reviewed and rated. Let authors argue with their reviewers, and try to explain why they think the other’s criticisms are misguided. Or take the critiques on board and post an updated version of the paper, marking the old review as applying to a prior version, and inviting the referee to (optionally) update their verdict of the current version. (Filters could vary in how much weight they give to “outdated” ratings that aren’t confirmed to still apply to new versions, possibly varying depending on how others’ ratings of the two versions compare, or on whether third parties mark the review as “outdated” or “still relevant”.) Either way, the process becomes more informative (and so, one hopes, likely more accurate).3

Instead of journals, anyone—or any group—can curate lists of “recommended papers”.4 The Journal of Political Philosophy was essentially just “Bob’s picks”, after all. There’s no essential reason for this curation role to be bundled with publication. As with journal prestige, curators would compete to develop reputations for identifying the best “diamonds in the rough” that others overlook. Those with the best track records would grow their followings over time, and skill in reviewing and curation—as revealed by widespread following and deference in the broader philosophical community—could be a source of significant professional esteem (like being a top journal editor today). Some kind of visible credit could go to the reviewers and curators who first signal-boost a paper that ends up being widely esteemed. (Some evaluative filters might seek to take into account reviewer track record in this way, giving less weight to those whose early verdicts sharply diverge—in either direction—from the eventual consensus verdicts.)

One could also introduce academic prediction markets (e.g. about how well-regarded a paper will be in X years time) to incentivize better judgments.

PhilMap Evaluative Filters

Combining these two big changes: users could then browse an AI-generated “map” of the philosophical literature, using their preferred evaluative filters to highlight the most “valuable” contributions to each debate—and finding the “cutting edges” to which they might be most interested in contributing. This could drastically accelerate philosophical progress, as the PhilMap would update much faster than our current disciplinary “conventional wisdom”. It could also help researchers to avoid re-inventing the wheel, focusing instead on areas where more work is truly needed. So there seem clear epistemic benefits on both the “production” and “consumption” sides.

Summary of benefits

The entire system is free and open access.
Users can more easily find whatever valuable work is produced, and understand the big-picture “state of the debate” at a glance.
Valuable work is more likely to be produced, as researchers are given both (i) better knowledge of what contributions would be valuable, and (ii) better incentives to produce valuable work (since it is more likely to be recognized as such).
A small number of gatekeepers can’t unilaterally prevent valuable new work from entering “the literature”. (They also can’t prevent bad new work. But there’s no real cost to that, as the latter is easily ignored.)
It offers a more efficient review process, compared to the current system in which (i) papers might be reviewed by dozens of referees before finally being published or abandoned, and (ii) much of that reviewing work is wasted due to its confidential nature. My described system could solve the “refereeing crisis” (whereby too much work for too little reward currently results in undersupply of this vital academic work—and what is supplied is often of lower quality than might be hoped), thanks to its greater efficiency and publicity.5
Disincentivizes overproduction of low-quality papers. If publication is cheap, it ceases to count for much.
It pushes us towards a kind of pluralism of evaluative standards.6 Currently, publishing a lot in top journals seems the main “measure” of professional esteem. But this is a terrible measure (and I say this as someone who publishes a lot in top journals!). Philosophers vary immensely in their evaluative standards, and it would be better to have a plurality of evaluative metrics (or filters) that reflected this reality. Different departments might value different metrics/filters, reflecting different conceptions of what constitutes good philosophy. If this info were publicly shared, it could help improve “matching” within the profession, further improving job satisfaction and productivity, and reducing “search costs” from people moving around to try to find a place where they really fit.

Objections

Are there any downsides sufficient to outweigh these benefits?

1. Incentivizing reviews

In response to a similar proposal from Heeson & Bright to shift to post-publication review, Hansson objects that “it is not obvious where that crowd [for crowd-sourced post-publication review] would come from”:

Anyone who has experience of editing knows how difficult it is to get scholars to review papers, even when they are prodded by editors. It is difficult to see how the number of reviews could increase in a system with no such prodding.
There is an obvious risk that the distribution of spontaneous post-publication reviews on sites for author-controlled publication will be very uneven. Some papers may attract many reviews, whereas others receive no reviews at all. It is also difficult to foresee what will happen to the quality of reviews. When you agree to review a paper for a journal in the current system, this is a commitment to carefully read and evaluate the paper as a whole and to point out both its positive and its negative qualities. It is not unreasonable to expect that spontaneous peer reviews in an author-controlled system will more often be brief value statements rather than thorough analyses of the contents.

An obvious solution would be to make submissions of one’s own work to the PhilMap cost a certain number of “reviewer credits”.7 Reviews of a particular paper might earn diminishing credits depending on how many reviews it has already secured. And they might be subject to further quality-adjustments, based on automatic AI analysis and/or meta-crowdsourced up/down votes. Perhaps to earn credits, you need to “commit” to writing a review of an especially substantive and thorough nature. It would be worth putting thought into the best way to develop the details of the system. But I don’t see any insuperable problems here. Further, I would expect review quality to improve significantly given the reputational stakes of having your name publicly attached. (Current referees have little incentive to read papers carefully, and it often shows.)

2. Transition feasibility

Another worry is simply how to get from here to there. I think the AI-powered PhilMap could significantly help with that transition. Currently, most PhilPapers entries are traditional publications. The PhilMap doesn’t require changing that. But if/as more people (and institutions) started using evaluative filters other than mere journal prestige, the incentive to publish in a journal would be reduced in favor of directly submitting to the PhilMap. And I’d certainly never referee for a journal again once a sufficiently well-designed alternative of this sort was available: I’d much rather contribute to a public review system — I positively enjoy writing critical blog posts, after all! If enough others felt similarly, it’s hard to see how journals could survive the competition.

Of course, this all depends upon novel evaluative metrics/filters proving more valuable than mere journal prestige, inspiring people to vote with their feet. I think journals suck, so this shouldn’t be difficult. But if I’m wrong, the radical changes just won’t take off as hoped. So it seems pretty low-risk to try it and see.

3. Other objections?

I’m curious to hear what other concerns one might have to the proposed system. There was some past discussion of Heeson & Bright’s proposal on Daily Nous, but I think my above discussion addresses the biggest concerns. I’ve also seen mention of a critical paper by Rowbottom, but my institution doesn’t provide access to the journal it’s in, and the author didn’t bother to post a pre-print to PhilPapers, so I can’t read their criticisms. (Further evidence that the current system is lousy!)

For example, my most-cited paper (on ‘Fittingness’) gets mentioned a lot in passing, but ~zero substantial engagement, whereas I get the sense that ‘Value Receptacles’ and ‘Willpower Satisficing’ have done a lot more to change how others actually think about their respective topics. (And, indeed, I think the latter two are vastly better papers.)

Either way, they should flag any potential conflicts of interest (e.g. close personal or professional connections to the author), and others should be able to raise flags when the reviewer themselves fails to do so. Mousing over the reviewer’s name could indicate relevant data about their track record, e.g. professional standing, average ratings that they give to others, etc.

Arvan, Bright, & Heesen argue that formal jury theorems support this conclusion. I’m dubious of placing much weight on such arguments: too much depends on whether the background assumptions are actually satisfied. But their “replies to objections” section is worth reading!

As with reviewers, curators would need to flag any conflicts of interest (but could do whatever they want subject to offering that transparency).

The publicity might deter some grad students and precariously employed philosophers from offering critical reviews (e.g. of work by faculty who could conceivably be on their future hiring committee). But if fewer reviews are needed anyway, those from the securely employed may well suffice. The cowardly might also be mistaken in their assumptions: I’d expect good philosophers to think better of candidates who can engage intelligently (even if critically!) with their work. (But who knows how many people on hiring committees actually meet my expectations for “good philosophers”. Reality may disappoint.)

A second effect of the publicity might be that everyone would be less inclined to write scathingly negative reviews, for fear of making enemies. But that’s probably a good thing. Scathing negative reports are often stupid, and would benefit from having the writers be careful of their reputations. It should always be possible to write an appropriately negative review in such a way as to cause no embarrassment from having one’s name attached to it.

Alternatively, the software might offer some way to anonymize one’s review (subject to checks to ensure that one isn’t abusing anonymity to hide a conflict of interests). Different evaluative filters might then vary in how much weight they give to anonymous vs. named reviews.

By this I mean a “descriptive” form of pluralism, i.e. about candidate standards. You don’t have to think the standards are all equal; but you should probably expect other philosophers to disagree with your philosophical values. So I think it’s appropriate to have a plurality of candidate standards available, from which we can argue about which is actually best, rather than pretending that our current measure is actually reliably measuring anything in particular, let alone any shared conception of philosophical merit. (Maybe it generates a shared sense of social status or prestige, which we all then value. But I take that to be a bad thing. It would be better for different subgroups to esteem different philosophers, who better merit it by the locally accepted standards. And for all this to be more transparent.)

If we want to reduce the pressure on grad students and the tenuously employed, they could be awarded a limited number of free credits each year, allowing them to submit more and review less. Conversely, the price per submission for senior faculty could increase, reflecting expectations that tenured faculty should shoulder more of the reviewing “burden”.

8 Comments

Michael

Feb 12·edited Feb 12Liked by Richard Y Chappell

I think you can look to computer science and especially machine learning research to see what "publish then filter" looks like in practice in a huge and well-funded area of science.

To give a concrete recent example: there is a new paper giving a technique called "Mamba" which is an alternative to the self-attention mechanisms in transformer neural networks. It's a highly-regarded piece of work that already has a lot of hype and publicity. As far as anyone can tell, it seems to have just been *rejected* from a top computer science publication venue. But the paper and peer reviews are all public, so people are free to argue that this was bad, and it has not stopped the technique from being influential (it has 43 citations in the 2 months so far since preprint) and gathering followup work before even being formally published.

https://openreview.net/forum?id=AL1fq05o7H

The system generally works very okay. Peer review in these fields is widely regarded as low quality and unreliable, but I believe this freewheeling culture, complemented by open source software, is a big reason why progress in machine learning has still been so rapid in the last decade.

On the other hand: it definitely does not disincentivize low quality papers. There is a lot of dreck.

And even when you have important, high-quality research, the quality of the actual *paper*, as a written product explaining & arguing, is observably much lower in these fields than in others with the traditional journal system. I think this is probably a good tradeoff for scientific fields where the paper is just a description of the real contribution. I think it might be bad for philosophy, where to some extent the argument itself is the contribution.

Expand full comment

3 replies by Richard Y Chappell and others