Open, reliable, and useful evaluation

Open evaluation (and rating)

Traditional peer review is a closed process, with reviewers' and editors' comments and recommendations hidden from the public.

In contrast, a (along with authors' responses and evaluation manager summaries) are made public and easily accessible. We give each of these a separate DOI and work to make sure each enters the literature and bibliometric databases. We aim further to curate these, making it easy to see the evaluators' comments in the context of the research project (e.g., with sidebar/hover annotation).

Open evaluation is more useful:

  • to other researchers and students (especially those early in their careers). Seeing the dialogue helps them digest the research itself and understand its relationship to the wider field. It helps them understand the strengths and weaknesses of the methods and approaches used, and how much agreement there is over these choices. It gives an inside perspective on how evaluation works.

  • to people using the research, providing further perspectives on its value, strengths and weaknesses, implications, and applications.

Publicly posting evaluations and responses may also lead to higher quality and more reliability. Evaluators can choose whether or not they wish to remain anonymous; there are pros and cons to each choice, but in either case, the fact that all the content is public may encourage evaluators to more fully and transparently express their reasoning and justifications. (And where they fail to do so, readers of the evaluation can take this into account.)

The fact that we are asking for evaluations and ratings of all the projects in our system—and not using "accept/reject"—should also drive more careful and comprehensive evaluation and feedback. At a traditional top-ranked journal, a reviewer may limit themselves to a few vague comments implying that the paper is "not interesting or strong enough to merit publication." This would not make sense within the context of The Unjournal.

More reliable, precise, and useful metrics

We do not "accept or reject" papers; we are evaluating research, not "publishing" it. But then, how do other researchers and students know whether the research is worth reading? How can policymakers know whether to trust it? How can it help a researcher advance their career? How can grantmakers and organizations know whether to fund more of this research?

As an alternative to the traditional measure of worth—asking, "what tier did a paper get published in?"—The Unjournal provides metrics: We ask evaluators to provide a specific set of ratings and predictions about aspects of the research, as well as aggregate measures. We make these public. We aim to synthesize and analyze these ratings in useful ways, as well as make this quantitative data accessible to meta-science researchers, meta-analysts, and tool builders.

Feel free to check out our ratings metrics and prediction metrics (these are our pilot metrics, we aim to refine these).

These metrics are separated into different categories designed to help researchers, readers, and users understand things like:

  • How much can one believe the results stated by the authors (and why)?

  • How relevant are these results for particular real-world choices and considerations?

  • Is the paper written in a way that is clear and readable?

  • How much does it advance our current knowledge?

We also request overall ratings and predictions . . . of the credibility, importance, and usefulness of the work, and to help benchmark these evaluations to each other and to the current "journal tier" system.

However, even here, the Unjournal metrics are also precise in a sense that "journal publication tiers" are not. There is no agreed-upon metric of exactly how journals rank (e.g., within economics' "top-5" or "top field journals"). More importantly, there is no clear measure of the relative quality and trustworthiness of the paper within particular journals.

In addition, there are issues of lobbying, career concerns, and timing, discussed elsewhere, which make the "tiers" system less reliable. An outsider doesn't know, for example:

  • Was a paper published in a top journal because of a special relationship and connections? Was an editor trying to push a particular agenda?

  • Was it published in a lower-ranked journal because the author needed to get some points quickly to fill their CV for an upcoming tenure decision?

In contrast, The Unjournal requires evaluators to give specific, precise, quantified ratings and predictions (along with an explicit metric of the evaluator's uncertainty over these appraisals).

Of course, our systems will not solve all problems associated with reviews and evaluations: power dynamics, human weaknesses, and limited resources will remain. But we hope our approach moves in the right direction.

Better feedback

See also Mapping evaluation workflow.

Faster (public) evaluation

We want to reduce the time between when research is done (and a paper or other research format is released) and when other people (academics, policymakers, journalists, etc.) have a credible measure of "how much to believe the results" and "how useful this research is."

Here's how The Unjournal can do this.

  1. Early evaluation: We will evaluate potentially impactful research soon after it is released (as a working paper, preprint, etc.). We will encourage authors to submit their work for our evaluation, and we will directly commission the evaluation of work from the highest-prestige authors.

  2. We will pay evaluators with further incentives for timeliness (as well as carefulness, thoroughness, communication, and insight). Evidence suggests that these incentives for promptness and other qualities are likely to work.

  3. Public evaluations and ratings: Rather than waiting years to see "what tier journal a paper lands in," the public can simply consult The Unjournal to find credible evaluations and ratings.

Can The Unjournal "do feedback to authors better" than traditional journals?

Maybe we can?

  • We pay evaluators.

  • The evaluations are public, and some sign their evaluations.

    • → Evaluators may be more motivated to be careful and complete.

On the other hand . . .

  • For public evaluations, people might defer to being overly careful.

  • At standard journals, referees do want to impress editors, and often (but not always) leave very detailed comments and suggestions.

Last updated