首页 > itarticle > wsta 20 – evaluation and re-ranking

wsta 20 – evaluation and re-ranking

admin 11月 13, 2020 0

Hard to characterise the quality of a system’s results:

a subjective problem
query is not the information need

human judgements: too expensive and slow

Automatic evaluation

Simplify assumption:
- retrieval is ad-hoc (no prior knowledge of the user)
- effectiveness based on relevance
  - relevant or irrelevant: binary or multiple grades
  - Relevance of docs are independent
Test collections:
- Relevance judgements (qrels)
- But not all docs have _qrels_ (big collection)
Relevance vector $R <1,0,0,0,1ldots>$ how to map it to a number? -> precision & recall (hard)
- Precision @ k
- Average precision
- Mean Average Precision (MAP)

RANK-BIASED PRECISION

RBP Formula

$RBP=(1-p)timesSigma^{d}_{i=1}r_itimes p^{i-1}$

Patient user: p = 0.95; Inpatient user: p = 0.50

EFFECTIVENESS IN PRACTICE:

Also look at query logs and click logs
Construct (learn) a similarity metric automatically from training data (queries, click data, documents)
Machine learning

Learning to rank

Training data $$: learn to combine “features representing” $x=$ to predict $r_i$

LEARNING TO RANK OBJECTIVES:

POINT-WISE OBJECTIVE
- Ask the user how relevant is $d_i$
Pair-wise objective (Given two docs)
- Ask the user: Which of these two documents is more relevant?
List-wise objective
- List-wise objective (Output is a ranked lists)
- Ask the user: Rearrange this list