[an error occurred while processing this directive] [an error occurred while processing this directive]

Please see the technical report for information of eye movements, experimental setup, baseline methods, and references:

Jarkko Salojärvi, Kai Puolamäki, Jaana Simola, Lauri Kovanen, Ilpo Kojo, Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Helsinki University of Technology, Publications in Computer and Information Science, Report A82. 3 March 2005. [PDF]

Frequently Asked Questions

1. Why is the pupil diameter strangely large in some data files?

In some measurement runs the equipment reported that the pupil diameter of the test subject was around 17 mm, which can be seen in the data files of Competition 2. The reported pupil diameter is obviously too large: the diameter of the human pupil is at largest about 6 mm when the eye is fully adapted to the dark. In a lit room the pupil diameter is smaller, a typically of the order of 3 mm.

The reason for the anomalous pupil diameter measurement is unknown. However, the pupil diameter measurements where the pupil diameter <6 mm should be reliable. The test set has no anomalous pupil readings.

2. How is the "relevance" defined?

In general it is hard to define relevance. That is why we have constructed an experimental setting where it is known a priori.

In the Challenge we have an experimental setup, where the test subject is first shown a question, followed by ten sentences. Five of the sentences are "relevant" to the question (they are of the same topic as the question) and five of the sentences are irrelevant (they have no relation to the topic of the question). One of the relevant sentences is the correct answer to the question. The experimental setup is designed to resemble a real-life information retrieval scenario as closely possible while at the same time retaining a controlled setup where the ground truth is known.

Thus, in the Challenge the meaning of "relevant" is defined in terms of this experimental setup. The objective of the Challenge is to find the best methods and features that can be used to predict the relevance from the eye movement measurements.

3. What are those lines in the data files about?

The following text describes Competition 1 and the training data set (c1_train.dat) more in detail. The datasets and their descriptions can be downloaded from the datasets page.

The data set consists of 10,936 rows (lines). Each row corresponds to a viewing of a word in an assignment.

The measurements consisted of 336 assignments. Each assignment contains 10 sentences (titles). Each sentence has a (known) relevance (0, 1 or 2). Each sentence contains many words. When the test subject viewed the assignment we recorded the eye movement trajectory and identified the viewed words and appended a corresponding row to the training data set file. The rows contain information on assignment number (1-336, column 2), sentence (title) number (1-10, column 26) and word number (1-number of words in the particular sentence, column 27). Each line also contains the classification label (relevance, 0-2, column 28) associated with each sentence (title).

You could in principle take all rows that correspond to a particular assignment number, and use the sentence and word numbers to plot a trajectory of the eye movements. The following simple awk script would print this trajectory:

awk '{print "assignment =",$2,"; sentence =",$26, \
"; word =",$27,"of",$25,"; label =",$28,"."}' < c1_train.dat

Take for example assignment 123:

assignment = 123 ; sentence = 4 ; word = 2 of 5 ; label = 1 .
assignment = 123 ; sentence = 2 ; word = 1 of 3 ; label = 0 .
assignment = 123 ; sentence = 1 ; word = 1 of 5 ; label = 1 .
assignment = 123 ; sentence = 3 ; word = 1 of 5 ; label = 1 .
assignment = 123 ; sentence = 2 ; word = 1 of 3 ; label = 0 .
assignment = 123 ; sentence = 4 ; word = 3 of 5 ; label = 1 .
assignment = 123 ; sentence = 4 ; word = 2 of 5 ; label = 1 .
assignment = 123 ; sentence = 4 ; word = 1 of 5 ; label = 1 .
assignment = 123 ; sentence = 4 ; word = 3 of 5 ; label = 1 .
assignment = 123 ; sentence = 5 ; word = 4 of 4 ; label = 0 .
assignment = 123 ; sentence = 4 ; word = 4 of 5 ; label = 1 .
assignment = 123 ; sentence = 3 ; word = 4 of 5 ; label = 1 .
assignment = 123 ; sentence = 3 ; word = 1 of 5 ; label = 1 .
assignment = 123 ; sentence = 5 ; word = 1 of 4 ; label = 0 .
assignment = 123 ; sentence = 6 ; word = 1 of 3 ; label = 0 .
assignment = 123 ; sentence = 5 ; word = 1 of 4 ; label = 0 .
assignment = 123 ; sentence = 5 ; word = 2 of 4 ; label = 0 .
assignment = 123 ; sentence = 5 ; word = 3 of 4 ; label = 0 .
assignment = 123 ; sentence = 8 ; word = 1 of 3 ; label = 0 .
assignment = 123 ; sentence = 7 ; word = 1 of 3 ; label = 0 .
assignment = 123 ; sentence = 9 ; word = 1 of 5 ; label = 1 .
assignment = 123 ; sentence = 10 ; word = 1 of 6 ; label = 2 .
assignment = 123 ; sentence = 9 ; word = 1 of 5 ; label = 1 .

In assignment 123 the test subject first read first words of sentences 1-4. Then he read sentence 4 more in detail. Then he browsed down, returning to sentence 5 for more careful reading. In assignment 123 the correct answer (label = 2) was the last sentence (sentence = 10). Notice that in this case it was enough for the test subject to read only the first word of the sentence containing the correct answer.

You will notice that the test subjects typically did not view all words or sentences. On the other hand, some words or sentences were viewed many times.

Your task is to find the classification labels for the test set that correspond to the sentences (the number of sentences is the number of assignments multiplied by ten - each assignment has 10 sentences).

[an error occurred while processing this directive]