On the correctness of electronic documents: studying, finding, and localizing inconsistency bugs in PDF readers and files (ESEC/FSE 2018 - Journal-First)

Sun 4 - Fri 9 November 2018 Lake Buena Vista, Florida, United States

Who

Tomasz Kuchta, Thibaud Lutellier, Edmund Wong, Lin Tan, Cristian Cadar

Track

ESEC/FSE 2018 Journal-First

Time Zone

The program is currently displayed in (GMT-05:00) Guadalajara, Mexico City, Monterrey.

Use conference time zone: (GMT-05:00) Guadalajara, Mexico City, MonterreySelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 8 Nov 2018 15:30 - 15:52 at Horizons 5 - Testing II Chair(s): Tevfik Bultan

Abstract

Electronic documents are widely used to store and share information such as bank statements, contracts, articles, maps and tax information. Many different applications exist for displaying a given electronic document, and users rightfully assume that documents will be rendered similarly independently of the application used. However, this is not always the case, and these inconsistencies, regardless of their causes—bugs in the application or the file itself—can become critical sources of miscommunication. In this paper, we present a study on the correctness of PDF documents and readers. We start by manually investigating a large number of real-world PDF documents to understand the frequency and characteristics of cross-reader inconsistencies, and find that such inconsistencies are common—13.5% PDF files are inconsistently rendered by at least one popular reader. We then propose an approach to detect and localize the source of such inconsistencies automatically. We evaluate our automatic approach on a large corpus of over 230 K documents using 11 popular readers and our experiments have detected 30 unique bugs in these readers and files. We also reported 33 bugs, some of which have already been confirmed or fixed by developers.

DOI

https://doi.org/10.1007/s10664-018-9600-2

Tomasz Kuchta

Thibaud Lutellier

Edmund Wong

Lin Tan

University of Waterloo

Canada

Cristian Cadar

Imperial College London

United Kingdom