Abstract : A key issue with Worst-Case Execution Time (WCET) analyses is the evaluation of the tightness and soundness of the results produced. In the absence of a ground truth, i.e. the Actual WCET (AWCET), such evaluations rely on comparison between di↵erent estimates or observed values. In this paper, we introduce a framework for the evaluation of measurement-based timing analyses. This framework uses abstract models of synthetic tasks to provide realisticexecution time data as input to the analyses, while ensuring that a corresponding AWCET can be computed. The effectiveness of the framework is demonstrated by evaluating the impact of imperfect structural coverage on an existing measurement-based probabilistic timing analysis.