Abstract : We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of ε-optimal goal-conditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s 0. In this paper, we introduce a novel modelbased approach that interleaves discovering new states from s 0 and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as O(L 5 S L+ε Γ L+ε A ε −2), where A is the number of actions, S L+ε is the number of states that are incrementally reachable from s 0 in L + ε steps, and Γ L+ε is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both ε and L at the cost of an extra Γ L+ε factor, which is small in most environments of interest. Furthermore, DISCO is the first algorithm that can return an ε/c min-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost c min. Finally, we report preliminary empirical results confirming our theoretical findings.