Abstract : Automatically detecting cover songs imply being robust to several kinds of musical modulations. Timbral variance can be accounted at the feature level, but key and most importantly tempo variations have to be dealt with at the retrieval stage. For that purpose, most state of the art approaches consider exhaustive search based on song to song matching methods that fail at scaling up. In this paper, we introduce a hybrid technique. It first retrieves the approximate neighbors of each query chroma descriptor. In a second stage, the temporal consistency is exploited to further filter out some matches, thereby filtering irrelevant songs. Our method performs a search in a dataset comprising 80 songs in about 1s, while achieving satisfactory accuracy compared to the best performing techniques of the state of the art.