Abstract : Bag-of-feature (BoF) models currently achieve state-of-the-art performance for action recognition. While such models do not explicitly account for people in video, person localization combined with BoF is expected to give further improvement for action recognition. The purpose of this paper is to validate this assumption and to quantify the improvements in action recognition expected from current and future person detectors. Given locations of people in video, we find that---somewhat surprisingly---background suppression leads only to a limited gain in performance. This holds for actions in both simple and complex scenes. On the other hand, we show how spatial locations of people enable to incorporate strong geometrical constraints in BoF models and in this way to improve the accuracy of action recognition in some cases. Our conclusions are validated with extensive experiments on three datasets with varying complexity, basic KTH, realistic UCF Sports and challenging Hollywood.