Benchmarks are particularly useful for characterizing algorithms and determining their usefulness in different settings. Here I highlight the need for more standardization of performance evaluations in inverse reinforcement learning.