Changes for training entropy model and correcting attention in local models (#25) 6ffeb66 unverified par-meta commited on Jan 17