Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
Producing this type of checkpoint-agnostic code means if your code works for one checkpoint, it will work with another checkpoint - as long as it was trained for a similar task - even if the architecture is different.