Extremely high logits
#9
by
Thomas2419
- opened
Hello I've found this model to have extremely high logits, and loss on new tasks because of that fact into the millions compares to Bert base, roberta, deberta, and other models I tested identically to mobile bert. Is this an intentional facet of mobilebert? It seems to render finetuning new heads onto the frozen model impossible due to instability?