Language fashions can provide an explanation for neurons in language fashions



We use GPT-4 to robotically write explanations for the habits of neurons in huge language fashions and to attain the ones explanations. We liberate a dataset of those (imperfect) explanations and ratings for each and every neuron in GPT-2.


Leave a Comment

Your email address will not be published. Required fields are marked *