|
I’ve thrown roughly 4000 comparable requests at the various Gemini models and found that every Gemini model since Gemini-1.5-pro-exp-0827 has been terrible at outputting a structured output consistently, and does not following instructions well. Terrible at timestamps too. The following could provide structured output consistently - without fail Gemini-1.5-pro-001 Gemini-1.5-pro-exp-0827 (since deprecated) Every single ones of the following models was/is unrealiable: Gemini-exp-1121 Gemini-exp-1206 Gemini-2.0-flash-exp Gemini 2.0-flash-exp-thinking-1219 Gemini 2.0-flash-exp-thinking-0121 Gemini 2.0-pro-exp-0205 The “thinking” models have improved the output, but I’m still regularly getting really poor performance. E.g. If I ask for a really simple output with XML tags. It will output 20 records correctly, e.g. Topic title </topictitle And then it will randomly alter the closing XML tags on, for example, the 21st record, e.g.: Topic title - XML closing tag misspelled I have: given the model examples of what is correct given the model specific examples of what is not correct (including the exact mistakes it currently outputs.) Changed the way I’ve worded the prompt / Reiterated / Asked the model to think step by step (where it will tell me it will check for common mistakes but still outputs them) Probably time for me to move on, I’ve given Google enough loyalty - I doubt there’s anyone reading the feedback or responding to it anyway. (责任编辑:) |
