欧博abgGoogle AI Developers Forum_欧博ABG-会员注册-官网网址

I’ve thrown roughly 4000 comparable requests at the various Gemini models and found that every Gemini model since Gemini-1.5-pro-exp-0827 has been terrible at outputting a structured output consistently, and does not following instructions well. Terrible at timestamps too.

The following could provide structured output consistently - without fail

Gemini-1.5-pro-001

Gemini-1.5-pro-exp-0827 (since deprecated)

Every single ones of the following models was/is unrealiable:

Gemini-exp-1121

Gemini-exp-1206

Gemini-2.0-flash-exp

Gemini 2.0-flash-exp-thinking-1219

Gemini 2.0-flash-exp-thinking-0121

Gemini 2.0-pro-exp-0205

The “thinking” models have improved the output, but I’m still regularly getting really poor performance.

E.g. If I ask for a really simple output with XML tags. It will output 20 records correctly, e.g.

Topic title </topictitle
00:00-01:23

And then it will randomly alter the closing XML tags on, for example, the 21st record, e.g.:

Topic title - XML closing tag misspelled
00:00-01:23 - XML closing tag mismatch

I have:

given the model examples of what is correct

given the model specific examples of what is not correct (including the exact mistakes it currently outputs.)

Changed the way I’ve worded the prompt / Reiterated / Asked the model to think step by step (where it will tell me it will check for common mistakes but still outputs them)

Probably time for me to move on, I’ve given Google enough loyalty - I doubt there’s anyone reading the feedback or responding to it anyway.

(责任编辑：)

搜索

热门标签:

欧博abgGoogle AI Developers Forum