Which model became your default over the year?

None — that was the point. The whole experiment was to NOT have a default. Twelve months in, the question "which model should I use for this draft?" feels more answerable than it did at the start, and the answer changes more often than I expected.

Did the writing get better?

Hard to say. It got faster. The drafts that came out of multi-model workflows were more measured and less repetitive than single-model drafts, but the improvements are subtle. The biggest change wasn't in the writing — it was in how much of the thinking happened before any model touched it.

What was the biggest surprise?

How much model differences vary by genre. The same writer using the same prompt template produces noticeably different work with Claude vs GPT vs Gemini. I'd expected differences in code or research; I hadn't expected them in essay drafts.

What I learned writing with three AI models for a year

For the past twelve months I've written with three flagship AI models — Claude 4.7, GPT-5, and Gemini 2.5 Pro — switching per question instead of defaulting to one. This is the year-end note: what worked, what didn't, what changed, and the patterns I'm taking into year two of the practice. None of this is a controlled experiment; treat it as one writer's notes, not as data.

What stayed true across all twelve months

The first draft is where the thinking happens. Models are too helpful at first-draft writing. Letting them produce the draft means the thinking happened in their tokens, not in mine. I wrote my way out of this pattern early and kept writing first drafts myself for the rest of the year.

The right model varies by question more than by writer. I started the year thinking I had a "right model for me". I ended it thinking I had a right model for each question type. Long-doc analysis → Claude. Marketing copy → GPT-5. Vision-text extraction → Gemini. Argument essays → Claude. Punchy headlines → GPT-5. This stayed consistent across model versions; the brands trade quarterly, the category-level assignments did not.

Branching changed how I revised. Once I had a tool that let me branch, I stopped seeing revision as "find the right wording and commit". I started seeing it as "try two wordings and compare". The reversion-to-the-mean fear that kept me from trying alternate phrasings shrank when both phrasings could live side by side.

What didn't last

Persona-loaded system prompts. "You are a senior editor at the New Yorker with twenty years of experience." Did nothing. I stopped using personas by month three and the writing didn't change.

Long instruction lists. I had a 1200-token system prompt at the start of the year. It contained edge cases I never actually hit. I cut it to ~250 tokens by month nine and the writing didn't change in any way I could detect.

Defaulting to one model for "general" use. I tried this twice — first with Claude for two months, then with GPT-5 for one. Both times I noticed I was using the wrong model for several tasks per week. Now I default to nothing; the per-message picker is my actual default.

What I added

The reverse outline as a default revision step. Started using it around month six. It catches problems I'd otherwise ship.

Voice examples in the system prompt instead of voice descriptions. Changed everything for getting AI rewrites that sound like me.

A list of phrases the AI tries to insert that I almost always cut. "Imagine if you could..." "in today's fast-paced world" — the same words come back regardless of model. Having them in my system prompt as anti-patterns saved me revision time.

The model-by-model notes

Claude 4.7: Best for long-form essay work, including this kind of meta-reflection. Restraint is the dominant trait — Claude rarely tries to upsell you a stronger metaphor than you wrote. This is exactly what you want most of the time and exactly what you don't want when you need a punchier draft.

GPT-5: Best when I needed an injection of energy. GPT-5 will push you toward a stronger verb, a more declarative sentence, a punchier opener. Often too much; with restraint constraints in the system prompt, very useful for marketing-adjacent writing.

Gemini 2.5 Pro: Less reach for essay work than the other two. Where it shone was research with vision (screenshots, charts, scanned pages) — and I underused it in writing-specific contexts because of that initial impression.

What I'm doing differently in year two

Less "ask the AI"; more "draft, then ask the AI". The asking-after-drafting habit produced better writing more often. The asking-before habit produced work I had to fight to make my own.

Continuing the multi-model practice with one shared instruction set. (See the two-minute setup.)

Adding voice work to the mix — voice-to-text drafts that then get refined in chat. The friction reduction is meaningful for capturing thoughts that disappear if I have to type them.

Closing

The piece this most resembles is a yearly running-log post — not a conclusion, more an inventory. AI writing tools shifted my practice in ways that are still settling. The patterns above are what's held up for me. They may not hold up for you; writing practices are personal. But the meta-pattern — "switch per question, keep your judgment in the loop, write the first draft yourself" — has been load-bearing for the whole year and I'd recommend trying it before committing to anything else.

The thinking is yours, the models do the typing. Year one taught me the rule. Year two is about getting better at applying it.

More writing-and-AI essays in Essays. Try oran.chat free if you want to do the multi-model thing without three subscriptions.