Issue 012 · live automationdiary.com
RSS feed 27 · 05 · 2026

I tested 4 AI note systems for a year. I kept one.

A practitioner's one-year teardown of AI-augmented note systems for client work — and the decision framework I'd give my past self.

Every “best AI note-taking app” article has the same problem: it was written by someone who used each tool for an afternoon. An afternoon tells you about the onboarding flow. It tells you nothing about whether the system still earns its place in your week when you’re tired, behind, and prepping for a client call in twenty minutes. That second test is the only one that matters, and it takes months, not afternoons.

So this isn’t a roundup. Over the past year I ran four AI-augmented note systems through real consulting work — client meetings, research, deliverable prep, the lot — and at the end I deliberately collapsed back to one. What follows is which one, why, and the decision framework underneath it, because the framework will outlast every specific tool named here.

I’ll name the framework up front so you can read the rest through it. A note system for professional work has to win on four things, in this order of importance: retrieval, trust, friction, and compounding. Most tools optimise the wrong one.

The four contenders

I won’t pretend these were the only four in existence — they were the four that were plausibly good enough to live with. Each got at least two months as my primary system, used for actual billable work, not test notes.

  • A dedicated AI meeting-notes tool — the kind that joins your calls, transcribes, and auto-summarises.
  • Notion with AI — the all-in-one workspace, with its AI layer doing summarisation and Q&A over my pages.
  • Obsidian plus an AI plugin — local markdown files, with a model wired in for synthesis on demand.
  • A general AI assistant + a plain folder of text files — the deliberately low-tech option, included as a control.

Each one is genuinely good at something. That’s exactly why the decision is hard, and why “which is best” is the wrong question. The right question is “best at what, for whom, measured over how long” — which is the framework.

Criterion one: retrieval (the one everyone underweights)

Here is the test that separated the field, and almost nobody applies it: six months after you take a note, can you get the insight back out?

Capture is easy. Every tool captures. The entire value of a note system for a senior professional is what happens half a year later, when a problem you half-remember solving for a former client resurfaces with a new one. If the system can surface that prior thinking in seconds, it just made you look like you have twenty years of pattern recognition on tap. If it can’t, you have an expensive archive of things you’ll never read again.

The dedicated meeting-notes tool failed this hardest, which surprised me, because its capture was the best of the four. Transcripts piled up beautifully and then sat in a silo, organised by meeting date — the one dimension I never search by. I search by theme, by problem, by client situation. The tool that was best at recording was worst at remembering.

Obsidian won this decisively, for an unglamorous reason: plain markdown plus links plus local search means everything I’ve ever written is one query away, and the AI layer reasons over my own past synthesis rather than over raw transcripts. Retrieval isn’t a feature it bolts on — it’s the substrate.

Criterion two: trust (or: can you stake a client relationship on it?)

The second filter is brutal and simple. Would you put this system’s output in front of a client without re-checking every line? For me the honest answer, across all four, was no — and that answer reshaped how I use all of them.

But there are degrees of no. The auto-summarising tools — the meeting-notes app and Notion AI — produced clean, confident, plausible summaries that were subtly wrong often enough that I couldn’t trust any of them without a full re-read. And a summary you have to fully re-read against the source hasn’t saved you the work; it’s added a verification step. That’s worse than no summary.

The systems where I stayed closer to the raw material — Obsidian and the plain-files control — were paradoxically more trustworthy, precisely because they did less unsupervised interpretation. The model helped me think; it didn’t quietly think for me and present the result as fact. For client-facing work, where being confidently wrong is a reputational event, that distinction is everything.

The most dangerous AI output isn’t the one that’s obviously wrong. It’s the one that’s clean, confident, and wrong in the third sentence — because that’s the one you forward without reading.

Criterion three: friction (the silent killer)

Friction is what kills systems that win on paper. A tool can be powerful and lose simply because using it correctly requires more discipline than you reliably have on a bad day.

Notion AI is the cautionary tale here. On paper it’s the strongest all-rounder — capture, organise, summarise, and query, all in one place. In practice, the all-in-one nature meant that keeping it useful required keeping it tidy, and a busy quarter is exactly when tidiness collapses. The system degraded precisely when I needed it most. Power that depends on your good behaviour isn’t power you can count on.

The plain-files control had the opposite profile: almost no power, almost no friction. It never broke because there was nothing to break. It lost on other criteria, but it taught me the real lesson — the best system isn’t the most capable one, it’s the most capable one you’ll still use when you’re underwater. That’s a much lower ceiling than the marketing implies, and a much more useful one.

Criterion four: compounding (does year two beat year one?)

The last criterion is the one that justifies the whole exercise. Does the system get more valuable the longer you use it, or does it just accumulate?

Accumulation is not compounding. The meeting-notes tool accumulated — more transcripts, no more insight. Notion accumulated until the accumulation itself became the maintenance burden. Compounding means each new note makes the existing ones more useful, usually through connection: this links to that, this pattern echoes that one, and the web of links becomes a thinking instrument in its own right.

Only the Obsidian setup genuinely compounded, because linking is native and the AI reasons across the links. By month ten it wasn’t a notes app. It was an externalised second brain that occasionally told me something about my own past work I’d forgotten I knew. Nothing else came close, and compounding is the criterion that, over a career, dwarfs all the others.

The verdict, and the framework that produced it

I kept Obsidian with an AI plugin. Not because it has the best AI — it doesn’t — but because it won the two criteria that compound over time (retrieval and compounding) while staying honest on trust and low on friction. The tools with the flashiest AI lost on the things that actually matter at the one-year mark.

But the tool is not the takeaway. The framework is. When you evaluate any AI note system — including ones that didn’t exist when I wrote this — rank them in this order:

  1. Retrieval — can you get insight out six months later, by the dimension you actually search?
  2. Trust — can output go to a client without a full re-check? (If no, does it at least keep you close to the source?)
  3. Friction — will you still use it correctly on your worst week?
  4. Compounding — does note 500 make note 1 more valuable?

Most reviews rank these in exactly the reverse order — leading with AI features (a friction-and-novelty story) and never mentioning retrieval or compounding at all. That’s why most reviews are useless for professional work. They’re scoring the test drive, not the ten-year ownership.

The honest summary: the best AI note system for client work is the boring one that remembers, doesn’t lie, survives your worst week, and gets smarter with age. The AI is the least interesting part of that sentence — which is the most useful thing I learned all year.


Next in Tool Decisions → My entire consulting AI stack, and what each one costs me.

Companion template · from this article

The Prospect Intake + Tension Brief

The exact Obsidian template and the full prompt chain — the structured intake note, the contrast prompt, and the provocation checklist. Drop it into your vault and run your next prospect through it in forty minutes.

Disclosure: Some links on this site are affiliate links — if you buy through them I may earn a commission, at no cost to you. I only recommend tools I actually run, and I tell you when something I tried didn't make the cut. That's the whole promise here.
Next → Why I stopped using AI for the part everyone uses it for

One real workflow, in your inbox, when it's ready.

No schedule. No filler. No sponsored placements. A new build goes out only when it's worth your ten minutes — usually every 2–4 weeks.

"I'll only send you something I would have written even if nobody paid me to. The day that stops, the newsletter stops."

Email
Role Consultant ▾
Cadence Irregular · only when filed
1,247Subscribers
62%Open rate
0Sponsored issues