lognote.

the fine print

Everything lognote does, in one place.

The homepage covers the headline. This page covers the rest: every setting, every recovery path, every silent piece of machinery that makes the product work. Read it cover-to-cover or jump to a section.

basics

The basics

Recording, the status pill, what lands in your note.

Starting and stopping a recording

The Obsidian plugin gives you a mic ribbon icon in the left sidebar (click to toggle), plus three command-palette entries: Lognote: Start recording, Lognote: Stop recording, and Lognote: Toggle recording. Assign keyboard shortcuts under Settings → Hotkeys → search “lognote” — there’s no default chord because we don’t want to step on whatever you’ve already bound.

Outside Obsidian, lognote-record-start <path-to-note.md> and lognote-record-stop (installed in ~/.local/bin/ by setup.sh) do the same thing. The PATH wrappers are how you’d wire lognote into any editor that runs shell commands, or just record from a terminal without an editor at all.

Status bar pill

While a recording is active, the plugin shows ● recording 02:14 in Obsidian’s bottom-right status bar. Click it to stop. The pill polls $LOGNOTE_STATE_DIR/recording-state every second and verifies the capture process is still alive via process.kill(pid, 0), so if the capture binary crashes or you kill it from a shell, the pill clears within a second — it never lies about state.

The pill only shows while Obsidian is in the foreground. For a global indicator that’s visible from anywhere, see the menu-bar-helper entry.

What lands in your note

When a recording finishes, lognote replaces the pending marker in your note with two things: a summary block (TL;DR, action items, decisions, open questions, topics discussed, notable quotes — sections are omitted, never left empty) and a <details>-wrapped full transcript that renders as click-to-expand. Below both, a small footer (_Summarized by **provider** (model)_) records which backend produced the summary, so you can spot-check what auto-mode picked.

If summarization is disabled (SUMMARIZE_ENABLED=0), the transcript still lands as a plain section with no summary block above it — see the skip-summaries entry.

If summarization is enabled but fails (bad API key, provider unreachable, model returns nothing), the transcript still lands wrapped in the same <details> block, but the summary section is replaced with a visible ⚠️ failure block showing the provider, error class, and a pointer to the retry command. The <audio>.summary.failed sidecar is written alongside so the plugin’s scan and retry commands can find it later. See retry-failed-summary for how to recover.

me / others speaker labels

The two-track recording structure gives you free 2-label diarization: anything captured through the mic is tagged me, anything captured from system audio (Zoom, Teams, browser, etc.) is tagged others. Consecutive segments from the same speaker get grouped under one block in the transcript, with timestamps in _[mm:ss]_ format.

This is a v1 limitation worth knowing: every remote participant in a Zoom call gets pooled under others. Per-individual labeling (separating remote speakers from each other) is on the roadmap, not in the product today. If you need to attribute a specific quote to a specific person, the timestamp + your memory is the current workflow.

Marker-based insertion

When you start recording, lognote writes a unique <!-- TRANSCRIPT_PENDING_<ts>_<pid> --> HTML comment at your cursor. The comment is invisible in preview but the post-transcribe script uses it to find the exact insertion point — so you can scroll, edit, navigate to other notes, even quit Obsidian, and the transcript still lands where you originally hit record.

If you’ve deleted the marker by the time transcription finishes, the transcript appends to the end of the note instead. If you’ve deleted the note itself, it falls back to inbox/ — see the inbox-fallback entry.

Frontmatter and metadata

For live recordings, lognote doesn’t touch your note’s frontmatter — it inserts at the marker and leaves everything above and below alone. The transcript block opens with a ## 🎙️ Transcript heading and an _Audio: [[<absolute-path-to-m4a>]]_ wikilink so you can click through to the source audio at any time. The summary footer records the provider and model name.

Notes produced by lognote-import and lognote-resplit are different: they create new notes from scratch and write YAML frontmatter at the top (title, source / source_audio, source_range, imported_at, generated_by) so you can tell at a glance where the note came from and which audio it traces back to.

Inbox fallback

If you delete (or rename, or otherwise lose track of) the note between hitting record and transcription finishing, lognote drops the result in $LOGNOTE_NOTES_DIR/inbox/transcript-<timestamp>.md instead of failing the run. A macOS notification fires so you know it landed there rather than where you originally pointed it.

The inbox directory is created on demand, so you don’t need to set it up. If you find yourself with a backlog there, it usually means a workflow problem worth fixing — but the work product is never lost.

settings

Settings & config

Every knob you can turn, and where your secrets live.

Settings tab

Open Settings → lognote inside Obsidian to configure the plugin. The detected fields up top — Lognote repo path and Vault directory — are read-only and resolved automatically (the repo path follows the plugin’s dist symlink; the vault path comes from Obsidian itself). They’re shown for sanity-checking, not editing.

The configurable fields:

  • Provider dropdown — pick the LLM backend (auto, openai, anthropic, azure, ollama, local-mlx). The credential fields below the dropdown change to match what the selected provider needs; values for the unselected providers are kept on disk so switching back doesn’t lose them. See the provider-auto-detection entry for what auto does.
  • MLX model — the HuggingFace repo id for the Whisper transcription model. The default mlx-community/whisper-large-v3-turbo is a good balance of speed and quality.
  • Local model override — pin a specific HF repo for local-mlx summarization, bypassing the auto-switch by transcript length. See the model-override entry.
  • Summarize on/off — turn off to skip the LLM step entirely; transcripts still land.
  • Audio retention days — how long raw .m4a files are kept before opportunistic cleanup. Default 90, 0 disables cleanup.

Everything in this tab maps 1:1 to env vars the bash CLIs honor (LLM_PROVIDER, OPENAI_API_KEY, MLX_MODEL, LOGNOTE_AUDIO_RETENTION_DAYS, etc.), so plugin settings and CLI invocations behave identically. The sensitive bits — API keys — are written outside the vault: the plugin keeps them in ~/.config/lognote/plugin-secrets.json (mode 0600), and setup.sh writes the mirror copy in secrets.sh at the repo root from the interactive prompts. Plugin edits update plugin-secrets.json only (not secrets.sh), so if you change provider creds in Settings → lognote and then invoke the bash CLIs directly (without going through the plugin), re-run ./setup.sh --reconfigure to refresh secrets.sh. Keeping API keys outside the vault is deliberate — Obsidian Sync, iCloud, and git should never see them.

providers

AI providers

Your choice. Local-mlx by default; OpenAI, Anthropic, Azure, Ollama if you want them.

Provider auto-detection

LLM_PROVIDER=auto (the default) probes the available providers in this priority order: OpenAI → Anthropic → Azure → Ollama → local-mlx. The first one that reports itself configured (cloud providers check for their respective API keys; Ollama checks the local endpoint; local-mlx checks that mlx_lm is importable and the default 3B model is cached) gets the job.

The priority order reflects a simple bet: if you’ve paid for a cloud key, you probably want it used; if you’re running Ollama locally, that’s faster than spinning up MLX; local-mlx is the always-available free fallback. You can override the auto-pick at any time by selecting a specific provider in Settings — the dispatcher always honors an explicit choice.

Local MLX (on-device summaries)

The local-mlx provider runs entirely on-device via mlx-lm. No API key, no network round-trip, no cost — and the audio-stays-local promise extends to the summary step too. setup.sh pre-downloads the default 3B model (mlx-community/Llama-3.2-3B-Instruct-4bit, ~2 GB) during install so the first auto-detect hit doesn’t stall on a download. Pass --skip-llm-download if you’d rather defer that.

The model auto-switches by transcript length: under ~3k tokens uses the 3B (already on disk), 3k–10k bumps to Llama-3.1-8B (~5 GB, lazy-downloaded on first need), over 10k bumps to Qwen2.5-14B (~8 GB). The lazy download announces itself on stderr (“Long meeting detected — downloading 5GB summary model…”) so you understand the one-time pause. To skip the auto-switch and pin a specific model, see the model-override entry.

Cloud providers (OpenAI, Anthropic, Azure, Ollama)

If you have a cloud LLM account, lognote will use it. All four backends are BYOK — lognote doesn’t proxy through any service of its own. Put credentials into Settings → lognote (or directly into ~/.config/lognote/env / secrets.sh if you’re CLI-only):

  • OpenAIOPENAI_API_KEY, optional OPENAI_MODEL (default gpt-4o-mini). The general-purpose default; fast and cheap.
  • AnthropicANTHROPIC_API_KEY, optional ANTHROPIC_MODEL (default claude-haiku-4-5). Best summaries we’ve measured, slightly slower.
  • Azure OpenAIAZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT, optional AZURE_OPENAI_API_VERSION (default 2024-10-21). For users on Azure tenants.
  • OllamaOLLAMA_ENDPOINT (default http://localhost:11434), optional OLLAMA_MODEL. Local server, OpenAI-compatible API. If you don’t set OLLAMA_MODEL, lognote picks the first model ollama list returns and logs that choice on stderr.

The system prompt is fixed across providers (in lib/providers/base.py) so the summary shape is the same no matter which backend ran — only the model’s interpretation differs.

Pinning a specific local model

By default, local-mlx picks its model by transcript length (3B / 8B / 14B buckets). To pin a single model regardless of meeting size, set LLM_MODEL_OVERRIDE to any compatible HuggingFace repo id — either through the Local model override field in Settings, or by adding export LLM_MODEL_OVERRIDE=... to ~/.config/lognote/env.

Useful when you’ve benchmarked a specific quant and want consistent output, when you’ve already downloaded a bigger model and want it used for everything, or when you want to test a non-default model family (Mistral, Qwen variants, etc.) without modifying the source.

Skipping summaries entirely

Flip Summarize on/off in Settings (or set SUMMARIZE_ENABLED=0 in ~/.config/lognote/env) and lognote skips the LLM call entirely. The transcript still lands in your note, just without the structured summary block above it and without the collapsed <details> wrapper.

Reasons to do this: you process transcripts elsewhere, you don’t trust LLM summaries for the kind of conversation you’re recording, you’re saving cloud-API cost, or you’re on a machine where local-mlx is too slow and you don’t want the wait. Summarization can be re-enabled at any time; it’s a runtime flag, not a build-time decision.

recovery

When meetings go sideways

Runaway recordings, interrupted fragments, failed summaries, orphan markers, all recoverable.

Resplit a multi-meeting recording

Sometimes you forget to stop recording between meetings and end up with one giant .m4a that spans two or three of them. The Lognote: Split a recording… command opens a modal that proposes candidate split points (detected from long silences in the transcript), lets you adjust them, and produces one note per resulting bucket — each with its own summary, transcript, and frontmatter pointing back to the source audio.

The work runs against the existing transcript JSON sidecar when possible, so a resplit doesn’t re-transcribe. If the sidecar is missing it transcribes once and reuses the result across buckets. From the CLI: bin/lognote-resplit candidates <audio.m4a> proposes splits, bin/lognote-resplit apply <manifest.json> applies them.

Recording joiner

Sometimes a recording gets interrupted — force quit, reboot, power loss — and you end up with a couple of .m4a fragments instead of one file. bin/lognote-join <fragment1> <fragment2> ... stitches them back into one dual-track recording and runs the normal transcribe → summarize → land-in-your-vault pipeline on the result. You don’t lose the meeting; you just get one note instead of three.

Pass --target-note <path> to land the joined transcript into a specific existing note (replacing a TRANSCRIPT_PENDING_JOINED marker, or appending if none); without it, the result goes to inbox/. The Obsidian plugin’s Lognote: Join recordings… command wraps the same engine with a UI for picking fragments visually.

Retry a failed summary

Summarization is treated as best-effort — if the provider is unreachable, the key is wrong, or the model returns nothing, the transcript still gets inserted into your note. The failure is recorded in a <audio>.summary.failed JSON sidecar next to the .m4a, with the error class, error message, target note, and marker id.

bin/lognote-retry-summary --list shows pending failures (one stanza per sidecar so you can eyeball which need credential fixes versus network retries). bin/lognote-retry-summary --run retries each: it regenerates the transcript markdown from the existing .transcript.json sidecar (no re-transcription), runs summarize.py against it, and patches the original note in place — replacing the ⚠️ summary-failed block with a real summary + collapsed-transcript layout. Successful retries delete their sidecar. The Obsidian plugin gives you two commands for the same loop without leaving the editor: Lognote: Show failed summaries (no retry) lists what’s pending, and Lognote: Retry failed summaries runs the CLI from inside Obsidian and surfaces the result via Notice.

Orphan marker recovery

If the transcription pipeline is killed mid-flight (reboot, force-quit, OOM, kill -9), the <!-- TRANSCRIPT_PENDING_<ts>_<pid> --> marker stays embedded in your note with nothing to replace it. bin/lognote-recover scans your vault for those orphans and matches each to the most likely audio file in $LOGNOTE_AUDIO_DIR by comparing the marker’s embedded timestamp to the audio’s filename timestamp, within ±60s tolerance (override via LOGNOTE_RECOVER_TOLERANCE_SEC).

bin/lognote-recover (or --list) shows the matches without doing anything; bin/lognote-recover --run re-invokes the transcription pipeline for each matched orphan. The tool never auto-runs — --run is always explicit. The Obsidian plugin’s Lognote: Scan for orphan recording markers command surfaces the same list.

External transcript import

If someone else recorded the meeting — Zoom’s auto-transcript, Teams, Otter, a copy-pasted plain-text transcript, a VTT/SRT subtitle file — bin/lognote-import <file> runs it through the same format → summarize → land-in-your-vault flow that a live recording would use. Format is auto-detected; override with --format vtt|srt|otter|plain|json if the sniffer misfires (rare, but possible with copy-pasted text that happens to look JSON-ish).

By default the result lands in inbox/. Use --target-note <path> to insert into an existing note (replaces the first <!-- MARKER --> it recognizes, or appends to EOF), or --output <path> to write to an arbitrary destination. --archive-original copies the source file into <vault>/_archive/ with cross-link frontmatter so the original survives even if you later delete the produced note. The Obsidian plugin’s Lognote: Import external transcript… command wraps the same engine.

capture

Capture behaviors to expect

The quiet machinery: silence watcher, clamshell guard, device-switching, disk-full.

Auto-stop on silence

A silence watcher tails the audio capture log and tracks the last “loud” sample on each track. When both the mic and system-audio tracks have been silent for AUTO_STOP_SILENCE_SECONDS (default 300s / 5 min, set to 0 to disable), it fires record-stop and transcription kicks in normally. You get a macOS notification 60s before the auto-stop (“make a sound to keep recording”) so you can intervene if you’re just thinking.

The watcher handles wake-from-sleep correctly: if the Mac suspends mid-recording, the audio binary stops emitting ticks while wall-clock keeps advancing — naively diffing now vs. last-loud would fire auto-stop on wake. Instead, any gap of 60s+ between ticks is treated as a wake event and the silence countdown restarts from zero. Silence threshold and warning interval are tunable; see lib/config.sh.

Clamshell guard

There’s a macOS quirk where the CoreAudio process tap captures only zeros when the default output device is the built-in MacBook speakers and the lid is closed. Audio is still audible (mirrored to the external display’s speakers or wherever), but the tap sees silence — so you’d record a full meeting of mic-only audio and only discover the missing system track at transcript time.

record-start detects this combination at launch (ioreg AppleClamshellState + the Swift binary’s --current-output inspection) and refuses to start, printing the available alternative outputs in the same message. Switch the output device via Control Center → Sound, or open the lid, and try again. The check costs nothing when the lid is open — no recording session is set up to perform it.

Output device switching mid-recording

The aggregate device that powers system-audio capture is bound to a specific clock master (the current default output device). If you change outputs mid-recording — pair AirPods, plug in a USB DAC, switch to an external display’s speakers — the original clock master goes away and audio would drop on the floor.

The Swift binary watches kAudioHardwarePropertyDefaultOutputDevice and, on a change, tears down the aggregate + IOProc and rebuilds them against the new clock master without interrupting the session. The same process tap is reused, so no audio is lost across the transition. From your perspective, the recording just keeps going.

Disk-full notification

The Swift capture binary watches AVAssetWriter.append() for failures during a recording. After sustained losses (typical cause: $LOGNOTE_AUDIO_DIR ran out of space) it inspects the writer’s NSError, and if the message indicates out-of-disk it fires a single macOS notification: Disk full — audio write failed. Recording may be losing audio. Free space and stop the recording.

The notification is deduped per recording session — one alert, not a spam stream. Free space and stop the recording manually; the partial .m4a up to the failure point is still valid and goes through the normal transcribe pipeline on stop.

Audio retention cleanup

At the start of every recording, lognote sweeps $LOGNOTE_AUDIO_DIR for .m4a files older than LOGNOTE_AUDIO_RETENTION_DAYS (default 90) and deletes them along with their sidecars (.transcript.json, .markers.json, .silence.json). Notes that reference the audio via [[wikilink]] are never touched — the wikilink just becomes a broken link, which Obsidian shows as red so you know the source is gone.

Set LOGNOTE_AUDIO_RETENTION_DAYS=0 to disable cleanup if you’d rather hold on to everything (or run your own pruning). The cleanup is opportunistic, not scheduled — it only runs when you start a new recording, so a long stretch without recording won’t churn through old files.

transcription

Transcription behaviors to expect

Track-split diarization, hallucination filtering, cross-track dedup.

Track-split diarization

mlx-whisper doesn’t do speaker diarization natively, so lognote leans on the m4a’s structure: track 0 is system audio (everyone else, captured via the CoreAudio process tap), track 1 is your mic. Each track is extracted with ffmpeg -map 0:a:<N>, transcribed independently by mlx-whisper, tagged with speaker: "me" or speaker: "others", and the two streams are interleaved by start timestamp before format-transcript.py renders the result.

See the me/others entry for the v1 limitation: remote participants in a Zoom call all pool under others. The transcription/silence-hallucination-filter and cross-track-dedup-and-vad entries cover the two passes that clean up the artifacts this approach introduces (Whisper hallucinations on silent tracks, loudspeaker bleed on the mic track).

Silence-hallucination filter

Whisper is known to invent text on silent audio — “Thank you.”, “Thanks for watching.”, “All right.”, “Bye.” — particularly when a chunk is near-silent. Before track-split, the mic and system streams were mixed so a hallucination on one track was masked by real audio on the other. Now that each track is transcribed independently, silent stretches surface those phantoms directly under whichever speaker label that track represents.

lib/hallucination_filter.py catches them with a heuristic: a curated list of common hallucinated phrases, combined with suspicious-duration and no-speech-probability gates so genuine short utterances (“All right, let’s begin.”) aren’t dropped. The filter runs on each track before tagging and interleaving, so the merged transcript you see in your note is already clean.

Cross-track dedup and mic VAD

If you record a Zoom call with the audio coming out of your laptop speakers (not headphones), the mic captures the speaker output too — so Whisper transcribes the same speech twice, once on the cleaner others track and once as bleed on the me track. Two filters handle this:

The text-similarity dedup pass compares every me segment against others segments within ±5s (Whisper drifts a few seconds between tracks on long recordings). When the Szymkiewicz–Simpson overlap coefficient is ≥0.7, the me segment is dropped as a duplicate. Tunable via LOGNOTE_DEDUP_SIMILARITY_THRESHOLD / LOGNOTE_DEDUP_TIME_WINDOW; disable entirely with LOGNOTE_DEDUP_CROSS_TRACK=0.

The mic VAD pass is a belt-and-suspenders second filter for residual bleed that escapes the text-similarity heuristic. It uses ffmpeg silencedetect to find silence intervals on the mic track, then drops any me segment whose audio range was ≥70% silent — if your mic was silent during a span, anything Whisper transcribed there has to be acoustic leakage. Disable with LOGNOTE_MIC_VAD_ENABLED=0.

permissions

Permissions

Two TCC grants on first run, and how to re-test them.

Microphone and System Audio Recording permissions

The first time you hit record, macOS shows two click-Allow TCC prompts, both labeled Lognote: one for Microphone (the obvious one) and one for System Audio Recording (gates the CoreAudio process tap that captures audio from other apps). Two separate prompts because macOS treats them as distinct TCC services with different threat models — there’s no single combined grant.

Both prompts come from Info.plist usage descriptions on the bundled Lognote.app. Re-signing the bundle with the same identifier (com.shariqh.lognote.recorder) preserves prior grants across rebuilds, so setup.sh upgrades don’t make you re-grant. To re-test the permission flow:

tccutil reset Microphone   com.shariqh.lognote.recorder
tccutil reset AudioCapture com.shariqh.lognote.recorder

Note the tccutil service name for System Audio Recording is AudioCapture, not SystemAudioCapture or kTCCServiceAudioCapture — verified empirically on macOS 26.4.

install

Install, upgrade, uninstall

One command up, one command down. No drift.

One-command setup

./setup.sh is the single entry point. It verifies prereqs (Apple Silicon only — Intel is hard-rejected), brew-installs python@3.11 / ffmpeg / node if missing, builds the .venv with the pinned mlx-whisper + openai versions, builds the Swift Lognote.app bundle (preserving its bundle ID so prior TCC grants survive), builds the menu bar helper, builds the Obsidian plugin, pre-downloads the ~2 GB local-mlx default model, installs the ~/.local/bin/lognote-record-{start,stop} PATH wrappers, and runs an interactive config block for vault path and LLM provider credentials.

Re-running ./setup.sh is an upgrade — every step is idempotent. If setup.sh itself changed in the pull, it auto-re-execs the new version, so a single invocation always applies the latest install flow. Useful flags: --non-interactive (skip prompts, CI-friendly), --reconfigure (re-prompt even if config exists), --skip-pull (don’t git-pull, useful on feature branches), --skip-llm-download (defer the ~2 GB model fetch).

Distribution channels

Three ways to get lognote on your machine:

  • Homebrew: brew tap shariqh/tools && brew install lognote. The formula clones to ~/dev/lognote and runs setup.sh. Future updates: brew upgrade lognote.
  • One-line installer: curl -fsSL https://raw.githubusercontent.com/shariqh/lognote/main/install.sh | bash. Equivalent to git clone + setup.sh. Override the destination with LOGNOTE_INSTALL_DIR=...; pass setup flags via LOGNOTE_INSTALL_FLAGS="--non-interactive".
  • Git clone: git clone git@github.com:shariqh/lognote.git ~/dev/lognote && cd ~/dev/lognote && ./setup.sh. The most direct path; recommended if you want to track a branch or read the source before installing.

All three converge on setup.sh, so install behavior is identical regardless of which channel you used. While the repo is private, the Homebrew and curl paths need a GitHub token with repo scope (gh auth login once, then re-run) — this caveat disappears when the repo goes public.

Uninstall and reinstall

./setup.sh --uninstall walks through every integration lognote installed on your machine — PATH wrappers, plugin symlink in the vault, state directory, menu bar app, brew-installed dependencies, the cloned repo itself — and asks for confirmation on each one before removing it. Pass --non-interactive to auto-confirm every prompt (use carefully).

./setup.sh --reinstall is sugar for “uninstall non-interactively, then install” — useful when something’s wedged and you want to start from a clean slate. Crucially, neither flag ever touches your audio recordings or vault notes. Uninstall removes the tooling that produced them; the content itself stays where it is.

that's the lot

Missing something you expected to see? Email hello@lognote.dev . The handbook tracks the product, so if it's here it ships, and if it ships it should be here.