Voice input and dictation: how to turn speech into text in Obsidian and beyond

In the era of digital notes, voice input is becoming more and more in demand: speech lets you quickly capture thoughts "on the go", without being distracted by typing.

This is especially relevant for productivity and personal work – for example, when you’re listening to a lecture or reflecting on what you’ve read and want not to lose an instant idea.

Below we’ll go through the popular dictation methods, their pros and cons, use scenarios and integrations with Obsidian.

A comparison of voice-input tools (summary)

Tool / service	Ease of use	Platform	Integration with Obsidian
iOS dictation	very easy (keyboard)	iPhone/iPad	built-in Clipboard, Shortcuts
Android dictation	very easy (keyboard)	Android	interceptor apps, Shortcuts
Third-party keyboards	medium (need to install)	Android/iOS (Gboard/Yandex)	not directly
Telegram (bots/premium)	easy (sent a voice message)	iOS/Android/Web (Telegram)	via the Telegram Sync plugin
Siri/Shortcuts	easy (universal)	iOS/macOS	Shortcuts for Obsidian URI
Windows dictation	very easy (Win+H)	Windows 10/11	works in any app
macOS dictation	easy (Fn twice)	macOS	works in any field
Google Docs Voice	easy (Chrome Tools)	Windows/macOS/Linux (Chrome)	export to Markdown via Copy
Extensions (Voice In)	easy (Chrome ext.)	Any OS (Chrome)	input in any browser field
Apps (Speechnotes, etc.)	easy (start)	Windows/macOS/Android	copy/transfer manually
Otter.ai	medium (registration)	Web/iOS/Android	can export the text
Whisper (CLI/service)	medium (command line)	Any (Python/CLI)	via scripts or plugins
NotebookLM (Google)	medium (web service)	Web	none directly, you can feed it text
Whisper API (plugin)	easy (install, API)	Obsidian	a built-in plugin
GPT Assistant (plugin)	easy (install, API)	Obsidian	generates answers based on notes
Telegram Sync (plugin)	easy (bot+token)	Obsidian	saves voice messages and text

📱 Mobile input

1️⃣ Built-in features:

On smartphones and tablets, most keyboards have a microphone button — just touch it and start speaking.

The system itself detects the end of the phrase (or press “Done”).

Voice input on iOS and Android supports many languages and the usual punctuation (say “period”, “comma”, etc.).

✔️ Pros:

always at hand,
requires no third-party apps,
well suited for quick notes.

🔴 Cons:

you need to fix the punctuation or format the text by voice;
phrases are taken literally (otherwise you get “speech to text” without splitting into sections).

One member of our chat noted:

“You won’t see a note in an audio file – you can’t skim it with your eyes, you can’t find what you need”.

That is, when dictating straight into text you lose the audio recording, but you get text that you can search and link.

2️⃣ Third-party keyboards (Gboard, Yandex, etc.).

The Google keyboard (Gboard) and others also let you dictate by voice.

Setup:

add the layout you need and press the microphone on the keyboard.
They support additional voice commands and are often more focused on accuracy (Yandex.Keyboard, for example, is optimised for Russian).
It’s simple and familiar to most users, although technically there’s no tight integration with Obsidian here – you’ll have to copy the text.

3️⃣ Telegram bots and messages

On the go you can send a voice message to yourself or to a bot in Telegram. Earlier, we discussed the synchronisation of a Telegram bot that lets you send text or audio right into your Obsidian vault. See the Obsidian sync methods here

🎙️ If you have “Transcription of voice messages” enabled in Telegram Premium, a “Text” button will appear under each voice message: tapping it gives us a ready transcript.

In the Obsidian developers’ chat they suggest another scenario:

record a message on iPhone (e.g. with the “Voice Memos”),
forward it to Telegram,
the bot/plugin automatically transcribes it.

For example, the Telegram Sync plugin can automatically save the text of voice messages (with the paid Premium transcription) into note files. This approach is convenient if you already actively use Telegram for notes.

4️⃣ Shortcuts (Siri, Google Assistant).

iOS and Android have voice assistants (Siri, Google).

You can, for example, create Shortcuts:

enable “Dictation” as an action in a Shortcut, so that with a button press or by voice (“Siri, dictate an Obsidian note”) you immediately record text into a note.
Obsidian even has user solutions: via the Advanced URI you can trigger the creation of a new note and the insertion of text.
For Android there’s a similar scheme: you can ask Google Assistant to “Take a note [text]”, and then export it to Obsidian.

💻 Desktop input (on the computer)

1️⃣ Google Docs – Dictation

In the Chrome browser, open Google Docs,
choose Tools → Voice typing in the menu.
A microphone will appear: press it and speak.
Google Cloud Speech-to-Text supports many languages (e.g. ru-RU), so the speech is transcribed with good accuracy.

✔️ Pros:

often works even through a poor microphone,
handles punctuation,
free (up to a limit).

Cons:

only in the browser,
you need internet, and you then have to copy the text into Obsidian (or sync via your own method).

A simple example:

you dictate a whole draft of an article or a lecture, and then transfer the result into Markdown.

2️⃣ Dictation in macOS

On a Mac, in System Settings → Keyboard → Dictation, enable the feature and choose your language.

Dictation starts with a double press of Fn (or another assigned key).
After that you can dictate any notes.

✔️ An advantage:

works in all apps (including Obsidian),
supports the commands “comma”, “period” and some actions (delete that, new paragraph, etc.).

One user noted:

in macOS you can achieve very accurate recognition: the main thing is to choose the right language in the settings.

🔴 Cons:

also requires an internet connection
the voice command “Start dictating” is in effect. (macOS also has an offline mode, “enhanced dictation”, but it understands some languages worse.)

3️⃣ Built-in Windows voice input.

Windows 10/11 has a dictation system: just place the cursor in a text field and press the combination Win+H.

A voice-input window will appear, speak – the words will appear in the document. Windows 11 officially supports dictation in many languages.

✔️Advantages:

works in any app, even in Obsidian (any text editor).

🔴 Disadvantages:

sometimes places periods and commas incorrectly,
to “stop” you have to say “Stop listening” or press a button.

🤖 Specialised apps

1️⃣ Speechnotes

(an online notepad and Android app)

About the app:

Speechnotes uses Google’s technology and supports many languages. On the site or in the app you can dictate a note right away – everything is saved automatically.

✔️ Its plus:

a focus on dictation (there are punctuation commands, auto-correction),
can be used for free.

🔴 The minus:

you have to copy the text from the browser into Obsidian.
Similarly with the Voice In – Speech-To-Text extension for Chrome: it adds voice input to any site.

According to the developers, Voice In lets you “dictate without a keyboard” on more than 10,000 sites, including Google Docs, Gmail, ChatGPT, etc. It’s very convenient if you often type from the browser: you speak into any input field.

2️⃣ Whisper (CLI and services).

OpenAI Whisper is a free model for transcribing audio. It supports many languages and recognises speech with noise and accents wonderfully.

You can run Whisper on your own machine (there’s a Python client, Docker, “Whisper.cpp” for offline) or use third-party services (e.g. servers based on Faster-Whisper).

3️⃣ Wispr Flow – an AI dictation keyboard

Wispr Flow is an app on iOS, macOS and Windows that turns your speech into text in any input field, including Obsidian, browsers, messengers, IDEs, email clients and other tools

✔️ Pros:

Works everywhere: Obsidian, Telegram, VS Code, Gmail, etc.
3–4 times faster than typing (Flow promises ~220 words/min vs ~45 typed)
Processes speech locally + AI commands + auto-editing of text
They promise solid privacy: the data isn’t used to train AI until you explicitly enable it

🔴 Cons / notes:

The installation weighs about 800 MB, uses ≈ 8% CPU even when idle, and constantly runs in the background (adds itself to autostart)
Users complain about the intrusion into context menus, the monitoring of apps (Firefox/Chrome), the lack of transparency about data collection
There are security questions: it’s unknown how and what exactly is transmitted, there are no clear boundaries

⭐ In Obsidian there are plugins and scripts available: for example, the Whisper API plugin lets you record right in a note or upload an audio file, and it creates a transcript.

It’s a powerful method: you just speak – and the text is automatically formed.

The main thing

– keep in mind that large audio recordings will take longer to process.

Whisper is advantageous in that it’s multilingual and offline (if you install a local model).

Combine the approach?

So, for example, you can combine:

quick voice on the go, and precise wording – dictate it more carefully.
Some find it more convenient to write by hand, while others get a flow of thoughts exactly in spoken speech.

Experiment: try different plugins and services to find your balance of speed and accuracy.

The main thing is for your voice to become a helper in your notes, not a hindrance.

What experts and tech blogs advise

If you believe the reviews, the list looks like this:

Rev — the top for transcription quality
Dragon Anywhere — insanely accurate, but expensive
Descript — perfect for video and podcasts
Google Voice Typing / Word dictation — for the lazy, but fast
Speechnotes / Braina Pro — underrated, but convenient

What we choose, in the Obsidian & Mind Club

🥇 Wispr Flow — insanely fast, almost like a thought

“It’s inserted into any program. Works like magic”.
”I write in Obsidian by voice and don’t stress”.
— Club members

Pros: universal, works everywhere, awesomely accurate
Cons: eats up resources, constantly in the background, touches browsers

🥈 Telegram Sync + Whisper or Premium dictation

“Dictated — got a note. Simple. Convenient.”

Pros: native, mobile, integration with Obsidian
Cons: Telegram isn’t always stable, takes getting used to

🥉 Whisper CLI / API / plugins

“Whisper is ChatGPT, only for sound. It works even when everything else is lagging”.

Pros: works offline, accuracy is top-notch
Cons: requires skills, not for beginners

🏅 Siri Shortcuts + Obsidian URI

“Said it — it was created”.
”Privacy to the max. All thoughts — inside the device”.

Pros: autonomous, no internet, customisable
Cons: iOS only, you need to set up a Shortcut

Keep going?

🗃️ Templates in Obsidian — examples and how to create them

⬅️ Back to home

Elton Labs

Voice input and dictation: how to turn speech into text in Obsidian and beyond

A comparison of voice-input tools (summary)

📱 Mobile input

1️⃣ Built-in features:

2️⃣ Third-party keyboards (Gboard, Yandex, etc.).

3️⃣ Telegram bots and messages

4️⃣ Shortcuts (Siri, Google Assistant).

💻 Desktop input (on the computer)

1️⃣ Google Docs – Dictation

2️⃣ Dictation in macOS

3️⃣ Built-in Windows voice input.

🤖 Specialised apps

1️⃣ Speechnotes

2️⃣ Whisper (CLI and services).

3️⃣ Wispr Flow – an AI dictation keyboard

⭐ In Obsidian there are plugins and scripts available: for example, the Whisper API plugin lets you record right in a note or upload an audio file, and it creates a transcript.

Combine the approach?

What experts and tech blogs advise

What we choose, in the Obsidian & Mind Club

🥇 Wispr Flow — insanely fast, almost like a thought

🥈 Telegram Sync + Whisper or Premium dictation

🥉 Whisper CLI / API / plugins

🏅 Siri Shortcuts + Obsidian URI