From an earlier video

This is material from an earlier video — the interfaces and features of AI services may have changed since. The idea and the approach still hold.

🧠 What if you had an assistant that instantly understood text, photos, PDFs and even video? That's Gemini.


What kind of model is it?

Gemini is Google’s new line of multimodal models. Unlike many other AIs, Gemini was trained simultaneously on text, images, video and audio.

That makes it especially flexible at understanding and analysing information.

🧠 Multimodality — what does it mean?

You can upload to the chat:

  • a PDF (a lecture, report, book),
  • a screenshot or a photo of a whiteboard,
  • a link to a YouTube video,
  • or just ask a question out loud —

And Gemini will combine it all, understand it, and give you a meaningful answer.


🔍 How is Gemini different from ChatGPT?

ParameterGemini 2.5 ProChatGPT GPT-4
Multimodality✅ yes⚠️ limited
Context (tokens)up to 1,000,000up to 128,000
SpeedFast (Flash model)Depends on plan
Google integrationFull (Drive, Gmail)❌ none
Working with video and PDF✅ works great⚠️ partial
Conversational language🤔 average✅ good

Bottom line: Gemini suits you better if you work with documents, video, presentations and Google’s cloud.


📥 Example prompt for analysing a PDF:

You are a marketing expert. Read this PDF and:
1. Find the 3 main problems.
2. Suggest 2 solutions.
3. Make a table with a brief summary.

🧭 When to use Gemini?

SituationWhat I use
Quickly analyse a YouTube video✅ Gemini
Working with Gmail, Drive, Docs✅ Gemini
Writing creative text🤔 ChatGPT is better
Automating spreadsheets✅ Gemini
Building a game or a quiz✅ Gemini (Canvas Mode)
Translation or grammar fixes✅ Gemini / ChatGPT

📎 What’s next?

In the upcoming articles:

  • 🎮 How to build educational games with Gemini

  • 🧩 How NotebookLM works and why you need it

  • 📊 How to make habit trackers and visualisations in Google Sheets

**All the prompts, templates and cases — in the Prompts section.


Keep going?