AI News: These Google Updates Are Dividing People

Matt Wolfe — Future Tools | Google IO 2026

URL: YouTube
Channel: Matt Wolfe — Future Tools
Event: Google IO 2026
Date: 2026-05-23

Tags: AI News Google IO 2026 Gemini AI Agents Video Generation Matt Wolfe

14 Key Takeaways

  1. Gemini 3.5 Flash = 速度与成本革命:intelligence 与 Claude Opus 4.7 / GPT-5.5 持平,但速度快 3 倍以上,价格 $150/M input, $9/M output
  2. Gemini Omni = 任意输入到视频:World knowledge grounding + 自然语言视频编辑 + Avatar 个人形象 + 10 秒视频生成
  3. Gemini Spark = Google 版 OpenClaw:运行在 Google Cloud,访问 Gmail/Calendar/Drive,可用 MCP connectors(Canva, OpenTable, Instacart)
  4. Google 策略转变:不再执着于 benchmark,而是问"如何让 AI 真正有用"——聚焦普通消费者、小商户
  5. Avatar 功能:自拍视频 → 生成个人 AI 形象 → 嵌入任何视频场景
  6. Project Genie + Street View:把真实地理位置变成可交互的 AI 游戏世界
  7. Google AI 眼镜终于要发货了:Google × Gentle Monster/Warby Parker,秋季先发音频版
  8. Synth ID 水印标准获竞争对手支持:OpenAI、11Labs、Cacao 等 Adopt
  9. Spotify UMG 协议:AI 混音向原艺术家支付版税——音乐行业首次正式拥抱 AI remixes
  10. Cursor Composer 2.5:性能接近 Opus 4.7,但成本 <$1/task(vs Opus ~$11)
  11. OpenAI ChatGPT 个人金融:通过 Plaid 连接银行 → 基于真实财务数据给建议 → 隐私风险引发担忧
  12. Andre Karpathy 加入 Anthropic:OpenAI co-founder → Tesla → OpenAI → 现在加入 Anthropic
  13. Boston Dynamics 机器人:能搬运整台冰箱——物流最后一步的自动化
  14. Google 产品线过度重叠:Flow/VO/Omni 都能做视频;Imagine/Nano Banana 都能做图片——用户体验越来越混乱

Google IO 2026 主要发布

1. Gemini 3.5 Flash

维度数据
Intelligence与 Gemini 3.1 Pro / Claude Opus 4.7 持平
速度比 3.1 Pro 快 2 倍以上,比 GPT-5.5/Claude Opus 快 3 倍以上
Input 价格$150/M tokens
Output 价格$9/M tokens
优势场景Agentic benchmarks(超越 Anthropic 和 OpenAI)

2. Gemini Omni(视频模型)

核心能力:

与 VO 的区别:

模型定位
VO纯 text-to-video 模型
Gemini Omni多模态交织 + 视频编辑 + 远超 text-to-video 的能力

3. Gemini Spark(Google 版 Agent)

定位:Google 版 OpenClaw/Hermes,但运行在 Google Cloud(不依赖本地电脑,关机也能跑)

能力:

CTO Karey 采访(安全设计):
发送内部邮件:无需批准;发送外部邮件:需要用户批准;所有敏感操作:用户始终处于控制中。

4. 其他重要发布

产品内容
Universal Cart跨商户购物车;支付自动分发给各商家;智能比价+兼容性检查(如 CPU 主板不兼容警告)
Agent Payments ProtocolAI Agent 代表用户付款,可调节信任级别
Anti-gravity 2.0新 IDE,看起来像 OpenAI Codex;内置代码生成
Gemini for Science科学工具套件,集成 30+ 生命科学数据库(UniProt, AlphaFold, Alpha Genome)
Project Genie + Street View把真实地图位置变成可交互 AI 游戏(如:猴踩滑板车在圣迭戈 Petco Park)
AI GlassesGoogle × Gentle Monster/Warby Parker;秋季先发音频版;Meta Ray-Bans 竞品
Synth IDAI 内容水印标准,OpenAI/11Labs/Cacao 均已 Adopt

非 Google 新闻

Andre Karpathy 加入 Anthropic

35:26 履历:OpenAI co-founder → Tesla(Autopilot)→ OpenAI → 自己创业(AI 教育)→ Anthropic

"他是 LLM 领域的头号声音,X 上 260 万粉丝,任何推文都会病毒传播"

Spotify + UMG:AI 混音授权协议

Spotify Personal Podcasts

用 prompt 生成个性化私人播客——基于 Spotify 口味档案 + 世界知识。例子:"daily city updates + 我喜欢的艺术家的演唱会信息"

Cursor Composer 2.5

维度数据
Terminal Bench与 Opus 4.7 持平
Swebench Multilingual超过 GPT-5.5
成本<$1/task(vs Opus ~$11)
⚠️ OpenAI ChatGPT 个人金融 — 隐私担忧

通过 Plaid 连接真实银行账户 → 基于实际财务数据给出建议

问题:OpenAI 已知用户所有消费习惯、投资组合、旅行记录 → 这些数据会流向广告商吗?
"如果 OpenAI 知道我所有的消费习惯、可用资金、投资方向、去哪里旅行,这对投放广告意味着什么?"

Elon Musk vs Sam Altman 审判

Elon Musk 败诉。法院裁定:诉讼时效已过,Musk 没有实质 case。

Granola Meetings

会议前:自动 research 参会者 LinkedIn,生成参会简报。会后:自动笔记+摘要。

Boston Dynamics 机器人

能搬运整台冰箱。问题:"冰箱里的饮料还冰吗?显然这冰箱没插电。"

Matt Wolfe 的总结

"这次 Google IO 的感觉不像是他们在追 benchmark,而是——我们如何让这些真正有用? 我们如何把它嵌入更多产品?如何让普通人、小商户也能用?如何更主动?如何让事情自动发生?"

30:16
"Google 在'往墙上扔意面,看什么能粘住'的阶段。6 个不同 Google 工具都能建网站;3 个工具都能做视频;3 个工具都能做图片。用户体验越来越混乱。"

32:02

Discussion Questions

  1. Google 的产品重叠问题——这对竞争对手(Bing?)是否是机会?
  2. OpenAI 接入银行账户——隐私风险是否值得?
  3. Spotify UMG 协议——AI 音乐的版税模式是否可以复制到其他内容领域?
  4. Gemini Spark 与 OpenClaw/Hermes 的差异化——云端运行 vs 本地运行的权衡?
  5. Google IO 的"有用性转向"——这对 AI 应用层创业意味着什么?

Full Transcript (Selected)

0:00 Here's the AI news you might have missed this week. Quick spoiler alert, we're going to talk a lot about Google today. This week was Google IO and I did get to attend in person and see the keynote live.
0:36 They announced the Gemini 3.5 family of models, but the one that we actually got during Google IO is 3.5 Flash, which is their smaller, faster model. So, it's not the most intelligent model in the world, but it's faster and cheaper than their full-fledged Gemini 3.5 Pro. Where it really stands out is its speed to intelligence ratio. This new 3.5 Flash is pretty much on par with 3.1 Pro and Claude Opus 4.7 in terms of intelligence, but more than double as fast as 3.1 Pro and more than three times as fast as GPT 5.5 and Claude Opus. So it's a lot faster model for being similarly as smart. The input price for 3.5 Flash is $150 and an output price of $9 per million tokens. So $150 and $9 versus Claude's 4.7 being $5 input and $25 output and OpenAI's GPT 5.5 being $5 per million tokens and $30 per million output tokens.
2:53 The second one that's more impressive is the new Gemini Omni model. This model is designed to create anything from any input. Right now you can input video and it will understand and edit that video for you. But in the future, you're going to be able to input audio, input images, and output any of those as well. We really think about it as the next step on our journey towards world models that can understand and generate anything. Right now you can put any modality into the model as input — images, videos, text, audio — and generate video. But in the future, we look for the model to support also image outputs and audio outputs as well as a path towards Gemini to be like truly multimodal.
4:04 The other thing that's really impressive about this Gemini Omni model is that it's grounded in world knowledge. So if you're using Nano Banana, you can say generate an image about how quantum computing works and it will generate an infographic type image, but it'll do the research and then bake in what it found into the image. Well, Omni kind of does that as well, but with videos.
6:50 There's this AI startup called Victor that plugs directly into your Slack so it can get stuff done for you and your team. Instead of typing a generic prompt like, "Help me find the latest AI news from the past week," Victor can actually go into my documents, look at past outlines and research to see what I'm looking for, search the internet, and generate an outline that I can actually use for my next video. Since Victor is already connected to the platforms you're already using, it can also do stuff like pull data directly from Google Ads, analyze performance, build a report, send it to your team in Slack, and even follow up on it all on its own.
12:36 Another big announcement that came out of Google this week was Gemini Spark. And this is Google's answer to like OpenClaw and Hermes, and it's them putting an agent out into the world. Instead of prompting things and getting responses back, it will actually go out and do actions on your behalf. Unlike Hermes and OpenClaw, which will run locally on your computer or on a VPS on a cloud computer somewhere, this one runs entirely on Google's servers. So it will always continue to run even if your own computers aren't online.
13:51 Since this has access to your Gmail, your Google calendar, and anything that's in your Google ecosystem, it can connect all those together and take actions with those tools. So some examples on their website: it'll give you an inbox summary from your email. It'll do things like look across your emails and your Google calendar and your Google Drive and then pull information together and put it all into a Google sheet for you.
14:48 If you've already built out skills for your OpenClaw or Hermes or Claude Code or Codeex, are we going to be able to pull those over? And they said you can upload skills.
15:25 Everyone will have a different threshold there. It is more about designing a system where users are feeling in control for the kind of control that they want. And the agent there are of course like there are sort of like commonly accepted there are some safety thresholds but you don't want the agent crossing without you ever approving thing. For example right now we are fine with the agent sending you email right because it's your agent. But like if it needs to send an email to someone external it needs to ask for permission or if it needs to do anything that touches external world a little bit on your behalf you should be in control.
20:14 Google is apparently working on a new mode where instead of showing you a list of links, they're going to show you AI responses at the top. And then below that, there might be some links. The real question is going to be how is Google balancing the AI response with actually organic links and incentivizing people to actually click away from Google?
22:16 Google introduced what they're calling Universal Cart. It's a shopping cart that actually works across merchants and across services. You could be browsing the web looking for different products. You find one product on one Shopify store, you add it to your cart. You go to a different store and find a different product on a different Shopify store or on a completely different website, target.com or whatever. You find something you like there, you add it to your cart there and you just browse the web and you're adding stuff to your universal cart. And when you're done, you check out and then it just kind of sends the payments to all of the places for you. This cart has intelligent features like it'll show things like this is the lowest price it's seen in 30 days. It might suggest you use your MX card to get 3% cash back. And it will actually double check and confirm if it thinks you're making a bad choice. For example, the CPU and motherboard that are in your cart are not compatible.
23:29 They also created the Agent Payments Protocol, which is going to allow your AI agents to go and check out in these carts on your behalf. And they're making it so you can kind of adjust the level of trust you have for your agent to go buy things on your behalf.
25:41 I've always thought Project Genie was fun. It's where you could sort of prompt almost like a video game world that you can interact with for like 60 seconds at a time. Well, this new one, you can actually do the same thing, but put it in real locations using Street View from Google. So for our character, let's do a monkey on roller skates. And look at this. We've got our monkey on roller skates skating around near Petco Park in San Diego. And it's actually grounded by real world map data. I mean, just three years ago I was playing with model scope and zero scope and I'd generate a monkey on roller skates and you could barely tell what it even was. And here we are. Fast forward just a couple years, now I have a monkey on roller skates skating around in downtown San Diego.
27:47 We finally have something that's going to get into our hands and we kind of know when it's going to get in our hands. Google is partnering with Gentle Monster and Warby Parker to put their version of AI glasses out. There's going to be ones that have the camera and the audio, very similar to the Meta Ray-Bands, and then there's going to be the ones that have the display. The audio version is going to launch first, and they're coming in fall. We still don't know when the ones with the display are coming. You can do a lot of the stuff that you can do with the Meta Ray-Bands on these glasses, but they almost sound like they're more designed to be on all of the time and sort of giving you things in your ear, like when you get a new text message.
29:13 They announced that they're going to make it easier to understand how content was created and edited. Their Synth ID embeds imperceptible signals into AI generated content. But they actually announced that other companies are going to adopt this standard, including competitor OpenAI, 11 Labs, Cacao, and more companies to come.
30:16 If I was to sum up Google IO and what they were trying to accomplish this year, it felt less like they were trying to benchmark Max and try to get the best absolute model on all of the benchmarks this time around and more like they're hunting for how do we actually make this useful? How do we bake this into more products and find more ways for people to actually use these AI tools without having to think about it? How can we be more proactive? How can we provide like daily briefings to people automatically? How can we create agents so things just happen on people's behalf?
31:54 There's six different Google tools right now that have AI baked into them that I can go to to build my website. When it comes to videos, you've got Flow and you've got VO and you've got Gemini Omni. When it comes to images, Google had Imagine and then they had Nano Banana. When it comes to coding, you can generate code over in AI Studio or you can use anti-gravity or it'll now generate code straight in AI search or you could do the code over on the Gemini app if you want. Like there's so many different platforms and there's a lot of like overlapping use cases on a lot of these platforms. And I think where to go to do certain things is sort of getting convoluted and confusing. And I think that's what Google's going to have to figure out.
35:26 This is interesting because Andre Karpathy was one of the original co-founders of OpenAI. He helped develop LLMs as we know them. At one point, he was poached away from OpenAI and went to Tesla and helped develop a bunch of the AI over at Tesla. He left Tesla, came back to OpenAI, worked at OpenAI for a couple more years, and then decided to leave OpenAI to focus on his own projects. And to hear that he now joined Anthropic is just kind of crazy because of all of his history over at OpenAI.
36:45 Spotify and Universal Music Group announced a licensing agreement for fan-made covers and remixes. This is like the music industry getting on board with AI and saying, "We're going to let our artists be remixed with AI." It does say it will open up additional revenue streams, meaning if you remix an artist, that artist is actually going to get paid for those streams of songs that were remixed.
37:52 You're actually going to be able to use AI to ask questions about podcasts. Users can ask Spotify questions about the podcast they're listening or watching and get answers in real time. But they're also letting you create your own AI generated podcast, sort of competing with something like Notebook LM. They're releasing something called Personal Podcasts, which is an experience that lets listeners generate and schedule short private audio episodes tailored to their interests and their listening habits. So all you need to do is write a prompt and Spotify generates personalized private audio based on your input. It draws on world knowledge and your Spotify taste profile to create something relevant and uniquely yours.
39:29 If you're a fan of Cursor, they released a new model that's really impressive this week called Composer 2.5. When it comes to coding, it's pretty on par with state-of-the-art models, performing about as good as Opus 4.7 on Terminal Bench. On Swebench Multilingual, it's about on par with Opus and even better than GPT 5.5. But the biggest deal is the cost here. Composer 2.5 is almost as good as Opus 4.7 on max, almost as good as GPT 5.5, but Opus is way up here at like $11. GPT 5.5 is like I don't know $450 or something. And Composer 2.5 is way down here at like less than a dollar per task. So it's nearly as good as these other models, just way way cheaper to use.
42:11 Boston Dynamics showed their robot that can pick up and carry a fridge around. So instead of asking the robot to go to the fridge and grab you a drink, this robot goes to the fridge and will bring you the whole dang fridge. It'll just walk it right over to you.