SSPAI Morning Brief: Google Launches Gemma 4 Open Models, Zhipu Unveils GLM-5V-Turbo Multimodal AI, and More Tech News

,

少数派编辑部

Morning Brief

  1. Google releases Gemma 4 open-source model series
  2. Zhipu unveils GLM-5V-Turbo multimodal model
  3. Google to require all Wear OS watch apps to support 64-bit
  4. China Radio and Television Association Actors Committee issues statement on AI face-swapping and voice cloning infringement
  5. News Worth a Quick Look

Google releases Gemma 4 open-source model series

On April 2, Google announced the new-generation open-source model family Gemma 4, positioning it as one of the most capable open-source model lineups to date. Built on the Gemini technology stack, the series emphasizes “intelligence per parameter” and the ability to run locally.

Gemma 4 comes in four variants: E2B, E4B, 26B MoE, and 31B Dense, covering deployment needs from mobile devices to high-performance GPUs. Among them, the 31B model ranks among the top three open-source models on the Arena AI leaderboard, while the 26B model ranks sixth, outperforming some models with roughly 20× more parameters.

In terms of technical capabilities, Gemma 4 supports up to a 256K context window (128K for edge-side models) and offers multimodal processing, allowing inputs such as images, videos, and audio. The model natively supports function calling, structured JSON output, and system instructions, making it suitable for agent workflow development while strengthening code generation capabilities. Gemma 4 is released under the Apache 2.0 open-source license and is compatible with mainstream toolchains such as Hugging Face, Ollama, and vLLM, supporting deployment on both local devices and cloud environments.

Google stated that Gemma 4 supports more than 140 languages and targets use cases across Android devices, IoT, and scientific research, aiming to further drive the adoption of AI in mobile and edge computing environments. Source


Zhipu unveils GLM-5V-Turbo multimodal model

On April 2, Zhipu introduced the vision-language model GLM-5V-Turbo, aiming to address the trade-off between visual understanding and code generation performance.

The model adopts a native multimodal fusion design, using the CogViT visual encoder to directly process images, videos, and complex document layouts. Combined with a Multi-Token Prediction (MTP) architecture, it improves inference efficiency and long-code generation capabilities, supporting up to a 200K context window. To avoid the “seesaw effect” between visual and programming capabilities, the model is trained with joint reinforcement learning across more than 30 tasks, achieving balanced performance in STEM reasoning, visual grounding, video analysis, and tool use.

GLM-5V-Turbo is deeply optimized for agent scenarios, with key integrations into OpenClaw and Claude Code workflows. It can generate code based on visual inputs and perform UI interactions. Benchmark results across CC-Bench-V2, ZClawBench, and ClawEval indicate strong performance in multimodal programming, GUI interaction, and multi-step task execution. Source


Google to require all Wear OS watch apps to support 64-bit

On April 2, Google announced that it will extend its long-standing 64-bit app transition policy from Android mobile to the Wear OS smartwatch platform, requiring developers to provide 64-bit versions of their apps starting in September.

Beginning this September, all new Wear OS apps and updates that include native code must provide both 32-bit and 64-bit versions when submitted to the Play Store; apps that fail to meet this requirement will not be accepted through the Play Console. For now, support for existing 32-bit apps remains unchanged, meaning devices that rely on 32-bit processors or preinstalled 32-bit Wear OS will continue to run these apps normally. Source


China Radio and Television Association Actors Committee issues statement on AI face-swapping and voice cloning infringement

In response to the increasing number of infringement cases involving AI face-swapping, voice cloning, manipulation of film and television materials, and the unauthorized scraping of actors’ images and audio for model training, the Actors Committee of the China Radio and Television Social Organizations Federation has issued a statement emphasizing that performers legally hold rights to their likeness, voice, and artistic image. It states that no individual or entity may collect, use, or distribute such content without written authorization. It also points out that even if labeled as “non-commercial” or “for public benefit,” activities such as AI face imitation, voice mimicry, or face-swapped short dramas involving specific actors still constitute infringement and carry legal liability.

The statement further calls on short video, livestreaming, and film platforms to strengthen content review mechanisms, conduct comprehensive investigations, and remove infringing works. AI technology platforms are also required to verify authorization for training materials. The Actors Committee stated it will initiate ongoing infringement monitoring and rights protection efforts, while supporting the development of AI technologies under compliant conditions and advocating for a unified authorization and revenue-sharing mechanism. Source


News Worth a Quick Look

  • According to Korean media outlet Etnews, Samsung plans to continue using M13 material OLED panels in its upcoming Galaxy Z Fold 8, Z Flip 8, and a new “wide foldable” device scheduled for release in the second half of the year. Since debuting with the Galaxy S24 series, M13 materials have been used across multiple flagship generations, including the Z Fold 6/Flip 6, S25 series, Z Fold 7/Flip 7, and the standard and Plus versions of the Galaxy S26 released in February this year, while the S26 Ultra upgrades to M14 materials. Source
  • On April 2, Google announced upgrades to its $20-per-month AI Pro subscription. Cloud storage has been increased from 2 TB to 5 TB; Gemini capabilities have been further enhanced to pull contextual information from Gmail and the web for downstream tasks. Gemini can also summarize emails and proofread messages before sending. Additionally, the subscription offers an annual plan priced at $200. Source
  • Leaker KeplerL2 posted on the NeoGAF forum on March 31, claiming that Sony’s PlayStation 6 handheld (codename Project Canis) will surpass Microsoft’s current Xbox Series S in both traditional rasterization and ray tracing performance. In terms of core specifications, current reports suggest the device will use a 3nm process from TSMC, with a chip size of just 135 mm². It is said to feature 4 Zen 6c cores and 2 low-power Zen 6 cores, paired with 16 RDNA 5 compute units and up to 24GB of LPDDR5X memory. Source

Leave a Reply