26 Dec 2024 5 min read AI

Reflections on AI Frontiers

Today, I happened to watch an interview with OpenAI's former Chief Scientist Bob McGrew (Chinese version), which sparked some new ideas. I'd like to share some recent thoughts, mainly revolving around the themes discussed in the interview.

The Limits of LLM

The first major topic is still LLM. The opening question was: "What are its limits, or has it encountered any bottlenecks?" Bob's response focused on two aspects: computational power and reinforcement learning.

I've mentioned the role and importance of reinforcement learning in many previous blogs. I understand it as the learning of "rules," which are very abstract and involve deduction and adaptation. On this note, I'd like to add that I've never really paid attention to DPO because I feel it's not comparable to PPO; they're on different levels. Of course, I agree more with the design philosophy of PPO. It's said that DPO has always been considered outside of reinforcement learning.

Regarding computational power, Bob mentioned o1 but didn't elaborate on it; he just noted that pre-training requires significantly increased computational resources (more GPUs, more data). This is undeniable—Scale law remains effective.

However, when discussing o1 later on, Bob provided new insights for me.

Previously, I believed o1 was only applicable in limited scenarios because its goal is to solve complex problems (in fact, Bob also thinks so; he said most people don't encounter needs for o1 in their daily work except programmers). We know complex problems can often be broken down into a series of subtasks. This is somewhat similar to Agents; in fact, Bob also believes so—he thinks o1 involves not just problem-solving but also planning and execution. This could be described as an overused concept of Agent, but Agent doesn't fully represent o1 either. We agree on this point.

What I hadn't considered was the reasoning direction (possibly because I haven't truly used o1 myself). Bob mentioned GPT-4o having a few seconds of thinking time while o1 ranges from 30 seconds to several minutes or even hours or days. This change he calls "expansion" essentially postpones "learning." We can view this process as the model supplementing its context by itself. This aligns with Altman's previous interview where he mentioned prompts would become obsolete (also noted at the end of Chapter 1 in Butterfly Book). The value of o1 lies in its extensibility—it begins to think more rather than just remember. Its combination with reinforcement learning might be an overlooked yet potentially revolutionary mix.

Applications of LLM

Returning to practical applications—there's quite a bit here starting with Bob's perspective.

LLM seems like a hammer capable of performing numerous tasks; however reliability issues are pressing concerns—we want AI assistance without errors or task deviations leading potentially severe consequences under certain circumstances—and according reliability laws improving reliability by one '9' requires tenfold increase computation cost!

Beyond reliability comes complexity within real-world contexts—not merely external knowledge provided externally—but encompassing entire internal environments such colleagues projects codebases past attempts likes dislikes—all requiring integrated dataset solutions!

Additionally computer agents akin next-gen PCs may alter interaction modes silently handling myriad complex tasks designed generically completing default jobs customizable post-purchase coexisting approaches encountering unsolvable issues necessitating programmatic methods overall multiple methodologies coexist long-term according Bobs view!

Having covered Bobs views lets discuss ours:

Reliability lacks much discussion currently trending towards incorporating humans into processes continuously outputting results confirmed feedback supplied continuing execution inevitably spawning new products applications gradually overturning existing designs centering around LLMs albeit slowly transitioning old replacing new lengthy periods large trends unchanged!

Next complex contexts evident undertaking aforementioned tasks immediately recalling recent Alibaba intelligent development annual review article highlighting difficulties automating highly vertical demands insufficient understanding descriptions hindering large model implementation ironically better suited non-descriptive scenarios aligning Bobs real-world contextual complexity notion additional issues include prematurely raising user expectations unachievable scenarios negative impacts unresolved engineering business contexts ignoring human element insufficient platform information integration unable consolidate user actions intentions etc illustrating future application focus areas infrastructure interactions lengthy reform processes gradual elimination reintegration habit adjustments thought process adaptations comprehensive transformations driven large models!

Finally avoiding computer agent discussions undoubtedly major trend everything updating centered around large models emphasizing coexistence multiple methodologies ChatGPT emergence sensing algorithmic career crises unchanged perspectives retaining minimal algorithm positions overseeing alternative methods fewer roles advancing AGI research current LLM capabilities addressing eighty percent problems occasionally ninety percent remaining ten percent relying solely insufficiently evident workplace experiences strongest LLMS incapable resolving all cases considering efficiency costs possibly limiting usage eighty percent presently trends persisting extended durations factoring transitional traditional enterprise lagging potentially spanning decades!

Multimodal

Beginning early 2024 transitioning text multimodal image voice fusion relatively mature video notoriously immature expensive nonetheless optimistic predicting AI creating desired films within two years personally limited exposure overly optimistic perhaps backfire ;D

Primarily engaged voice noticeable momentum third fourth quarters especially following OpenAI OMNI release emerging direction confidently anticipating explosive growth year differing interactions freeing hands altering established modes recall earlier interaction discussions transformative changes triggering reactions generating products applications personally favoring direction!

Undoubtedly unstoppable multimodal trend!

Embodied Intelligence

LLM reactivated robotics brains valuable assets biggest difference embodied intelligence physical involvement lives slight imagination staggering contrary pessimistic household robots mechanical arms fatal significant safety risks conversely retail workplace applications promising five-year progress personal belief shorter timeframe possibly two-three years prior independent research achieving considerable success now possessing brains ready takeoff perhaps twenty twenty-six robot embodied intelligence inaugural year awaiting multimodal flight natural interactions improved slightly longer duration allowing natural interactions improved slightly longer duration allowing natural interactions improved slightly longer duration allowing natural interactions improved slightly longer duration allowing natural interactions improved slightly longer duration allowing natural interactions improved slightly longer duration allowing natural interactions improved slightly longer duration allowing natural interactions improved slightly longer duration allowing natural interactions improved slightly longer duration allowing natural interactions improved slightly longer duration allowing naturally occurring naturally occurring naturally occurring naturally occurring naturally occurring naturally occurring naturally occurring naturally occurring naturally occurring naturally occurring naturally occurring naturally occurring naturally occurs

Other Interesting Points

Lastly recording interesting viewpoints:

AI automates specific tasks whereas jobs comprise multiple tasks majority containing unautomatable elements infinite patience domains pricing consulting suitable automation excellent scientists share common trait perseverance viewing problems conquerable goals enduring years resolve concluding easter egg follows sudden AGI moment fractal increasing automation ordinary future reasoning capability final human-level intelligence challenge remaining scalability challenges system hardware optimization data fundamentally expansion questions transitioning scarce resource world ubiquitous free proactive ability initiate actions determine correct questions worthwhile projects becoming scarce production factors difficult solving regardless advanced tension exists specific guidance open creation filling creative gaps determining final works appearance wherever human involvement possible consider replacing repetitive painful single-task assignments repeated hundreds times easter egg follows

Easter Egg 1:

Bob mentions excellent scientists' commonality:

I'll always remember Aditya Ramesh inventor DALL-E story proving neural networks creativity beyond simple memory recombination attempting generate nonexistent training set image pink panda skating ice persistently working eighteen months two years recalling year later Ilya presenting latest generation result indistinct image pointing saying look pink top white bottom pixels forming although barely visible Aditya persevered immediately imagining domestic scenario likely packing bags haha

Easter Egg 2:

OpenAI made difficult crucial decision closing exploratory projects robotics gaming teams focusing language models generative modeling including integral multimodal work admirable trade-off!