Posts tagged "Agents"

🔗 Agents

I came across Chip Huyen’s blog via twitter today. Her blog is great and has a lot of content I’m going to read. I’ll start with this post on Agents.

Chip defines Agents as:

An agent is anything that can perceive its environment and act upon that environment. Artificial Intelligence: A Modern Approach (1995) defines an agent as anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.⁠

She then explains that the interaction between an agent and the environment is the key to the agent’s success. If you build a web scraping agent, the internet is the environment. If you build an agent that can play Minecraft, the Minecraft world is the environment. And so on.

Planning is the heart of an agent, and in the blog, Chip provides a lot of detail on it. Planning is decoupled from execution. The plan requires that the model understands the user’s intent and requires that the model breaks the task down into smaller subtasks. You can do a lot to help improve the likelihood of success - asking for human feedback, writing great system prompts, giving better descriptions of the tools available to the agent, use a stronger model, or even fine tune a model.

To me it seems that with the current state of LLMs, the most effective manner to improve the planning aspect of an agent is to ask for human feedback. There may be veritcals, like coding or math, where the plan is easier for an LLM to generate, but for the long tasks, I think asking for human feedback is the best way to improve the agent. I wonder if we’ll ever see the reasoning models (o1, o3, etc) allow users to adjust hte plan on the fly. I know that Google’s Gemini 1.5 Deep Research has a “refine” feature that allows you to adjust the plan already.

The post is great, and the blog looks promising. I’ll be back to Chip’s writing!

🔗 Building effective agents

Agents are an overloaded term in AI. I found this article to be particularly helpful in understanding the current definition of agenticy systems, agents, and workflows. Plus it included a survey of some popular workflow architectures. To me, it all starts looking a lot like the microservices architectures as those were becoming more and more popular by Amazon over the past decade.

One principal to follow when building (software in general) agentic systems…

When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all. Agentic systems often trade latency and cost for better task performance, and you should consider when this tradeoff makes sense.

Their definitions of agents…

  • Agentic systems - parent term for an entire app or system
  • Workflows - systems where multiple LLMs are orchestrated together with code
  • Agents - LLMs dynamically direct their own processes and tool usage

Their walkthrough of common workflow patterns was interesting as well. They walk through prompt chaining, routing parallelization, orchestrator workers, and evaluator optimizer. I think as reasoning models like o1 become more powerful, we’ll start to see these workflows become less important. Let’s look at Evaluator-optimizer:

Evaluator Optimizer

This is just one LLM providing content or a response and another LLM call provides feedback or an evaluation. It’s useful if we have evaluation criteria.

As reasoning models increase, more applications will move from worklfows to agents. Models like o1 that are doing multi step reasoning, one of the first types of paths I expect them to take, would be the iterative loop above. Let’s try it with Google Gemini’s new 2.0 Flash Thinking Experimental:

Prompt 💭

Generate a day-by-day training plan for a reactive dog exhibiting specific behaviors: lunging, barking at strangers. The plan should incorporate positive reinforcement techniques and gradually desensitize the dog to triggers over 4 weeks. Each day’s plan must include:
•	A clearly defined goal for that day.
•	Detailed instructions for the owner, including step-by-step exercises.
•	A checklist to measure the dog’s progress.

Evaluation Criteria:

1. Clarity: Are the daily goals and instructions written in a way that is easy to understand for a non-expert dog owner?
2. Feasibility: Are the exercises practical for an average dog owner to implement with minimal equipment or prior training experience?
3. Progressiveness: Does the plan show a clear progression from simple to complex tasks aligned with the dog’s desensitization process?
4. Effectiveness: Based on canine training principles, is the plan likely to improve the dog’s behavior when executed consistently?
5. Empathy: Does the tone and approach demonstrate understanding of the challenges faced by both the dog and the owner?

Thinking 🧠

This is a stripped down version of Gemini’s thought process.

1. Deconstruct the Request
2. Establish Foundational Principles: Positive Reinforcement, Classical Conditioning , Operant Conditioning, Gradual Exposure, Consistency is Key, Manage the Environment, Listen to Your Dog, Patience and Understanding
3. Structure the 4 Weeks: Week 1: Foundation Building, Week 2: Controlled Exposure (Distance), Week 3: Controlled Exposure (Closer), Week 4: Generalization and Maintenance
4. Design Daily Goals
5. Develop Detailed Instructions
6. Create Progress Checklists
7. Incorporate Evaluation Criteria Throughout
8. Refine and Review
9. Add Important Considerations
10. Self-Correction Example during the Process
    - Initial Thought, Correction, Revised Approach

Check it out! This thinking process is actually a perfect example of how our brains bounce between coming up with ideas and checking if they’re any good. Like, there’s this awesome moment where they think “hey, maybe let’s throw the dog right into meeting strangers” but then the evaluation part of their brain kicks in with “whoa, hold up, that’s way too intense for a scared pup.” The whole thing is basically a dance between brainstorming cool training ideas and then reality-checking them against what actually works for reactive dogs and their humans - it’s like having an excited idea machine and a practical bouncer working together in your head to build something that actually makes sense.

The parallel between AI workflow architectures and the evolution of microservices hints at a deeper pattern in software engineering - we repeatedly solve the problem of complexity by breaking it down into smaller, specialized components that communicate in structured ways. Additionally, as models like OpenAI’s o1 (or even o3) become more sophisticated at multi-step reasoning, we may see a shift back toward unified systems. This raises an interesting question: Is the current trend toward complex workflow architectures a permanent evolution in AI system design, or just a temporary solution until single models become capable enough to handle these tasks autonomously? The answer could fundamentally reshape how we build AI systems in the coming years.