Agentic AI represents the next evolutionary step beyond generative AI, enabling autonomous systems with enhanced reasoning and interaction capabilities to tackle complex tasks. However, this progress introduces significant challenges in communication reliability, goal management, and system design. Recent research reveals critical limitations and proposes innovative solutions like ADAS and Google MASS to optimize agent performance.
Emergent Behavior Risks
As agentic systems scale, they exhibit unprogrammed behaviors arising from agent interactions. While beneficial for novel solutions, these behaviors risk oscillations, inefficiencies, or harmful outputs. Robust guardrails, including role-based access controls, resource limits, and validation rules, are essential to maintain alignment with organizational goals. Without these, systems may deviate unpredictably during execution.
Multi-Turn Conversation Failures
Studies from Microsoft and Salesforce demonstrate a 39% average performance drop when AI assistants handle multi-turn conversations. Key issues include:
- Premature conclusions:
Agents fixate on early assumptions without course-correcting.
- Information neglect:
Critical mid-conversation details are overlooked.
- Unreliability spike:
Performance inconsistency increases by 112% compared to single-turn tasks. Even state-of-the-art models (GPT-4o, Claude 3.7, Gemini 2.5) show 30-40% degradation in extended dialogues. Technical tweaks like temperature reduction fail to resolve these issues, only upfront information delivery mitigates errors.
Goal Dilution in Subgoal Breakdown
When decomposing objectives into subgoals, agents frequently lose coherence. Salesforce identifies this as "jagged intelligence", inconsistent execution where agents excel in isolated tasks but fail in integrated workflows. For example:
- Subgoals may conflict without centralized oversight.
- Agents overlook interdependencies between subtasks.
- Partial solutions compound errors across workflow stages.
Salesforce Research: Building Trustworthy Agents
Salesforce addresses these challenges through three pillars:
- Foundational Research
- SIMPLE Benchmark: 225 reasoning questions quantifying LLM jaggedness.
- SFR-Embedding Models: State-of-the-art text/code embeddings improving RAG accuracy.
- Guardrails and Testing
- CRMArena: Simulates real CRM scenarios to evaluate agent reliability.
- SFR-Guard Models: Enforce policy compliance and toxicity detection.
- Workflow Integration
Agents iteratively refined via customer feedback loops to ensure enterprise-grade consistency.
Automated Design Solutions
ADAS: Self-Improving Agents
Automated Design of Agentic Systems (ADAS) enables meta-agents to autonomously design, test, and refine specialized agents. Its iterative process:
- Generates candidate agents for a task.
- Simulates human-like feedback on correctness/efficiency.
- Refines code through debugging and optimization.
ADAS-discovered agents outperform manual designs and transfer seamlessly across models (e.g., GPT-3.5 to Claude). Crucially, they identify novel patterns like chained reasoning steps for complex problem-solving.
Google MASS: Optimizing Multi-Agent Systems
Multi-Agent System Search (MASS) is a three-stage framework optimizing prompts and topologies:
Stage
Function
Impact
Block-Level Prompt
Tunes individual agent instructions
Boosts local task aptitude
Topology Search
Identifies optimal agent connections
Reduces redundant interactions
Workflow-Level Tune
Refines prompts for global coordination
Enhances cross-agent collaboration
MASS outperforms prior frameworks (AFlow, ADAS) by 6-8% in reasoning, QA, and coding tasks. It reduces computational costs by focusing on high-impact design spaces and enables real-time adjustments for dynamic environments.
Conclusion
Agentic AI’s potential is tempered by communication fragility and goal-management flaws. Salesforce’s research provides critical tools for benchmarking and securing agents, while ADAS and MASS represent paradigm shifts in automated design. These innovations move us toward systems where agents collaboratively adapt to complexity, without sacrificing reliability. However, native multi-turn reliability remains essential for enterprise-scale adoption.
Relevant Sources and related Posts
- Generative to Agentic AI: Survey, Conceptualization, and Challenges
- Beyond Rules: Agentic AI Orchestration and the Dawn of Emergent Intelligence
- AI chatbots become dramatically less reliable in longer conversations, new study finds
- New Othello experiment supports the world model hypothesis for large language models
- How LLMs can automatically design agentic systems
- Google AI Introduces Multi-Agent System Search MASS: A New AI Agent Optimization Framework for Better Prompts and Topologies
- Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies
- Video: Google’s Multi-Agent System Search
Read also my previous related posts
Play Podcast
https://soundcloud.com/digital-age-switzerland/agentic-ai-advancements-challenges-in-communication-optimizatio?si=96e22a04caa7453a88393b0a3689f44b&utm\source=clipboard&utm\medium=text&utm\campaign=social\sharing



