Is your AI agent smarter than your cat? How to avoid AI BS

CatGPT - Creating intelligent agents with today's AI models

Bon-chan, the CatGPT co-author of this blog article. (Our 12-week-old kitten.)

Yann LeCun, a renowned AI researcher, once challenged that AI is not smarter than the typical house cat, writing:

"It will take years for [AI] to get as smart as cats, and more years to get as smart as humans, let alone smarter (don't confuse the superhuman knowledge accumulation and retrieval abilities of current LLMs with actual intelligence). It will take years for them to be deployed and fine-tuned for efficiency and safety as they are made smarter and smarter." Dr. Yann LeCun, May 2024

The idea that AI may not be as intelligent as we hoped resonates with business leaders who haven't seen the desired outcomes. Despite $200 billion in global AI investment by 2025, over 80% of businesses report no meaningful bottom-line impact. While it's easy to blame AI models' erroneous outputs, the underlying issue is matching the right AI and data to the right problems.

Back to our cat analogy: Cat-level intelligence may not sound impressive, but cats excel in their specialized domains, whether catching mice or, in my cat's case, perfectly timing a pounce onto my laptop during important video calls. Like cats, AI systems thrive with the right training and environment. The challenge lies in knowing when you're asking your digital cat to implausibly pick winning stocks versus catch profitable digital mice.

Would you trust your CatGPT to pick winning stocks?

Knowing when your CatGPT is out of its wits is harder to detect than it might seem. AI's ability to generate plausible outputs from poor inputs creates a dangerous illusion: what once would have triggered immediate attention as clearly missing or erroneous now gets "beautified," hiding critical quality issues. This makes it hard to identify when systems need repair, and even harder for customers to trust your AI-enabled product.

This framework helps you evaluate when your CatGPT is ready to hunt, when it needs more training, and when you should call in a human instead.

Should I Let My CatGPT Handle This Use Case?

Use-Case Evaluation Framework

First, consider the problem your CatGPT is tackling:

Code:

Factor	High AI Suitability	Low AI Suitability	Questions to Ask
Error Tolerance	Low-stakes outcomes (brainstorming, first drafts)	High-stakes decisions (medical diagnosis, financial advice)	What happens if the AI gets this wrong? Can I afford false positives/negatives?
Manual Effort	High effort without AI (writing unit tests, data organization)	Simple tasks (basic calculations, straightforward processes)	Would this take significant human time and resources to complete manually?
Human Safety	Psychologically taxing work (content moderation, spam filtering)	Direct human safety risks	Does this task expose humans to harmful content or dangerous conditions?

AI Quality Assessment Framework

Second, consider the model's capabilities (e.g. training data), the context, and other AI quality factors:

Quality Indicator	Green Light	Yellow Light	Red Light	Key Questions
Benchmarking	90%+ accuracy on similar tasks	70-90% accuracy	<70% accuracy	Have you tested AI performance on comparable problems?
Data Relevance	Direct, recent, relevant context	Some extrapolation needed	Significant gaps/speculation required	How much must the AI guess beyond the provided data?
Data Quality	Clean, accurate, validated	Some errors/inconsistencies	Major inaccuracies/conflicts	Does your data contain errors that AI will perpetuate?
Training Recency	Recent training data	6-12 months old	>1 year old for dynamic fields	Is the AI's knowledge current for your domain?
Bias Risk	Objective, quantifiable tasks	Some subjective elements	Highly subjective/biased domain	Could biased training data skew results for your use case?
Data Encoding	Preserves crucial information	Minor information loss	Loses critical details	Does the AI's processing method preserve what matters most?
Sample Size	Large, representative dataset	Adequate but limited	Insufficient training examples	Does the AI have enough examples to make reliable predictions?

Common AI Biases to Consider

Third, be careful of AI biases that can lead to highly refined but ultimately incorrect output:

Bias Type	Description	Example	Mitigation Strategy
Semantic Aesthetics	Over-represents things that "look random"	Random number generation favors "random-seeming" digits	Use external validation/testing
Positivity Bias	Overly optimistic without contrary evidence	Rating content highly without performance data	Provide balanced examples and benchmarks
Safety Filtering	Built-in restrictions may limit functionality	Conservative responses in sensitive domains	Consider specialized models for specific use cases
Training Data Bias	Reflects biases present in source material	Underrepresentation of certain groups/perspectives	Audit training data sources and supplement as needed

Training Your CatGPT: Preparing Data for Success

Once you've selected an appropriate use case, data set, and model, the next step is to carefully implement your AI by structuring your data.

Context is Critical

The Foundation: Sufficient instructions, examples, and sample size can make the difference between AI output that is incorrect/harmful versus accurate/beneficial.

Context Element	Why It Matters	Best Practice
Clear Instructions	AI interprets ambiguous prompts unpredictably	Provide specific, detailed instructions with desired format and constraints
Representative Examples	AI learns patterns from your examples	Include diverse, high-quality examples that cover edge cases
Adequate Sample Size	Small samples lead to unreliable patterns	Ensure sufficient examples for AI to identify robust patterns

Structure Your Inputs

The Pipeline: AI that creates final insights or conclusions benefits immensely from a carefully crafted pipeline of signal extraction and search capability.

Key Principles:

Layer your processing: Don't ask AI to do everything at once.
Validate intermediate steps: Check that signal extraction is working correctly
Maintain data lineage: Track how inputs transform into outputs
Build in quality checks: Validate data quality at each pipeline stage

Scope Management for Custom AI Solutions

The Risk: Creating an AI that produces seemingly sharp output encourages applying it everywhere, but there's an ongoing cost of ensuring accuracy for each use case.

Spreading too thin creates "data quality debt" - the accumulated cost of ensuring accuracy across multiple use cases that can quickly exceed your project budget.

Vendor Selection Guidelines

When working with AI product vendors, transparency is your best indicator of a reliable partner.

Remember: Vendors promising everything are often selling the dream more than a usable product.

Implementation Guidelines

Before Deploying AI:

Define success metrics - What specific outcomes indicate the AI is working?

Establish error boundaries - What level of mistakes can you tolerate?

Create feedback loops - How will you monitor and improve performance?

Plan for failure - What's your backup plan when AI falls short?

Red Flags to Avoid:

Asking AI for insights without supporting data
Using AI for decisions beyond your risk tolerance
Deploying AI without testing on similar use cases
Ignoring data quality issues before implementation
Expecting AI to solve problems that require domain expertise

Conclusion

AI is a powerful tool, but like training a cat, its value depends on thoughtful application and realistic expectations. The organizations seeing meaningful returns from their AI investments share common traits: they carefully evaluate use cases, maintain high data quality standards, and deploy AI where its success is measurable, repeatable, and scalable.