Is your AI agent smarter than your cat? How to avoid AI BS

Ryan Rapp Ryan Rapp
August 22, 2025
6 min read
Is your AI agent smarter than your cat? How to avoid AI BS

CatGPT - Creating intelligent agents with today's AI models

Blog image

Bon-chan, the CatGPT co-author of this blog article. (Our 12-week-old kitten.)

Yann LeCun, a renowned AI researcher, once challenged that AI is not smarter than the typical house cat, writing:

"It will take years for [AI] to get as smart as cats, and more years to get as smart as humans, let alone smarter (don't confuse the superhuman knowledge accumulation and retrieval abilities of current LLMs with actual intelligence). It will take years for them to be deployed and fine-tuned for efficiency and safety as they are made smarter and smarter." Dr. Yann LeCun, May 2024

The idea that AI may not be as intelligent as we hoped resonates with business leaders who haven't seen the desired outcomes. Despite $200 billion in global AI investment by 2025, over 80% of businesses report no meaningful bottom-line impact. While it's easy to blame AI models' erroneous outputs, the underlying issue is matching the right AI and data to the right problems.

Back to our cat analogy: Cat-level intelligence may not sound impressive, but cats excel in their specialized domains, whether catching mice or, in my cat's case, perfectly timing a pounce onto my laptop during important video calls. Like cats, AI systems thrive with the right training and environment. The challenge lies in knowing when you're asking your digital cat to implausibly pick winning stocks versus catch profitable digital mice.

Blog image

Would you trust your CatGPT to pick winning stocks?

Knowing when your CatGPT is out of its wits is harder to detect than it might seem. AI's ability to generate plausible outputs from poor inputs creates a dangerous illusion: what once would have triggered immediate attention as clearly missing or erroneous now gets "beautified," hiding critical quality issues. This makes it hard to identify when systems need repair, and even harder for customers to trust your AI-enabled product.

This framework helps you evaluate when your CatGPT is ready to hunt, when it needs more training, and when you should call in a human instead.

Should I Let My CatGPT Handle This Use Case?

Use-Case Evaluation Framework

First, consider the problem your CatGPT is tackling:

Code:

Factor High AI Suitability Low AI Suitability Questions to Ask
Error Tolerance Low-stakes outcomes (brainstorming, first drafts) High-stakes decisions (medical diagnosis, financial advice) What happens if the AI gets this wrong? Can I afford false positives/negatives?
Manual Effort High effort without AI (writing unit tests, data organization) Simple tasks (basic calculations, straightforward processes) Would this take significant human time and resources to complete manually?
Human Safety Psychologically taxing work (content moderation, spam filtering) Direct human safety risks Does this task expose humans to harmful content or dangerous conditions?

AI Quality Assessment Framework

Second, consider the model's capabilities (e.g. training data), the context, and other AI quality factors:

Quality Indicator Green Light Yellow Light Red Light Key Questions
Benchmarking 90%+ accuracy on similar tasks 70-90% accuracy <70% accuracy Have you tested AI performance on comparable problems?
Data Relevance Direct, recent, relevant context Some extrapolation needed Significant gaps/speculation required How much must the AI guess beyond the provided data?
Data Quality Clean, accurate, validated Some errors/inconsistencies Major inaccuracies/conflicts Does your data contain errors that AI will perpetuate?
Training Recency Recent training data 6-12 months old >1 year old for dynamic fields Is the AI's knowledge current for your domain?
Bias Risk Objective, quantifiable tasks Some subjective elements Highly subjective/biased domain Could biased training data skew results for your use case?
Data Encoding Preserves crucial information Minor information loss Loses critical details Does the AI's processing method preserve what matters most?
Sample Size Large, representative dataset Adequate but limited Insufficient training examples Does the AI have enough examples to make reliable predictions?

Common AI Biases to Consider

Third, be careful of AI biases that can lead to highly refined but ultimately incorrect output:

Bias Type Description Example Mitigation Strategy
Semantic Aesthetics Over-represents things that "look random" Random number generation favors "random-seeming" digits Use external validation/testing
Positivity Bias Overly optimistic without contrary evidence Rating content highly without performance data Provide balanced examples and benchmarks
Safety Filtering Built-in restrictions may limit functionality Conservative responses in sensitive domains Consider specialized models for specific use cases
Training Data Bias Reflects biases present in source material Underrepresentation of certain groups/perspectives Audit training data sources and supplement as needed

Training Your CatGPT: Preparing Data for Success

Once you've selected an appropriate use case, data set, and model, the next step is to carefully implement your AI by structuring your data.

Context is Critical

The Foundation: Sufficient instructions, examples, and sample size can make the difference between AI output that is incorrect/harmful versus accurate/beneficial.

Context Element Why It Matters Best Practice
Clear Instructions AI interprets ambiguous prompts unpredictably Provide specific, detailed instructions with desired format and constraints
Representative Examples AI learns patterns from your examples Include diverse, high-quality examples that cover edge cases
Adequate Sample Size Small samples lead to unreliable patterns Ensure sufficient examples for AI to identify robust patterns

Structure Your Inputs

The Pipeline: AI that creates final insights or conclusions benefits immensely from a carefully crafted pipeline of signal extraction and search capability.

Key Principles:

  • Layer your processing: Don't ask AI to do everything at once.
  • Validate intermediate steps: Check that signal extraction is working correctly
  • Maintain data lineage: Track how inputs transform into outputs
  • Build in quality checks: Validate data quality at each pipeline stage

Scope Management for Custom AI Solutions

The Risk: Creating an AI that produces seemingly sharp output encourages applying it everywhere, but there's an ongoing cost of ensuring accuracy for each use case.

Spreading too thin creates "data quality debt" - the accumulated cost of ensuring accuracy across multiple use cases that can quickly exceed your project budget.

Vendor Selection Guidelines

When working with AI product vendors, transparency is your best indicator of a reliable partner.

Remember: Vendors promising everything are often selling the dream more than a usable product.

Implementation Guidelines

Before Deploying AI:

Define success metrics - What specific outcomes indicate the AI is working?

Establish error boundaries - What level of mistakes can you tolerate?

Create feedback loops - How will you monitor and improve performance?

Plan for failure - What's your backup plan when AI falls short?

Red Flags to Avoid:

  • Asking AI for insights without supporting data
  • Using AI for decisions beyond your risk tolerance
  • Deploying AI without testing on similar use cases
  • Ignoring data quality issues before implementation
  • Expecting AI to solve problems that require domain expertise

Conclusion

AI is a powerful tool, but like training a cat, its value depends on thoughtful application and realistic expectations. The organizations seeing meaningful returns from their AI investments share common traits: they carefully evaluate use cases, maintain high data quality standards, and deploy AI where its success is measurable, repeatable, and scalable.

Ready to boost your productivity?

Try SecondBrain and experience the power of AI-driven productivity tools.