Understanding AI and how it’s evolving is now part of every CEO’s job.
You don’t need to build the models, but you do need to choose the right tools, ask the right questions, and avoid costly missteps. That means having a clear understanding of how AI systems work and where they're likely to break down.
This article draws from our white paper, Real Estate AI: A CEO’s Guide to What Matters Now. It tackles the most important AI questions facing real estate leaders and lays out a clear path to adoption. Download the full guide.
What Makes Up the AI Stack: A Breakdown
Think of AI systems as having four layers, each serving a specific purpose. Understanding this stack helps you better adopt AI in your workflows and make smarter decisions.

1. Applications Layer: What You Actually Interact With
This is the layer your team uses; tools for tasks like lease abstraction, underwriting, tenant engagement, and memo generation. The quality of this experience depends on how well the layers beneath are built. Strong applications are fast and intuitive. Weak ones are slow, confusing, and break down when exposed to complex workflows.
Examples include:
2. Frameworks & Tooling Layer: The Hidden Infrastructure
This layer equips developers with the infrastructure needed to turn raw models into reliable enterprise software. It covers things like prompt management, retrieval pipelines, logging, and permissions. You won’t interact with it directly, but it’s what makes AI products functional, secure, and scalable.
This layer determines whether an AI application can:
Handle multiple AI operations together reliably
Pull information from your systems without breaking
Control who sees what and log all interactions
Handle hundreds of users without slowing down
Give you visibility into what the AI is actually doing
Companies like LangChain, LlamaIndex, and Hugging Face build tools that power this layer. But as a CEO, what matters isn't the specific tools - it's whether your vendor has invested in making their infrastructure robust.
3. Models Layer: The "Brain" of the System
Models like GPT-4, Claude, and Gemini are trained on vast amounts of data to process language, identify patterns, and produce outputs. Models serve as the core intelligence behind AI systems.
But here's what most people miss: the model alone isn't the product. Think of it like a car engine. Even the most powerful engine won’t deliver a great drive if the rest of the car is badly built.
The Leading Models With Their Strengths
Different models excel in different areas. The best AI systems are designed to use the right model for each task or even combine multiple models in a single workflow.
GPT- 4 (OpenAI): Best-in-class for general-purpose performance across reasoning, coding, and language tasks. Fast and more affordable than earlier variants, but still relatively expensive compared to open-source alternatives.
Claude Sonnet 4.0 (Anthropic): High factual accuracy, improved reasoning, and strong safety features. Supports large context windows and excels in retrieval-augmented generation and structured outputs. Built for enterprise deployment with enhanced security controls.
Gemini (Google): Strong multi-modal capabilities (text, vision, and code) with deep integration into Google Workspace. Rapidly evolving and competitive in both consumer and enterprise settings.
Grok (xAI): Elon Musk-backed model designed for real-time applications, trained on Twitter data. Early-stage but evolving quickly within the X ecosystem.
DeepSeek: A fast, cost-efficient open-weight model suitable for custom tuning and on-premise use. However, its backing by Chinese entities may pose data security or compliance risks for some organizations.
Mistral and LLaMA: Top-performing open-source models offering flexibility and customization, but need significant engineering to scale safely.
Here’s a quick look at how the top AI models rank on the GPQA benchmark:

4. Compute Layer: The Hardware of the System
This is the hardware that makes everything possible. It includes advanced computer chips (GPUs) and cloud infrastructure that train and run large models. Major players include NVIDIA, AWS, and Google Cloud Platform.
Why This Matters for CEOs
GPU costs directly impact AI pricing. As compute get cheaper, AI tools become more accessible.
Better infrastructure means faster AI responses, which impacts user experience.
High demand for compute can cause delays or outages in AI services
Some advanced chips face export restrictions, affecting global AI development
How AI Models Are Evaluated
AI models vary widely in their strengths and limitations, depending on how they’re used. These are the main criteria vendors and buyers rely on to assess them:
Accuracy
This measures how often the model produces useful, correct outputs. But "accuracy" isn't binary - it depends heavily on the task and context.
How It's Measured
Task success rate across standardized benchmarks
Human evaluation of real-world outputs
Comparison against expert human performance
Domain-specific testing (e.g., lease abstraction accuracy)
Context Window
This determines how much information the model can consider at once.
Larger context windows (128K+ tokens) allow for more complex tasks but cost more to run. Small context (4K-8K tokens) are good for simple Q&A and basic summaries, while medium context (32K-64K tokens) can handle full lease documents or small reports in a single prompt.
How It's Measured
Maximum token count
Ability to retain key context in long tasks
Factuality / Grounding
This measures how well the model stays aligned with real data rather than making up plausible-sounding but incorrect information (hallucinations).
How It's Measured
Percentage of answers that match source material
Hallucination rate
Reasoning Ability
This evaluates how well the model handles multi-step logic, maintains consistency, and works through complex problems.
In real estate, this includes tasks like working through complex waterfall calculations, understanding the implications of zoning changes on development plans, and etc.
How It's Measured
Pass rate on logic-heavy tasks
Internal consistency of outputs
Speed and Cost
This includes the time and cost to run each task, which directly impacts whether the AI model is practical for your use case.
How It's Measured
Latency (in seconds)
Cost per 1,000 tokens or per API call
Total cost of ownership
Tool Use / Agentic Capabilities
This measures whether the model can complete multi-step tasks, call external tools, or take actions beyond just generating text.
How It's Measured
Task completion rate for agents
Success rate across steps
Customizability
This reflects how well the model can be tailored to your specific workflows, terminology, and data. However, it’s important to note that more customization usually means higher costs and longer implementation times. Over-customization can also make systems hard to maintain.
How It's Measured
Ability to fine-tune or integrate your content
Performance on custom tasks

Why Architecture Matters More Than the Model Itself
Here's the most important insight for CEOs: the model you choose today may not be the best one six months from now. This field is moving too fast. New models are released frequently, each with different strengths, costs, and capabilities.
That's why the smartest companies aren't betting on any one model - they're betting on model-agnostic architectures.
Keep these factors in mind when assessing a vendor or internal solution:
Avoid vendor lock-in that limits your ability to switch models or providers as better options emerge.
Choose modular systems that let you upgrade or replace models without reworking your entire stack.
Don’t tie your roadmap to a single provider’s product decisions, pricing, or availability.
Look for multi-provider compatibility so you can work with OpenAI, Anthropic, Google, or others interchangeably.
Ensure there’s a fallback plan in case your provider changes pricing or deprecates a key capability.
Key AI Terms Every CEO Should Know
AI terminology can be confusing. Here's a set of essential terms to help you cut through the noise, ask better questions, and choose the right vendors.

Model: The "brain" of the AI system that generates content or decisions. Think of it as the engine that powers everything else.
Prompt: The input or instruction you give the model (e.g., summarize this lease or identify key terms in this document). The quality of prompts significantly affects output quality.
Token: A chunk of text (about ¾ of a word) that AI models read and generate. Token limits define how much content a model can process at once. Most providers price their services based on token usage.
Context Window: The maximum amount of information a model can consider at once. Larger context windows enable more complex tasks but cost more to run.
Hallucination: When the model generates confident-sounding but incorrect information. This is a critical concern for business applications where accuracy matters.
Grounding: Connecting the model's responses to real, verifiable data sources. Well-grounded systems reduce hallucinations by anchoring responses to your actual documents and data.
RAG (Retrieval-Augmented Generation): A technique that allows AI to search through your specific documents and data to improve accuracy. Instead of relying on training data alone, RAG systems pull relevant information from your files in real-time.
Agent: An AI system that can take multiple steps to complete a task, like researching, analyzing, writing, and sending a report. Agents can use tools, make decisions, and complete complex workflows with minimal human oversight.
Fine-tuning: Training a model on your company's specific data, terminology, or style to improve performance on your particular use cases. This is more expensive but can significantly improve results.
How to Move Ahead With Confidence
AI fluency is now a leadership skill. It’s not about technical depth, it’s about steering your company with clarity in a fast-moving landscape and choosing AI partners that evolve with your needs.
Ultimately, the decision comes down to building in-house, buying the services of a third-party vendor, or co-building with Stackpoint.
Building internally is a long, expensive road. You need deep expertise in machine learning, systems design, data infrastructure, and organizational change. Even well-funded teams underestimate how hard this is to get right. In fast-moving industries, that delay often costs more than it’s worth.
Buying can work, only if the product can truly solve your needs. There’s a huge gap between legacy products with AI features latched on and platforms built around AI. The former may offer small efficiencies while the latter can fundamentally change how your team works.
However, if off-the-shelf tools don’t address your pain points, partner with Stackpoint to co-design a product you’ll love. We work with experienced operators to identify overlooked problems and build new, AI-native solutions around them. Let’s connect.