AI agent ROI statistics range widely depending on workflow type and deployment scale. McKinsey's 2023 analysis of 63 AI use cases found customer operations delivers 20–45% productivity gains when AI is deployed in production. IBM IBV's 2024 enterprise research found organizations scaling AI across multiple functions achieve 3.5× higher revenue growth than early-stage deployers. Workflow fit and pre-defined success criteria explain most of the variance between the top and bottom of that range.

20–45%. That is McKinsey's estimated productivity improvement for customer operations when AI is deployed in production — not piloted, not tested in isolation, but running against live workflows.[¹] It is the highest function-level figure in McKinsey's 2023 analysis of 63 AI use cases. It is also the most cited, most contested, and most misread number in the AI ROI conversation.

The figure describes what AI agents produce when deployed against well-fitted workflows with defined success criteria. It does not describe average outcomes across all deployments. Understanding the difference between the range and the average determines whether AI agent ROI statistics are useful planning inputs — or just headline numbers that look good in a business case.

What the major AI agent ROI studies actually measure

Five research organizations publish AI ROI data with enough methodology detail to be worth citing in a business case. Each measures a different layer of the same deployment pattern.

McKinsey Global Institute, "The economic potential of generative AI" (2023) analyzed 63 AI use cases across industries and modeled the productivity and cost impact of deploying AI across specific business functions. The analysis maps which activities within each function are candidates for AI automation, estimates the proportion of time spent on those activities, and projects the impact assuming deployment succeeds against the right task types.[¹] McKinsey's function-level figures represent modeled potential in well-suited deployments — not averages across all implementations.

IBM Institute for Business Value (2024) surveys enterprise organizations on AI outcomes and segments them by deployment maturity: beginners (proof-of-concept), developers (limited production), scalers (AI running across multiple functions), and transformers (AI integrated into core business processes).[²] IBM IBV's revenue growth differential of 3.5× compares scalers to beginners — it measures the compounding effect of deploying AI across multiple workflows, not the return from a single implementation.

Deloitte "State of Generative AI in the Enterprise" (Q4 2024) surveys organizations on ROI realization. The Q4 2024 edition found that 79% of organizations that had fully scaled at least one GenAI implementation met or exceeded their ROI expectations.[³] Organizations that only ran pilots reported meaningful ROI at substantially lower rates — consistent with the IBM maturity data.

PwC AI Agent Survey (2025) covers 308 US business executives with active AI agent deployments. Among active deployers, 57% report cost savings, 66% report productivity gains, and 55% report faster decision-making.[⁴] These figures represent active deployers — not the full population of organizations that attempted deployment.

Dell'Acqua et al., Harvard Business School/Wharton (2023) ran a controlled experiment with 758 BCG consultants.[⁵] The most methodologically rigorous benchmark for professional services AI — a randomized study, not a survey. Result: 25.1% faster task completion with 40% higher quality output. This is a task-level figure from a controlled environment, not a full-workflow ROI number.

McKinsey's function-level productivity figures describe outcomes when AI is deployed against well-fitted workflows with defined success criteria — not averages across all implementations. IBM IBV's revenue growth differential describes organizations that scaled past a single deployment, not those running one pilot.

SourceStudyWhat it measuresYear
McKinsey Global InstituteEconomic potential of generative AIModeled productivity impact by function and use case2023
IBM Institute for Business ValueCEO's Guide to Generative AIROI by deployment maturity stage2024
DeloitteState of Generative AI in the EnterpriseROI realization rates, enterprise surveyQ4 2024
PwCAI Agent SurveySelf-reported outcomes, 308 US executives2025
Dell'Acqua et al. (HBS/Wharton)Field experimentTask-level productivity in knowledge work2023

AI agent ROI by business function

McKinsey's function-level analysis remains the most granular breakdown of AI ROI by business area in published research. The analysis covers six major functions and provides ranges that reflect deployment quality — well-fitted, production deployments at the upper end, looser deployments at the lower.

Customer operations: 20–45% productivity improvement. Customer operations is the highest-ROI function in McKinsey's analysis because it has the most concentrated version of the characteristics that produce high AI returns: high task volume, structured inputs, measurable output, repeated pattern.[¹] For service businesses where client communication is a significant proportion of senior staff time, this function maps directly to the highest-volume workflows — follow-up, inquiry response, document collection, status updates.

Document and content processing: 25–50% time reduction. Document processing — intake, extraction, generation, review — has the same structural characteristics. Inputs are structured (forms, records, contracts). Outputs are defined (a summarized document, a drafted response, a populated CRM record). Agents perform these tasks at comparable accuracy to humans for the high-frequency, low-judgment subset — which is where most of the volume sits.

Software engineering: 25–50% coding task time reduction. GitHub Copilot research found developers complete coding tasks 55% faster with AI assistance — sitting at the top of McKinsey's modeled range.[⁶] Software engineering was among the first professionally benchmarked domains for AI assistance, which is why the figures are precise. The task structure is well-defined: fixed inputs, defined outputs, clear correctness criteria.

Marketing and sales: 5–15% revenue uplift. Marketing and sales AI delivers smaller productivity percentages but is measured differently — revenue impact rather than task completion speed. A 5–15% revenue uplift from faster lead response, more consistent follow-up, and better personalization is a different return category than productivity gains in operations. For a business generating $1M per year, a 10% revenue uplift is $100,000.

R&D and product development: 10–15% productivity improvement. Drug discovery, materials research, and engineering are the primary use cases. The percentage is lower than customer operations because AI assists a narrower slice of a more judgment-intensive workflow — and because the measurable output (a compound, a material spec) is harder to attribute to a single tool.

Supply chain management: 3–5% inventory cost reduction. Supply chain AI impact is measured in inventory cost terms. Lower percentage, different metric — reflecting the primary cost category in that function.

Horizontal bar chart showing AI agent productivity impact ranges across six business functions. Customer operations 20–45% highlighted in orange. Document processing 25–50% highlighted in orange. Software engineering 25–50%, R&D 10–15%, marketing and sales 5–15% revenue uplift, and supply chain 3–5% cost reduction, all in muted tones. Source: McKinsey Global Institute, 2023.
McKinsey's function-level analysis. Customer operations and document processing produce the highest returns for service businesses — both share high volume, structured inputs, and a measurable output.

What IBM IBV found about deployment scale

IBM IBV's 2024 research identified a threshold effect that single-workflow benchmarks miss. Organizations classified as "scalers" — deploying AI across three or more integrated business functions — showed 3.5× higher revenue growth and 5.4× higher profitability improvement compared to organizations at the "beginner" stage.[²]

The compounding effect has a structural explanation. An agent handling client follow-up uses the CRM and email system. An agent handling document collection uses the same CRM. An agent handling status updates uses the same data. Each agent deployed against the same integration infrastructure inherits the connections built for previous agents. The marginal cost of the third agent is lower than the marginal cost of the first — while the combined output grows disproportionately.

IBM IBV's maturity stages, with the revenue growth profile at each:

StageDefinitionRevenue growth vs. beginner
BeginnerProof-of-concept, limited testingBaseline
Developer1–2 production deployments, narrow scope1.5–2× baseline
ScalerAI running across 3+ integrated functions3.5× baseline
TransformerAI embedded in core business modelDistinct category

For service businesses, the practical equivalent of IBM's "scaler" stage is three workflow agents running against a shared integration layer: client communication, document coordination, and reporting or lead management. At that scope, the combined time recovery and efficiency gain is large enough to appear in revenue figures — not because the agents generate revenue directly, but because they redirect senior professional time from coordination overhead to billable work.

For a decision framework on how to sequence multiple deployments and build toward that scope, see running multiple AI agents.

Organizations scaling AI across three functions see 3.5× higher revenue growth than those running a single pilot.

Deloitte's enterprise survey: who realizes ROI and why

Deloitte's Q4 2024 "State of Generative AI in the Enterprise" survey identified a consistent pattern in which organizations realize ROI and which do not.[³]

Organizations that had fully deployed at least one GenAI system — past pilot, running in production against live data — reported meeting or exceeding ROI expectations at a rate of 79%. Organizations still at the pilot or exploration stage reported meaningful ROI at substantially lower rates.

The finding aligns with IBM IBV's maturity data and explains the wide variance in self-reported AI ROI across the industry. The variance is not primarily explained by which tool was used, which vendor was engaged, or how large the organization was. Deployment stage explains the gap. Organizations that committed to production — real data, real integrations, active workflows — realized ROI at high rates. Organizations that ran controlled tests did not.

Deloitte's secondary finding: organizations that defined measurable ROI criteria before implementation began were significantly more likely to report positive results than those that defined criteria after. This is consistent with a measurement problem, not a performance problem. An agent performing correctly against undefined success criteria looks identical to one that is underperforming — both generate activity, neither generates a verifiable result.

Why AI agent ROI spans a factor of 10

AI agent ROI figures from research range from near-zero (pilots that never reached production) to 5–10× returns (multi-workflow deployments with pre-defined success criteria). Two variables explain most of the spread.

Workflow fit. The McKinsey function-level data describes a specific structural profile for high-ROI deployments: high task volume, structured and repeatable inputs, a measurable output, and low requirement for contextual judgment on each individual task. Customer operations and document processing fit this profile closely. Advisory work, strategic analysis, and client relationship management fit it poorly. Deploying an agent against a poorly-fitted workflow produces real but substantially smaller gains — typically in the 5–15% range that represents the lower end of McKinsey's customer operations figure.

For the framework on identifying which workflows in your business are structurally suited for agent deployment, see which workflows to automate first.

Pre-defined success criteria. Deloitte's finding — that organizations with pre-defined ROI criteria are significantly more likely to report positive results — is a measurement effect and a management effect combined. Measurement: an agent running correctly against undefined success criteria cannot be distinguished from one that is degrading. Management: organizations that define success criteria before deployment make different build decisions — they scope tighter, instrument more, and review outcomes earlier. The agent built when success criteria are defined from day one is a better-scoped agent than the one built without them.

For a framework on defining measurable success criteria before go-live, see how to know if your AI agent is working.

Three stat cards side by side. Left card: 79% — Deloitte Q4 2024, organizations that fully deployed GenAI met or exceeded ROI expectations. Center card: 3.5× — IBM IBV 2024, revenue growth differential for organizations scaling AI across multiple functions. Right card: 20–45% — McKinsey 2023, productivity improvement in customer operations with AI deployed in production.
Three research organizations, three different measurements, one consistent pattern: production deployment in the right workflow category produces measurable, repeatable returns.

How to use these statistics in a business case

The five research sources provide different inputs to a business case depending on what the case needs to establish.

For projecting impact on a specific workflow: Use McKinsey's function-level figures. If the target workflow is customer operations (follow-up, inquiry response, status updates), the applicable productivity improvement range is 20–45%. If document processing, use 25–50% time reduction. If marketing or sales (lead response, nurture sequences), apply the 5–15% revenue uplift to current inbound revenue.

For building a multi-workflow business case: Use IBM IBV's scale data. The 3.5× revenue growth differential describes the compounding effect of deploying across three or more integrated workflows. The figure is relevant for organizations planning a sequence of two or more agent implementations over 12–18 months.

For assessing whether production deployment is worth the move from piloting: Use the Deloitte finding. The data shows 79% of organizations that committed to production met or exceeded ROI expectations. The risk of production deployment is lower than the continued cost of remaining in pilot mode — where meaningful ROI realization rates are substantially lower.

For defining success metrics: Use PwC's deployment outcomes as the benchmark. Among active deployers, 57% report cost savings and 66% report productivity gains. An implementation plan without a mechanism to measure whether it lands in either category needs one before the build begins.

For the full ROI calculation methodology — adding speed returns, accuracy returns, and time-saved calculations into a single year-1 ROI figure — see how to measure AI agent ROI for a service business.

Frequently asked questions

What ROI do AI agents deliver on average? AI agents that reach production deliver an average 171% ROI, with U.S. service businesses averaging 192%, according to 2026 research. This average masks a wide distribution — organizations deploying against well-fitted workflows with pre-defined success criteria sit significantly above the average. The McKinsey and IBM IBV data both confirm that production deployment in the right workflow category produces returns well above that mean.

Which business functions produce the highest AI agent ROI? McKinsey's 2023 analysis of 63 AI use cases found customer operations delivers the highest consistent return — 20–45% productivity improvement with AI in production. Document and content processing delivers 25–50% time reduction. Both share high volume, structured inputs, measurable output, and low per-task judgment requirement.

How does AI agent ROI differ between customer operations and marketing? Customer operations delivers 20–45% productivity gains — the highest of any major function by percentage in McKinsey's analysis. Marketing and sales delivers 5–15% revenue uplift, measured differently as revenue impact. For service businesses, customer operations (client communication, follow-up, document collection) is typically the higher-return function because it has a higher proportion of structured, repeatable tasks.

How does deployment scale affect AI agent ROI? IBM IBV's 2024 research found organizations deploying AI across three or more integrated functions achieve 3.5× higher revenue growth than organizations at the early deployment stage. The compounding effect comes from shared integration infrastructure: each additional workflow agent shares the system connections built for previous agents, reducing marginal build cost while multiplying combined output.

Notes

  1. McKinsey & Company, "The economic potential of generative AI: The next productivity frontier," McKinsey Global Institute, June 2023.
  2. IBM Institute for Business Value, "CEO's guide to generative AI: Scale or stall," IBM IBV, 2024.
  3. Deloitte, "State of Generative AI in the Enterprise Q4 2024," Deloitte Insights, 2024.
  4. PwC, "AI Agent Survey," PwC US, 2025.
  5. Fabrizio Dell'Acqua et al., "Navigating the Jagged Technological Frontier," Harvard Business School Working Paper, 2023.
  6. GitHub, "Research: Quantifying GitHub Copilot's Impact on Developer Productivity and Happiness," GitHub Research, September 2022.
  7. Master of Code, "AI ROI: Why Only 5% of Enterprises See Real Returns in 2026," Master of Code Research, 2026.