Hermes handles customer order queries, delivery follow-ups, and supplier communication for ecommerce operators on Shopify and Gmail. Inbox volume scales with order volume — Hermes breaks that relationship. Skills build from every completed task, improving draft quality on the operator's specific products and customer types the longer Hermes runs.

Peak season or not, the pattern is the same: inbox volume tracks order volume, and both land on the same operator. Orders ship late, customers email, suppliers need chasing — all of it lands in the same inbox used to run the rest of the business. Hermes handles the message layer: drafting responses to order queries, queuing supplier follow-ups, flagging unresolved complaints — each draft surfaced for review before anything sends.

Ecommerce operators handle 30+ customer messages daily

Gorgias's 2023 Ecommerce Customer Service Benchmark reports the average Shopify store handles over 1,000 customer support conversations per month — roughly 33 per day.[¹] For an operator responding personally, at 3–5 minutes per response, that is two to three hours of daily inbox work.

That number tracks order volume directly. A store growing from £300K to £600K annual revenue does not halve its message count — it doubles the delivery questions, tracking requests, return queries, and restocking chasers. The work scales with the orders.

At 33 messages per day, responding personally at 3–5 minutes each is 100–165 minutes of daily inbox work — before picking orders, managing suppliers, or handling the rest of the business. At 60 messages per day, the inbox alone consumes half the working day. The message volume is not the exception; it is the baseline of running a growing store.

Hiring a customer service assistant transfers the inbox load but adds coordination overhead: handoffs, context-sharing, approval loops before anything sends to a customer. The message volume problem becomes a management problem. Neither version scales cleanly past £1M annual revenue.

Before and after diagram: before shows three stacked message cards — delivery query, return request
The same messages. A different role in them — composing vs. reviewing.

How Hermes handles customer order queries and follow-up

Hermes connects to the Shopify store and the operator's Gmail account. For an incoming order query — a delivery question, a return request, a product issue — Hermes reads the relevant order data from Shopify, drafts a response matching the store's policies and tone, and surfaces the draft for review. The draft arrives pre-populated with the order details — the customer's name, order number, last status update, and relevant policy reference — so the operator can approve or adjust in a single read rather than looking up each detail separately.

The draft arrives in Slack or Gmail, depending on which channel the operator uses for approvals. The operator approves, edits, or dismisses. Nothing sends without a sign-off.

Hermes surfaces every customer response as a draft for approval before anything sends. Hermes does not auto-reply to customers. The operator reviews and approves each response — Hermes handles the assembly, not the decision.

For follow-up sequences — the customer who hasn't responded to a shipping update, the order that went out without a confirmation — Hermes queues the follow-up at the configured interval and surfaces it for review. The timing and template are set once; Hermes monitors and drafts against them.

Hub diagram showing a single Hermes instance connected to Shopify, Gmail, Slack, Supplier Email
One Hermes deployment connects order data, customer communication, and supplier follow-up — no separate setup per channel.

How Hermes handles supplier and fulfilment follow-up

Supplier communication is less frequent than customer queries but higher-stakes. A restock confirmation missing for 48 hours has inventory implications. An order past its estimated ship date needs chasing before the customer notices.

Hermes watches for supplier communication gaps: messages awaiting a reply past a defined window, restock confirmations due but not received, fulfilment statuses that haven't updated in the expected timeframe. When a gap appears, Hermes drafts the follow-up and queues it for review.

The draft surfaces in Slack or Gmail with the relevant order or stock context attached — the operator reviews and approves in one read. Hermes does not require the operator to track which supplier needed chasing or when.

The table below shows the message types a Hermes ecommerce deployment handles and the time comparison per response.

Message typeWhat Hermes doesBefore HermesAfter Hermes
Delivery queryReads Shopify order data; drafts response matching store policies5–10 min per response30 sec to review and approve
Return or refund requestReads order and return policy; drafts response per store rules5–10 min30–60 sec to approve
Order confirmation follow-upQueues at configured interval if no confirmation; surfaces for approvalOften missedAutomated queue; 30 sec review
Supplier restock chaseMonitors restock windows; drafts follow-up when overdueAd hoc, often forgottenSystematic; 1 min to review
Shipping delay notificationReads delay data from fulfilment; drafts proactive customer updateOften reactiveProactive; 30 sec approval
Product issue responseDrafts response based on product category and issue type10–15 min1–2 min to edit specifics
Order volume doubles. The time spent on customer messages doesn't have to.

How Skills build from ecommerce-specific patterns over time

Each completed task — an approved customer response, a sent supplier follow-up — adds a Skill object to Hermes's library. The Skill records the task type, the inputs, the approach used, and the outcome. On the next similar task, Hermes applies the Skill.

For an ecommerce operator, Skills build against the store's specific products, complaint patterns, and supplier relationships. In week one, Hermes handles return requests at a baseline level — structurally correct, needing edits for tone and product-specific detail. By month three, Skills built from 80–100 real completed responses reflect the store's actual return policy language, preferred tone for delivery complaints, and supplier communication style.

The draft quality in month three is not the same as week one. Hermes improves on what the operator actually sends — not on generic customer service templates.

Skills also encode negative examples. Every response the operator dismisses or significantly edits updates the Skill with what not to do — a response that was too formal for the store's tone, a resolution that exceeded the return policy, a supplier chase that used incorrect product terminology. Both approved responses and dismissed ones contribute to the Skill. The operator's corrections in week one are not wasted time; they are the most impactful input the Skill receives.

Performance curve showing Hermes task accuracy increasing from week two through month three, with
Skills accumulate from every completed customer and supplier task. Edge case handling improves through months two and three.

What a Hermes ecommerce deployment covers on day one

A Hermes ecommerce deployment starts with three connections: the Shopify store (for order data and fulfilment status), the operator's Gmail account (for customer and supplier communication), and Slack (for the approval workflow). These are configured before the first workflow goes live.

On day one, Hermes handles configured workflows at a baseline level. A delivery query draft is structurally correct — it needs editing for tone and the operator's specific return window. A supplier follow-up is in the right format — it needs review before sending. The output is usable from day one. Skills start building from every completed task.

By month two, Hermes handles the store's most common query types without editing. By month three, edge cases — partial orders, international shipping queries, product-specific complaint patterns — are handled more precisely. The operator's role shifts from composing to reviewing.

What the first month looks like: the operator reviews every draft in weeks one and two and approves, edits, or dismisses each one. The correction rate is higher — but corrections feed the Skill. By week three, common query types (delivery questions, standard return requests) start requiring minimal edits. By the end of month one, the operator's review time for the most common message types drops from 3 minutes per response to 30 seconds. Month two extends this to a wider set of query types. Month three covers most of the store's real query variation, built from real completed responses, not templates.

The practical output of a Hermes deployment for an ecommerce operator is not a different inbox — it is the same inbox handled in 30–40 minutes of review time per day instead of 2–3 hours of composition time. The operator does not need more hours; they need the hours spent on what drives revenue, not what responds to it.

For a full overview of Hermes and how it works, see what is Hermes. For running Hermes across multiple platforms from a single deployment, see the Hermes setup guide.

What to get right in context definition for an ecommerce deployment

Context definition is where the quality of Hermes's output is determined before the first real customer message arrives. For an ecommerce operator, context definition requires four specific inputs per workflow type.

Return and refund policy language. Hermes needs the exact wording the store uses for return windows, refund timelines, and exceptions. Not a summary — the actual policy language used in customer communication. A Skill built from policy language that matches the store's actual communications will produce responses that don't contradict what customers have already been told.

Complaint escalation threshold. Which complaint types get a standard response and which ones get escalated to the operator for personal handling? A single missing item gets the standard resolution. An order that has been lost three times requires operator judgment. The threshold definition goes into context, not into the operator's judgment at review time.

Tone examples per context. The tone for a routine delivery query is different from the tone for a complaint escalation. Five examples of each — real responses the operator has sent and is satisfied with — calibrate the Skill faster than any description of the desired tone.

Supplier communication templates. Supplier communication is lower-volume but higher-consequence. A restock chase that gets the product name wrong or uses the wrong order reference creates supplier confusion. Real examples of successful supplier follow-ups are the most useful input.

What Hermes does not handle for an ecommerce operator

Returns and refund approvals. Hermes drafts the response to a return request. Whether to approve the return — especially for edge cases — stays with the operator. Hermes can surface the case with relevant context; the approval decision is human.

Carrier and logistics disputes. A missing order that requires filing a claim with a carrier, or a dispute about liability between the store and its fulfilment partner, requires operator judgment. Hermes can draft the communication once the decision is made.

Customs and international compliance. Customer queries about customs charges, import duties, or international shipping restrictions involve variables Hermes cannot reliably evaluate without operator input.

New supplier relationships. An initial outreach to a supplier the store has not worked with before requires the operator's judgment about how to position the relationship. Once the relationship is established and the operator has communicated with the supplier several times, Hermes builds a Skill from those interactions and handles routine supplier follow-ups from the established pattern.

How Hermes compares to other tools ecommerce operators evaluate

Ecommerce operators evaluating Hermes typically come from three directions: a dedicated support platform like Gorgias or Zendesk, a virtual assistant arrangement, or manual inbox management. Gorgias and Zendesk handle ticket routing and macro-based responses — they require the operator to write response templates for every scenario upfront and do not adapt from completed interactions. A VA produces context-aware responses but at per-hour cost and capacity limits that scale with order volume. Hermes differs on the adaption layer: rather than applying pre-written macros, it drafts responses by reading the actual order data in Shopify and builds Skills from completed interactions, so response quality on a store's specific products and complaint patterns improves over time without requiring manual template maintenance.

Frequently asked questions

How does Hermes help ecommerce operators manage customer messages? Hermes connects to Shopify and Gmail, reads order data for incoming queries, and drafts responses matching the store's policies and tone. Each draft surfaces for operator approval before sending. Skills build from every completed response, improving accuracy on the store's specific products, complaint types, and customer patterns over time.

Does Hermes send customer replies automatically? No. Every draft Hermes produces surfaces for human review before sending. The operator approves, edits, or dismisses each response. Nothing sends without a sign-off.

Can Hermes handle both customer communication and supplier follow-up? Yes. Hermes watches for customer order queries and supplier communication gaps simultaneously — drafting responses and follow-ups for each type. Both draft types surface for operator review before anything sends.

How long before Hermes improves on a store's specific communication patterns? Common query types — delivery questions, return requests, tracking updates — are typically handled accurately within the first two to four weeks. Edge cases specific to the store's product range and supplier relationships improve through months two and three as Skills accumulate from real completed tasks.

Does Hermes handle messages across multiple storefronts or brands? Yes, from a single deployment. A Hermes instance can be configured with separate context definitions per storefront or brand — different tone, different return policies, different supplier contacts — and handles each channel according to the relevant context. The Skills stay separate per configuration, so a return policy Skill for Brand A does not bleed into Brand B's responses.

What happens when a customer sends a message that Hermes cannot handle correctly? Hermes drafts the best available response and routes it for operator review with relevant context attached. If the operator determines that no template applies, they can dismiss the draft and compose manually. That manual response then becomes an example pair for the Skill — so the next similar message is handled better. Hermes does not send incorrect responses; it surfaces uncertain cases for human resolution.

Notes

  1. Gorgias, Ecommerce Customer Service Benchmark Report, 2023. https://www.gorgias.com/blog/ecommerce-customer-service-benchmark

For a full guide to deploying Hermes and writing the context definitions that set Skill quality from day one, see the Hermes setup guide. For a full explanation of how Hermes builds Skills from completed tasks and why context definition quality determines long-term output, see how Hermes learns. For a broader view of what AI agents can and cannot handle, see what AI agents are actually bad at.