Wayfair: Driving AI Progress, Without Everyone ‘Dropping Their Day Jobs’

By Steven Melendez |  July 7, 2023

Wayfair, the Boston-based online furniture retailer, has long used artificial intelligence and machine learning to optimize its online marketing and the way it presents products to site visitors. Now, the company is also researching how generative AI technologies like GPT can assist in customer service, sales, and crafting code. Chief Technology Officer Fiona Tan spoke about Wayfair’s experiences using artificial intelligence and managing human oversight of the technology.

Can you tell me a bit about how AI got started at Wayfair?

Fiona Tan, Chief Technology Officer, Wayfair

Customer acquisition, making the right bids for ads, search engine marketing, and especially product listing ads, was probably the area that was the genesis of the machine learning experience at Wayfair. The way we’ve been thinking about AI and ML is really a framework that sort of says, what data do we have? And how much risk is there? Based on that, we would think about how much we can optimize.

So think about the marketing use case as an example. It’s almost 100 percent automated with AI and machine learning, in terms of what drives the bids in the ad auctions that we participate in. And the reason we were able to go whole hog on that, early on, is that from a risk perspective, if I bid too much, I pay too much. That’s something we feel that the risk was easier to manage and go forward with. We probably started that close to a decade ago.

You still need to have some degree of humans in the loop to make sure that it’s actually correct…

Some of the other areas that we started investing in and getting good at was more around  understanding the products, and making sure that we got the product information right. So if we’re using machine learning data to say, “Hey, this is real wood or this is faux wood,” those were areas that we leaned into. But you still need to have some degree of humans in the loop to make sure that it’s actually correct, because the reputational risk of saying it’s real wood when it’s not, that’s a bit more of a risk.

And marketing, pricing, search, personalization, customer understanding, product understanding are all areas we’ve applied machine learning pretty heavily [to.] We’ve also done some work with computer vision, which sort of augments what we do in machine learning, because a lot of it’s around looking at imagery and being able to…tell if this is the same as something else: one couch versus another couch—is it the same? Is it similar?

As far as the marketing side, how did you decide it was ready to operate with less human supervision? Were there things you were watching out for?

… You usually employ a solution and check to first see, how good is it? How far off is it? And usually the metrics that we would use when we’re figuring out the models, you usually have an outcome metric: I want to spend this much, and I want to get this particular result, and you see how close the algorithms get to it. And then over time, you build confidence and you go, “OK, I think I can trust the models for that.”

Then, you might go into another area and have A/B tests of fully automated, and also potentially humans in the loop, or a more heuristic model. You can even have a fully human-based model. And you figure out over time, how do they compare, and then you get more comfortable with one versus the other. Because with machine learning—AI in general—there is some issue around explainability. It’s a bit of a black box. So a lot of times as you get more and more data, the model is “making decisions” for you. I think Wayfair, [since the company has] grown up in the digital age, was always much more of a tech-forward company. There was a little bit more openness to the sort of black box nature of machine learning, compared to my previous role at Walmart, for example.

So it depends on the business mindset, and how comfortable people are with a little bit more of a black box.

You were talking about using computer vision to classify furniture. How did that get started and what exactly is it used for?

In the field that we are in with furniture and with couches, you don’t have brands. You get data from suppliers, but some of these are suppliers that are small, in terms of how much information they have, or how technically savvy they are. You get a mishmash of data…

So if you look at it from the computer vision part, it’s trying to ensure that as we’re getting products coming in, we have an ability to figure out how close they are to each other. I don’t have any global codes for these products. It’s not AAA, six-pack Duracell batteries, where I know exactly what it is.

If I get two couches, and they may be coming in from different suppliers, I need to know if they’re exactly the same couch, or if [they’re] similar. When you go to the site and you look for a couch, I want to make sure that you see a variety of couches. If I show you all the couches that are exactly the same, even though they may be from different suppliers, it’s not a great customer experience.

And if A and B are similar, and you live in an area where couch A is a lot closer to you, versus someone else’s case…I can show you couch A, and when [another person does] a search, I can show couch B, and it ends up becoming a win-win. So my ability to understand that these two couches are similar is also useful.

You mentioned some cases where it’s still helpful to have humans in the loop. What are some good examples there?

If you’re going to make any decisions about whether something is real wood or faux wood, for example, having humans in the loop to just double-check where there is potential reputational risk. And if you look at some of the generative AI options right now, that’s another area where we want to make sure that we are able to leverage some of the benefits that it gives us.

I’ll give you an example of one of the areas that we’re doing a proof-of-concept, and we’ll probably fast track it. [That] is customer service. That’s an area where it’s pretty well known that it’s probably a good application of the ChatGPT sort of generative text technology. But in that case, we still have the customer service agents in the loop, so it doesn’t go directly to the customer. The technology will come up with potential text, and the customer service agent can then respond back to the customer, but the agent still has the ability to double-check, [and] make sure it sounds good. We train it on our return policies and some of the things that we might do if, say, the customer is asking for a $20 extra discount. So we give it some of that information, but it’s still up to the service agent to correct it, use it, or not use it.

Another one is as a tool for our sales agents. Once again, if the customer is asking questions about products, etc., it can provide some helpful ways to respond. And then sales agents can…make the decision on exactly how they want to respond. Another example is around code generation. In that case, too, you always have the developer in the loop. And assuming that we’re comfortable with the legal aspects of the code generation, which we’re still working through, you still have a developer that can test and make sure that it’s actually okay before it goes out into deployment. So those are the areas where we feel like there’s a lot of promise with generative AI options. We just want to make sure that we have a way to control the deployment.

You want to make sure that you also don’t get everybody so excited about using this tech that you lose sight of all the core technology work that you have to do.

You mentioned running trials, and that you might fast track the new AI. What does that whole process look like?

We’ve put together a little task force that is looking at all the generative AI technologies. ChatGPT is obviously the most famous of them all, where you’re generating human intelligible text. There’s also image generation, which is a potential interest area. So we’ve put together this task force. It has some of our machine learning data science experts. We have somebody from the legal team involved. Marketing copy is another area where there’s good ability to use generative AI, so we have a cross-functional team that’s…looking at the applications where we believe that would be a really good ROI…

There is such a buzz that everybody and their dog is looking for ways to use it. And sometimes it makes sense. And other times it may not make sense. We sent out messages to our teams as well, just saying we’re doing this and just sort of making sure we’re keeping our teams up to date in terms of where we are, and in terms of the getting the OK from a legal perspective to be able to use it. …You want to make sure that you also don’t get everybody so excited about using this tech that you lose sight of all the core technology work that you have to do. We’re in the middle of a big technology transformation—I still need to make sure that the team is working on that.

[Generative AI is] more of a kind of augmentation to our AI journey. It’s not that suddenly everything is generative AI.

How are you going about testing the generative AI technology? Are you using it with actual customers, and actual marketing copy, and so forth?

I think with the coding pieces, we will identify team that will look at one code generation capability versus another versus not. So making sure we have a way to A/B or A/B/C test.

And then in other cases, when we have humans in the loop, they can obviously provide that level of checks, but, say, for the pilot around customer service, we also do a blind test, where you have for certain cases the customer service agent generating a response, and in other cases, you have the ChatGPT-like capability with the service agent in the middle, and you compare and see how well it’s doing.

Then there’s another use case where it’s going to be more customer-facing. And with that one, I think we’ll test it off the main Wayfair site to a smaller audience. …If you play around with ChatGPT, as an example, it takes a little bit of time to come back with something. If you’re depending on it on a website, where generally speaking site speed is probably the biggest factor in terms of how usable a site is and impact on conversion for an e-commerce site, you make sure that you don’t introduce something that’s fairly novel, but may end up slowing down your site. So I think that’s part of it—making sure that it’s ready for prime time for as big of an audience as we have.

…I think we’re being pretty pragmatic about it, making sure we go as fast as we can without running into issues from an enterprise risk perspective.

Key insights…
• Seeking perfect explainability from AI models will likely slow your ability to roll out AI-enabled offerings.
• Look for a balance between core technology improvement/modernization needs and generative AI experimentation.
• Consider assembling a cross-functional task force to explore different use cases for generative AI, taking into account the potential return on investment and risk.
• Keep a human in the loop for quality control in the early stages of AI testing.