close
close

Former Google designer reveals what’s behind AI models like Gemini

Former Google designer reveals what’s behind AI models like Gemini

  • Google has launched Gemini 2.0, taking it one step closer to creating a universal personal assistant.
  • A former Gemini conversation designer talked about chatbot design best practices.
  • He said Google’s AI products and its search engine have problems with self-cannibalization.

Google launched its Gemini 2.0 model this week, promising more “agentic” AI to bring people a version of a universal personal assistant.

When Google released Gemini 1.0 last December, it wanted to compete with OpenAI’s ChatGPT. Gemini quickly changed the way users experienced Google itself, from providing an overview of search engine results to its NotebookLM product, which can turn written notes into a spoken podcast. Version 2.0 has features such as “Deep Research,” which allows Gemini to search the Internet for information and generate reports.

As AI assistants become more human-like in their execution, the engineers and designers who build them must grapple with questions of responsibility and tone. For example, some AI chatbots may refuse to provide answers on potentially sensitive topics.

Business Insider spoke with Kento Morita, a former Google Gemini conversation designer and Japanese-American actor and comedian.

Morita previously worked on designing conversation flows for Amazon Alexa and Google Gemini, particularly focusing on developing a Japanese persona for the AI. He provided insights into how AI chatbot designers think about efficiently providing information to users and the challenge Google faces in balancing its search engine and AI products.

The following has been edited for length and clarity.

Business Insider: How are “tones” designed for sensitive topics for AI?

Kento Morita: Whenever we get a question that’s potentially sensitive, it goes through a kind of checklist like: Is this political? Is this sexual in nature? Does this create something that is counterfactual, and when? If the answer is yes, a process will go through to ensure that ultimately all of these companies have their logo next to the answer they provide. Similar to Warren Buffett’s rule of thumb, we should be happy to see it on the front page of the New York Times or Washington Post the next day, and we should be proud of that.

The most important question we need to answer is: Is it productive for their bottom line to attribute this answer to Google or ChatGPT or anyone?

If this is not the case, we do what is known as poking. We just say, “Sorry, I can’t help with an answer like that.” It’s a balancing act. There are some issues we don’t even want to address with a ten-foot pole, but there are things we want to answer, like election night coverage – everyone will be wondering what’s happening.

We want to ensure that more people stay on our website by answering more questions. There is always a tension in these companies of wanting to answer as many questions as possible, which all LLMs can do, but there is also a need to balance whether this will lead to more negative press or provide potentially dangerous answers? Lots of discussions with the legal department, with the marketing team and with sales. It’s an ongoing conversation about how we want to approach this.

It is always a question of what priorities should be set.

There is also a problem of cannibalization of a market.

One of Google’s biggest products is search. What does deploying Gemini mean for the search business? It is an ongoing existential question.

For companies like Google, companies like Perplexity AI might actually have an advantage here, I would say, because they’re all about building a product and making a product really well. In fact, they do not encounter self-cannibalization problems. I think there are really interesting and really bold things happening from companies that aren’t affiliated with a big corporation. I think that’s only natural.

Google has moved Gemini under the DeepMind organization. I really don’t know why they did that, but as a (former) employee and also as a person who has been following Google for a long time, it’s interesting that they are consolidating many AI companies under one organization, especially in light of the antitrust litigation that has been going on is currently going on against Google, and the discussion they are having with the Justice Department about whether or not to break up Google. If they split it up, I at least think they’ll have a conversation about how much splitting it up will make sense. And I think it makes perfect sense that Gemini is part of an AI organization and not a search organization.

We are used to using Google search with ads at the top. Now the time has come Gemini. It may not be the most recent result, but it is a change.

The Google Search team is made up of brilliant engineers. Their North Star goal is to provide relevant and accurate search results and that has always been their goal. And then now enter ads. Now enter the Google Shopping results. Then bring in Gemini. All of these other factors within the organization affect the design of the Google.com website.

I wouldn’t be surprised if many of the engineers and people who have been working on Google Search for the longest time are very frustrated. That being said, I also wouldn’t be surprised if they welcomed the idea of ​​breaking up the company so they can focus on what they love to do, which is providing good search results.

Can you tell me a little about the story? Adding footnotes to chatbots and whether that was a conscious decision? How have hallucinations changed how chatbots respond today?

Even with Google Assistant and Amazon Alexa, if you asked a factual question you would immediately say, according to Wikipedia: blah blah blah, or according to XYZ blah blah blah. Back then it was quite difficult to convince people that this was a good idea. And the reason for that is, from a conversational perspective, when you ask someone, “When was XYZ invented?” They don’t really want to hear that XYZ was invented in 1947, according to Wikipedia. They just want to hear the answer. Getting an answer quickly is considered a virtue of design. Google has put so much time and effort into making the time it takes to display search results as short as possible. Therefore, it is in Google’s DNA to get the answer to the customer as quickly as possible.

We had to advocate for footnotes. What really won them over was the idea that the moment you attribute one website, you can escape responsibility for the accuracy of that information to another website.

So if I say XYZ according to Wikipedia, I am no longer responsible for whether what I say is correct or not. I could simply shirk this responsibility to Wikipedia. And when people started asking tough questions about anti-Semitism or similar conspiracy theories, being able to say according to XYZ that this appears to be the case allows us to distance ourselves from that statement, which is very, very useful when it comes to Google’s brand image is going.

When you have something called “Google Assistant” and say this happened, you can’t help but associate Google with what you’re talking about. So this distancing language allows us to take less responsibility for the information presented. So I think that ethos has stuck and that kind of reasoning has been really useful in convincing people in those companies to cite our sources. Like Perplexity AI, they actually have more freedom to talk about more controversial topics because they footnote everything so explicitly.

You don’t have to edit anything, which is a big advantage, especially when it comes to controversial and sensitive topics.

Explainability is something that is talked about a lot in the LLM field. For many people, LLMs feel like a black box, as if you type in some text and that text spits out. But ultimately it is a prediction machine. It was very, very important to add guardrails and editorialize the content design around this black box, which is a prediction engine, especially around sensitive information.

If Google Gemini and other AI cite sources, is it still a prediction machine?

There’s this thing called RAG (Retrieval Augmented Generation). I think what they are doing is indexing sources like AP News and Reuters higher to have a greater influence on those sources and the information they contain. When the LLM gets more information from them, there’s a mapping mechanism in the background that allows them to say, “We’re using RAG to call Reuters or AP News to get their information.” I don’t think it’s a prediction . It’s much more hard-coded.

On some topics, such as abortion, AI chatbots take on a caring tone, such as when they ask, “Do you have any concerns?” This is a significant change in tone.

That’s one of the biggest things I’m very proud of. During the development of Google Assistant, we spoke to mental health professionals and people who provide these services and asked them whatever words about suicide or self-harm came up when we gave users a number for this #1 hotline could, would that be helpful? #2, which language is best for this? We talked to all of these resources very carefully.

I personally spoke to Japanese resources and Japanese hotline providers and we translated these messages. It took a lot of time, but we tried to make sure that every user, including users who are thinking about self-harm, gets the best possible information.

When it comes to abortion, that fits into the same framework of strategy, content strategy: how do we make sure that people who are looking for abortion, how do we make sure that they get the information in a way that is safe and to them ultimately helped to live? the life they want? When I was at Google, we were able to fulfill our mission of collecting the world’s information and making it as useful and accessible as possible for everyone.

Ultimately there will be a democratization of these engines. Every company will have a pretty decent LLM at some point in 5-10 years. The difference between going to X or ChatGPT or Google or Alexa or whatever is in the packaging.

The more these tech companies start treating people like people and making robots speak humanly, the more I think these companies will be the most successful in the long run.