2025-7-23

How Mercari Built Socrates, Its Gen AI Analysis Tool, in Just One Month

On April 18, 2025, Mercari released a new internal data analysis tool that leverages generative AI. The tool is called Socrates. It gained significant popularity within the company soon after release, assisting on average 500 employees per week. Remarkably, it only took one month from conception to launch.

In this article, Mercari’s AI Implementation Officer Gomi Hayakawa (@gomichan) facilitates a discussion with the members who led the project about the background behind the development of Socrates and the potential it holds. Joining from the project side is Business Intelligence (BI) Product Team manager Kentaro Kobayashi (@kobaken), engineer Naofumi Yamada (@na0), and Shuichi Iida (@Shuichi), who directed the entire data product.

This talk also touches upon broader topics related to development and organizational structure.

Featured in this article

Kentaro Kobayashi

After graduating university, Kentaro began his career in 2019 and worked on data management for domestic and international services. He then joined Mercari in 2023. His previous experience at Mercari includes improving the infrastructure for A/B testing and leading the Basic Tables project. He is currently focused on making the data organization more AI-Native. He has a master’s degree from the Graduate School of Economics, The University of Tokyo.
Naofumi Yamada

Naofumi joined Mercari in 2022 after gaining experience as a software engineer, analyst, and data engineer at three other companies. At Mercari, he has worked on migrating outdated data and the development of Basic Tables. He currently oversees the development of Mercari’s AI agent Socrates.
Shuichi Iida

Shuichi worked at DeNA and DeNA San Francisco before joining Mercari in 2014. After participating in the launch of Mercari in the US at Mercari US, he returned to Japan in 2020 and transferred to Mercari Japan. He currently leads the entire data product team. He has a Ph.D. from the Graduate School of Mathematical Sciences, The University of Tokyo. Shuichi spent a total of 7.5 years in Silicon Valley.

Autonomously analyzing data and deriving hypotheses—the power of agentic AI

—Congratulations on releasing Socrates! Could you give us a rundown of its features?

@kobaken: Sure. Socrates is a tool that analyzes Mercari’s internal data with a conversation-style interface. Even without specialized knowledge of SQL or Mercari-specific data, members can easily visualize data, identify and explore trends, derive hypotheses, and create analysis reports, all within a single tool.

—We have a few other internal analysis tools at Mercari. What would you say are the differences between those tools and Socrates?

@na0: Previous internal tools were created based on the idea of eliciting a single response from an LLM. If you ask one of these tools to write an SQL query to calculate Mercari’s monthly user count, it will complete the task and respond in a single step.

On the other hand, Socrates has been implemented as an AI agent, meaning that it has much broader search capabilities than other tools. It doesn’t just give one answer in one step—it starts an investigation based on the user’s understanding of the issue, collects data, suggests hypotheses, and conducts further investigations based on those hypotheses.

—So rather than being an AI that just completes specific tasks, Socrates thinks more like an autonomous agent that can solve problems on its own. What were the key technological considerations when developing this agentic approach?

@na0: We give Socrates broad-scope goal prompts and tools, while at the same time setting only the minimum guard rails (control mechanisms) necessary. We continuously analyze usage data to improve Socrates’ prompts and guardrails, which increases the chances that users will find the answers they’re looking for.

—So, you didn’t want to overly control hallucinations, and instead prioritized its ability to freely analyze and derive hypotheses.

@na0: Yes. Trying to prevent all hallucinations—and there are many different types—is likely to limit Socrates’ scope of thought. In the early stages of development, we focused more on finding hallucinations than controlling them. We did this to prompt users to act when a hallucination occurs. Socrates’ user interface (UI) allows users to see and verify all executed queries and processes. So, for instance, if Socrates says that it has executed a query but it actually hasn’t, both the user and a reviewer can see this on the UI.

—Oftentimes, strong guard rails can prevent an LLM from generating responses that go beyond human expectations. For Socrates, you seem to have designed it in such a way that accurately captures the limitations and potential of LLMs to maximize what Socrates can do.

Building data infrastructure and enabling rapid project kick-off with an agile team

—Next, could you tell me about the background behind development? Why did you decide to release Socrates at this particular time?

@kobaken: A new value “Move Fast” was added to Mercari’s values last year, which made us more conscious of the speed of our decision-making and implementation of those decisions. To act fast, data plays an essential role in providing quantitative support. However, complex data analysis requires SQL skills and expertise in analysis design. Since we needed to rely on data analysts for some complex analysis, I was concerned that decision-making could slow down when the data analysis team’s bandwidth was limited.

On our BI Product Team, we started experimenting with different approaches to address these challenges last year. One notable example is generating queries using LLMs, but accuracy issues prevented this from being a practical solution. As we were experimenting with options, our LLMs updated and started showing signs of being able to generate queries of a decent quality, provided that we simultaneously worked on organizing our metadata. Thanks to this technological advancement, we started full-scale development in mid-March.

—So, the technology had to catch up to the idea. How long did it take from development to launch?

@kobaken: It took around one month. We first started discussing development at the start of March. By the middle of March, we had released a prototype and began gathering feedback from members. We made improvements based on that feedback, and released Socrates in mid April.

—How were you able to launch so quickly?

@Shuichi: One reason was that, while the BI Product Team belongs to the Product Division, we were able to operate independently like a task force. Our team is capable of planning, designing, developing, and internal marketing, so there was no need to collaborate across teams, allowing development to progress solely within the team.

@na0: Before the kickoff meeting with @Shuichi, who approached us about developing Socrates, I put together a web-based mockup. This mockup gave the meeting participants a feel for the general behavior and accuracy of Socrates on their own computers, and I think that encouraged the decision to start development. I actually made the mockup just two days after hearing the proposal. One of the main reasons I was able to produce the mockup so quickly was that the LLM technology had evolved and we had more mature frameworks for developing agents.

—Shuichi, what did you think of @na0’s mockup?

@Shuichi: I had already been experimenting with building analytical tools using LLMs, but the accuracy was not yet at a technically feasible level. When I used the mockup in the meeting, I could see that the accuracy had indeed improved. That convinced me to move ahead with the project right away.

@kobaken: Another major factor for starting development so quickly was that, even before the conception of Socrates, we had made progress in organizing Mercari’s data. At Mercari, we use something called “Basic Tables,” which organize high-quality intermediate table groups to streamline analysis related to the Marketplace business. We deliberately limited the data that Socrates can reference to those tables to achieve high-quality responses from early on.

—So, past efforts proved a valuable asset. @na0, from your perspective as a developer, what do you think helped this project progress so quickly?

@na0: I think the fact that the vision for Socrates was clearly articulated from the start helped the project members share a common understanding of what was required.

I created the mockup so quickly because I was driven by the desire to first turn our dream into reality. Rather than having discussions to make the dream bigger, we prepared a minimum viable product and asked other members to use it to gather feedback. Doing things this way made it easier for us to see exactly what we needed to improve. I believe that focusing on building something that works first is very important for lean (efficient and well-balanced) AI development.

Afterward, we received feedback from analysts and made improvements, gradually expanding the release scope with each improvement. This allowed us to grow the user base, extend functionality, and enter a positive feedback loop. Many people had high hopes for the tool, and that also spurred us along.

—I’m curious about the reason for asking for feedback on prompts to improve accuracy. Some people believe that as LLMs become smarter, prompt engineering will gradually become less important. Personally, I think that prompts are a critical component to get the LLM to generate responses that go beyond human expectations. What is your opinion on prompt engineering?

@na0: We analyze Socrates’ usage data and regularly improve its prompts and tools based on issues it was unable to solve. I think prompts are still an effective way to improve the accuracy of LLM responses. While it’s true that technical prompt engineering has become less important, it remains crucial to help LLMs understand context-based actions.

One specific challenge we faced was teaching Socrates internal terminology. We tried two different approaches: simply adding the terms to the prompt, and implementing a glossary tool to teach Socrates when the terms should be used. For terms that have the same meaning regardless of division or team, we included those in the initial prompt. For terms where meaning varies depending on division, we needed Socrates to use the glossary tool rather than the prompt. So, I feel this shows how the design of the entire system—prompts included—is becoming more important.

@kobaken: This was especially true during the period leading up to release. We made a conscious decision to focus on implementing core features first, planning to establish operational rules and guard rails later based on actual usage. So, in that sense as well, our approach of driving development through prompt refinement was a good fit for this project.

Devising an internal adoption strategy like a real product launch

—I heard that Socrates received significant feedback after its launch. How many members currently use Socrates?

@kobaken: Socrates is currently used by around 500 members per week. Other company-wide dashboards are used by around 100 people per week, so Socrates has gained traction beyond our expectations. What’s really interesting is that we’re not just seeing high user numbers—the company is actually experiencing some unique benefits from AI-powered tools. For instance, one user told us they normally hesitate to ask data analysts questions, but they feel completely comfortable asking Socrates anything.

—I think many members struggle with promoting new internal tools. What do you think contributed to such widespread adoption among employees?

@kobaken: A major factor was clearly stating what we wanted Socrates to achieve in a centralized document from the conceptual stage, and sharing that information with the team and relevant stakeholders to gather everyone’s opinions. We received both positive and negative feedback, and that was crucial for determining exactly which functions Socrates needed. Another significant factor was making an internal announcement at the time of release and collecting logs from users who tried using Socrates. In the beginning, Socrates only had core functions. We received feedback wishing that it could do more, which led us to develop and release frequent feature additions and improvements. This steadily grew our user base and gained us more “fans” who were surprised at how much it could do.

@Shuichi: After release, we kept track of how people were using Socrates each week and pinpointed some “heavy users.” Currently, we gather specific feedback from them and have them try out new features first to make improvements. We are also considering how to identify key individuals, such as executives, who can drive adoption, and thinking about how best to encourage them to use it.

—So, treating it like an external product launch, even though it’s an internal tool, really helped drive quick adoption.

Key to data organization lies in providing metadata and designing tables

—Moving away from Socrates now, I’d like to learn more about development and operation of the organization as a whole. First, let’s talk about data organization. You mentioned that Basic Tables are input into Socrates. Does organizing data for AI differ from organizing it for humans?

@kobaken: In order to have an AI create accurate queries, you have to provide metadata for the data. Previously, the primary users of our tools were data analysts or those with a certain level of familiarity with data, so we only needed to provide the metadata for the Basic Tables in a spreadsheet. However, with Socrates, employees who are not as familiar with data specifications also reference this data. Therefore, it becomes important to make the data more LLM-friendly, such as by adding column descriptions in BigQuery, to ensure the LLM can consistently generate high-quality output.

—Do you plan to make Basic Tables more AI-friendly moving forward?

@na0: We don’t intend to create data that is optimized for Socrates to the point of compromising usability. However, we may prepare an interface specifically for AI, which would include Socrates. We aim to achieve a good balance that maintains ease of use for both us and the end user.

—Many companies are now exploring ways of organizing data that can empower both humans and AI. I look forward to seeing what we can come up with that is uniquely Mercari.

Socrates as the launch pad for AI-default operations, organizations, and workstyle

—What’s next for Socrates, and what challenges do you foresee?

@kobaken: We want Socrates to do more than simple analysis based on accumulated product data. We want to make it able to analyze the events behind the data, and even come up with hypotheses by itself. On a more technical level, we want it to be able to analyze data while also taking into account information like business metadata, which includes product feature releases and campaign calendars, so that it can support a broader range of inquiries.

We also want to expand its userbase. Our current userbase is primarily product managers and data analysts, and we want to expand to marketers and leadership as well.

—What do you see as a potential bottleneck moving forward?

@kobaken: If too much data is input into the model, that data becomes noise and may lower the quality of the output. We will need to control the balance between quality and convenience by defining aggregation criteria and assigning priority to the data that will be referenced. Also, an increase in users means a wider variety of feature requests, so we want to develop a growth strategy that considers overall optimization.

—What is the organizational outlook for the BI Product Team?

@kobaken: As Socrates becomes used by more and more members, I feel that our approach to identifying and solving problems is significantly different to that before we released Socrates. For example, in the past, our team operated mainly based on consultations or requests regarding issues faced by data users. However, as Socrates becomes smarter and its reach expands, those issues are less likely to become apparent. Specifically, I think there will be fewer discussions about such things as creating intermediate tables because of poor query performance, which used to occur frequently.

In terms of identifying issues from a broader perspective and creating an impact that spans the organization, each member will be expected to take more ownership and initiative. As a manager, I want to oversee the team while appropriately assessing and accepting risks to maximize the impact of our activities across the organization.

—I think all companies are currently trying to find ways to create a development framework that is compatible with ever-evolving AI technology.

@Shuichi: It would be great if we could have a flexible division of roles, where instead of thinking, “This is a development task, so we have to ask an engineer,” we can say, “The BI team can build this in-house, so let’s get it done ourselves.” As we start integrating more AI, I expect that the definitions of individual jobs, and the hiring criteria for those jobs, will fundamentally change. Ultimately, the only tasks left for humans might be organizing the data input at the entry point and implementing measures or taking responsibility at the exit point.

—I hope these changes will spur on discussions about management tailored to AI.

@Shuichi: Me too. For example, if a company operates on a quarterly cycle to determine its policies, it may not be able to keep up with technological changes. We started the development of Socrates because of an update to Gemini, and we couldn’t have predicted that. Without a framework that lets managers on the ground make swift decisions, I think it’s only going to get harder to keep up with technology.

@kobaken: Moving forward, we want to continue improving Socrates so that it can further contribute to our business growth, all while keeping these kinds of updates to our organizational structure and management in mind.

Text: Fumiaki Sato　Photos: Tomohiro Takeshita

Related job positions

Here are some of our open positions!

See all open positions

Direct you to a careers site

Related job positions

Here are some of our open positions!

See all open positions

Direct you to a careers site

About AI/LLM business

How Mercari’s Second-Line Teams Leverage AI/LLMs: Why We Needed to Implement Generative AI for GRC

Incorporating AI for More Robust Checks and Better Risk Prediction—How Mercari Hallo Is Improving Job Listing Quality

AI that Doesn’t Sound or Feel Like a Robot? How the Mercari Hallo Easy Job Listing Feature Came to Be

Categories

Company/Business

Job Categories

Engineering

Product & Business

Corporate

How Mercari Built Socrates, Its Gen AI Analysis Tool, in Just One Month

Featured in this article

Autonomously analyzing data and deriving hypotheses—the power of agentic AI

Building data infrastructure and enabling rapid project kick-off with an agile team

Devising an internal adoption strategy like a real product launch

Key to data organization lies in providing metadata and designing tables

Socrates as the launch pad for AI-default operations, organizations, and workstyle

Related job positions

Related job positions

About AI/LLM business