mercan

How Mercari’s Visual Recommendation Technology Earned Recognition at a Top Tech Conference

2025-11-26

How Mercari’s Visual Recommendation Technology Earned Recognition at a Top Tech Conference

Share

  • X
  • Facebook
  • LinkedIn

Do you know about Mercari’s visual similarity-based recommendations feature? Mercari’s paper on improving this feature, “Improving Visual Recommendation on E-commerce Platforms Using Vision-Language Models,” was selected as a spotlight presentation at RecSys 2025—the premier global conference on recommender systems. As a conference with a narrow acceptance rate, this remarkable achievement demonstrates that Mercari’s AI technology is competitive at a global level.

This feat was by no means accomplished overnight. The feature began as a rapid two-engineer project and was then refined by a different team, who added improvements that elevated it to international acclaim.

For this article, we talked to the five members who worked on the visual similarity-based recommendations feature, tracing the journey from the stories behind the early stages of development, through the challenges of the improvement project, to the drafting of the paper that has gained international recognition. We also see how Mercari’s engineering culture creates value across teams.

Featured in this article

  • Yusuke Shido (@shido)

    Yusuke joined Mercari in 2019 after graduating from university with a major in computer science and machine learning. As a machine learning engineer in the TnS domain, he was involved in areas such as detection of prohibited items, and served as a tech lead. Yusuke then began working on improving the logic for Mercari’s recommendation system, overseeing the baseline implementation of the first version of the visual similarity-based recommendations feature.

  • Yuta Ueno (@wakit)

    Yuta joined Mercari in 2022. Yuta first worked on developing an item recommendation and search feature for our B2C service. He then became a member of the Recommendation Team, and now focuses on improving the user experience of our app’s home screen and item detail screen. Yuta worked on the first version of the visual similarity-based recommendations feature as both PM and engineer.

  • Yuki Yada (@arr0w)

    Yuki joined Mercari as a new grad to work as a machine learning engineer in April 2024. As a student, he worked on research related to machine learning applications and interned at a number of companies as a frontend and machine learning engineer. At Mercari, he has worked as a machine learning engineer in various capacities, such as for the on-demand work service Mercari Hallo, our AI-focused team Eliza, and most recently the Recommendation Team. For the visual similarity-based recommendations feature improvement project, he is primarily in charge of model implementation and evaluation.

  • Sho Akiyama (@akiyamasho)

    After working as an ML engineer, full-stack engineer, mobile application engineer, and other roles, Sho joined Mercari in February 2024. Sho served as the Tech Lead for this feature improvement project, leading the overall development as an ML/full-stack engineer on Mercari’s AI team, Eliza. He is currently an engineering manager.

  • Ryo Watanabe (@naberyo)

    Ryo joined Mercari as an ML engineer in April 2024. As a student, his research focus was image generation. Ryo was an intern at Mercari before he joined as a new grad to work as an AI/ML engineer on the Recommendation Team. He is in charge of implementing and monitoring A/B testing, and contributed to the research paper for the visual similarity-based recommendations feature.

Two engineers, two weeks: the fast-tracked initial release 

—First, tell us how the first version of the visual similarity-based recommendations feature came about. What issue were you trying to solve?

@wakit: At the time, not much emphasis was being placed on actively developing the recommendation feature the user sees on the item detail page. Teams were focused more on other pages like the home screen. There was a lot of room for improvement, but no one put their hand up to work on it. I decided to look into this myself by researching other companies’ apps and coming up with plans for improvement. My goal was to implement a feature that would provide users with a new experience that wasn’t available at the time: Recommend similar items based on images.

—Did you face any especially challenging issues during development?

@wakit: One thing was that resources were limited. One of the team’s OKRs (Objectives and Key Results) committed us to releasing features in two to three weeks, which is an incredibly tight schedule. On top of that, only @shido and I really had the time to develop the feature.

@shido: We really didn’t have much time at all. But, thankfully, the technology available at the time had evolved a lot compared to when I first attempted something similar in 2018. Back then, I was in charge of a team of engineers working on the image search feature, and development took around six months. By 2024, cloud services such as Google Cloud Platform had become more advanced, and it had become significantly easier to build image search systems. Thanks to that, we were able to release the feature in a short period of time.

@wakit: We were focused not only on developing the feature quickly, but also producing stable results and reducing the cost of machine learning and serving (system provisioning) during implementation.

—What results did you see post release? And what was the reaction like?

@shido: We achieved some impactful results, with purchase rates through recommendations surging by about 150%. With this new feature, we could recommend items that were in different categories but had a similar appearance and feel, which is something that is impossible to do with just text information. I think this led users to make a lot more new discoveries.

@wakit: We also received positive feedback on social media, with many users reporting that they enjoyed the new experience the feature provided on Mercari.

Passing the baton to another team for improvements

―Not long after the initial release, you started working on improvements to this feature. What led to this?

@arr0w: At the time, @akiyamasho and I were on Eliza (a team dedicated to promoting AI/LLMs), and the team was looking for its next big challenge. Lo and behold, we came across a post on Slack by @shido announcing the implementation and results of the first version of the visual similarity-based recommendations feature that his team had released. It was a eureka moment for me and perfectly timed because we were exploring the feasibility of A/B testing for image searches at the time. @akiyamasho and I immediately went to @shido and the others to ask them to let us conduct A/B tests. 

@akiyamasho: We were also facing an issue—we were working with the Search Team to fine tune our model and improve its functionality through additional learning, but we hadn’t come across any specific applications for the technology. When I saw this feature, I was convinced it was the perfect use case.

@wakit: With so few people working on the initial development, we were able to collaborate and share information seamlessly, and this enabled us to initiate the improvement project right away. What made the project feasible was the extent to which the technology had evolved and our ability to work across teams seamlessly.

Offline evaluation for higher accuracy

―So, after Eliza took on the improvement project, did you face any technical challenges?

@arr0w: The model used in the initial version was called “MobileNet.” It had a lightweight design that allowed it to operate at low cost, which was appealing. @shido and his team had already verified the validity of visual recognition in their initial version with A/B testing, so we thought that we might be able to achieve even greater improvements by using a higher-performing model. We adopted a new, more powerful model called “SigLIP” and subjected both models to A/B testing to compare functionality.

―What was one thing that was particularly important during the project?

@akiyamasho: One thing that helped us a lot was that we already had access to user tap logs because the initial feature had already been released.

―Did these tap logs influence the validation method you used?

@naberyo: Yes, because of these logs we were able to perform “offline evaluations.” This is a method of pseudo-validating the model’s accuracy using past log data before conducting an actual A/B test, that is, an online evaluation. The first team had set up a robust mechanism to collect all logs, and so we were able to start, and so we were able to start A/B testing with a fairly good idea of what results we would get with the SigLIP model.

@shido: I’m so glad we decided to incorporate logging from the start of development. (laughs)

―What results did the new model produce?

@naberyo: We got great results. The tap through rate increased by about 50% and the purchase rate increased by about 14%, which were higher numbers than we were expecting.

One of Mercari’s strengths is its high number of monthly active users (MAU), which is currently at around 23 million. Leveraging real-world service achievements in an academic setting

―Those are really great results. Why did you decide to publish the results in an academic paper?

@arr0w: The SigLIP model we used was still new at the time, and hardly any use cases or research papers had been published about its application in a production environment. Also, vision-language models, which use both images and language as input, were starting to gain attention in the academic world. Using a vision-language model in a large-scale production environment like that of Mercari provided some unique insights, so we thought it would be beneficial to publish our results for others to reference.

@naberyo: We first decided to submit our paper to MIRU, a Japanese conference in the field of visual recognition and understanding technology, to see how it would be reviewed in an academic setting. We chose MIRU because it offers rigorous peer review from researchers and students in the field. We wanted to test our work in this academic setting and gather feedback for the upcoming RecSys conference.

―What feedback did you get from MIRU?

@naberyo: MIRU’s peer review was quite harsh―we received feedback saying that the idea for the model already existed and lacked novelty. But, we also gained confidence that the quality of our online evaluation experiments met a certain academic standard.

@arr0w: I think MIRU’s peer review was spot-on. To be honest, we didn’t have much time to prepare the paper for MIRU—less than a month—so we didn’t have high expectations. I was really surprised when they accepted our submission.

―So, your experience with MIRU readied you for your application to RecSys.

@shido: RecSys is the world’s leading international conference in the field of recommendation systems. It not only values academic achievements but also emphasizes practical applications in business through its “Industrial Track” category. Based on our feedback from MIRU, we decided to change our strategy and instead highlight that we conducted A/B testing on a massive scale for a service with about 23 million MAU.

@arr0w: A lot of academic research uses offline evaluation with published datasets, so implementation is often limited to controlled user testing. Research that introduces a new technology into an actual product at the scale of Mercari’s business, and that also quantitatively demonstrates its effectiveness, is extremely rare on a global scale. I think that aspect of our paper was what garnered such high praise from the RecSys reviewers. In fact, we received the highest evaluation from all reviewers and were selected for an oral presentation. We were so pleased!

―RecSys features big tech companies from all around the world. What was it like being at the event?

@naberyo: I got the impression that all the companies presenting at RecSys were focused on producing novel technology. LLMs and LLM agents, in particular, were trending topics.

@arr0w: All the sessions included the word “LLM” in the title, and all the workshops about generative AI were fully booked. People were very interested in those two topics.

@wakit: I noticed that both engineers and their managers were actively participating in sessions to stay current with the latest trends. You could feel the enthusiasm that’s been sweeping across the entire industry lately. Compared to the other presentations, I think ours stood out as it focused on a fundamental topic like visual recommendation rather than a trendy topic like LLMs, while also demonstrating practical implementation and results at a scale of about 23 million MAU. I think this made people view our presentation in a different light. Engineers from a wide range of companies came to our poster session to ask questions. Some said that they are facing a similar problem in their country, and others wanted to know what we did to improve the technology. We ended up having some passionate discussions with engineers who work in the same field.

Mercari as the ideal lab for engineers

―Lastly, what message would you give any engineers or students reading this article?

@naberyo: Mercari provides an environment where you can try out any idea you have right away. We have a big product with around 23 million MAU, and a wealth of data and technology that supports that product. Above all, the sheer number of our users makes Mercari the ideal engineering environment.

@wakit: One major draw is that engineers have a high degree of autonomy. Mercari is a large company but has a culture that encourages swift decision-making like a startup. We can take an idea for something we want to implement and start developing without losing time to approval processes and the like. Also, while speed is crucial in academia, being at Mercari allows us to spend time creating things that have no precedent. I enjoy investing my own efforts into improving a product that I personally enjoy.

@arr0w: We have so many members who are highly motivated to write papers and contribute to academia, and management also fairly evaluates and supports these efforts, which fosters a supportive culture.

@shido: Machine learning research often ends within the confines of a computer, but at Mercari, our research outcomes can directly improve the experience of over 20 million users and have a significant impact on the business. I can’t think of a more exciting place to be an engineer!

Photographer: Tomohiro Takeshita

Share

  • X
  • Facebook
  • LinkedIn

Unleash the
potential
in all people

We’re Hiring!

Join us