Jointly developed by Google and Mercari. Large-scale language models with visual capabilities. What humans gain and lose.

【Google explains how it gives visuals to large-scale language models, and shows a demo of its creation in cooperation with mercari.】


・Google posts an article explaining how LVM works as well as releasing a demo of the Large-scale Visual Model (LVM), which gives the Large-scale Language Model (LLM) a “visual” look.

・As the name “mercari” suggests, this demo was created using mercari product data.

・This search does not use “titles,” “descriptions,” or “tags” and is based solely on AI analysis of images.

・It can search for appropriate products even with sentences that seem less searchable, such as “handmade accessories with black and white beads,” “cups with pictures of dancing people on them,” or “Google logo-colored cups.”

・This is “like giving visuals to a large language model.”



These are the quotes from the article




What humans lose by evolving large-scale language models


Large-scale language models (hereafter, LLMs), such as Chat-GPT, have attracted attention for their ability to exchange information (words) as if they understood human language, as if they were having a conversation.


And now it seems that LLM, which is being jointly developed by Google and Mercari, has acquired visual capabilities. It is something that understands not only words but also visuals and searches for products.


As shown above, a search for “a cup with a picture of a dancing person on it” would not have been caught without information on words like “dancing” and “cup” if it had been done before.

However, with the newly developed LMV, there are no such words, just a photo, and it will be a hit in a search!
In other words, it is as if the LLM understands the content of the photo and searches for it. It’s amazing.


When LLMs such as Chat-GPT appeared on the scene, I imagined that many things would become more convenient and efficient. At the same time, I also thought, “If we can exchange information so easily, we humans will no longer need to think as we used to. In other words, our ability to think will weaken?

The development of automobiles has weakened our legs.
The development of calculators weakens our mental arithmetic ability.

It is similar to those.

When something becomes more convenient, we usually lose some of our functionality. (No wonder, since we stop using it because it becomes more convenient.)


After reading the article about the LMV, which I also got a visual of this time, I thought even more.


By getting a visual.

“Looking for a marriage partner (lover)” (e.g. on social networking sites),

Or, “I’m looking for a tourist attraction where I can enjoy the scenery that everyone likes,

and so on,

There is a possibility to reach the objective (search goal) more concretely and efficiently than ever before.


If that happens,

I’m sure all detours and useless things will be eliminated, I’m sure.

Because if you do what the LLM says, you are guaranteed some correctness. (This is assuming that the LLM has reached a great degree of perfection.)


If you do so,
For example,

“I was very hurt by my love affair with that person, but it was a good experience.

“I got lost, but I saw a beautiful view that I never expected.”

And so on,

We will lose such coincidences.




In other words,

As the LLM is developed and perfected,

I would imagine, in an exaggerated way, that humans will lose not only the “ability to think” but also “chance” and, albeit indirectly, the “opportunity to be moved”. (Lack of detours and waste is efficient, but it is going to lead to a narrower way of life)

Of course,

It is good that LLMs make so many things more efficient and convenient.


Everything has both good and bad, positive and negative, yin and yang.


The greater the power, the more important it is to be aware of this.


In this issue, plus or minus?

I thought and wrote from the perspective of what people are losing as LLM has become more convenient


See you then.



In a world of convenience, there is a demand or a business that says, “I want to do something that dares to be done in a world of convenience. From this perspective, in the distant future, services and businesses that dare to make people think, dare to enjoy coincidence and randomness may appear.




You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *