Once again, ChatGPT developer OpenAI has chosen a special time to publish its innovations. On Monday – one day before Google’s I/O developer conference – the new AI model GPT-4o was presented to the public for the first time. The next day, Google also had a good portion of AI with them.
The OÖN provides an overview of what the latest developments look like and why both companies partly use each other as models.
GPT-4o: The “omni model”
There are some innovations for the more than 100 million users who, according to OpenAI, work with ChatGPT. Although there was already voice input, the software had to take a short break to process and answer queries. With the new “flagship model” GPT-4o, a “much more natural and much simpler” type of interaction between humans and machines can now take place. From an AI perspective, this interaction should take place at the level of GPT-4, but be much faster.
In recent years, OpenAI has focused on improving AI. Now, for the first time, a big step forward is being taken in terms of user-friendliness, said technology boss Mira Murati at the online presentation.
“GPT-4o combines logical thinking in speech, text and image recognition,” Murati continued. OpenAI therefore also speaks of an “omni model”, which explains the “o” in the name. Users have the option to upload photos and documents. In addition, the software can analyze the live image from a smartphone camera. GPT-4o takes information from all of these sources of information and evaluates it.
The software can respond to different emotions both when recording and outputting information. In a demonstration, ChatGPT made up a bedtime story and read it out loud. You could interrupt the software and ask it to add more drama to your voice or to speak like a robot. ChatGPT even sang the last sentence if requested. The functions of GPT-4o are available in 50 different languages – even for free users. Paying customers are allowed to use the various offers to a greater extent.
OpenAI boss Sam Altman wrote after the presentation that it was the best way to use a computer that he had ever experienced. “It feels like the AI you see in movies. And it still surprises me a little that it’s real.” The presentation was preceded by rumors that OpenAI could compete with Google with an AI-supported search engine. There was no talk of that on Monday, but Murati concluded the presentation with a reference to the “next big thing” that OpenAI wants to present “soon”.
More AI when googling
After the GPT-4o announcements, the bar was set high for Google. After all, the software has the potential to become a smarter version of voice assistants like Siri, Alexa or the Google Assistant. Google didn’t have to be asked twice and also made announcements in the AI area at the I/O developer conference on Tuesday.
- Continue reading: The end of the search as we knew it
Google’s approach is called Gemini – an AI model with which queries can no longer only be made in text, but also in image form. This is already possible on smartphones from Google and Samsung. Under the motto “We do the Googling for you,” this function is now being integrated into other Google services. The newest and fastest member of the Gemini family to date is called Gemini 1.5 Flash and is intended to work faster and more efficiently.
A new aspect of Google search is called “Overviews”. This means that in the future, Googling will be more reminiscent of interacting with a chatbot. In a separate overview area, Google tries to answer the search query directly. Only below do the links to other websites – known from previous Google searches – follow. A test phase has shown that this type of search increases usage – and with it user satisfaction. The new AI-powered search will initially be launched in English in the US. But it should come to Europe “in the foreseeable future”.
Like ChatGPT, Gemini should also be able to analyze uploaded files in the future. Google also has an update for paying customers under the title Gemini Live. Users of the paid Gemini Advanced offer should be able to have conversations with the AI assistant on their mobile devices. Similar to GPT-4o, users can choose between multiple “natural-sounding” voices and interrupt Gemini’s responses to ask questions. This feature is expected to be available “in the coming months.”
My themes
For your saved topics were
new articles found.
info By clicking on the icon you can add the keyword to your topics.
info
By clicking on the icon you open your “my topics” page. They have of 15 keywords saved and would have to remove keywords.
info By clicking on the icon you can remove the keyword from your topics.
Add the topic to your topics.
Source: Nachrichten