The Week that Was in Generative AI

This week has seen a flurry of major announcements in the generative AI space, highlighting the rapid advancements and diverse applications of this transformative technology.

Google Cloud unveiled several significant updates at its Cloud Next 2024 event. The highlight was the introduction of Gemini 1.5 Pro, now available in public preview. This model showcases enhanced performance, especially in long context understanding, and supports multimodal capabilities, enabling innovative applications across various industries.

Additionally, Google announced new AI-powered features for Google Workspace and expanded access to their AI models via Vertex AI, which now supports over 1 million developers​.

Google is rolling out AI-generated overviews to US search results. These overviews provide quick, summarized answers to complex queries by piecing together information from multiple sources. Users can adjust the language and detail level in these overviews to suit their needs.

The process will enable users to “ask your most complex questions, with all the nuances and caveats you have in mind, all in one go,” rather than having to break your question into multiple services, according to a Google blog post.

This feature is powered by a new customized Gemini model, which enhances multi-step reasoning and planning capabilities​. Global rollout will follow later.

OpenAI introduced GPT-4o (the o stands for “omni”) designed to handle multimodal inputs including text, images, and audio.

GPT-4o integrates text, image, and audio processing in a single model. This allows for more versatile applications, such as analysing images and providing detailed textual responses, as well as generating human-like speech from text​.

OpenAI has introduced new fine-tuning capabilities for GPT-4o, which will enable independent developers to incorporate ChatGPT 4o into other products. This includes the ability to create custom actions and integrate external APIs to enhance the functionality of the AI assistant.

OpenAI also introduced GPT-4 Turbo, a new version of their language model that includes a larger 128K context window and lower pricing. This model also supports image inputs, enabling use cases like generating captions and analysing real-world images.

ChatGPT now supports voice interactions and image inputs. The voice capability, powered by a new text-to-speech model, allows users to generate human-like audio. The image feature enables users to discuss images and use a drawing tool for more detailed interactions. These capabilities are being gradually rolled out to ensure safety and effectiveness​.

Amazon Web Services (AWS) launched Amazon Bedrock Studio, a new generative AI development environment, in public preview. This platform is designed to accelerate the creation of AI applications by offering tools for rapid prototyping and deployment. AWS also expanded its Titan family of models with the new Titan Text Premier, enhancing capabilities for text-based generative AI applications.

A new study by the IBM Institute for Business Value found that surveyed CEOs are facing workforce, culture and governance challenges as they act quickly to implement and scale generative AI across their organizations.

The annual global study* of 3,000 CEOs from over 30 countries and 26 industries found that 64% of those surveyed say succeeding with generative AI will depend more on people’s adoption than the technology itself. However, 61% of respondents say they are pushing their organization to adopt generative AI more quickly than some people are comfortable with.

Also this week in AI, eight prominent US newspapers, including the New York Daily News, Chicago Tribune and Orlando Sentinel, sued OpenAI and Microsoft for copyright infringement relating to the companies' use of generative AI tech. They, like The New York Times in its ongoing lawsuit against OpenAI, accuse OpenAI and Microsoft of scraping their IP without permission or compensation to build and commercialize generative models such as GPT-4.

“We’ve spent billions of dollars gathering information and reporting news at our publications, and we can’t allow OpenAI and Microsoft to expand the big tech playbook of stealing our work to build their own businesses at our expense,” Frank Pine, the executive editor overseeing the newspapers, said in a statement.