Amazon announces Nova, a new family of multimodal AI models

At its re:Invent conference on Tuesday, Amazon Web Services (AWS), Amazon’s cloud computing division, announced a new family of multimodal generative AI models that it calls Nova.

There are four text-generating models in total: Micro, Lite, Pro and Premier. Micro, Lite and Pro will be available to AWS customers on Tuesday, while Premier will arrive in early 2025, Amazon CEO Andy Jassy said on stage.

In addition, there is an image generation model, Nova Canvas, and a video generation model, Nova Reel. Both also launched on AWS this morning.

“We have continued to work on our own boundary models,” Jassy said, “and those boundary models have made tremendous progress in the last four to five months. And we thought if we got value out of it, you would probably get value out of it.”

The text-generating Nova models, which are optimized for 15 languages (but mainly English), vary greatly in size and performance.

Micro can only record and output text, but offers the lowest latency of all – it processes text and generates responses the fastest.

Lite can process image, video and text input reasonably quickly. Pro offers a balanced combination of accuracy, speed and cost for a range of tasks. And Premier is the most powerful and designed for complex workloads.

Like Lite, Pro and Premier can analyze text, images and videos. All three are good for tasks like digesting documents and summarizing charts, meetings, and diagrams. However, AWS positions Premier as a “teacher” model for building fine-tuned custom models, rather than a model that can be used alone.

Micro has a context window with 128,000 tokens, meaning it can handle up to around 100,000 words. Lite and Pro have context windows with 300,000 tokens, which is equivalent to about 225,000 words, 15,000 lines of computer code, or 30 minutes of footage.

In early 2025, certain Nova models’ context windows will be expanded to support over 2 million tokens, AWS says.

Jassy claims the Nova models are among the fastest in their class – and among the most cost-effective to run. They are available in AWS Bedrock, Amazon’s AI development platform, where they can be tuned to text, images, and videos and distilled for improved speed and greater efficiency.

“We optimized these models to work with proprietary systems and APIs, making it much easier for you to perform multiple orchestrated automated steps – agent behavior – with these models,” Jassy added. “So I think these are very compelling.”

Canvas and Reel are AWS’ strongest play in generative media yet.

Canvas allows users to create and edit images based on prompts (such as removing backgrounds) and provides controls for the color schemes and layouts of the generated images. Reel, the more sophisticated of the two models, creates videos up to six seconds long from prompts or, optionally, reference images. Reel allows users to adjust camera movement to create videos with pans, 360-degree rotations, and zooms.

Reel is currently limited to six-second videos (which take about three minutes to generate), but a version that can create two-minute videos is “coming soon,” according to AWS.

Here is an example:

And one more thing:

And here are pictures from Canvas:

AWS Nova Canvas — According to AWS, Canvas can generate images in different styles and extend existing images or insert objects into scenes. **Photo credit:**AWS

Jassy emphasized that both Canvas and Reel have “built-in” controls for responsible use, including watermarks and content moderation. “(We are trying to limit) the generation of harmful content,” he said.

AWS explained the safeguards in a blog post, saying that Nova is “expanding its security measures to combat the spread of misinformation, child sexual abuse material, and chemical, biological, radiological or nuclear risks.” However, it is not clear what this means in practice – or what forms these measures take.

AWS also remains unclear about exactly which data is used to train all generative models. The company previously only told TechCrunch that it was a combination of proprietary and licensed data.

Only a few providers willingly disclose such information. They view training data as a competitive advantage and therefore keep it – and the associated information – top secret. Training data details are also a potential source of intellectual property lawsuits, another incentive to reveal much.

In lieu of transparency, AWS offers an indemnity policy that covers customers in the event that one of its models regurgitates a potentially copyrighted still image (i.e., spits out a mirror copy of it).

So what’s next for Nova? According to Jassy, AWS is working on a speech-to-speech model – a model that takes speech and outputs a transformed version of it – for the first quarter of 2025 and an “any-to-any” model for around mid-2025.

AWS re:Invent 2024 Nova — **Photo credit:**Frederic Lardinois/TechCrunch

The speech-to-speech model will also be able to interpret verbal and non-verbal cues such as tone and cadence and deliver natural, “human-like” voices, according to Amazon. As for the Any-to-Any model, it will theoretically support applications from translators to content editors to AI assistants.

Of course, that assumes it doesn’t suffer any setbacks.

“You can input text, voice, images or video and output text, voice, images or video,” Jassy said of the any-to-any model. “This is the future in which frontier models are built and consumed.”

This article originally appeared on TechCrunch at https://techcrunch.com/2024/12/03/amazon-announces-nova-a-new-family-of-multimodal-ai-models/

Related Posts

Friday Morning Weather Briefing Video: Severe Threat Late Saturday Afternoon and Evening: The Alabama Weather Blog

“Bel-Air” ends season 4 on Peacock

Amy Adams delivers in the crazy dog ​​comedy

Leave a Reply Cancel reply

Amy Adams delivers in the crazy dog comedy