Suno’s upgraded AI music generator is technically impressive, but still soulless

1 hour ago 4

When it’s not trying to fend off lawsuits from major record labels, Suno is still working on refining its AI music creation tool. The latest model, Suno v5, is an obvious technical improvement over its previous version, v4.5+. But it still can’t seem to escape the bland emptiness that pervades most AI art.

There are some across-the-board upgrades in audio quality that are undeniable, like fewer artifacts and clearer separation between instruments. Some tracks produced using v4.5+ can smush all the melodic parts together in a way where the lines between guitar, bass, and synth are muddy at best. But with v5, the mixes are much cleaner.

During a demo, Henry Phipps, a Suno product manager, pointed to a song we had the model generate that included a flute-like synth with what sounded like a ping-pong delay effect on it: “I’ve never heard that before in previous models… what that says to me is that the model understands that this is an isolated sound that’s being affected and needs to be reproduced faithfully in different parts of the stereo field.” Since Suno isn’t actually applying effects in the traditional sense, this means the model is identifying a particular instrument and approximating the sound of a stereo delay because it’s decided that is what it should sound like.

There are no edges to any of the Suno vocals. Everything is bathed in reverb, layered with harmonies, and perfectly on pitch. Even if you explicitly tell it not to do these things, the model just ignores you.

Suno also claims that v5 has a better understanding of genre, though that claim seems questionable from my testing. With some of my prompts like “modern avant R&B with glitchy, but funky drums, atmospheric melodic parts, and breathy vocals,” neither v5 or v4.5+ seemed to be the clear winner in delivering what I had in mind (mostly Kelela’s Take Me Apart). They both got close, giving me downtempo tracks with some moody synths, but they lacked the weirdness I was hoping for.

Neither could Suno quite figure out what I was looking for with “early ‘90s lo-fi indie rock recorded on a 4-track cassette recorder with off key vocals and slightly out of tune guitars” either, but v5 was definitely more off target. Despite everything I tried, I could not get Suno to spit out anything that sounded even remotely like Pavement. The loose slacker noise pop I associate with Slanted and Enchanted was nowhere to be found. Instead, I got bombastic “indie” rock with chunky riffs and clean driving power chords. Suno v5 kept serving up songs that sounded more like Arctic Monkeys than anything released before the turn of the century.

Similarly, in my testing, v5 seemed to struggle with era- or decade-specific prompts at times. When I asked for “late 1970s krautrock,” v4.5+ basically nails it outside of the vocals (more on that later). But v5 often delivers ‘80s-tinged synthpop and tracks that are distinctly more modern sounding, even if they have some of that classic krautrock DNA.

What I will say is that the arrangements that Suno’s v5 model creates are much more complex. Compared to v4.5+, there are more one-off musical flourishes that keep things from getting too repetitive and more varied song structures. Where v4.5+ is usually content to stick with a basic verse-chorus-verse structure (with a bridge tacked on for good measure), v5 would often have pre- or post- chorus sections, multiple bridges or breakdowns, and generally build over the course of a track offering more of an arc than just distinct sections.

It also occasionally delivered interesting results when remixing existing tracks. I uploaded a song from an EP I released a few years back (which probably should have tripped its copyright filter) and look, I’m not going to lie, I kind of liked the way it transcribed parts of my guitar solo into a recurring synth motif and turned my big chord pads into driving arpeggios.

But what was missing in all of these covers of my song that I asked Suno to create was the raw, lo-fi nature of the track that I recorded in my living room at 3AM about six years ago. And that’s kind of a running theme here. While Suno can mimic some of the superficial features of an old recording or a human performance like tape hiss or breaths, it always feels inauthentic.

Phipps admits that he hasn’t heard the vocal model recreate the unique imperfections of a real human performance. In its early messaging about v5, Suno touted its “emotionally rich vocals” and “human-like emotional depth,” but that phrasing is now absent from any public-facing materials. Instead, the company has now chosen to describe the vocals as “natural, authentic,” chalking the change up to a “stylistic choice.”

But even that feels like a stretch. While, yes, compared to v4.5+ the vocals feel more human, they’re still stiff. Phipps explained that “when we perceive a vocal out of Suno [v4.5] to be emotionally flat, I think it’s because it’s just missing some detail that gives it that edge,” and that the higher fidelity of the v5 model delivers that detail.

It’s hard to argue with the technical aspects of that claim — vocal performances are more detailed — but they’re still all painfully generic. Every rock vocal ends up sounding like Imagine Dragons or Mumford and Sons, every R&B song like a sleepwalking Adele or a charmless Ariana Grande.

There are no edges to any of the Suno vocals. Everything is bathed in reverb, layered with harmonies, and perfectly on pitch. Even if you explicitly tell it not to do these things, the model just ignores you. I asked v5 for an “unprocessed emotional solo A cappella female vocal performance with no reverb, no harmonies, no effects, just dry vocals.” The two songs it delivered were bathed in reverb, included additional vocalists harmonizing with the first, and one even had what sounded like a bass accompaniment. (Though, it may have been a voice approximating a bass.) But Phipps wasn’t surprised. The “models don’t yet understand descriptions of specific effects and recording techniques. The way the vocal is performed is most influenced by the lyrics and the general mood,” he said.

So, I fed Suno lyrics that were just different enough from the Rolling Stone’s “Gimme Shelter to avoid getting flagged for copyright infringement. At first brush it seemed to have all the elements that make the original so devastating. A powerful female vocalist shouting over a full, bluesy arrangement, but it had all of the emotional impact of a dentistry textbook.

When I listen to the “Gimme Shelter, it’s the way Mary Clayton’s voice cracks as she belts out “rape and murder” during the bridge that causes me to choke up. It’s Robert Smith’s completely out-of-tune warble that conveys the desperation in “Why Can’t I Be You” and the tangible exhaustion in Kurt Cobain’s breath right before he delivers the last line in “Where Did You Sleep Last Night” that tells you this is a man struggling with real demons.

In general, trying to make Suno sound “bad” — out of tune, raw, off key, sloppy — was futile. For all the company’s talk about how “natural” the new model’s vocals sound it lacks the imperfections that often carry the emotional weight of a performance. Suno’s virtual vocalists still sound detached. Model v5 might understand that a particular lyric should be sad, but it has no actual emotional connection to the words, because it’s a pile of code, not an artist.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Read Entire Article