Apple showed off how it tested and developed Audio Mix and 4K 120fps slow-motion video for the iPhone 16 Pro.
Hi, I'm in one of Apple's testing labs for audio as you probably can hear my audio sounds a little different. This is probably one of the quietest places at Apple, if not in the world. I got rare access to the audio and video labs that Apple uses to test and calibrate the iPhone 16's video and audio. Join me and let's take a look. For years, the iPhone's ability to record videos with excellent image quality has been a high watermark when it comes to phones. But just as competitors like Samsung have started to catch up, Apple released the iPhone 16 Pro, which takes video capture to another level by bringing parity between regular video and slow motion in terms of image. And dynamic range. And while how things look on your phone's screen is important, a video's audio is likely its most important attribute. Thankfully I got to speak with several of the engineers behind all of this while getting rare behind the scenes access to some of Apple's labs that they used to calibrate the audio and video on the iPhone 16. My first stop was an anechoic chamber that Apple uses to calibrate the microphones on the iPhone 16. OK, so let's talk about this room. Not only, notice the floor, it looks like I'm on a bunch of chicken wire. It's not really chicken wire, but this looks that way. And what this does is it keeps anything from vibrating. The floor is not connected to the ground. It's actually isolated from a building. It's isolated from the foundation. And then if I'm standing on this, it minimizes my vibration so it doesn't get picked up by the audio testing. Essentially, this room has no echoes. Listen to me clap. Here's what a clap sounds in a deprivation place like this. Ready? I don't know if you can hear that on the microphone. There's no echo whatsoever. It's so eerie. Apple achieves this level of quietness with a lot of foam. I'm gonna close the door. Notice how thick this door is. The door, look at the size of these. Some people say you can actually hear your own heartbeat. It's so quiet in here. So what is Apple doing with this anechoic chamber? Well, the challenge Apple has is to take something that fits inside your pocket. And have it capture pristine sounds and play them back in a way that you've heard it or even better. And I spoke with Apple's Rair Devie and Francesca Sweet to find out how Apple went about this. We wanted to enable spatial audio capture on our iPhones to get this immersive capture of the sound field you're in, and when you relive those memory, hopefully that kind of reminds you of where you took those videos, but developing them such that. It can work in all these different soundscapes and maintain the amazing robustness we have on the iPhone. That was the that was the task on hand. The iPhone 16 has 4 tiny microphones, but in order to get them to pick up the sound of a much larger microphone, well, Apple had to do some clever engineering, which started out by testing what the mics actually pick up, and that brings you back to the anechoic chamber for a chime test. So what's going on here is there's a series of speakers on an arc here and they're playing a chime, and then what's happening is we're testing the iPhone's microphones and seeing how they pick up that chime. Then the iPhone rotates a couple more degrees, the chimes play again, and it does this until it goes around 360 degrees. By figuring out how exactly each mic on the iPhone hears sounds, Apple is then able to use software and machine learning to make the mics act like different kinds of microphones, like a cardioid mic or a lavele mic, depending on different situations. Yeah, I mean, if you think about it, the setup we have right now, right, ideally, most of the content creators want to put the microphone right on the user. You put a lapel mic on me, but that's on the iPhone. The mic goes where the product is, but we still want to enable that feature as if you record it on a lapel mic. And so the placement of the microphone is extremely critical, and then with the final placement we end up with, we use machine learning algorithms as well as our tuning chains to come up with that signature sound that you're able to get even with lapel mic. And we've been doing this development over a number of years and so so much of the machine learning capabilities that we have today are built on years of experience and expertise that we've been developing. But the testing doesn't stop there. The next stop is a lab where Apple does comparative playback testing to help tune the audio. All right, I'm in a really cool area. This is for Apple's perceptual test for audio, and I'm gonna listen to a loop of different audio and kind of compare which one I like better, which one I don't. And what Apple does is they have different users take this test and it helps them tune the audio for their different devices. So let's see what happens. So what you can't hear are video clips being played back. Each clip has two audio tracks, and I switched back and forth between them, and I get to rate the audio if it's good, bad, fair, excellent, and so on. Apple has a number of testers do this and uses this info to help tune how the iPhone records and plays back audio. We also talked about uh tuning those products and then comparing it to not just Uh, a multiple products out there, but also, uh, making sure that our tuning is. Uh, working in all sorts of different scenarios, and I think what's important to add is that it is about the users and how they experience the device, right? It is not just, you know, just weir with his golden ear that's sitting in there and dictating how it should sound. We really want to make sure that, you know, anyone who's taking advantage of this feature is gonna appreciate it and enjoy it and it's um so it really is about our users and making sure that it is. Something that they can take advantage of really easily. So how does all this work for a feature like Audio mix that lets iPhone 16 owners change the quality of how their audio sounds on a video? Well, Rashir explains the idea behind development of Audio mix is, um, the iPhone gets used in all sorts of different scenarios and all sorts of different soundscapes and all sorts of different environment, and we wanted, uh, To provide that flexibility to our users to be able to capture the sound they would like in those scenarios rather than saying this is what we think you should capture the sound and like hard coding it. So that's why we came up with these different audio mixes to be able to give that studio feel when you use the studio mix or give that cinematic feel and have your dialogue front and center and have a really nice balanced ambience to go with it. Uh, so it's giving that creative freedom to our users, but at the same time maintaining the simplicity. Of course, we're here also pointed out that you don't need to be a video nerd, my words not his, to have good audio in your iPhone videos. We're not expecting every user to go in and edit the videos and change the sliders. If you, uh, if you shot it the way we intended it to be shot, then it should sound amazing. And if you still want to change it the way you would like. You have the full freedom. Rich's team does so much work and so much engineering and there's so much intelligence behind studio quality mics and the spatial capture that allows to do things like audio mix, but it's surfaced up in a way in the photos app that makes it really easy to use and really intuitive, and that allows for any user to take advantage of these really powerful technologies. My next and last stop is the video validation lab. If the anechoic chamber was about minimizing noise and stimuli. The theater is just about the opposite. Francesca introduced me to Sean Yang, who's on the team behind video playback and how they use the theater to tune it. For every iPhone, we calibrated the display at a factory floor to make sure that color is accurate, and the brightness uniformity is good, the peak brightness mattresses back, and there are many things we actually do at the factory so that every customer when they buy iPhone, they have the same display. So when we play the video, we also look at the MD environment by reading the MB environment sensor, so we adapt the video playback to the environment that you're playing a video on. Now, I'm not able to show off every corner of the theater, but imagine having your very own Dolby Atmos Theater to watch your videos that you recorded on your iPhone. We use this the. to tune the video playback experiences so that when you play back this video and darkroom or in in the office environment or even under the sun, you get the same perceptual experience that you will get in as if you're watching a video in the theater. I got to see how the theater screen mimics what playback on the iPhone 16 Pro screen looks like. And that includes one of my favorite new features on the iPhone 16 Pro, it's ability to record and playback 4K 120 frames per second slow motion video. 4K 120, uh, it's a massive amount of it, if you think about it's 1 billion pixels per second, right? And we actually have to enhance RA 18 Pro SOC significantly so that we add that the Apple camera interface to allow us to reading. These maximum data with the minimal latency, we actually have to enhanced the image signal process ISP, Apple neural engine GPU to be able to process these data in real time with the highest quality that our user come to expected from Apple. And I think what's important is you can't really fake it, right? So if you're capturing it 120 frames per second and you're slowing it down to half speed or quarter speed, You really are going to see every single frame in its full detail and so any kind of artifacts or things like that are going to come through. Like the testing that goes on to tune the audio, Apple has people share their feedback on videos, so it's not just one person's opinion about how videos should look when played back on an iPhone. We actually have a group of experts and at Apple to look at this video. If whenever we have a difference of opinion, we get them together. We debate, you know, oftentimes you needed to, to have some sort of tradeoff and uh so we consulted the many experts within the Apple to make sure that uh the video comes through as a uh a highest quality, no matter where you play. As I reflect back on the labs I saw and the engineers I met, I keep thinking back to something that Francesca told me at the end of I think we just want to, you know, reinforce that the simplicity of the design doesn't mean that there is an incredible technical depth that goes behind it. And so, of course, you know, we try to make it as seamless as possible for a user to interact with these really powerful tools that, you know, allow you to manipulate the audio or video um and service them in a way that's, you know, really accessible, but there is an inordinate amount of engineering. Work that goes in to make them that simple. At the end of the day, whether it's Apple, Samsung, or Google or others, a lot goes into the video and audio features on our phones. But the next time you're filming a short film with your iPhone or just recording a quick video of your kids playing in the living room, it's wild to think about all the time, effort, and testing Apple has spent to make either one look and sound its best. Thank you for watching.