Pinterest Stumbleupon Whatsapp
Ads by Google

Wednesday morning, Microsoft showed off a project they’ve been working on for seven years, an augmented reality headset called Project HoloLens.

The vision is ambitious: they want to fundamentally change the way people interact with computers, by building a pair of glasses that can fluidly mix virtual and real content together in the physical space of the user. This is like virtual reality technology Why Virtual Reality Technology Will Blow Your Mind in 5 Years Why Virtual Reality Technology Will Blow Your Mind in 5 Years The future of virtual reality includes head, eye and expression tracking, simulated touch, and much more. These amazing technologies will be available to you in 5 years or less. Read More , but fundamentally more powerful.  Furthermore, they want to do all the processing locally on the glasses — no computer, no phone, no cables. They’re even launching a special version of Windows just for the new hardware. This is the next stage in technological evolution for all those AR games Augmented Reality Apps: Useful, Or Just Hype? MakeUseOf Tests Augmented Reality Apps: Useful, Or Just Hype? MakeUseOf Tests In 2011, analysts predicted the rise of Augmented Reality mobile apps. The nascent technology would revolutionize the way we interact with our mobile devices. Flash forward two years and dozens of AR apps populate all... Read More you installed on your phone that one time and haven’t touched since.

Their time frame is even more ambitious than their goals: they want to ship developer kits this spring, and the consumer product “during the Windows 10 timeframe”. Here’s the pitch.

All of this sounds great, but I admit to a fairly high degree of skepticism.

The technologies Microsoft is using have serious, fundamental challenges, and so far Microsoft has been very tight-lipped about how (or if) they’ve solved them. If they haven’t solved them, then their goal of shipping within a year is very concerning. The last thing VR and AR need is a big company shipping another half-baked product like the Kinect. Remember the Project Natal demo from 2009?

Ads by Google

Without further ado, here are the five most important things I’d like to know about the HoloLens.

Is This a Light Field Display?

In order to understand this one, we have to look a little deeper into 3D, and how it works. In order to get the sensation of a real, tangible 3D world, our brains integrate a lot of different kinds of information. We get depth cues about the world in three primary ways:

  • Stereo depth — the disparity between what both of our eyes see.  Faking this is how 3D movies work
  • Motion parallax — subtle motions of our head and torso give us additional depth cues for objects that are farther away
  • Optical focus — when we focus on something, the lenses of our eyes physically deform until it comes into focus; near-field objects require more lens distortion, which provides depth information about what we’re looking at

Optical focus is easy to check out for yourself:  close one eye and hold your thumb up in front of a wall across the room. Then, shift your focus from your thumbnail to the surface behind it. When looking past your thumb, your thumb will shift out of focus because the lens of your eye is now less deformed and can’t correctly collect the light coming from it.

VR headsets like the Oculus Rift provide the first two cues extremely accurately, but not the last, which works out surprisingly well: our eyes default to relaxing completely, since the optics focus the images as through the light were coming from infinitely far away. The lack of the optical focus cue is unrealistic, but it usually isn’t distracting. You can still have very cool gaming experiences 5 Oculus Rift Gaming Experiences That Will Blow You Away 5 Oculus Rift Gaming Experiences That Will Blow You Away Now that the second generation of Oculus Rift development kit is out and in the hands of developers the world over, let's look at some of the best stuff that's hit the Rift so far. Read More without it.

In augmented reality, the problem is different, because you have to mix light from real and virtual objects. The light from the real world will naturally be focused at a variety of depths. The virtual content, however, will be all be focused at a fixed, artificial distance dictated by the optics — probably on infinity. Virtual objects won’t look like they’re really part of the scene. They’ll be out of focus when you look at real things at the same depth and vice versa.  It won’t be possible to move your eye fluidly across the scene while keeping it in focus, as you do normally. The conflicting depth cues will be confusing at best, and sickening at worst.

In order to fix this, you need something called a light field display. Light field displays are displays that use an array of tiny lenses to display light focused at many depths simultaneously. This allows the user to focus naturally on the display, and (for augmented reality) solves the problem described above.

There is, however, a problem: light field displays essentially map a single 2D screen onto a three-dimensional light field, which means that each “depth pixel” that the user perceives (and exists at a particular focal depth in the scene) is actually made up of light from many pixels on the original display. The finer-grained the depth you want to portray, the more resolution you have to give up.

Generally, light fields have about an eight-fold resolution decrease in order to give adequate depth precision. The best microdisplays available have a resolution of about 1080p. Assuming one high-end microdisplay driving each eye, that would make the actual resolution of Microsoft’s headset only about 500 x 500 pixels per eye, less even than the Oculus Rift DK1. If the display has a high field of view, virtual objects will be incomprehensible blobs of pixels. If it doesn’t, immersion will suffer proportionately. We never actually get to see through the lens (just computer re-creations of what the user is seeing), so we have no idea what the user experience is really like.

It’s possible that Microsoft has come up with some novel solution to this problem, to allow the use of a light field display without the resolution tradeoff. However, Microsoft’s been extremely cagey about their display technology, which makes me suspect that they haven’t. Here’s the best explanation we’ve got so far (from the WIRED demo).

To create Project HoloLens’ images, light particles bounce around millions of times in the so-called light engine of the device. Then the photons enter the goggles’ two lenses, where they ricochet between layers of blue, green and red glass before they reach the back of your eye.

This sort of description of the technology could mean practically anything (though, in fairness to Microsoft, the hardware did impress WIRED, though the article was light on details).

We won’t know more for sure until Microsoft starts to release technical specs, probably months from now. On a further note of nit picking, is it really necessary to drown the project in this much marketing-speak? The dedicated processor they’re using for head tracking is called a “holographic processor” and the images are called “holograms,” for no particular reason. The product is fundamentally cool enough that it really isn’t necessary to gild it like this.

Is the Tracking Good Enough?

The Project HoloLens headset has a high FOV depth camera mounted on it (like the Kinect), which it uses to figure out where the headset is in space (by trying to line up the depth image it’s seeing with its model of the world, composited from past depth images). Here’s their live demo of the headset in action.

The tracking is impressive considering that it uses no markers or other cheats, but even in that video (under heavily controlled conditions), you can see a certain amount of wobble: the tracking is not completely stable. That’s to be expected: this sort of inside-out tracking is extremely hard.

However, the big lesson from the various Oculus Rift prototypes Watch Us Try The Oculus Rift Crescent Bay At CES 2015 Watch Us Try The Oculus Rift Crescent Bay At CES 2015 The Oculus Rift Crescent Bay is a brand new prototype that shows off some exciting improvements in virtual reality technology. We try it out at CES 2015. Read More  is that accuracy of tracking matters a lot. Jittery tracking is merely annoying when it’s a few objects in a largely stable real world, but in scenes like the Mars demo they showed in their concept video, where almost everything you’re seeing is virtual, imprecise tracking could lead to a lack of “presence” in the virtual scene, or even simulator sickness. Can Microsoft get the tracking up to the standard set by Oculus (sub-millimeter tracking accuracy and less than 20 ms total latency) by their shipping date at the end of this year?

Here’s Michael Abrash, a VR researcher who has worked for both Valve and Oculus, talking about the problem

[Because there’s always a delay in generating virtual images, […] it’s very difficult to get virtual and real images to register closely enough so the eye doesn’t notice. For example, suppose you have a real Coke can that you want to turn into an AR Pepsi can by drawing a Pepsi logo over the Coke logo. If it takes dozens of milliseconds to redraw the Pepsi logo, every time you rotate your head the effect will be that the Pepsi logo will appear to shift a few degrees relative to the can, and part of the Coke logo will become visible; then the Pepsi logo will snap back to the right place when you stop moving. This is clearly not good enough for hard AR

Can the Display Draw Black?

Another issue alongside focal depth and tracking has to do with drawing dark colors. Adding more light to a scene is relatively simple, using beam splitters. Taking light out is a lot harder. How do you selectively darken parts of the real world?  Putting up a selectively transparent LCD screen won’t cut it, since it can’t always be at the correct focus to block what you’re looking at. The optical tools to solve this problem, unless Microsoft has invented them secretly, simply don’t exist.

This matters, because for a lot of the applications Microsoft is showing off (like watching Netflix on your wall), the headset really needs the ability to remove the light coming from the wall, or else your movie will always have a visible stucco pattern overlaid with it: it’ll be impossible for imagery to block out real objects in the scene, making the use of the headset heavily dependent on the ambient lighting conditions.  Back to Michael Abrash:

[S]o far nothing of the sort has surfaced in the AR industry or literature, and unless and until it does, hard AR, in the SF sense that we all know and love, can’t happen, except in near-darkness.

That doesn’t mean AR is off the table, just that for a while yet it’ll be soft AR, based on additive blending […] Again, think translucent like “Ghostbusters.” High-intensity virtual images with no dark areas will also work, especially with the help of regional or global darkening – they just won’t look like part of the real world.

What About Occlusion?

“Occlusion” is the term for what happens when one object passes in front of another and stops you from seeing what’s behind it.  In order for virtual scenery to feel like a tangible part of the world, it’s important for real objects to occlude virtual objects: if you hold your hand up in front of a piece of virtual imagery, you shouldn’t be able to see it through your hand.  Because of the use of a depth camera on the headset, this is actually possible.  But, watch the live demo again:

By and large, they carefully control the camera angles to avoid real objects passing in front of virtual ones. However, when the demonstrator interacts with the Windows menu, you can see that her hand doesn’t occlude it at all. If this is beyond the reach of their technology, that’s a very bad sign for the viability of their consumer product.

And speaking of that UI…

Is This Really the Final UI?

The UI shown off by Microsoft in their demo videos seems to work by using some combination of gaze and hand tracking to control a cursor in the virtual scene, while using voice controls for selecting between different options.  This has two major drawbacks: it makes you look like the little kid in the Shining who talks to his finger, but more importantly, it also represents a fundamentally flawed design paradigm.

Historically, the best user interfaces have been ones that bring physical intuitions about the world into the virtual world.  The mouse brought clicking, dragging, and windows.  Touch interface brought swipe to scroll and pinch to zoom.  Both of these were critical in making computers more accessible and useful to the general population — because they were fundamentally more intuitive than what came before.

VR and AR give you a lot more freedom as a designer: you can place UI elements anywhere on a 3D space, and have the users interact with them naturally, as though they were physical objects. A huge number of obvious metaphors suggest themselves. Touch a virtual UI element to select it. Pinch to it pick it up and move it. Slide it out of the way to store it temporarily. Crush it to delete it. You can imagine building a user interface that’s so utterly intuitive that it requires no explanation. Something that your grandmother can instantly pick up, because it’s built on a foundation of basic physical intuitions that everyone builds up over a lifetime of interacting with the world. Take a minute, and listen to this smart person describe what immersive interfaces could be.

In other words, it seems obvious (to me) that an immersive user interface should be at least as intuitive as the touch interfaces pioneered by the iPhone for 2D multitouch screens. Building an interface around manipulating a VR “mouse” is a step backward, and exposes either deep technological shortcomings in their hand tracking technology or a fundamental misunderstanding of what’s interesting about this new medium. Either way, it’s a very bad sign for this product being more than a colossal, Kinect-scale flop.

Hopefully, Microsoft has time to get feedback on this and do a better job.  As an example, here’s an interface designed by one hobbyist for the Oculus Rift DK2 and the Leap Motion. An immersive UI designed by a large company should be at least this good.

A Sign of Things to Come

On the whole, I’m extremely skeptical of the HoloLens Project as a whole.  I’m very glad that a company with Microsoft’s resources is investigating this issue, but I’m concerned that they’re trying to rush a product out without solving some critical underlying technical issues, or nailing down a good UI paradigm. The HoloLens is a sign of things to come, but that doesn’t mean that the product itself is going to provide a good experience to consumers.

Image Credit: courtesy of Microsoft

  1. FutureJunky
    October 27, 2016 at 11:52 pm

    Do we have an update for this article?

  2. Anaryl
    April 1, 2015 at 1:21 pm

    Whilst there are lots of pertinent points raised here - almost all of it is conjecture. Saying "If they are using this means this is not a good sign" is pretty much speculation.

  3. Anonymous
    January 28, 2015 at 12:14 am

    So you wrote about somthing you did not have hands on use of and guessed about stuff

    • Mark
      January 28, 2015 at 12:47 am

      This article contained more useful technical information that any of the breathless PR nonsense I've read elsewhere. Try finding anyone else discussing the need for occlusion and light blocking.

  4. Mattski
    January 26, 2015 at 2:18 am

    Watched it with my ten-year-old, who said, "I want that!" But she reserved her greatest excitement for the dog at the end of the ad.

  5. transmitthis
    January 25, 2015 at 11:00 pm

    I would have thought a top 5 question would be asking about the field of view the device covers.
    as seen here

  6. Chris
    January 25, 2015 at 5:41 pm

    I am a 17 year old, I love to code (C#) but lack the ideas for new projects. Seeing this made me so happy, I'll be buying one weather it is flawed or not; I would love to be one of the first consumers to own one and can't wait to start developing for it.

    • Aaron
      February 13, 2016 at 3:20 am

      You might a formal education first. Otherwise, good luck! :)

  7. John
    January 24, 2015 at 6:17 pm

    The oculus rift really really sucks. Extremely low resolution, computer issues, poor content, etc

    • Andre Infante
      January 25, 2015 at 1:12 am

      The resolution certainly isn't what everyone would like it to be, but it is developer hardware for now - expecting it to have a rich stock of content and run flawlessly is maybe asking a little much.

      In any case, I have a Rift sitting an inch from my hand right now, and I think that, for all its (many) rough edges, it does offer an experience that nothing else comes close to for now.

  8. alms for palmer
    January 24, 2015 at 12:47 am

    How about you act like a journalist, go get answers, then print those instead of the clickbait?

    • cegli
      January 24, 2015 at 2:40 am

      I would assume because Microsoft is not taking questions on the matter. This article isn't click-bait. The questions the author bring up are very fair, and are similar to the ones I had when I watched the presentation.

  9. DonGateley
    January 23, 2015 at 8:43 pm

    Very, very impressive article but a couple of nits. Joe Belfiore said it knows exactly where in the image your eyes are looking. If detection is accurate enough with both eyes then light field is not needed at all. All the effects you talk about can be simulated and more importantly foveated rendering can reduce both bandwidth to and internal to the device and the computational burden while increasing apparent resolution. Radically in each case. My guess is that the projector is micro-mirror because with that the array resolution need not be uniform and the high resolution part of the it can be steered to where you are looking. There be magic there.

    The accuracy of the tracking should be as accurate as the sensors possibly allow given the amount of horsepower and the dedicated processor that they say is in the device. Given that data turn around between device and host should be reduced by all the local processing, it has the potential for much greater stability, much lower latency and much higher foveal frame rate than the Oculus Rift.

    The problem of selective transmission (black) is an intriguing one and I anxiously await the answer to that. I think that accurate eye tracking is going to make it easier to solve like it does everything else. For VR, however, an add on mask that keeps out those pesky ambient photons should suffice to make it equivalent to the Rift in that regard. That's a price I'd most willingly pay to get the total immersion that VR requires.

    Everything else seems to be well in place to achieve all the things that Abrash and Carmack have to say about immersion and presence. I have a feeling the HoloLens revelation was an earthquake at OculusBook. Not to mention at some of the smaller outfits trying to move into the space.

    Occlusion is only a compute power problem and it sounds like they are throwing the works at that.

    Your observations on UI are great food for thought.

    • Andre Infante
      January 23, 2015 at 10:20 pm

      Unfortunately, eye-tracking doesn't resolve the optical focus problem by itself. You still need to be able to display different parts of the scene at different focal depths (although you might be able to cheat, I suppose, if you could alter the focus of the entire scene to match the depth of whatever's in your foveas). Even being able to rapidly alter the focal depth of your display, though, is not an easy optics problem.

      Vis-a-vis foveated rendering, unfortunately, as matters stand, you need to be rendering at a very high resolution for the overhead of operating two sets of stereo cameras to be worth it in terms of the performance tradeoff. It likely isn't practical for mobile hardware this generation, even if eye tracking technology were good enough.

      On tracking: unfortunately, tracking improvements probably have exponentially diminishing returns with processing power, and (indeed) may require less noisy sensors. I assume Microsoft is using Time of Flight technology, as they did with the Kinect. The fact that it's not accurate enough to mask out hands and other occluding objects is not a good sign.

      Since the article was written, a few reviews have come out which state that the device has a very low FOV (about 45 degrees), making it more plausible that they're using a relatively low resolution light field.

    • Mark
      January 28, 2015 at 1:56 am

      The presenter says "Windows knows EXACTLY where you are looking" but it's pretty clear that in this demo the focal point is always dead in the center of the FOV. Look at the clumsy way the blocky menu is navigated. At least in this iteration there is no indication that eye tracking is being performed at all.

  10. piyush
    January 23, 2015 at 6:09 pm

    skeptism apart, what they showed in the live demo is quiet impressive even if it is in highly controlled conditions. still in the demo itself she had to click multiple times for the device to recognise. plus the concernes the author raised in the article definitely points that microsoft is rushing this product.

    • Planet Killer
      March 17, 2015 at 6:32 am

      Is seven years really rushing a product?

  11. dragonmouth
    January 23, 2015 at 3:05 pm

    Microsoft has been living in a state of augmented reality since its founding, totally oblivious to the needs and wants of its users.

    • F. Floss
      January 25, 2015 at 8:44 am

      Uh...considering how AR simply adds things into the real world, you seem to be suggesting that Microsoft is giving its consumers exactly what they want and more, which I assume is the direct opposite of what you were trying to express...just sayin'.

    • dragonmouth
      January 25, 2015 at 1:09 pm

      That is your understanding of the term "augmented reality." Drugs and power also "augment" reality, giving one an unrealistic view of it. Since its inception, Microsoft has been convinced that they are God's gift to the world, the best thing since sliced bread, that their s**t don't stink. If you only narrowly focus on one or two products, all those convictions may seem to be true.

    • F. Floss
      January 26, 2015 at 5:19 am

      Actually, intoxication (such as through alcohol or power) doesn't enhance reality, rather, it distorts or even detracts from reality so no, not really. Essentially, what you are saying is Microsoft doesn't pay attention to their consumer's needs. However, saying they are in a state of augmented reality (that is, enhanced reality) doesn't correctly express that. Why not just say it directly? It's much less confusing on the passing reader.

Leave a Reply

Your email address will not be published. Required fields are marked *