How to create a premium WebAR Experience, by Merlin Studio

Merlin Studio's Wizard Overlord CTO Servin

The Post

How to create a premium WebAR Experience

First things first
SLAM Engines
8thWall
Object recognition
Physics and object recognition
Creating our own tech stack
Custom Permission flow
Custom Loader
Tiered experience
Headless CMS
Custom hosting and CI/CD
Automatic Texture handling
3D Models and best practices
What’s next
Conclusion & TLDR;

First things first

In terms of marketing and advertising, WebAR has the potential to be a game-changer, with 74% of users finding AR more attention-grabbing than traditional marketing.

It can be leveraged at various stages of the customer journey to enhance engagement and interaction, ultimately leading to increased results.

Overall, WebAR's impact is significant in terms of accessibility, user engagement, and its potential in marketing and advertising.

WebAR, or web-based augmented reality, has various impacts. It allows easy sharing of AR experiences over the internet without the need for an app, making it more accessible and user-friendly, which can lead to increased adoption across various industries.
At Merlin Studio we’re at a constant pace of the creation of frameworks to tackle our client’s needs. Agencies and businesses often get in contact when they’re looking for digital expertise in creating multi-language, headless and more extensive WebAR projects than some integrated solutions offer.
By building these WebAR experiences, we’ve tackled many issues that have arisen.

Merlin's 2023 Recap

SLAM Engines

WebAR is an emerging technology that is handled differently by every browser.

Android devices already allow us to hook into “immersive-ar” sessions with ARCore, but iOS sadly doesn’t. Safari and iOS could be utilizing Apple’s ARKit, but the support isn’t there yet.

An interesting development is Variant Launch’s App Clip solution. They allow you to open an App Clip through a website which then opens a “fraction” of an app which then opens an ARKit enabled browser.

Silly? Yes.
iOS doesn’t allow us to hook into ARKit. What do we do now?

We use external software like 8thWall for their “SLAM”, simultaneous localization and mapping, engine. They pretty much do 3D triangulation and location calculations with your gyro sensor and location. Simple, right?

A SLAM engine calculates the approximate position of an element in 3D space and continuously calculates your movement to keep the element’s position and scale correct.

This allows us to create WebAR experiences anyway even though we don’t have access to the native ARKit solution. A quick fix, if you like.

Some issues with this SLAM Engine are that sudden movement can cause stuttering or glitches and people are very easily able to “place” the experience way out in the distance.

A custom solution we created can luckily hook into the minimal and maximal distance at which an experience can be placed. This makes the experience less prone to human error and, most importantly, makes sure our client’s logos are large enough.

8thWall

8thWall is a great tool for WebAR. It allows us to truly experience WebAR in a functioning way.

8thWall also has a huge subset of tools, object recognition software and examples to work from.

Their work has been a pillar for every high-end WebAR product you can currently experience, as their SLAM engine is undefeated.

A downside of 8thWall is its hosting costs and API structure. It’s not easy to use when you want to innovate within the WebAR space. Their API is set up to be as easy as possible, when you’re doing more advanced work, it tends to hold you back more than set you forward.

Nonetheless, we’ve found a way to integrate 8thWall in a way that doesn’t affect our product too much. We can use our own tested way of working while using the best 8thWall has to offer.

Pretty neat.

Object recognition

With object recognition, it’s possible to give your WebAR experience another layer of depth.

Services like 8thWall give you some options regarding object recognition like hand or image tracking. This way you can place 3D digital elements over or besides physical elements.

Google’s MediaPipe allows us to classify images, objects, hand landmarks, gestures, faces and more.

These tools are created with machine learning and allow us to creatively implement this into our AR experience.

For instance, to tease a WebAR experience; we previously used image recognition to power a nationwide ABRI campaign with bus stop posters. This allowed an out-of-home experience and welcomed people to the experience. People scan the QR code, keep their phone pointed at the poster and experience the world coming to life.

Physics and object recognition

Physics?! Yes!

We can add simulated physics to projected elements on your screen. As a nerd, I think that’s pretty cool.

For instance, dropping balls on the floor, throwing things, interacting with elements, etc.

Something that makes physics even better is its combination with object recognition. If we, for instance, use a hand recognition machine learning model which allows us to replace your hand with a 3D model hand. This hand would then be able to interact with the scene’s 3D elements. Meaning you can throw, swipe, pick or sweep 3D elements in your view.

The current way of interacting is a bit odd since it requires you to look through your screen while moving your arm. New tech like Apple’s Vision Pro will allow this to feel more natural. It looks like Apple’s Vision Pro will be supporting WebAR, meaning we can open the world to a new way of navigating soon.

Creating our own tech stack

At Merlin Studio we use frameworks and packages like NextJS, React Three Fiber and Three.js.

These tools help us create modular 3D scenes in a stateful way. As exhilarating as that may sound, it means that we can easily re-use, re-order and manage our 3D worlds. “Stateful” is a way of programming in which we can define a certain state which in turn affects the world, instead of programmatically having to update every affected element.

Integrating 8thWall’s API with our frameworks was a bit of a pain, but worth it.

Just like with a Headless CMS, we can now integrate anything we could in a normal website while profiting from 8thWall’s integrations.

In the past, we’ve integrated email collection forms, embedded and integrated e-commerce and added user logins.

Custom Permission flow

8thWall has “out-of-the-box” components for things like loading screens, permission pop-ups and explanatory overlays.

We’ve chosen to recreate these components because this will allow us to rapidly change flows in the future when needed. Next to that, 8thWall’s default screens are purple, branded in their logo and not designed in a pretty way.

Since we don’t know what’s happening in the 8thWall code, we can’t add custom behavior to it. For an enterprise-grade WebAR experience, we want all the flexibility we can yield.

Our custom permission flow checks if you’ve ever given permission before, if so: skips to the experience, has both Android and iOS custom behavior and allows us to preload the experience. This gives us a solid 5-10 seconds of preloading, making the time to load very short.

The seconds in which you’re waiting for a website to load, are seconds of your life you’re not getting back. That’s why we try to keep that at a minimum.

Custom Loader

8thWall offers things like loading screens out of the box. These purple overlays are branded in 8thWall's colors, fonts and include their logo.

We wanted a custom solution that allowed us to have control over what we were loading and when. This would allow us to show users relevant content faster and more efficiently. Our custom loader also gives people relevant information they may need during the experience, like how to enable sound or ask for help.

We had an experience running four different “scenes” or “worlds” with different elements, each with its own loader and memory management.

Tiered experience

Using our own framework, we created certain “tiers” for certain phones.

Flagship phones are wildly more powerful than the phone you had 5 years ago. The thing is, everyone uses a different device which means that we need to make sure our users get a good experience even when they’re on a slightly older device.

iOS, for instance, has a VRAM limit of 50MB. This means that we can only show so many sparkles and explosions until the Safari browser crashes without warning.

We’ve built a system that detects your phone’s capabilities and assigns a tier. For instance, iPhone 12 gets tier 3 while iPhone 15 gets tier 1. One small problem is that iOS doesn’t allow us to fully see which phone you are using, which is a great privacy feature nonetheless.

For cases like the older iPhone XR or 12, we created a monitoring software that inclines or declines based on the amount of lag you have during your experience.

Is the website experiencing a lagspike or is it behaving slower for a certain period? Then we’ll decline the experience showing you less detailed textures, fewer vertices and fewer particles.

Using tiers means we can reduce textures, details and the amount of particles in the scenes to make sure a wide range of devices can run the WebAR experience smoothly.

Headless CMS

“Hi Merlin, is it possible for us to show different content or elements for every country? We’re locked in by legal restrictions”. This is a frequent request. Different countries, different rules or even names. We use a Headless CMS with dynamic languages so you can add up to 10 languages and locales with different content for free.

A Headless CMS is a Content Management System in which you can write copy, host images and videos in different locales. The difference from something like (regular) WordPress is that the CMS also creates the “template”, which in turn gives you a cluttered, insecure website.

Since we hooked up our framework to the WebAR solution, we can easily hook up a Headless CMS as well. Because we use modern React frameworks like Vercel’s NextJS, we can also hook into APIs easily.

A headless approach allows you to use the latest frameworks, be picky as to what’s included and be independent since your website doesn’t fully rely on it.

Our framework gets all the data from the CMS once while compiling everything during the “build” step.

We hooked up our framework to DatoCMS to host up to 10 locales on different URLs. Dato allows us to write modular content, create content boundaries for our client and use different assets and copy for every language.

Clouds, Servers and Hosting — hosting + servers

Custom hosting and CI/CD

As we’re using a modern, innovate framework, we can host our project on Vercel, Netlify or Google Cloud.

Hosting a website worldwide with 10 locales can be quite a challenge. There are a few things to keep in mind:

Load balancing
CDN
Caching
Multi-locale deploying
Worldwide servers for fast delivery

We use a combination of Docker, Google Cloud and CloudFlare CDN to cache, serve and limit calls to the CMS.

A headless CMS can be quite expensive when not handled well. Our CDN handles all the traffic so our CMS won’t overcharge us with a usage bill. Next to that, we use custom solutions to host the CMS videos ourselves to avoid bandwidth issues.

In simple terms, this means that wherever you are in the world: you will have a fast connection and our clients don’t get unexpectedly high hosting bills.

So, feel free to go viral.

Automatic Texture handling

We use a CDN that caches all our textures, images, content and models. Next to that, our CDN solution can convert, and thus reduce, the texture size of our 3D models.

Textures are essentially just PNG, JPG, WebP or AVIF images which are stretched over a 3D model. Textures can get big fast, so this is a very welcome solution.

The combination of textures, 3D points, particles and regular website elements all contribute to the performance and loading time of the website.

So having a CDN make sure everything is delivered as fast as possible is pretty magical.

3D Models and best practices

WebAR is powered by a few things:

SLAM Engine (or native ARKit)
A website + user interface
3D Models
Interaction

3D Models are often modeled in programs like Blender or Cinema4D.

The 3D Designers we work with need to follow guidelines and strict export rules to create usable and performant models for us to use. A small checklist for performance would be:

As little vertices as possible
4K textures (which are reduced and converted)
a GLTF or GLB format
WebGL2-compatible vertex or fragment shading

3D Models can get big fast. Large models with high-quality textures can fill the VRAM (device memory) quickly. We reduce texture size on certain tiers to make sure the device can handle the whole experience. Scenes with both animated UI, particles, smoke or billboard effects, 3D models and animations can require a huge amount of memory from your device.

What’s next

While writing this, it’s 2024 and Apple will release the Vision Pro in 3 weeks. This is one of many devices which will create a simulated layer on top of the physical world. They will probably reinvent “AR” and use a new word, just like Meta did with Metaverse.

WebAR should get access to iOS’s ARKit soon. This means that we will then be able to tap into the “native” AR systems.
This means that 8thWall will become unnecessary for us and that we can create even more detailed WebAR experiences and showcases.

We will soon be able to use AR combined with LLM (Large language models) and AI image recognition to talk to Siri or Google Assistant with simulated vision.
“Hey Google, how can I repair this broken faucet?”. You’ll be guided visually and aurally (I had to search for that).

AI can also help us advance in the world of campaigns and advertisements. We can use AI for UGC, User Generated Content, or create a personalized experience by utilizing text-to-3D model programs.
Commerce might get virtual extensions, as online storefronts will be able to recognize clothing through your camera and give you all the information you need while walking through that store.

Our experiences will get an enormous boost with the introduction of WebGPU, WebGL2 and new developments in Web Graphics.

Conclusion & TLDR;

We craft enterprise WebAR experiences independently or together with agencies.
Everybody likes a TLDR; and a bullet point list, so here’s how we do it: