"A playful app that brings life into a lifeless 'face' by placing AR speech balloons in the virtual world based on GPS."



“Bringing life into a lifeless face!”

BOOM! is a playful AR product that captures user’s thoughts and feelings, applies them to lifeless faces in the city such as posters and statues, and shares them with other people. It detects human faces shape of any kind (faces in posters, pictures, drawings, sculptures or statues, etc) and places a speech balloon of the user’s thoughts next to the detected face. It uses face detection technology to detect a face, and speech-to-text technology to recognize user’s speech and convert it into text.


Concept Background

The beginning of this concept rose from our daily scene in the New York City, a hub for art and culture. Everyday we are surrounded by hundreds of human faces whether they are a part of posters and pictures on the wall, statues in the park, paintings and sculptures in the museum, and whether they are alive or lifeless. Those faces, especially inanimate ones, inspired us to think about the potential of communications between people in the city by augmenting those faces.                                                           



Today, we are surrounded by a number of faces. No matter where we are located in the world, if we have a small smart device, we can log into “facebook”, which is a book with faces, “instagram”, or “snapchat” and encounter many faces: faces that we have seen everyday, faces that we have seen a couple of times, or faces that we have never seen in our lives..

  Lifeless Faces in Reality World

Lifeless Faces in Reality World

  Living Faces in Digital World

Living Faces in Digital World

These faces in the digital world are flattened 2D images of the living people in the real world. They are placed within a rectangular screen and as we scroll down the feed, we see those faces unconsciously thinking that people with those faces are breathing somewhere in this real world. Like this, we confront and consume countless faces of living people, as well as their small notes everyday.

What would it look like if there is a virtual world where lifeless faces can communicate with us? What if posters, statues, sculptures, and mannequin can talk? What if there is a “facebook” in spacetime version?

User Scenario.jpg


To have fun with lifeless faces around us, people need to find them in objects like posters or statues and bring them to life by offering a speaking ability. We had a brainstorming session to examine what motivates people to interact with these lifeless faces (why), different ways they would be able to interact with the faces (how), and what messages people would want to communicate through the faces (what).


Finally, we decided to create a mobile app that uses AR technology to capture the moments or to write random things in the virtual world next to the objects of lifeless faces in the real world. The reason we chose the AR feature is that users can have a lot more freedom in their creation of speech balloons in terms of design, content, and composition than when using other traditional methods like using physical stickers.



The AR speech balloons will live in the virtual space where the users can only see through their mobile devices, serving as magic windows. Moreover, we wanted to have the most seamless and fluid interaction, so that users can find this app simple and handy to use without much effort and time. Thus, we decided to implement voice recognition technology to convert the speech into text automatically. 


In short, users can augment "any object with a human face shape" and bring it to life by capturing and leaving their thoughts into the AR speech bubble by simply talking to their mobile device.




import UIKit
import ReplayKit
import AVFoundation
import Vision
import Speech
import Photos
import SwiftGifOrigin

    //speech recognizer setup
    let speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "en-US"))
    var regRequest: SFSpeechAudioBufferRecognitionRequest?
    var regTask: SFSpeechRecognitionTask?
    let avEngine = AVAudioEngine()

    func getPermissionSpeechRecognizer() {
        SFSpeechRecognizer.requestAuthorization { (status) in
            switch status {
            case .authorized:
                self.boomButton.isEnabled = true
            case .denied:
                self.boomButton.isEnabled = false
            case .notDetermined:
                self.boomButton.isEnabled = false
            case .restricted:
                self.boomButton.isEnabled = false

Face Detection(Swift)

    //Vision framework objects
    let faceDetection = VNDetectFaceRectanglesRequest()
    let faceLandmarks = VNDetectFaceLandmarksRequest()
    let faceLandmarksDetectionRequest = VNSequenceRequestHandler()
    let faceDetectionRequest = VNSequenceRequestHandler()
    override func viewDidLoad() {
        bubbleMarker.frame = CGRect (x:0, y:0, width: 340, height: 320)
        txtSpeech.frame = CGRect (x:0, y:0, width: 250, height: 120)
        lipMarker.frame = CGRect(x:0, y:0, width:200, height:1200)
        lipMarker.image = UIImage.gif(name: "particle5")
        bubbleMarker.isHidden = true
        txtSpeech.isHidden = true
        lipMarker.isHidden = true

        boomButton.isEnabled = false
        cameraButton.layer.cornerRadius = cameraButton.frame.size.height/3
        cameraButton.layer.masksToBounds = true
        txtSpeech.text = ""
        speechRecognizer?.delegate = self




! 180710_Boom_UX Flow.jpg




Case 1. Single User

Case 2. Multi User


Multi User with Screen View