This is going to be a super fast guide for developers to leverage platform development tools with VR and build cool apps. Starting with just a laptop and a Meta Quest in your hand.

Here's what the final thing looks like:

Code

Here's the GitHub Repo containing all the code - https://github.com/CMU-SV-Ronith/sem_3pi

Introduction to VR and AI-Generated NPCs

Virtual Reality (VR) is transforming how we interact with digital environments, offering immersive experiences that blur the lines between the virtual and real world. One of the most exciting advancements in VR is the integration of Artificial Intelligence (AI), especially in creating Non-Player Characters (NPCs) that can interact with users in more human-like ways. This blog post delves into the creation of an interactive AI interviewer app in VR, highlighting the technological synergy of VR and AI.

Choosing the Right Engine

When embarking on VR app development, selecting the right engine is crucial. Two prominent engines are Unity and Unreal. After extensive research and testing, Unity emerged as the preferable choice for several reasons:

  1. Ease of Learning and Accessibility: Unity uses C#, known for its simpler syntax and automatic memory management. This choice makes Unity accessible, especially for beginners and indie developers, unlike Unreal Engine's complex C++.
  2. Asset Store and Community Support: Unity boasts an expansive Asset Store, providing numerous resources that speed up development. Coupled with an active community, developers have access to extensive support and shared knowledge.
  3. Versatility in 2D and 3D Development: Unity's proficiency in both 2D and 3D development offers flexibility, allowing developers to work on various project types without switching engines.
  4. Platform Support and Integration: Unity's broad platform support, including PCs, consoles, mobile devices, and VR systems, is vital for developers targeting diverse audiences.
  5. Strong Documentation and Educational Resources: Unity’s comprehensive documentation and learning resources are invaluable for both beginners and experienced developers.

Making API Calls in C# with Code-Walkthrough

We code in C# in Unity. Here's a simple video on integrating Unity App with C# code that you write - https://www.youtube.com/watch?v=lgUIx75fJ_E

This was used for the Speech Analysis and Text Analysis functions.

To integrate AI into our VR app, we need to make API calls. This is achieved through UnityEngine.Networking, setting up coroutines for HTTP(s) communications. The process involves getting and posting information, waiting for responses, and updating the UI to reflect changes. We utilize AWS for backend services and the Dolby Media Speech Analytics API for speech analysis.

The AnalyzeText coroutine sends a user's transcript to our backend for processing, while GetAnalysisResults retrieves the analysis. The AnalyseSpeech coroutine sends audio files for speech analysis, and UploadFileToS3 handles the uploading of files to AWS S3.

References of setting up AWS in app - link, video link

API connectors in the app

private IEnumerator AnalyzeText(string transcript)
        {
            var requestJson = new
            {
                prompt = transcript
            };

            var request = new UnityWebRequest(baseUrl + "/analyzeText", "POST")
            {
                uploadHandler = new UploadHandlerRaw(System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestJson))),
                downloadHandler = new DownloadHandlerBuffer()
            };

            request.SetRequestHeader("Content-Type", "application/json");

            yield return request.SendWebRequest();

            if (request.isNetworkError || request.isHttpError)
            {
                Debug.LogError("Error: " + request.error);
            }
            else
            {
                string responseText = request.downloadHandler.text;
                // Display responseText in the UI
            }
        }
        private IEnumerator GetAnalysisResults()
        {
            var request = UnityWebRequest.Get(baseUrl + "/getResults");

            yield return request.SendWebRequest();

            if (request.isNetworkError || request.isHttpError)
            {
                Debug.LogError("Error: " + request.error);
            }
            else
            {
                var responseJson = JsonUtility.FromJson<AnalysisResponse>(request.downloadHandler.text);
                var analysisObject = new
                {
                    Loudness = responseJson.processed_region.audio.speech.details[0].loudness.measured,
                    Confidence = responseJson.processed_region.audio.speech.details[0].sections[0].confidence,
                    Quality = responseJson.processed_region.audio.speech.details[0].quality_score,
                    LongestMonologue = responseJson.processed_region.audio.speech.details[0].longest_monologue
                };
                // Display analysisObject values in the UI
            }
        }
        private IEnumerator AnalyseSpeech(string fileUrl)
        {
            var requestJson = new
            {
                input = fileUrl
            };

            var request = new UnityWebRequest(baseUrl + "/analyseSpeech", "POST")
            {
                uploadHandler = new UploadHandlerRaw(System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestJson))),
                downloadHandler = new DownloadHandlerBuffer()
            };

            request.SetRequestHeader("Content-Type", "application/json");
            request.SetRequestHeader("x-api-key", "YOUR_API_KEY"); // Replace with your actual API key

            yield return request.SendWebRequest();

            if (request.isNetworkError || request.isHttpError)
            {
                Debug.LogError("Error: " + request.error);
            }
            else
            {
                string jobId = request.downloadHandler.text;
                // Store jobId for later use or handle it as needed
            }
        }



        private async Task<string> UploadFileToS3(string filePath, string bucketName)
        {
            // Hardcoded credentials (not recommended for production)
            string awsAccessKeyId = "AK";
            string awsSecretAccessKey = "f";
            AWSCredentials credentials = new BasicAWSCredentials(awsAccessKeyId, awsSecretAccessKey);
            AmazonS3Client s3Client = new AmazonS3Client(credentials, Amazon.RegionEndpoint.USEast1); // Initialize with your AWS credentials

            try
            {
                // Create a PutObject request
                PutObjectRequest putRequest = new PutObjectRequest
                {
                    BucketName = bucketName,
                    FilePath = filePath,
                    Key = Path.GetFileName(filePath),
                    CannedACL = S3CannedACL.PublicRead // Set the file to be publicly accessible
                };

                PutObjectResponse response = await s3Client.PutObjectAsync(putRequest);

                if (response.HttpStatusCode == System.Net.HttpStatusCode.OK)
                {
                    string fileUrl = $"https://{bucketName}.s3.amazonaws.com/{Path.GetFileName(filePath)}";
                    Debug.Log("File uploaded successfully. URL: " + fileUrl);
                    return fileUrl; // Return the URL
                }
                else
                {
                    Debug.LogError("Failed to upload file. HTTP Status Code: " + response.HttpStatusCode);
                    return null; // Return null if upload failed
                }
            }
            catch (AmazonS3Exception e)
            {
                Debug.LogError("Error encountered on server. Message:'" + e.Message + "'");
                return null; // Return null on exception
            }
            catch (Exception e)
            {
                Debug.LogError("Unknown encountered on server. Message:'" + e.Message + "'");
                return null; // Return null on exception
            }
        }

    }

Function for stringing all API calls together

private IEnumerator UploadAndAnalyze(string localPath)
        {
            // Upload file and wait for the result
            Task<string> uploadTask = UploadFileToS3(localPath, "recontact-temp-recording-bucket");
            yield return new WaitUntil(() => uploadTask.IsCompleted);

            if (uploadTask.Exception != null)
            {
                Debug.LogError("Upload failed: " + uploadTask.Exception);
                yield break;
            }

            string fileUrl = uploadTask.Result;
            StartCoroutine(AnalyseSpeech(fileUrl));

            // Wait before getting results
            yield return new WaitForSeconds(20);

            // Get analysis results
            StartCoroutine(GetAnalysisResults());

            // Get transcription and analyze text
            var req = new CreateAudioTranscriptionsRequest
            {
                FileData = new FileData() { Data = File.ReadAllBytes(localPath), Name = "audio.wav" },
                Model = "whisper-1",
                Language = "en"
            };

            var transcriptionTask = openai.CreateAudioTranscription(req);
            yield return new WaitUntil(() => transcriptionTask.IsCompleted);

            if (transcriptionTask.Exception != null)
            {
                Debug.LogError("Transcription failed: " + transcriptionTask.Exception);
                yield break;
            }

            var res = transcriptionTask.Result;
            StartCoroutine(AnalyzeText(res.Text));
        }

Creating NPCs with Inworld.ai and Trigger Words

NPCs are given life using tools like Inworld.ai, which allows for the creation of characters with specific goals and triggers. For instance, an NPC can be programmed to give a task when a user says a particular phrase. This adds an interactive layer to the VR experience, making it more engaging and realistic.

The code snippet for NPC interaction demonstrates how to set up these interactions. It includes setting up goals, defining trigger phrases, and programming responses and actions based on user interaction.

Reference video - https://www.youtube.com/watch?v=FVDoMnkw4rY

Complete Code and Overall Walkthrough

The complete code combines all these elements into a cohesive VR app. It demonstrates the integration of various APIs, handling of voice and text inputs, and interaction with NPCs. The code is structured to ensure seamless interaction within the VR environment, enabling users to experience a realistic and interactive AI interview.


This guide offers a window into the intricate process of developing an interactive AI interviewer app in VR. By combining Unity's versatile development environment with powerful APIs and AI-driven NPCs, developers can create engaging and immersive VR experiences. The future of VR and AI in app development is undoubtedly bright, offering endless possibilities for innovation and user engagement.

Photo Credits: Dalle3, by OpenAI