Mozart
Multimodal 3D UI for Conceptual Modeling
Problem
According to a survey, 80 percent of people want to model or create to visualize their imagination using their computers; however, the difficult UI of such tools prevents them from doing so. Even to model something simple, the user has to navigate an obtrusive set of icons, toolbars, and features which are rarely used. Current 3D CAD software demands expert-level proficiency, creating a barrier between conceptual intent and digital expression for the vast majority of potential users.
Approach
We propose a computer modeling interface to bring 3D visualization to the common layman who wishes to rapidly visualize their imagination. This work is motivated to employ natural expression with the fewest restrictions to free CAD users from tedious command buttons and menu items. We explored both the hardware and software aspects of the interface, specifically, the use of intuitive speech commands and multitouch gestures on an inclined interactive surface.
The initial TUIO and OSC touch-based integration was developed during Google Summer of Code. Touch+Speech multimodal fusion was subsequently implemented with Sriganesh Madhvanath at HP Labs, combining simultaneous gesture and voice input into a unified command stream for 3D object manipulation.
Team: Anirudh Sharma, Sriganesh Madhvanath, Ankit Shekhawat, Mark Billinghurst.
Video
Study Results
A within-subjects user study was conducted to compare the multimodal (MM) interface — combining speech and multitouch — against a multitouch-only (MT) baseline across two 3D modeling tasks of increasing complexity.
Participants
12 participants (8 male, 4 female, ages 20–29) with no prior experience in 3D modeling software. All participants completed both conditions in counterbalanced order.
Task Completion Time
| Metric | Result |
|---|---|
| Condition comparison | Multitouch only (MT) vs. Multimodal (MM) |
| Statistical test | One-way ANOVA |
| Significance | No significant difference (p>0.05) |
Task completion times were comparable across both conditions, indicating that the addition of speech input did not slow users down despite introducing a new modality.
Error Rate (Undo Count)
| Metric | Result |
|---|---|
| Task 2 (complex modeling) | Significantly fewer errors for MM |
| Statistical test | Paired t-test, t(11) = 3.07, p = 0.005 |
In the more complex modeling task, participants made significantly fewer errors (measured by undo count) when using the multimodal interface, suggesting that speech commands reduced accidental or imprecise touch inputs.
Subjective Workload (NASA TLX)
| Dimension | MT (Multitouch Only) | MM (Multimodal) |
|---|---|---|
| Frustration | Higher | Lower |
| Physical demand | Higher | Lower |
NASA TLX ratings revealed that participants experienced higher frustration and physical demand in the multitouch-only condition. The multimodal interface reduced both dimensions by offloading selection and parameterization commands to speech.
User Preference
9 of 12 participants (75%) preferred the multimodal interface over multitouch-only, citing more natural interaction and reduced reliance on on-screen menus as primary reasons.