In this paper, we propose a method for estimating the user position where a user is holding a microphone in an indoor environment using digital watermarking for audio signals. The proposed method utilizes detection strengths, which are calculated while detecting spread-spectrum-based watermarks. Taking into account delays and attenuation of the watermarked signals emitted from multiple loudspeakers and other factors, we construct a model of detection strengths. The user position is estimated in real-time using the model. The experimental results indicate that the user positions are estimated with 1.3 m of root mean squared error on average for the case where the user is static. We demonstrate that the proposed method successfully estimates the user position even when the user moves.