Voice Service Technical Specifications
1. Introduction
Facephi Voice Service is a C++ REST API service to which you can send audio files to be processed and get the result of the voice recognition process. The service offers an endpoint to enroll a new voice, and another one to authenticate a voice.
2. Hardware requirements
| Minimum requirement | Recommended requirement | |
|---|---|---|
| CPU | 2 cores supporting SSE4.2 instructions set extension, >=2GHz | 16 cores, AVX2 ISA support |
| RAM | 4 Gb | 8 Gb |
| Disk | 4 Gb | SSD 4 Gb |
| Network | 100 Mbps | 1 Gbps |
3. Software requirements
- Linux x86_64 (Ubuntu 24.04 or higher) with Docker 24.0 or higher.
or
- Windows 10 x64 with Docker 24.0 or higher.
4. Enrollment requirements
Three recordings of the same user pronouncing a secret phrase are required, which must meet the following minimum requirements:
| Minimum requirements for enrollment | Values |
|---|---|
| Audio length | > 700 ms |
| Speech relative length (*) | > 0.55 |
| Signal to noise ratio (SNR) (**) | > 8 dB |
() Speech Relative Length = Speech Duration / Audio Duration_
_(*) Our recommended speaker distance is 30cm- a natural distance using a hand-held device
There should only be one person speaking during the recording. To verify that the enrollment was carried out by a single person, the individual biometric templates created from the three recordings are compared.
| Minimum requirements for enrollment | Threshold |
|---|---|
| If the probability of a match is less than the similarity threshold, the record is rejected and a new recording is requested. | 0.55 |
5. Authentication requirements
| Minimum requirements for authentication | Values |
|---|---|
| Audio length | > 700 ms |
| Speech relative length | > 0.55 |
| Signal to noise ratio (SNR) | > 3 dB |
6. Metrics
There are two common channels in which voice biometric validation is applied, through microphones or telephone lines.
Extracted metrics for microphone use case (new version noctua).
| Threshold | FAR (%) | FRR (%) |
|---|---|---|
| 0.5 | 0.17 | 3.32 |
Extracted metrics for telephone use case.
| Threshold | FAR (%) | FRR (%) |
|---|---|---|
| 0.5 | 1 | 9.12 |
FAR (False Acceptance Rate) is the probability that the system will incorrectly accept an impostor as a legitimate user.
FRR (False Rejection Rate) is the probability that the system will incorrectly reject a legitimate user.
7. Liveness detection
| Minimum requirements for liveness detection | Values |
|---|---|
| Voice Speech Length for Replay Attack Detection | > 1000 ms |
| Voice Speech Length for Voice Clone Attack Detection | > 3000 ms |
| Signal to noise ratio (SNR) | > 10 dB |
| Recommended thresholds for liveness detection | Threshold |
|---|---|
| Liveness validation will be considered successful when the value is higher than the threshold. | 0.5 |