Choose the right plan to fit your organization's needs.
Our flexible pricing options cater to different business sizes and requirements, ensuring you have access to the right level of voice anonymization technology at a cost-effective price.
Data volume: 100h
Full integration support
Single Language
Non commercial use
Data volume: 1 000h
Shared access to the API
Pre-built AI models
8h engineering support
Single Language
Data volume: 10 000h
Priority access to the API
Pre-built AI models
16h engineering support
Multiple Languages
Voice + Content
Data volume: Unlimited
Fine-tuned AI models
Multiple Languages
Voice + Content



Pilot Package
Includes:
Security & DPA onboarding, pipeline configuration, 1 calibration cycle, documentation of processing & deletion workflow, and gold-standard QC on a 1% sample.
Unit Rates
Frequently asked questions
Functionality
What is biometric anonymisation ?
Biometric anonymisation is the process of removing or altering the unique voice characteristics that can identify a person — such as vocal timbre, pitch patterns, rhythm, and other biometric markers. Instead of masking or distorting the audio, the voice is transformed into a new, natural-sounding voice that cannot be linked back to the original speaker.
This ensures that the content of the speech remains fully usable while the identity, privacy, and safety of the speaker are completely protected, meeting strict standards like non-linkability, non-singling-out, and non-inference.
How many languages do you support ?
Our solution currently supports English, French, German, Spanish and Italian. If you cannot find your preferred language in the list, please reach out to us - we can add a new language in just one day!
What file formats do you support ?
We support all the popular audio formats. Please refer to our documentation for a list of supported formats.
Does it work in real-time, for example, streaming audio ?
Version 4.0 supports faster-than-real-time audio processing, but it does not include native streaming capabilities.
Mini mode: < 0.5× real-time factor (RTF), Advanced mode: ~ 0.75× RTF.
Users who require streaming must implement their own streaming architecture, embedding API requests within their chosen workflow.
Does the anonymisation work for children's voices ?
No, our solution does not accurately work for children's voices. This is an active area of research at Nijta and we are partnering up with renowned EdTech providers to build a robust solution for children’s voices.
Could the age and gender of the speaker be preserved after anonymisation ?
Yes. The anonymised output supports explicit control of: 1) Age group: young adult, middle-aged adult, senior adult, same, or random, 2) Gender: same, opposite, specific (male/female), or random.
This allows you to customise the target voice identity.
Could the original emotion of the speaker be preserved after anonymisation ?
Yes. Version 4.0 provides Prosody & Emotion Preservation, maintaining natural emotional expression and intonation. You can also specify the desired emotion in the output.
Could the non-verbal cues such as the speaking pace, pronunciation, intonation, etc. of the original speaker be preserved after anonymisation ?
Yes. The anonymised speech retains natural prosody, rhythm, pacing, and general pronunciation. Only speaker-specific biometric features are removed.
Can the anonymisation filter profane language ?
This is an active area of research. We are working with a large group to filter profane language in live calls.
Performance
What is the accuracy of your solution ?
According to the latest documentation (Version 4.0):
Speech Quality Accuracy
- MOS (Mean Opinion Score): 4.15
High naturalness and clarity in anonymised output.
Automatic Speech Recognition Accuracy
- WER reduction:
- Mini mode: −58%
- Advanced mode: −62%
Indicates improved intelligibility for downstream transcription.
Privacy / Biometric Anonymisation Metrics
- Equal Error Rate (EER): 42%
- UAR (Unweighted Average Recall): 38%
- WER after anonymisation: 2.7
These metrics indicate strong degradation of speaker-identifiability while preserving speech intelligibility.
What is the processing time of your solution ?
Processing speed depends on the anonymisation mode:In benchmark tests (e.g., a 5 min 49 sec multilingual audio with two speakers, diarisation + code-switch enabled), biometric anonymisation took ~8.35 minutes.
- Mini mode: < 0.5× RTF
- Advanced mode: ~ 0.75× RTF
What is the maximum size of audio files that could be sent to the API ?
The maximum allowed size is 10 MB. Files larger than this are rejected.
How many concurrent requests could be processed by the API without degrading the processing time ?
Up to 1 simultaneous request per user without performance degradation.
Can the customer fine-tune the models hosted on their site ?
No. Model fine-tuning is not supported, even for on-premise deployments.
Installation
Does it work as a SaaS or on-premise ?
We provide both SaaS and on-premise solutions.
What are the computational requirements for hosting the on-premise solution ?
Minimum recommended configuration:
- OS: Ubuntu 20.04 or later
- RAM: 32 GB minimum
- Disk: 100 GB minimum
- GPU: Nvidia l40
What are the concrete measures followed to ensure the security of the SaaS solution ?
The SaaS platform is built following industry-standard security practices, including secure development processes aligned with ISO 27000 guidelines, and is deployed on a cloud infrastructure with advanced security controls for data protection, access management, and operational security.