Sarvam Provider
The Sarvam AI Provider is a library developed to integrate with the AI SDK. This library brings Speech to Text (STT) capabilities to your applications, allowing for seamless interaction with audio and text data.
Setup
The Sarvam provider is available in the sarvam-ai-provider
module. You can install it with:
pnpm add sarvam-ai-provider
Provider Instance
First, get your Sarvam API Key from the Sarvam Dashboard.
Then initialize Sarvam
in your application:
import { createSarvam } from 'sarvam-ai-provider';
const sarvam = createSarvam({ headers: { 'api-subscription-key': 'YOUR_API_KEY', },});
The api-subscription-key
needs to be passed in headers. Consider using
YOUR_API_KEY
as environment variables for security.
- Transcribe speech to text
import { experimental_transcribe as transcribe } from 'ai';import { readFile } from 'fs/promises';
await transcribe({ model: sarvam.transcription('saarika:v2'), audio: await readFile('./src/transcript-test.mp3'), providerOptions: { sarvam: { language_code: 'en-IN', }, },});
Features
Changing parameters
- Change language_code
providerOptions: { sarvam: { language_code: 'en-IN', }, },
language_code
specifies the language of the input audio and is required for
accurate transcription. • It is mandatory for the saarika:v1
model (this
model does not support unknown
). • It is optional for the saarika:v2
model. • Use unknown
when the language is not known; in that case, the API
will auto‑detect it. Available options: unknown
, hi-IN
, bn-IN
, kn-IN
,
ml-IN
, mr-IN
, od-IN
, pa-IN
, ta-IN
, te-IN
, en-IN
, gu-IN
.
- with_timestamps?
providerOptions: { sarvam: { with_timestamps: true, },},
with_timestamps
specifies whether to include start/end timestamps for each
word/token. • Type: boolean • When true, each word/token will include
start/end timestamps. • Default: false
- with_diarization?
providerOptions: { sarvam: { with_diarization: true, },},
with_diarization
enables speaker diarization (Beta). • Type: boolean • When
true, enables speaker diarization. • Default: false
- num_speakers?
providerOptions: { sarvam: { with_diarization: true, num_speakers: 2, },},
num_speakers
sets the number of distinct speakers to detect (only when
with_diarization
is true). • Type: number | null • Number of distinct
speakers to detect. • Default: null