Sarvam Provider

The Sarvam AI Provider is a library developed to integrate with the AI SDK. This library brings Speech to Text (STT) capabilities to your applications, allowing for seamless interaction with audio and text data.

Setup

The Sarvam provider is available in the sarvam-ai-provider module. You can install it with:

pnpm
npm
yarn
pnpm add sarvam-ai-provider

Provider Instance

First, get your Sarvam API Key from the Sarvam Dashboard.

Then initialize Sarvam in your application:

import { createSarvam } from 'sarvam-ai-provider';
const sarvam = createSarvam({
headers: {
'api-subscription-key': 'YOUR_API_KEY',
},
});

The api-subscription-key needs to be passed in headers. Consider using YOUR_API_KEY as environment variables for security.

  • Transcribe speech to text
import { experimental_transcribe as transcribe } from 'ai';
import { readFile } from 'fs/promises';
await transcribe({
model: sarvam.transcription('saarika:v2'),
audio: await readFile('./src/transcript-test.mp3'),
providerOptions: {
sarvam: {
language_code: 'en-IN',
},
},
});

Features

Changing parameters

  • Change language_code
providerOptions: {
sarvam: {
language_code: 'en-IN',
},
},

language_code specifies the language of the input audio and is required for accurate transcription. • It is mandatory for the saarika:v1 model (this model does not support unknown). • It is optional for the saarika:v2 model. • Use unknown when the language is not known; in that case, the API will auto‑detect it. Available options: unknown, hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, en-IN, gu-IN.

  • with_timestamps?
providerOptions: {
sarvam: {
with_timestamps: true,
},
},

with_timestamps specifies whether to include start/end timestamps for each word/token. • Type: boolean • When true, each word/token will include start/end timestamps. • Default: false

  • with_diarization?
providerOptions: {
sarvam: {
with_diarization: true,
},
},

with_diarization enables speaker diarization (Beta). • Type: boolean • When true, enables speaker diarization. • Default: false

  • num_speakers?
providerOptions: {
sarvam: {
with_diarization: true,
num_speakers: 2,
},
},

num_speakers sets the number of distinct speakers to detect (only when with_diarization is true). • Type: number | null • Number of distinct speakers to detect. • Default: null

References