Skip to content

Configuration

All configuration is done through environment variables in the .env file.

VariableRequiredDescription
GEMINI_API_KEYYesGoogle AI API key (powers Gemma 3 27B primary)
GROQ_API_KEYRecommendedGroq API key (Llama 3.3 70B fallback)

The LLM fallback chain uses cross-provider redundancy so quota limits on one provider don’t cascade:

  1. Gemma 3 27B via Google (primary, GEMINI_API_KEY)
  2. Llama 3.3 70B via Groq (fallback, GROQ_API_KEY)

If a provider fails (timeout, rate limit, malformed response), the system automatically tries the next one. Because each provider uses a separate API key, their quotas are completely independent.

ProviderModelRPMRPDTPMCost
GoogleGemma 3 27B3014,40015KFree
GroqLlama 3.3 70B100014,40012KFree

Both providers block at their limits and never auto-charge. You cannot accidentally incur costs.

For the latest limits, see the official documentation:

Rate limiting is configured in src/routes/api/analyze/+server.ts:

const RATE_LIMIT = {
maxPerMinute: 10,
maxPerDay: 200
};

Adjust these values based on your expected traffic and API key limits.

Each provider has its own timeout. Vercel Fluid Compute is enabled by default and allows up to 300 seconds on the Hobby plan:

// Gemma: 90s, Groq: 30s → worst case total: 120s
const PROVIDER_TIMEOUTS_MS = [90_000, 30_000];

Gemma 3 27B typically takes 30-45 seconds for the full scoring prompt but can spike under load. The 90s timeout gives generous headroom. Groq responds in under 1 second but gets 30s for safety. If both providers fail, the system falls back to rule-based scoring on the client side.