Skip to content

Configuration

All configuration is done through environment variables in the .env file. At least one provider must be configured (Gemini, Groq, or Ollama); the route returns 503 otherwise.

VariableRequiredDescription
GEMINI_API_KEYOne of theseGoogle AI API key (Gemma 3 27B)
GROQ_API_KEYOne of theseGroq API key (Llama 3.3 70B)
OLLAMA_BASE_URLOne of theseBase URL of a local Ollama daemon (e.g. http://127.0.0.1:11434)
OLLAMA_MODELOptionalOllama model tag, defaults to llama3.2. Use any tag from ollama list.
OLLAMA_API_KEYOptionalBearer token sent as Authorization: Bearer {key} on every Ollama request. Only needed if your Ollama is behind auth.

The LLM chain composes from whatever’s configured in env. Ordering is fixed:

  1. Ollama (OLLAMA_BASE_URL), local first when configured
  2. Gemma 3 27B via Google (GEMINI_API_KEY)
  3. Llama 3.3 70B via Groq (GROQ_API_KEY)

If a provider fails (timeout, rate limit, malformed response), the system automatically tries the next one. Because each provider uses a separate credential, their quotas are completely independent. Self-hosters who want a fully offline scanner should set only OLLAMA_BASE_URL and leave the cloud keys unset.

For privacy-first deployments where every byte of the resume stays on your machine:

Terminal window
# install ollama from https://ollama.com and pull a model
ollama pull llama3.2
# in your .env (or as shell vars before pnpm dev):
OLLAMA_BASE_URL=http://127.0.0.1:11434
OLLAMA_MODEL=llama3.2
# leave GEMINI_API_KEY / GROQ_API_KEY unset for offline-only mode

The Ollama path uses Ollama’s format: 'json' so the model returns strict JSON without prompt-engineering tricks. First scan is slow on commodity hardware (60-120s for llama3.2:3b on a typical laptop); subsequent scans of the same resume hit the in-memory result cache and return in <100ms. Bigger models produce noticeably better suggestions but take longer.

The /api/analyze response includes _provider: "ollama-{model}" so you can confirm requests are landing locally and not falling back to a cloud key you forgot to remove.

Vanilla ollama serve on 127.0.0.1 has no authentication, which is fine for a local-only setup. If your Ollama lives behind a reverse proxy that requires a bearer token, or you’re pointing at a hosted Ollama-compatible endpoint (OpenWebUI, LiteLLM, OpenRouter’s Ollama-compatible routes, a Cloudflare-tunneled daemon with a service token, etc.), set OLLAMA_API_KEY and the request will include Authorization: Bearer {key} on every call:

Terminal window
# in your .env
OLLAMA_BASE_URL=https://ollama.your-domain.tld
OLLAMA_MODEL=llama3.2
OLLAMA_API_KEY=sk-your-proxy-token

The header is only attached when the env var is non-empty, so leaving it unset keeps the request shape identical to the local-only setup. Empty or whitespace-only values are treated as not set so a stray OLLAMA_API_KEY= line in .env does not produce a malformed Authorization: Bearer header that the proxy would reject.

How users sign in (or whether they sign in at all) is a separate choice from the LLM provider, and it’s also driven by environment variables. ATS Screener supports three modes, picked automatically:

  • Anonymous: leave Firebase and LDAP unset. The scanner is open and history is local. This is the default.
  • Firebase: set the PUBLIC_FIREBASE_* variables for Google / email sign-in and synced history.
  • Active Directory: set LDAP_URL for on-premise AD sign-in.

See Authentication for the full comparison and the Active Directory guide for AD setup. The Firebase variables are listed below.

Terminal window
# self-host without firebase: leave every PUBLIC_FIREBASE_* var unset (the default).
# self-host with firebase: set all six.
PUBLIC_FIREBASE_API_KEY=...
PUBLIC_FIREBASE_AUTH_DOMAIN=your-project.firebaseapp.com
PUBLIC_FIREBASE_PROJECT_ID=your-project
PUBLIC_FIREBASE_STORAGE_BUCKET=your-project.appspot.com
PUBLIC_FIREBASE_MESSAGING_SENDER_ID=1234567890
PUBLIC_FIREBASE_APP_ID=1:1234567890:web:abc
ProviderModelRPMRPDTPMCost
GoogleGemma 3 27B3014,40015KFree
GroqLlama 3.3 70B100014,40012KFree

Both providers block at their limits and never auto-charge. You cannot accidentally incur costs.

For the latest limits, see the official documentation:

Rate limiting is configured in src/routes/api/analyze/+server.ts:

const RATE_LIMIT = {
maxPerMinute: 10,
maxPerDay: 200
};

Adjust these values based on your expected traffic and API key limits.

Each provider has its own timeout. Vercel Fluid Compute is enabled by default and allows up to 300 seconds on the Hobby plan:

// Gemma: 90s, Groq: 30s → worst case total: 120s
const PROVIDER_TIMEOUTS_MS = [90_000, 30_000];

Gemma 3 27B typically takes 30-45 seconds for the full scoring prompt but can spike under load. The 90s timeout gives generous headroom. Groq responds in under 1 second but gets 30s for safety. If both providers fail, the system falls back to rule-based scoring on the client side.