Behind the scenes
What happens when you search
You type a company name. Seconds later it is either in the index, with its sources, or you get a plain reason why not. Here is the whole journey, in five steps.
01
We check you're real
A quick, invisible check makes sure a real person is asking, not a bot or a flood of traffic.
Vercel BotID · Firestore fixed-window rate limit, 8/min per IP
02
Maybe we already know
If the company is already listed, or we looked it up recently, you get the answer instantly.
Firestore lookups cache · 30-day TTL · keyed by normalized name
03
Exa reads the live web
For a new company, Exa searches the open web and hands back the six most relevant pages, with their actual text, so we are reading real sources, not guessing. The classifier may only cite pages Exa returned.
Exa Search API (type: auto) + Contents API · text, highlights, summaries · 6 results
04
Gemini weighs the evidence
Google's Gemini reads those pages and decides one thing: is this company's mission locked in by how it is governed?
Gemini 3.1 Flash-Lite on Vertex AI · temperature 0 · structured JSON (responseJsonSchema)
05
Publish, or explain
Confident and well-sourced? The company joins the index automatically, with a little confetti. If not, we show exactly why, or offer a quick human review.
confidence ≥ 0.65 · Next.js Cache Tags · revalidateTag(companies-list)
Outcome A
Published
Added to the index with its sources, ready to share.
Outcome B
Explained or queued
A clear reason it does not qualify, or a path to human review.
What it runs on
Exa
exa-js · Search API (type: auto) + Contents API: text, highlights, summaries
Google Gemini
@google/genai · gemini-3.1-flash-lite · Vertex AI (global) · structured output
Vercel
Functions on Fluid Compute · BotID · Cache Tags / ISR
Next.js · React
16 App Router · React 19 · SSG + ISR · Zod + Ajv validation
Google Firestore
firebase-admin · REST transport · nam5 · cache + rate-limit store
PostHog
posthog-js · autocapture · reverse-proxied via /ingest
Notes for engineers
- · Firestore runs over REST (preferRest) to dodge the gRPC cold-start hang on serverless Functions.
- · The Gemini call sets the @google/genai httpOptions.timeout (a real fetch abort) and retries once, so a dead keep-alive socket can't wedge a warm instance.
- · Scoped cache tags: a publish revalidates only companies-list, not every company page.
Architecture overview · informational. Back to the index.
