How It Works — Los Angeles Mission College Chatbot

Three doorways

Students reach the assistant three ways. All three call into the same backend services, so the answers stay consistent whether you type or talk.

Web chat

The chat panel on the campus website.

Live voice agent

Click the mic and talk to the assistant in real time.

Phone

One shared number, IVR menu routes to the right campus.

Where the data comes from

Behind the chat are five data sources, each chosen for what it's actually good at.

1

Neo4j knowledge graph Rebuilt

The structured spine. Holds course prerequisites, corequisites, advisories, GE area placement, UC and CSU transferability, course renumberings, programs, and instructors. Rebuilt from disk after every deploy: ASSIST data extracts, parsed campus catalog PDFs, an eLumen scrape, and a hand-curated CCN renumbering map.
2

ASSIST.org API Live

For any "does this transfer to UCLA?" question, we hit ASSIST directly so the answer reflects the current articulation agreement, not a stale snapshot.
3

LACCD class schedule API Live

When a student asks "what sections are open this term?" or "is there a zero-textbook-cost option?" we pull current sections in real time.
4

Program Mapper & eLumen Live

Degree and certificate requirements, and the official course-of-record catalog.
5

Gemini File Search Indexed

A per-campus retrieval store holding the crawled campus website plus the catalog as markdown. Catches everything the structured sources don't — "where's the financial aid office," "what are the library hours."

Why the knowledge graph matters

Most AI chatbots are pure semantic search over text — they retrieve passages from documents and let the model write an answer. That works for "what time does the library open." It breaks down on anything that crosses two facts that aren't in the same paragraph. The Neo4j knowledge graph is what makes the harder questions answerable.

The graph holds about 20,000 courses across all 9 LACCD campuses, plus their programs and instructors, with typed relationships between them — not text, facts:

MATH 261 requires MATH 260
STAT C1000 satisfies IGETC Area 2A
MATH 227 same as STAT C1000 (~2,700 of these)
MATH 261 transfers to UC, CSU
Computer Science AS requires course MATH 261, CS 101, …

You query these like a database. You don't retrieve and hope.

What pure-RAG chatbots can't do — and the graph can

RAG only

"What's the prereq chain to Calculus II?" → returns the Calc II catalog page, mentions the immediate prereq, stops there.

With the graph

Walks four edges: Calc II → Calc I → Trig → Algebra → Pre-Algebra. Returns the actual chain in milliseconds.

RAG only

"Which IGETC Area 2A courses are offered this spring at Mission?" → no single document has the answer. The model hallucinates or punts.

With the graph

Joins GE area + live schedule + campus filter in one query. Returns the actual offered sections.

RAG only

"Is MATH 227 transferable?" → the catalog still lists it under the old code. The renumbered version (STAT C1000) is in a different document. They don't share keywords; RAG sees them as unrelated.

With the graph

An explicit same-as edge connects the old and new codes. A question about either resolves correctly.

RAG only

"Is BIOLOGY C100 UC-transferable?" → retrieves a page that mentions transferability somewhere; the model infers. Sometimes wrong, always confident.

With the graph

The transferability edge either exists or it doesn't. No inference, no hallucination on facts.

How the graph lifts every other data source

The graph isn't a sixth competing source — it's a structural backbone the other five lean on:

ASSIST.org articulation A student asks about MATH 227, but ASSIST's table only lists the renumbered STAT C1000. The graph resolves the old code to the new one before hitting ASSIST, so the live agreement actually comes back. Without it, the API returns "no agreement found" and the student walks away thinking the course doesn't transfer.
Live class schedule When the schedule API returns sections, each row gets enriched inline with GE area, transferability, and prereqs from the graph. The student sees "MATH 261 — meets IGETC 2A, UC and CSU transferable, prereq MATH 260" instead of just a time and a room.
Program Mapper & eLumen A degree's required-courses list becomes a navigable plan: the graph tells you which are currently offered, which transfer, and what their prereq chains look like.
Gemini File Search (RAG) Even when the answer comes from a website crawl, if the snippet mentions a course code, the graph expands it: current name, prior names, transferability, GE placement. A stale catalog page still produces a correct answer.
Course sequencing The pace-track sequencer (full / momentum / part-time = 15 / 18 / 9 units per term) walks prereq depth across the graph to schedule courses without violating dependencies. There's no way to do this with text retrieval alone.

The shorthand: pure-RAG chatbots are reading comprehension over a haystack. The graph turns the LACCD catalog into a queryable database of relationships, and the live APIs and the RAG layer use it as a translator and a fact-checker. That's why a question like "I took MATH 227 last year — does that count toward my UCLA transfer for psychology?" — which crosses renumbering, articulation, and major requirements — actually gets answered correctly here.

What's AI, what's deterministic, and how we prevent hallucinations

AI shows up in exactly two places, both at the edges of the system. Everything load-bearing — the facts, the routing, the resolution, the citations — is deterministic code reading from authoritative sources. That separation is the design, and it's what makes the answers trustworthy.

Deterministic (no AI)

Intent classification — keyword patterns route the question; the model doesn't pick its own tools.
Course code resolution — renumbering walker traverses graph edges, not guesses.
Knowledge graph queries — Cypher against Neo4j; the edge exists or it doesn't.
Live API calls — ASSIST, class schedule, Program Mapper, eLumen return whatever the source returned.
Course sequencing — pure algorithms (topological sort, pace math) over graph data.
Citations — URLs and source IDs come from the handlers, never from the model.

AI (Gemini)

Composition — takes the structured context the handlers returned and writes the final natural-language answer.
RAG fallback — for unstructured questions ("library hours," "where's financial aid"), retrieves campus-website snippets and answers from them.
Voice — speech-to-speech via Gemini Live, calling the same deterministic handlers as chat for facts.

Five mechanisms that keep the answers grounded

Ground first, generate second. The deterministic handlers run before Gemini sees the question. The model writes prose over already-correct facts — it isn't asked to recall from training data. When a handler returns nothing, the prompt says so explicitly and the model is instructed to admit it doesn't know.
The graph is a fact-checker for RAG. When a website snippet mentions a course code, the orchestrator pulls graph facts about that course (current name, transferability, GE area) and adds them to the prompt as authoritative. A stale catalog page that says "MATH 227" gets corrected to "STAT C1000" inline.
Routing picks the authoritative source. ASSIST.org articulation runs before the knowledge-graph router on transfer questions, because the live agreement is more authoritative than the cached graph. Each question type is matched to the source that's actually authoritative for it.
Citations trace back to real documents. The model doesn't invent URLs. Every cited link came from a handler that retrieved a real document or a real graph row. If you click a citation, it goes to the actual source the system used.
Failure mode is "I don't know," not improvisation. The system prompt and the handler-empty path both push the model toward declining gracefully. Transient model errors trigger a clarifying question instead of a guess. There's also an OpenRouter tier-2 model as a budget fallback — same prompting, same grounding, just a different generator.

The honest edges

A few places where hallucination risk isn't fully eliminated, because no system can promise zero:

The RAG fallback is still RAG. If a campus website is stale, the answer to a website-only question can be stale.
Long compositions can drift on edge details when summarizing many items at once.
Out-of-scope questions ("should I major in nursing?") are advice, not retrieval — grounding can't apply because there's nothing to ground in.

How a question flows through the system

When a question comes in, an orchestrator reads it, classifies the intent, and routes to the right service. The service returns structured context. Gemini composes the answer, cites the source, and hands it back.

Student │ ┌─────┼─────────────────────────┐ │ │ │ Web Voice button Shared phone chat (browser → LiveKit) (IVR → LiveKit) │ │ │ └─────┴───────────┬─────────────┘ ▼ FastAPI backend │ Host header → CampusConfig │ ▼ Chat orchestrator ┌─── intent classifier ───┐ ▼ ▼ ┌───────────────────────────────────────┐ │ • Neo4j knowledge graph │ │ • ASSIST.org live API │ │ • LACCD live class schedule API │ │ • Program Mapper API │ │ • eLumen course-of-record API │ │ • Gemini File Search (per-campus) │ └───────────────────────────────────────┘ │ Structured context ▼ Gemini composes answer + citations │ ▼ Student

Entry layer. Web panel, voice agent, and phone IVR all hand off to the same backend. Voice runs in a separate process but calls the same services as chat — a question answered by typing matches the same question answered by speaking.
Campus resolution. The URL the student arrived on (e.g. lamc-chat.johnnyrobot.ai) maps to a campus config object that carries colors, sample questions, system prompt, retrieval store, voice agent, and phone number. Everything downstream reads from that single object.
Intent classification. The orchestrator scans the question for transfer, prereq, schedule, program, articulation, sequence, instructor, and general-info patterns — then picks one or more handlers.
Specialized handlers. Each of the five data sources has its own service. They return structured context (course rows, articulation tables, live section listings, retrieval snippets) rather than free text.
Composition. Gemini takes the structured context plus the campus's system prompt and writes the final answer with inline citations.
Fallback. If the day's Gemini budget is exhausted, an OpenRouter tier-2 model takes over so the chat stays up.

The data pipeline behind the graph

The knowledge graph isn't crawled at runtime — it's rebuilt from disk so answers are deterministic and fast. The seed pipeline runs once after every deploy.

ASSIST data extracts ─┐ Campus catalog PDFs ──┤ eLumen scrape ────────┼──► Neo4j load scripts ──► Neo4j graph CCN renumbering CSV ──┤ Programs / instructors ┘ Campus website crawls ─┐ Catalog markdown ──────┼──► Gemini File Search ──► Per-campus retrieval store Addendums ─────────────┘

ASSIST.org extracts — official articulation data refreshed on the cadence ASSIST publishes.
Catalog PDFs — each campus's annual catalog, parsed once and synced into the graph.
eLumen scrape — official course-of-record details: units, descriptions, hours.
CCN renumbering map — California's statewide course renumbering, so a question about an old course number resolves to the new one.
Website crawls + catalog markdown — feed the per-campus retrieval store; this is what answers everything that isn't in the structured sources.

The live APIs — ASSIST, LACCD class schedule, Program Mapper, eLumen — are queried on demand at chat time, never cached past their freshness window, so transfer agreements and section availability always reflect the current state.

One app, nine colleges.

Three doorways

Web chat

Live voice agent

Phone

Where the data comes from

Neo4j knowledge graph Rebuilt

ASSIST.org API Live

LACCD class schedule API Live

Program Mapper & eLumen Live

Gemini File Search Indexed

Why the knowledge graph matters

What pure-RAG chatbots can't do — and the graph can

RAG only

With the graph

RAG only

With the graph

RAG only

With the graph

RAG only

With the graph

How the graph lifts every other data source

What's AI, what's deterministic, and how we prevent hallucinations

Deterministic (no AI)

AI (Gemini)

Five mechanisms that keep the answers grounded

The honest edges

How a question flows through the system

The data pipeline behind the graph

Nine schools, one codebase.