Logo

How We Built a High-Performance, Low-Cost Multi Agent AI Education Consultant

blog-image
In the rapidly evolving world of EdTech, the distance between a student and their ideal university often feels like navigating a labyrinth blindfolded. Mountains of contradictory information, outdated course catalogs, application deadlines that vary by country, scholarship requirements buried in PDF footnotes, it's overwhelming. At LQ Digital, our AI/Product Department witnessed this firsthand. We heard the same story repeatedly: "I just want to know if this university is right for me, but I can't afford to manage my time for a consultation. I don't know how to book one, and if I have to stand in a long line, my whole day is wasted."

The Vision: A 24/7 Digital Counselor


We decided to build the Multi-Agent Chatbot because we didn't want to create just another bot that replies to your FAQs. Our goal was a scalable, multi-agent AI consultant that provides personalized academic guidance, one that genuinely understands a student’s unique situation and respects their time.

Imagine Ayush, a high school senior in the bustling city of Kathmandu. It’s 2 AM, and he’s scrolling through endless websites trying to find an engineering program that fits his budget. In the traditional world, Ayush would have to spend his afternoon traveling to an office, waiting in a long line, and hoping a consultant has an opening. With our consultation firm, that "waiting room" doesn't exist.

We envisioned an AI that doesn't just answer questions, but asks the right questions back. We built a system that:

  • Detects Confusion: If a student is unsure, the AI pivots the conversation naturally to provide clarity instead of a data dump.
  • Proactive Guidance: It guides the user through the hypothetical maze leading from general interest (Science) to specific degrees (Physiotherapy).
  • Smart Scheduling: Most importantly, it bridges the gap to human expertise by booking consultation services according to the user’s specific availability. No more wasting time in lines, the AI captures the intent and sets the stage for a seamless follow-up.
  • Why This is a Game-Changer


    By removing the physical and temporal barriers, we’ve created a system that:

  • Values Time: Ayush gets high-fidelity data on tuition and IELTS requirements in seconds, not hours. Ayush doesn't need to go to consultancy to understand all these requirements
  • Contextual Memory: The bot maintains his identity across the session, making the interaction feel like a continuous consultation.
  • Frictionless Booking: By offering to book a consultation at the exact moment the student finds a relevant course, we eliminate the "long line" fatigue entirely.
  • The Tech Stack Under the Hood


    To achieve this vision, we quickly realized that traditional "monolithic" chatbot architectures wouldn't cut it. You know the type where there’s one massive prompt trying to do everything, getting confused, going off-topic and hallucinating university names that don't exist. We needed something smarter, something that could actually think through complex multi-step queries.

    That's when we embraced an Agentic Framework, where instead of one "brain" trying to do everything, we built a team of specialized AI agents that can collaborate:

  • The Brain: Google Gemini 2.5 Flash-Lite – We chose this model specifically because it's optimized for both speed and long-context reasoning. It can remember the entire conversation thread, understand context from ten messages ago, and still respond quickly. For students who might be chatting on mobile with spotty connections, speed matters.
  • The Conductor: LangGraph – This is where the magic happens. LangGraph orchestrates 4-5 "agent hops" per student message. All of this happens in seconds, creating what feels like a thoughtful conversation with an expert counselor.
  • The Memory: PostgreSQL with pgvector – We store thousands of university courses, programs, and requirements in a vector database.
  • The Engine: FastAPI + Python 3.13 – Deployed on Google Cloud Run, giving us the flexibility to scale up or down based on actual demand rather than guessing.
  • Performance Testing: When Theory Meets Reality


    Building the system was one thing. Proving it could handle real-world traffic was another. We've all seen AI demos that fall apart the moment ten people try to use them simultaneously. We didn't want to be that team.

    So we got ruthless with testing. Using Locust, we simulated hundreds of concurrent students all asking complex questions at the same time. The kind of usage spike you'd see when application deadlines approach or when a university posts new scholarship opportunities.

    We set up scenarios mimicking real student behavior: some asking simple questions, others having long multi-turn conversations, a few uploading documents, some switching languages mid-conversation. We wanted to see where it would break.

    The Results surprised even us:

  • Stability: 0% Failure Rate – Over hundreds of concurrent requests, not a single timeout or crash. The system gracefully handled the load without breaking a sweat.
  • Intelligence: A median of 1,004 tokens per request – This number tells an important story. It's substantial enough to prove our agents are giving thoughtful, detailed responses (not just "here's a link, good luck"), but efficient enough that we're not rambling or hallucinating. Our agents stay focused and on-topic.
  • Latency: P50 of 3.21 seconds – Now, we know what you're thinking: "3.2 seconds? My API responds in 200ms!" But here's the thing: our chatbot isn't just fetching data from a database. It's reasoning. It's querying external systems, comparing options, formulating personalized advice. And here's something we discovered in user testing: that 3.2 seconds actually creates a more natural, conversational rhythm. When you ask a human counselor a complex question, they pause to think. They might say "hmm, let me consider your situation..." That brief moment signals that real thinking is happening. Instant responses can feel robotic. Our 3.2 seconds feels thoughtful.
  • The Financial "Aha!" Moment: Why Serverless Wins (And Why We Almost Got It Wrong)


    Here's where things got really interesting. Like most teams, our initial instinct was to spin up a Virtual Private Server (VPS). It's what everyone does, right? Rent a server, install your app, you're done. Simple, predictable monthly cost.

    But then we ran the numbers, and we had our "wait, this doesn't make sense" moment.

    Our app is IO-bound, not CPU-heavy. Most of the time, our server is just waiting. Waiting for Gemini to finish processing. Waiting for database queries to return. Waiting for external APIs to respond. The CPU is sitting there, twiddling its thumbs, consuming electricity and our budget, doing absolutely nothing 85-90% of the time.

    With a traditional VPS, we'd be paying for a machine running 24/7, even when:

  • Students are sleeping (midnight to 6 AM in most timezones)
  • It's Sunday afternoon and everyone's with family
  • Application season is over and traffic drops by 70%

  • That's like renting a bus to drive you to work even though you only commute twice a week. Which is financially absurd.

    Why we fell in love with Google Cloud Run:


  • Pay-as-you-go Billing – This was the game-changer. We only pay when a student is actually typing a message. Not when they're reading responses. Not when they're thinking. Not when they're asleep. Just when our code is running. It's like having a taxi meter instead of a car lease.
  • Zero-to-Hero Scaling – Imagine a university launches a marketing campaign, and suddenly 10,000 students hit our chatbot simultaneously. On a VPS? We'd crash spectacularly. Our team would get emergency alerts at 3 AM, scrambling to spin up more servers while students see error messages and lose trust. With Cloud Run? It automatically spins up new instances in seconds.No downtime. It handles it. Then scales back down when traffic subsides.
  • No "DevOps Tax" – This one's harder to quantify but equally important. With Cloud Run, SSL certificates renew automatically. Health checks are built-in. Deployments happen with a single command. Updates are zero-downtime. Our developers spend time building features students actually want, not firefighting server issues at 2 AM or debugging why the load balancer isn't distributing traffic correctly. That time savings translates directly into better product development.
  • Conclusion


    Our journey with the Multi-Agent Chatbot has taught us something crucial: The barrier to entry for sophisticated, helpful AI is no longer technology or cost: it's architecture and intent. The motto behind building a multi-agent chatbot was never just to automate messages and replies; it was to make life easier for students planning their futures abroad by saving their time and making the consultation booking simpler through our chatbot. We recognized that for a student, time is the most precious currency.

    You don't need massive budgets. You don't need a team of 50 engineers. You do need:

  • The right model for the job (Gemini Flash-Lite over expensive alternatives)
  • The right logic framework (LangGraph for agent orchestration)
  • The right infrastructure (Cloud Run for intelligent scaling)
  • And most importantly, the right purpose: building something that genuinely would help people and make their life easier.

  • We've built a system that doesn't replace human counselors; it amplifies them. It handles the initial questions, the data retrieval, the 2 AM panic searches, freeing human experts to focus on the complex, deeply personal guidance that only humans can provide.

    And we've proven you can do it for the cost of a pizza.

    The next generation of students deserves better than information overwhelm and expensive gatekeepers. They deserve a digital counselor that's always there, always patient, always working to help them find their path.

    That's exactly what we built. And we're just getting started.

    Ready to Build a Website That ActuallyConverts?

    Let us help you create a high-performing digital experience that drives measurable results and long-term growth.

    Services

    Solutions

    Booking & Scheduling Systems

    Client Management Systems (CRM)

    Custom Dashboards & Portals

    E-Commerce Enhancements

    Automation Tools

    Reporting Templates

    Newsletter for updates

    Subscribe to get more updates

    icon

    Connect with Us

    icon
    icon
    icon
    icon
    icon
    icon
    icon
    icon
    Australia Flag

    In the spirit of reconciliation, LangQuang acknowledges the Traditional Custodians of Country across Australia and their enduring connection to land, sea, and community. We pay respect to their Elders past and present and extend that respect to all First Nations peoples with disability, their families, and careers.

    © 2026 Langquang. All rights reserved.

    Privacy Policy

    Hi there!