2026

Streaming AI Chat Assistant with Voice Input and Tool Execution

Product Feature · Periskope

Built a context-aware AI assistant for Periskope using Gemini and SSE, enabling support agents to automate CRM actions and summarize WhatsApp chats via real-time streaming and voice.

Streaming
SSE
Gemini
Voice Input
React
TypeScript
BullMQ
PlatformWeb · API · AI Infrastructure
ClientPeriskope
My Role
AI Engineer
Full-Stack Engineer
Streaming AI Chat Assistant with Voice Input and Tool Execution

The Friction of Context Switching

In a high-volume WhatsApp support environment, agents spend a disproportionate amount of time oscillating between the chat window and the CRM. They need to summarize long threads, create tickets, and retrieve historical context—all while maintaining a fast response time. I built the Periskope Chat Assistant to collapse this workflow into a single, context-aware command center.

The goal wasn't just to build a chatbot, but an 'agentic' sidekick that has full visibility into the active conversation. It needed to be fast, feel instant, and be capable of taking real-world actions like creating notes or fetching external context without the user ever leaving the dashboard.

Context is king: The assistant is injected with the last 20 messages of the active WhatsApp thread as dynamic system context before every request.

Architecting for Real-Time Delivery

To make the AI feel responsive, I opted for Server-Sent Events (SSE) instead of traditional REST or WebSockets. SSE provides a unidirectional stream from the server to the client, which is perfect for LLM token streaming. This avoids the 'waiting for completion' spinner, letting agents start reading the first word of a summary while the rest is still being generated.

On the backend, the Chat Assistant handler manages a complex lifecycle. It maintains a 30-minute session TTL to keep conversations coherent without bloating the database. We use Google Gemini via Vertex AI, which provides the low-latency performance required for a smooth streaming experience.

The Tool Calling Loop

The real power of this assistant lies in its ability to call tools. I implemented a tool-calling loop that allows the model to decide if it needs more information or needs to perform an action. For example, if a user asks 'Did we resolve his previous issue?', the model can invoke `get_previous_messages` before answering.

I limited this loop to 10 iterations per message to prevent 'infinite thinking' loops and runaway API costs. Each tool call is rendered in the UI with a distinct state, allowing the agent to see exactly which systems the AI is interacting with in real-time.

10 per turn
Max Iterations
30 Minutes
Session TTL
50 per session
Message Limit

Integrating Voice and Transcription

Many support agents work in fast-paced environments where typing isn't always efficient. I added a voice input layer using the browser's MediaRecorder API. The audio is captured, base64-encoded, and sent to a transcription endpoint before the text is passed to the LLM agent.

Handling base64 audio required careful memory management on the frontend to prevent crashes during long recordings. Once transcribed, the text is treated as a standard prompt, triggering the same tool-calling and streaming pipeline used for text input.

Voice input reduced the time to create complex CRM tickets by approximately 40% compared to manual data entry.

Engineering Challenges: State and Latency

One of the biggest hurdles was managing state between the streaming response and tool execution. If a tool call takes 3 seconds, the stream pauses, which can look like a network failure. I implemented intermediate 'status' tokens in the SSE stream (e.g., `[ACTION: fetch_context]`) to tell the UI to show a loading indicator for that specific tool.

Memory management was another concern. With a 30-minute session TTL and a 50-message limit, I had to ensure the backend wasn't leaking memory from abandoned SSE connections. We implemented a robust heartbeat and cleanup routine that terminates inactive streams and clears the associated session buffers.

chatAssistantHandler.ts
async function handleToolCalls(calls: ToolCall[]) {
  for (const call of calls) {
    if (call.name === 'create_ticket') {
      const result = await ticketService.create(call.args);
      assistant.pushFeedback(result);
    }
    // Iterative feedback loop allows Gemini to 'react' to results
  }
}

Reflections and Future Improvements

If I were to rebuild this today, I would move toward a more robust RAG (Retrieval-Augmented Generation) pipeline for fetching customer history. Currently, the tool-calling approach works well, but a vector database would allow the assistant to search across thousands of historical conversations more efficiently.

Overall, the transition from a simple chatbot to a tool-augmented assistant has significantly boosted agent productivity. It shifted the AI's role from a 'writer' to an 'operator' that understands the context of the business and the specific needs of the customer conversation.