This week a lot of research went into the context engine and the ai portion of the app. To make the ai apps scalable and robust from the get go six services where built comprised of two kubernetes clusters. First up the cpu k8 cluster receives all services required for the llm engine to run which dont require a gpu. Five services where assigned to this cluster namely:
-audio-processor (STT)
-redis (short term memory)
-supabase (llm optimized persistency store based on postgresql)
-tts
-webhook
On the other gpu cluster the LLM model selected is mistral a fined tuned model that is trained on multimodal conversations. It is empathic and patient to deal with the nuances of human speach.
It supports barge in and interruptions to loop in a human when required or interruptions from the user natural to human conversation.
This is the territory of the unknown where research and connecting the dots of these micro services has been instrumental to get a functioning end to end workflow. I am planning to complete e2e testing this week as I close out my third sprint.
Plans are to demo this intial test next week.
Leave a Reply