Standard LLMs rely on prompt engineering to fix problems (hallucinations, poor response, missing information) that come from issues in the backend architecture. If the backend (corpus processing) is properly built from the ground up, it is possible to offer a full, comprehensive answer to a meaningful prompt, without the need for multiple prompts, rewording your query, having to go through a chat session, or prompt engineering. In this article, I explain how to do it, focusing on enterprise corpuses. The strategy relies on four principles:
➡️ Exact and augmented retrieval ➡️ Showing full context in the response ➡️ Enhanced UI with option menu ➡️ Structured response as opposed to long text
Standard LLMs are trained to predict the next tokens or missing tokens. It requires deep neural networks (DNN) with billions or even trillions of tokens, as highlighted by Jensen Huang, CEO of Nvidia, in his keynote talk at the GTC conference earlier this year. Yet, 10 trillion tokens cover all possible string combinations; the vast majority of them is noise. After all, most people have a vocabulary of about 30k words. But this massive training is necessary to prevent DNNs from getting stuck in sub-optimal configurations due to vanishing gradient and other issues.
What if you could do with a million times less? With mere millions of tokens rather than trillions? Afterall, predicting the next token is a task remotely related to what modern LLMs do. Its history is tied to text auto-filling, guessing missing words, autocorrect and so on, developed initially for tools such as BERT. Now, it’s no different than training a plane to efficiently operate on the runway, but not to fly. It also entices LLM vendors to charge clients by token usage, with little regard to ROI.
Our approach is radically different. We do not use DNNs nor GPUs. It is as much different from standard AI than it is from classical NLP and machine learning. Its origins are similar to other tools that we built including NoGAN, our alternative to GAN for tabular data synthetization. NoGAN — a fast technology with no DNN — performs a lot faster with much better results, even in real-time. The output quality is assessed using our ground-breaking evaluation metric capturing important defects missed by all other benchmarking tools.
In this article, I highlight unique components of xLLM, our new architecture for enterprise.