bouvyd’s Twitter Archive—№ 2,103

🧵 I've been experimenting with #RAG (Retrieval-Augmented Generation) to query my codebase, hoping to boost a local #LLM like #Llama or #CodeGemma with code knowledge. But honestly? The results have been underwhelming...
Permalink On twitter.com 2024 Aug 20 Mood +5 🙂

…in reply to @bouvyd
When I dive into the output from a #vectorstore query, it's clear why: the results usually don't capture the structure of the code—just fragments of what's "close" to what the LLM thinks I'm asking about. 🤔
Permalink On twitter.com 2024 Aug 20 Mood +1 🙂

…in reply to @bouvyd
#RAG might be great for unstructured text (like PDFs), but when it comes to code? I'm starting to think it needs a different approach. Code is inherently structured—especially with OOP (classes, attributes, methods, etc.).
Permalink On twitter.com 2024 Aug 20 Mood +5 🙂

…in reply to @bouvyd
So here’s a thought: What if we used a structured system instead? Imagine a relational database storing all the details of your codebase, and the LLM queries it directly via a specialized interface. 🗂️
On twitter.com 2024 Aug 20 Mood 0

…in reply to @bouvyd
This way, the LLM could tap into the full class structure, answer high-level questions, and truly "understand" the code. When I paste an entire class into the AI's context, it gets it perfectly. So why not make this process programmatic?
Permalink On twitter.com 2024 Aug 20 Mood +3 🙂

…in reply to @bouvyd
I've looked around for solutions like this but haven’t found much... Has anyone out there stumbled upon something similar? Would love to hear your thoughts! 💡
Permalink On twitter.com 2024 Aug 20 Mood +9 🙂