VibeBuilders.ai Logo
VibeBuilders.ai
[D] Is this close enough to be usable? Need your inputs: Automated RAG testing tool. AI Data Pipelines for Real-World Production (Part 3)

[D] Is this close enough to be usable? Need your inputs: Automated RAG testing tool. AI Data Pipelines for Real-World Production (Part 3)

Snoo-bedooo
April 15, 2025
reddit

Hey there, Redditors!

I'm back with the latest installment on creating dependable AI data pipelines for real-world production.

If you've been following along, you know I'm on a mission to move beyond the "thin OpenAI wrapper" trend and tackle the challenges of building robust data pipelines.

With 18 months of hands-on experience and many user interviews, I realized that with the probabilistic nature of systems, we need better_testing.gpt:

1. As you build you should test
The world of AI is a fast-moving one, and we've realized that just working on systems is not an optimal design choice. By the time your product ships, it might already be using outdated technology. So, what's the lesson here? Embrace change, test along, but be prepared to switch pace.
2. No Best Practices Yet for RAGs
In this rapidly evolving landscape, there are no established best practices. You'll need to make educated bets on tools and processes, knowing that things will change. With the RAG testing tool, I tried allowing for testing many potential parameter combinations automatically
3. Testing Frameworks
If your generative AI product doesn't have users giving feedback, then you are building in isolation. I used Deepeval to generate test sets, and they will soon support synthetic test set generation
4. Infographics only go so far
AI researchers and data scientists, while brilliant, end up in a loop of pursuing Twitter promotional content. New ways are promoted via new content pieces, but ideally, we need something above simple tracing but less than full-fledged analytics. To do this, I stored test outputs in Postgres and created a Superset instance to visualize the results
5. Bridging the Gap between VectorDBs
There's a noticeable number of Vector DBs. To ensure smooth product development, we need to be able to switch to best best-performing one, especially since user interviews signal that they might start deteriorating after loading 50 million rows

Github repo is here

Next steps:
I have questions for you:

  1. What variables do you change when building RAGs?
  2. What is the set of strategies I should add to the solution? (parent-son etc.)
  3. How can I improve it in general?
  4. Is anyone interested in a leaderboard for best parameter configs?

Check out the blog post:

Link to part 3

Remember to give this post an upvote if you found it insightful!
And also star our Github repo

Vibe Score

LLM Vibe Score

0

Sentiment

Human Vibe Score

1

Rate this Resource

Join the VibeBuilders.ai Newsletter

The newsletter helps digital entrepreneurs how to harness AI to build your own assets for your funnel & ecosystem without bloating your subscription costs.

Start the free 5-day AI Captain's Command Line Bootcamp when you sign up:

By subscribing, you agree to our Privacy Policy and Terms of Service.