Best Way to Learn About AI | VibeBuilders.ai Resource Directory

Hey there, Redditors!

I'm back with the latest installment on creating dependable AI data pipelines for real-world production.

If you've been following along, you know I'm on a mission to move beyond the "thin OpenAI wrapper" trend and tackle the challenges of building robust data pipelines.

With 18 months of hands-on experience and many user interviews, I realized that with the probabilistic nature of systems, we need better_testing.gpt:

1. As you build you should test
The world of AI is a fast-moving one, and we've realized that just working on systems is not an optimal design choice. By the time your product ships, it might already be using outdated technology. So, what's the lesson here? Embrace change, test along, but be prepared to switch pace.
2. No Best Practices Yet for RAGs
In this rapidly evolving landscape, there are no established best practices. You'll need to make educated bets on tools and processes, knowing that things will change. With the RAG testing tool, I tried allowing for testing many potential parameter combinations automatically
3. Testing Frameworks
If your generative AI product doesn't have users giving feedback, then you are building in isolation. I used Deepeval to generate test sets, and they will soon support synthetic test set generation
4. Infographics only go so far
AI researchers and data scientists, while brilliant, end up in a loop of pursuing Twitter promotional content. New ways are promoted via new content pieces, but ideally, we need something above simple tracing but less than full-fledged analytics. To do this, I stored test outputs in Postgres and created a Superset instance to visualize the results
5. Bridging the Gap between VectorDBs
There's a noticeable number of Vector DBs. To ensure smooth product development, we need to be able to switch to best best-performing one, especially since user interviews signal that they might start deteriorating after loading 50 million rows

Github repo is here

Next steps:
I have questions for you:

What variables do you change when building RAGs?
What is the set of strategies I should add to the solution? (parent-son etc.)
How can I improve it in general?
Is anyone interested in a leaderboard for best parameter configs?

Check out the blog post:

Link to part 3

Remember to give this post an upvote if you found it insightful!
And also star our Github repo

[D] Is this close enough to be usable? Need your inputs: Automated RAG testing tool. AI Data Pipelines for Real-World Production (Part 3)

Rate this Resource

Join the VibeBuilders.ai Newsletter

Topics