DSPy Unleashed: We Built a Self-Improving System That Teaches Anything to Anyone

How we're using DSPy to create an autonomous education engine that gets smarter with every question it generates

Oct 03, 2025

CONTEXT

Remember my last article where I wondered if DSPy could solve real problems? Well, I’ve been building something massive that I can’t fully reveal yet (proprietary magic), but oh my - DSPy isn’t just solving problems, it’s enabling an entirely new category of AI systems.

We’re talking tens of thousands of questions. Self-improving pipelines. Agentic workflows that learn from their own outputs. All optimized for Arabic education with some seriously spiky opinions about how learning should work.

Spoiler: This isn’t about prompt engineering anymore. It’s about systems that evolve.

Useful or Not: Declarative Self-improving Python

Stanislav Huseletov

August 13, 2025

Useful or Not: Declarative Self-improving Python

I was put to a question: Declarative Self-improving Python - how to use to create optimized prompts and pipelines that can replace our existing ways of doing things.

Read full story

Questions? Ask our AI Avatar to recap!

What We Actually Built (and Why DSPy Was Perfect)

Imagine an AI system that doesn’t just generate educational content - it understands what makes great teaching, learns from every interaction, and gets better autonomously. That’s what DSPy enabled us to build.

Traditional AI works like this: Generate content → Hope it’s good → Manual review → Maybe improve prompts.

Our DSPy system? Generate → Validate → Learn → Optimize → Repeat... automatically, forever. Every single piece of content makes the system smarter. Not through retraining models (that’s so 2023), but through DSPy’s compilation and optimization. The system literally discovers better patterns for teaching as it runs.

“DSPy is versatile! We’re running it on top of RAG systems, plugging in custom models like Falcon H1-34B, and DSPy just... handles it.”

Here’s the beautiful part about DSPy - it’s incredibly versatile. We’re running it on top of RAG systems, plugging in custom models like Falcon H1-34B (an Arabic-specialized beast of a model), and DSPy just... handles it. Want to swap from GPT-4 to a local Falcon deployment? Change one line. Want to add RAG retrieval to any step? DSPy’s got you. This flexibility meant we could optimize for Arabic education without being locked into Western-centric models.

The Arabic education angle matters here. Most EdTech treats Arabic as an afterthought - translate English content and call it done. That’s nonsense. Arabic education has its own pedagogical traditions, its own ways of explaining concepts, its own cultural contexts that make learning stick. We built this into the system’s DNA. Not as hardcoded rules, but as learned behaviors that DSPy optimizes for. And with Falcon H1-34B understanding Arabic natively, we’re not fighting against the model - we’re working with it.

The DSPy Breakthroughs That Changed Everything

Everyone uses DSPy for single-shot improvements. We discovered you can chain stages where each one has its own retrieval strategy, its own validation criteria, its own optimization path. This isn’t RAG anymore. It’s something new. Each stage learns independently but contributes to a coherent whole.

Here’s what blew my mind - DSPy compilation isn’t just about accuracy. It’s about discovering the shortest path to excellence. Our system now generates content with:

30% fewer tokens (massive cost savings)
40% faster execution
Higher quality than our hand-crafted prompts

And it found these optimizations itself. We just gave it examples of good outputs.

“We’re getting better results than GPT-4 alone, and that’s with scaffolding, image generation, and actual control over every validation step.”

The real magic is this pattern we call Accuracy Mode vs Production Mode. Build a perfect example with expensive models and extensive validation. Then compile that behavior into a fast, cheap production system running on Falcon H1. You get 95% of the quality at 30% of the cost. But here’s the kicker - the production system keeps learning. Every run potentially improves the next.

Look, here’s what matters: We’re getting better results than just throwing prompts at GPT-4 and hoping for the best. Why? Because DSPy lets us control every single step. Not just the generation - the validation, the scaffolding, the image creation, everything. It’s not a black box anymore. You send a prompt to GPT and you get... something. Maybe it’s good, maybe not. With DSPy? We know it’s good because we validated it six different ways, and if it wasn’t good, the system fixed it automatically.

The system found teaching patterns we never programmed. It discovered that certain explanation structures work better for Arabic learners. It identified question types that maximize engagement. All autonomously. Every output goes through multiple validation stages, but failed validations become training data. The system learns from its mistakes in real-time.

What This Means (And What We Can’t Say Yet)

We’re not translating Western pedagogy. We’re building something that understands Arabic educational philosophy from the ground up. Mixed-script mathematics. Cultural examples that resonate. Explanations that follow Arabic rhetorical patterns. This isn’t localization. It’s educational system redesign.

Traditional systems give everyone the same content, maybe filtered by grade. Our system generates content adapted to specific curricula, cultural contexts, and learning styles. In real-time. At scale. We don’t manually review and improve prompts anymore. The system identifies its own weaknesses and fixes them. It’s like having a team of curriculum designers working 24/7, except they never sleep and learn from every single output.

“We’re generating educational content at a scale and quality that shouldn’t be possible. Tens of thousands of perfectly crafted questions.”

There’s so much more I can’t share - the specific architectures, the optimization techniques, the ways we made DSPy do things it wasn’t designed for. But I can tell you this: We’re generating educational content at a scale and quality that shouldn’t be possible. Tens of thousands of perfectly crafted questions. Each one validated, scaffolded, and culturally appropriate. And it’s getting better every day. Automatically.

This isn’t about replacing teachers. It’s about giving every student access to personalized, culturally relevant, constantly improving educational content. A system that knows exactly how to explain algebra to an Arabic-speaking 8th grader. Content that adapts to local curricula while maintaining global standards. Questions that get more effective based on actual learning outcomes.

DSPy isn’t just another AI framework. It’s the key to building systems that improve themselves. We’re done with static AI - the future is systems that evolve.

Follow for technical deep dives once we can share more.

Check out the model

Emin

Oct 3

This is interesting, because current government curriculum materials are boring. It’s difficult to motivate kids to learn when the mandated books are so dull; if they tried to make them even duller, I don’t know how they’d pull it off. I love the concept, but I’m not sure about the details for this DSPy. Is this UAE-only? Who is it for—edtech software developers building a product, educational institutions, or students? Is this an app ready to use, or a backend for building an app?

1 reply

Oct 8

If you are interested to lear more about this DSPy everyone is talking about: https://open.substack.com/pub/trilogyai/p/useful-or-not-dspy

1 more comment...

Useful or Not: Declarative Self-improving Python

Discussion about this post

Ready for more?