This is a autopost bolg frinds we are trying to all latest sports,news,all new update provide for you
Thursday, June 5, 2025
Show HN: Create LLM graders and run evals in JavaScript with one file https://ift.tt/0knyjTt
Show HN: Create LLM graders and run evals in JavaScript with one file Hi HN! Run it: OPENROUTER_API_KEY="sk" npx bff-eval --demo We built a tool to help people take LLM outputs and easily grade them / eval them to know how good an assistant response is. We've built a number of LLM apps, and while we could ship decent tech demos, we were disappointed with how they'd perform over time. We worked with a few companies who had the same problem, and found out scientifically building prompts and evals is far from a solved problem... writing these things feels more like directing a play than coding. Inspired by Anthropic's constitutional ai concepts, and amazing software like DSPy, we're setting out to make fine tuning prompts, not models, the default approach to improving quality using actual metrics and structured debugging techniques. Our approach is pretty simple: you feed it a JSONL file with inputs and outputs, pick the models you want to test against (via OpenRouter), and then use an LLM-as-grader file in JS that figures out how well your outputs match the original queries. If you're starting from scratch, we've found TDD is a great approach to prompt creation... start by asking an LLM to generate synthetic data, then you be the first judge creating scores, then create a grader and continue to refine it till its scores match your ground truth scores. If you’re building LLM apps and care about reliability, I hope this will be useful! Would love any feedback. The team and I are lurking here all day and happy to chat. Or hit me up directly on Whatsapp: +1 (646) 670-1291 We have a lot bigger plans long-term, but we wanted to start with this simple (and hopefully useful!) tool. Run it: OPENROUTER_API_KEY="sk" npx bff-eval --demo https://ift.tt/QDwMlVI June 5, 2025 at 09:50PM
Subscribe to:
Post Comments (Atom)
Show HN: Tablr – Supabase with AI Features https://ift.tt/ltABMro
Show HN: Tablr – Supabase with AI Features https://www.tablr.dev/ June 30, 2025 at 04:35AM
-
Show HN: Locksmith – detect locks taken by Postgres migrations https://ift.tt/0cBueJt February 10, 2025 at 02:26AM
-
Show HN: I built a FOSS tool to run your Steam games in the Cloud I wanted to play my Steam games but my aging PC couldn’t keep up, so I bui...
-
Show HN: TNX API – Natural Language Interactions with Your Database Hey HN! I built TNX API to make working with databases as simple as aski...
No comments:
Post a Comment