Thursday, July 2, 2026

Show HN: Piggy – lazy senior dev mode for AI agents (80–94% less code) https://ift.tt/NnjstZG

Show HN: Piggy – lazy senior dev mode for AI agents (80–94% less code) https://ift.tt/bi3PHOK July 3, 2026 at 12:59AM

Show HN: A provider-agnostic agent loop built on ports and adapters https://ift.tt/SkjxNKF

Show HN: A provider-agnostic agent loop built on ports and adapters I work on agent infra at Featherless. This is MIT and works with any OpenAI-compatible endpoint, not just ours. I kept rebuilding the same loop: call model, run tools, feed results back, stop. Every framework I tried either owned the UI, owned the control flow, or dragged a dependency tree. So I pulled the loop out and put every piece behind an interface: memory, model, tools, stop condition. The loop depends only on the interfaces. It never writes to a screen. It emits one typed event stream, so a trace is just data, and you render it however you want. The landing page scrubs one run and rebuilds a CLI, a DOM timeline, and raw JSONL from the same stream. One dependency (zod). Same build runs in Node, Bun, Deno, and a browser tab. Every seam is tested in isolation with deterministic doubles, no network. Why not the Vercel AI SDK, pi, or LangGraph: AI SDK owns more of the surface and has been awkward with self-hosted tool calling. pi is a great coding-agent toolkit but it's shaped around being a coding agent and ships a TUI. LangGraph is a heavier graph framework. This is the layer under all of those: the bare loop you'd build any of them on. Happy to be told where the seams are wrong. If anyone finds any problems let me know this field moves at break neck speed so let me know if I am missing anything. https://ift.tt/6hECm2k July 3, 2026 at 12:52AM

Show HN: Inkwell – An RSS reader for e-ink devices https://ift.tt/TCNyu5j

Show HN: Inkwell – An RSS reader for e-ink devices https://ift.tt/DHSyVBX July 2, 2026 at 09:08PM

Show HN: ctx – Search the coding agent history already on your machine https://ift.tt/FGkAil5

Show HN: ctx – Search the coding agent history already on your machine Coding agents don't have long-term memory. But you do have months of full-fidelity agent transcripts stored on your machine. A simple solution that goes a long way: ingest those transcripts and logs into a structured SQLite database, then search them with ranked text match. Everything is fully local and doesn't require anything fancy like a graph database or hosted memory service. This is the idea behind ctx, a Rust CLI that handles the ingestion and searching. We give our agents a skill that tells them to reference past sessions before working in an area. Usually we do this through an "Agent History Research Subagent" whose job is just to prepare a short brief covering any relevant history before the task begins. A real example: sometimes our test suite runs would fail because disk was full on the runner. The correct approach was to run the cleanup runbook, but the root cause of the failure was not clear to the agents, so they would think it was a test regression and go down the wrong rabbit hole debugging. When the agent searched history, it realized this failure had been encountered before and found the right workaround immediately. That got the agent onto the right cleanup path, and later we improved the log output so the same failure would be clearer next time. It's a boring story, but it's real agent productivity. Another nice use case is quickly generating session transcripts for sharing. You can exclude the noisy intermediate messages, so the transcript shows the important parts of the session more cleanly. Try attaching a session transcript to your next PR so your teammate and their agent can review the provenance and prompting behind the change. If you're up for an additional challenge, ask your agent to "exhaustively review all agent history in this repo and find where the SDLC is struggling or isn't agent-native". Using past sessions to recursively improve the agentic SDLC is a loop that we're using a lot today. If you try it out, please let us know what you think! https://ift.tt/XrnplYJ July 2, 2026 at 09:28PM

Wednesday, July 1, 2026

Show HN: Searchable directory of 22k+ products from worker-owned co-ops https://ift.tt/61wyXHb

Show HN: Searchable directory of 22k+ products from worker-owned co-ops https://ift.tt/v2Aya0R July 2, 2026 at 02:17AM

Show HN: Z-Jail – A 130 KB Linux sandbox-C99 with 7 defense layers and zero deps https://ift.tt/QZ5n6wC

Show HN: Z-Jail – A 130 KB Linux sandbox-C99 with 7 defense layers and zero deps https://ift.tt/QulTK1E July 2, 2026 at 12:48AM

Show HN: QR code renderer in a TrueType font https://ift.tt/GhjbSnx

Show HN: QR code renderer in a TrueType font In the "Libre Barcode Project" discussion yesterday, 1bpp asked: "Is anyone willing to sacrifice their sanity for the sake of implementing a QR renderer as TTF hinting code?" Yes. I had some tokens to burn and was curious... turns out, it's possible. This was put together by a mix of Gemini, GPT, and Claude (depending on which usage limits kept running out). https://qr.jim.sh/ June 28, 2026 at 06:07AM

Tuesday, June 30, 2026

Show HN: Shot-scraper video tool for recording YAML-defined webapp feature demos https://ift.tt/5Si3Fsp

Show HN: Shot-scraper video tool for recording YAML-defined webapp feature demos https://ift.tt/kDyf7Ic June 30, 2026 at 10:28PM

Monday, June 29, 2026

Show HN: Fleet – a local-first console for managing Dockerized Hermes AI Agents https://ift.tt/oftlOpU

Show HN: Fleet – a local-first console for managing Dockerized Hermes AI Agents https://ift.tt/PEwmhkK June 30, 2026 at 02:01AM

Show HN: The UNESCO Tsunami Warning Emails Are Gone https://ift.tt/drnACw2

Show HN: The UNESCO Tsunami Warning Emails Are Gone This key piece of tsunami warning and safety was discontinued this morning and evidently there's no way to get it back. :/ https://ift.tt/UbNvXEd June 29, 2026 at 11:36PM

Sunday, June 28, 2026

Show HN: Use-zerostack – delegate any task to a lightweight coding agent https://ift.tt/PluCqiK

Show HN: Use-zerostack – delegate any task to a lightweight coding agent https://ift.tt/BerTGPY June 29, 2026 at 01:03AM

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch https://ift.tt/HBuk6Yf

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch Hi everyone, I started working on nanoeuler after the ban of anthropic's fable because my ambition and dream is to work in the AI field in anthropic. The two interesting reasons that led me to create nanoeuler were (1) interfacing with llm does not mean understanding how they are composed and (2), working on llm with a very low-level layer to understand the correlation between parameters and data and growth of the model and how the GPU works and how some layers can be optimized. So I started working on it with a research aspect by making nanoeuler grow more and more but doing one step after another starting from Shakespeare.txt and understanding what a text generation model understands at 23 million parameters. For example, nanoeuler at that number had understood that Name: started a line and wrote that line with sense. I wrote everything in CUDA because I wanted to not use any intermediary between the model in training and inference and what it had to do. Then the use of SFT and much more, even if in small ways, were really useful to understand the various step to make an llm like a chatbot.Any feedback, help, or suggestions are absolutely welcome! https://ift.tt/aVdXS2O June 29, 2026 at 01:08AM

Show HN: Caliper – pass@k reliability testing for Claude Code and Codex skills https://ift.tt/qgayukA

Show HN: Caliper – pass@k reliability testing for Claude Code and Codex skills Skills for Claude Code and Codex are hard to test. What I mean by hard is that there's no standard way to do it. You evaluate the skill once on something, it looks like it works. You publish it. Then the new super model releases (GLM 5.2 anyone?), it will quietly break for some part, and you won't find out until your users complain. I also faced the same problem, so I tried to build something lightweight to stop doing that. Caliper. It's a local and lightweight harness that runs a skill k times in isolated environments and gives you a pass@k score (How much times it succeeded in these k times). As a non-deterministic technology, you can't just say "it worked once". You need to answer how much it passed in k times. You define success in a YAML spec. I picked YAML to keep a schema and make it still readable for a human. You either use a LLM judge, a Python assertion, or both: Here's an simple evaluation example with a JSON extraction, so you write this in a YAML file: tasks: - name: Extracts action items as clean JSON prompt: "Read /tmp/transcript.txt and write the action items to /tmp/actions.json." expect: "A valid JSON array where every item has owner, task, due. No markdown fences." assert: | import json items = json.load(open("/tmp/actions.json")) assert isinstance(items, list) assert all({"owner","task","due"} <= i.keys() for i in items) Then with the CLI, you'll run it: caliper run extract-actions.eval.yaml --k 5 --baseline What's cool about the --baseline flag is that it will re-runs everything without the skill, so you can see whether the skill is doing the work or the base agent was going to pass anyway: ID Task k(5) pass@k task-1 Extracts action items as JSON 5/5 100% PASS With skill 100% No skill 60% Delta +40% Most models know how to get the JSON right most of the time (JSON extraction was solved by 2 years old already). But that's it, "most of the time" is the bug. That delta shows how the skill actually helped. (It's sometimes 0%, sometimes -100%!) I also created two skills you can get started right away with your favorite harness, e.g. Claude Code, Codex or Pi: - evaluate-skill: run and manage evals without leaving your workflow - grill-skill: reads your SKILL.md, interviews you about what "good" looks like, writes a 3-task spec (happy path, edge case, adversarial), and runs it You can install the skill with the command: npx skills@latest add edonadei/caliper I for now support claude-code, codex, pi, claude-api, openai-api. You can run the agent and the judge as separate backends, so you can run a skill on one and judge with another. GitHub: https://github.com/edonadei/caliper PyPI: https://pypi.org/project/caliper-eval/ Of course, it's a first step. I think the autorater layer can be vastly improved, more handholding to create and iterate on evaluation specs, supporting more harness, why not including this layer into a self-improvement bigger system? If you're also building agentic evaluations, I'm genuinely interested to hear how you are handling that. https://github.com/edonadei/caliper June 28, 2026 at 11:12PM

Saturday, June 27, 2026

Show HN: Starglyphs - A constellation puzzle game based on Euler paths https://ift.tt/9jCuPHo

Show HN: Starglyphs - A constellation puzzle game based on Euler paths I am a big Dragon Age fan and sunk hundreds of hours into Inquisition. It had this minigame called astrariums where you had to solve these shapes based on constellation guides by tracing stars. I'm a hobby game dev and wondered if I could procedurally generate these puzzles so they were always solvable. Turns out you can, so I built a space puzzle game around it with a colorful aesthetic. I released it in web form here but I'm currently working on getting it on Steam and mobile. https://starglyphs.com June 28, 2026 at 03:20AM

Show HN: Adrafinil – keep a lid-closed Mac awake only while agents work https://ift.tt/uYUrhoj

Show HN: Adrafinil – keep a lid-closed Mac awake only while agents work A month ago there was a wave of posts and tweets about engineers walking around cafes and parks with their MacBooks propped half-open, as fully closing the lid forces sleep that stops their AI agents. Some people made snarky comments about using tmux or Amphetamine, and some defended their choice with “but I only need it sometimes, and forgetting to disable Amphetamine and finding my laptop discharged in my bag is worse.” This is a solution to this problem. Unlike caffeinate, it will prevent your MacBook from sleeping even with the lid closed, with no external power or display, using pmset disablesleep 1. Unlike other sleep-preventing apps, Adrafinil only activates when there’s an agent actively doing something. It detects agent activity through hooks it installs into Claude Code, Codex, and others. To reassure you it’s working, the app shows the active status in the menu bar, and it plays a chime when you close the lid. Once the agent is done, Adrafinil detects it and lets the laptop go to sleep by setting pmset disablesleep back to 0. It will also let it sleep in case of overheating. And if you want to manually toggle it, you can install an optional MCP and tell your agent to keep the MacBook awake for a specific time. It has four binaries, one of which is a root helper exposing a single setSleepBlocked call. All the logic and policy live in the unprivileged parts. They’re all notarized, and the app is fully open source (MIT). https://ift.tt/6YeD5Em June 28, 2026 at 02:04AM

Show HN: Wind particles on Mapbox from a single EXIF JPEG https://ift.tt/xidvVYu

Show HN: Wind particles on Mapbox from a single EXIF JPEG https://ift.tt/tqXv0HP June 27, 2026 at 11:46PM

Show HN: A Living Neural Web in HTML5 Canvas https://ift.tt/gTivKnV

Show HN: A Living Neural Web in HTML5 Canvas https://techoreon.github.io/verpad/canvas-playground.html June 27, 2026 at 10:05PM

Friday, June 26, 2026

Show HN: TBD, a Mac-native CLI-forward coding agent multiplexer https://ift.tt/2I3TKB7

Show HN: TBD, a Mac-native CLI-forward coding agent multiplexer Inspired by Conductor, dmux, claude-squad, agent-deck, and Git Tower ## What makes it different: (Aside from GUI) A core tenet is -- everything a user can do manually, must be exposed via CLI for agents/automation Best paired with something that lets agents in different worktrees talk to each other (e.g. https://ift.tt/HTjYahr ) ## Background: I used and loved Conductor for months starting around January, but hit some persistent issues that made me realize that a core tool that I'm actively using for most of my waking hours sits too close to my skin to produce itches that I can't scratch myself After realizing I needed to switch to something hackable, I went through a few week-ish long trials of dmux, claude-squad, and agent-deck. They were all great, but I then realized I really didn't want to memorize keyboard shortcuts, and I've managed to put off learning how to drive tmux for over a decade, didn't want to end that streak XD So TBD happened in March. In the months since, it's gotten stable enough to the point where a few former and current colleagues have switched to using it as their daily drivers as well. It's been kind of like a fun little club house we contribute to The architecture is a daemon that handles the bulk of state management and actual work, and CLI and GUI clients as two interfaces. Users go through GUI, LLMs and scripts go through CLI. It works best for Claude Code (our shared daily drivers) but two of us also use Codex on the side, so there's some basic support there as well The only way to run it is to clone and build from source, partially b/c I imagine the main appeal is for people who need to hack on the thing they're using (but also b/c didn't want to shell out for an Apple dev license) I think it's now a good enough starting point for similarly minded folks to use as a base to fork and build your own variants, tailored to your own workflows https://ift.tt/Pmz4Fkp June 26, 2026 at 10:29PM

Show HN: Mantis, A self-hosted LLM gateway https://ift.tt/9uEkyB0

Show HN: Mantis, A self-hosted LLM gateway Hey HNers - Riz here. I got together with a few guys and we built an LLM gateway. It's designed for small teams working on early-stage products, and can be deployed to AWS using a single command (i.e. `mantis deploy`). It's self-hosted, and is designed to belong to you. https://ift.tt/tLYE9eq June 27, 2026 at 12:45AM

Show HN: Puzzle with Strangers. A free multiplayer jigsaw https://ift.tt/es4avwt

Show HN: Puzzle with Strangers. A free multiplayer jigsaw I built this over the last few days. Me and handful of friends are successfully hooked. I recently went to a — for lack of a better word – social/collaborative performance at an art gallery in Berlin where a group of artists filled a huge industrial hall with wooden 10x10cm cubes for people to build structures with. It was beautiful how universal the concept of playing with wooden blocks is and how ephemeral the structures were, people of all ages were put back into a childlike play. The thought about what kind of games need zero explanation stuck with me and i built an anonymous multiplayer jigsaw. We've already spent hours in there and you're invited now as well. Hope you enjoy. https://ift.tt/okCpys9 June 26, 2026 at 10:17PM

Show HN: Piggy – lazy senior dev mode for AI agents (80–94% less code) https://ift.tt/NnjstZG

Show HN: Piggy – lazy senior dev mode for AI agents (80–94% less code) https://ift.tt/bi3PHOK July 3, 2026 at 12:59AM