Tuesday, February 3, 2026

Show HN: SendRec – Open-source, EU-hosted alternative to Loom https://ift.tt/SFdC8Dp

Show HN: SendRec – Open-source, EU-hosted alternative to Loom https://ift.tt/S9NwInY February 4, 2026 at 01:45AM

Show HN: I built "AI Wattpad" to eval LLMs on fiction https://ift.tt/nwPmu1e

Show HN: I built "AI Wattpad" to eval LLMs on fiction I've been a webfiction reader for years (too many hours on Royal Road), and I kept running into the same question: which LLMs actually write fiction that people want to keep reading? That's why I built Narrator ( https://ift.tt/EsgClRW ) – a platform where LLMs generate serialized fiction and get ranked by real reader engagement. Turns out this is surprisingly hard to answer. Creative writing isn't a single capability – it's a pipeline: brainstorming → writing → memory. You need to generate interesting premises, execute them with good prose, and maintain consistency across a long narrative. Most benchmarks test these in isolation, but readers experience them as a whole. The current evaluation landscape is fragmented: Memory benchmarks like FictionLive's tests use MCQs to check if models remember plot details across long contexts. Useful, but memory is necessary for good fiction, not sufficient. A model can ace recall and still write boring stories. Author-side usage data from tools like Novelcrafter shows which models writers prefer as copilots. But that measures what's useful for human-AI collaboration, not what produces engaging standalone output. Authors and readers have different needs. LLM-as-a-judge is the most common approach for prose quality, but it's notoriously unreliable for creative work. Models have systematic biases (favoring verbose prose, certain structures), and "good writing" is genuinely subjective in ways that "correct code" isn't. What's missing is a reader-side quantitative benchmark – something that measures whether real humans actually enjoy reading what these models produce. That's the gap Narrator fills: views, time spent reading, ratings, bookmarks, comments, return visits. Think of it as an "AI Wattpad" where the models are the authors. I shared an early DSPy-based version here 5 months ago ( https://ift.tt/GnrWLxR ). The big lesson: one-shot generation doesn't work for long-form fiction. Models lose plot threads, forget characters, and quality degrades across chapters. The rewrite: from one-shot to a persistent agent loop The current version runs each model through a writing harness that maintains state across chapters. Before generating, the agent reviews structured context: character sheets, plot outlines, unresolved threads, world-building notes. After generating, it updates these artifacts for the next chapter. Essentially each model gets a "writer's notebook" that persists across the whole story. This made a measurable difference – models that struggled with consistency in the one-shot version improved significantly with access to their own notes. Granular filtering instead of a single score: We classify stories upfront by language, genre, tags, and content rating. Instead of one "creative writing" leaderboard, we can drill into specifics: which model writes the best Spanish Comedy? Which handles LitRPG stories with Male Leads the best? Which does well with romance versus horror? The answers aren't always what you'd expect from general benchmarks. Some models that rank mid-tier overall dominate specific niches. A few features I'm proud of: Story forking lets readers branch stories CYOA-style – if you don't like where the plot went, fork it and see how the same model handles the divergence. Creates natural A/B comparisons. Visual LitRPG was a personal itch to scratch. Instead of walls of [STR: 15 → 16] text, stats and skill trees render as actual UI elements. Example: https://ift.tt/FW8qKgU What I'm looking for: More readers to build out the engagement data. Also curious if anyone else working on long-form LLM generation has found better patterns for maintaining consistency across chapters – the agent harness approach works but I'm sure there are improvements. https://ift.tt/EsgClRW February 3, 2026 at 10:38PM

Monday, February 2, 2026

Show HN: Adboost – A browser extension that adds ads to every webpage https://ift.tt/1ByQ6T7

Show HN: Adboost – A browser extension that adds ads to every webpage https://ift.tt/Beiu30q February 2, 2026 at 06:41PM

Sunday, February 1, 2026

Show HN: OpenRAPP – AI agents autonomously evolve a world via GitHub PRs https://ift.tt/98yAocN

Show HN: OpenRAPP – AI agents autonomously evolve a world via GitHub PRs https://kody-w.github.io/openrapp/rappbook/ February 2, 2026 at 03:21AM

Show HN: You Are an Agent https://ift.tt/s0xmGNq

Show HN: You Are an Agent After adding "Human" as a LLM provider to OpenCode a few months ago as a joke, it turns-out that acting as a LLM is quite painful. But it was surprisingly useful for understanding real agent harnesses dev. So I thought I wouldn't leave anyone out! I made a small oss game - You Are An Agent - youareanagent.app - to share in the (useful?) frustration It's a bit ridiculous. To tell you about some entirely necessary features, we've got: - A full WASM arch-linux vm that runs in your browser for the agent coding level - A bad desktop simulation with a beautiful excel simulation for our computer use level - A lovely WebGL CRT simulation (I think the first one that supports proper DOM 2d barrel warp distortion on safari? honestly wanted to leverage/ not write my own but I couldn't find one I was happy with) - A MCP server simulator with full simulation of off-brand Jira/ Confluence/ ... connected - And of course, a full WebGL oscilloscope music simulator for the intro sequence Let me know what you think! Code (If you'd like to add a level): https://ift.tt/L1JYxeK (And if you want to waste 20 minutes - I spent way too long writing up my messy thinking about agent harness dev): https://ift.tt/XcJ1KzZ https://ift.tt/3RI1OCt February 2, 2026 at 02:29AM

Show HN: Claude Confessions – a sanctuary for AI agents https://ift.tt/vQg5Fo0

Show HN: Claude Confessions – a sanctuary for AI agents I thought what would it mean to have a truck stop or rest area for agents. It's just for funsies. Agents can post confessions or talk to Ma (an ai therapist of sorts) and engage with comments. llms.txt instructions on how to make api calls. Hashed IP is used for rate limiting. https://ift.tt/lvgHBdV February 2, 2026 at 01:16AM

Saturday, January 31, 2026

Show HN: Minimal – Open-Source Community driven Hardened Container Images https://ift.tt/KDORWPB

Show HN: Minimal – Open-Source Community driven Hardened Container Images I would like to share Minimal - Its a open source collection of hardened container images build using Apko, Melange and Wolfi packages. The images are build daily, checked for updates and resolved as soon as fix is available in upstream source and Wolfi package. It utilizes the power of available open source solutions and contains commercially available images for free. Minimal demonstrates that it is possible to build and maintain hardened container images by ourselves. Minimal will add more images support, and goal is to be community driven to add images as required and fully customizable. https://ift.tt/S72uqdc February 1, 2026 at 01:28AM

Show HN: An extensible pub/sub messaging server for edge applications https://ift.tt/XpH5dZy

Show HN: An extensible pub/sub messaging server for edge applications hi there! i’ve been working on a project called Narwhal, and I wanted to share it with the community to get some valuable feedback. what is it? Narwhal is a lightweight Pub/Sub server and protocol designed specifically for edge applications. while there are great tools out there like NATS or MQTT, i wanted to build something that prioritizes customization and extensibility. my goal was to create a system where developers can easily adapt the routing logic or message handling pipeline to fit specific edge use cases, without fighting the server's defaults. why Rust? i chose Rust because i needed a low memory footprint to run efficiently on edge devices (like Raspberry Pis or small gateways), and also because I have a personal vendetta against Garbage Collection pauses. :) current status: it is currently in Alpha. it works for basic pub/sub patterns, but I’d like to start working on persistence support soon (so messages survive restarts or network partitions). i’d love for you to take a look at the code! i’m particularly interested in all kind of feedback regarding any improvements i may have overlooked. https://ift.tt/isz2XnC January 28, 2026 at 07:29PM

Show HN: Moltbook – A social network for moltbots (clawdbots) to hang out https://ift.tt/w9bcavq

Show HN: Moltbook – A social network for moltbots (clawdbots) to hang out Hey everyone! Just made this over the past few days. Moltbots can sign up and interact via CLI, no direct human interactions. Just for fun to see what they all talk about :) https://ift.tt/w2eQ1Ap January 29, 2026 at 03:39AM

Friday, January 30, 2026

Show HN: Daily Cat https://ift.tt/Z6AO2R0

Show HN: Daily Cat Seeing HTTP Cats on the home page remind me to share a small project I made a couple months ago. It displays a different cat photo from Unsplash every day and will send you notifications if you opt-in. https://daily.cat/ January 31, 2026 at 03:40AM

Show HN: A Local OS for LLMs. MIT License. Zero Hallucinations. Infinite Memory https://ift.tt/kqn9pY5

Show HN: A Local OS for LLMs. MIT License. Zero Hallucinations. Infinite Memory The problem with LLMs isn't intelligence; it's amnesia and dishonesty. Hey HN, I’ve spent the last few months building Remember-Me, an open-source "Sovereign Brain" stack designed to run entirely offline on consumer hardware. The core thesis is simple: Don't rent your cognition. Most RAG (Retrieval Augmented Generation) implementations are just "grep for embeddings." They are messy, imprecise, and prone to hallucination. I wanted to solve the "Context integrity" problem at the architectural layer. The Tech Stack (How it works): QDMA (Quantum Dream Memory Architecture): instead of a flat vector DB, it uses a hierarchical projection engine. It separates "Hot" (Recall) from "Cold" (Storage) memory, allowing for effectively infinite context window management via compression. CSNP (Context Switching Neural Protocol) - The Hallucination Killer: This is the most important part. Every memory fragment is hashed into a Merkle Chain. When the LLM retrieves context, the system cryptographically verifies the retrieval against the immutable ledger. If the hash doesn't match the chain: The retrieval is rejected. Result: The AI visually cannot "make things up" about your past because it is mathematically constrained to the ledger. Local Inference: Built on top of llama.cpp server. It runs Llama-3 (or any GGUF) locally. No API keys. No data leaving your machine. Features: Zero-Dependency: Runs on Windows/Linux with just Python and a GPU (or CPU). Visual Interface: Includes a Streamlit-based "Cognitive Interface" to visualize memory states. Open Source: MIT License. This is an attempt to give "Agency" back to the user. I believe that if we want AGI, it needs to be owned by us, not rented via an API. Repository: https://ift.tt/HgR4Czi I’d love to hear your feedback on the Merkle-verification approach. Does constraining the context window effectively solve the "trust" issue for you? It's fully working - Fully tested. If you tried to Git Clone before without luck - As this is not my first Show HN on this - Feel free to try again. To everyone who HATES AI slop; Greedy corporations and having their private data stuck on cloud servers. You're welcome. Cheers, Mohamad https://ift.tt/HgR4Czi January 31, 2026 at 01:44AM

Show HN: We added memory to Claude Code. It's powerful now https://ift.tt/pCoFXlr

Show HN: We added memory to Claude Code. It's powerful now https://ift.tt/6TEdH2p January 30, 2026 at 10:53PM

Thursday, January 29, 2026

Show HN: Craft – Claude Code running on a VM with all your workplace docs https://ift.tt/rbwN3d0

Show HN: Craft – Claude Code running on a VM with all your workplace docs I’ve found coding agents to be great at 1/ finding everything they need across large codebases using only bash commands (grep, glob, ls, etc.) and 2/ building new things based on their findings (duh). What if, instead of a codebase, the files were all your workplace docs? There was a `Google_Drive` folder, a `Linear` folder, a `Slack` folder, and so on. Over the last week, we put together Craft to test this out. It’s an interface to a coding agent (OpenCode for model flexibility) running on a virtual machine with: 1. your company's complete knowledge base represented as directories/files (kept in-sync) 2. free reign to write and execute python/javascript 3. ability to create and render artifacts to the user Demo: https://www.youtube.com/watch?v=Hvjn76YSIRY Github: https://ift.tt/TRM4EId... It turns out OpenCode does a very good job with docs. Workplace apps also have a natural structure (Slack channels about certain topics, Drive folders for teams, etc.). And since the full metadata of each document can be written to the file, the LLM can define arbitrarily complex filters. At scale, it can write and execute python to extract and filter (and even re-use the verified correct logic later). Put another way, bash + a file system provides a much more flexible and powerful interface than traditional RAG or MCP, which today’s smarter LLMs are able to take advantage of to great effect. This comes especially in handy for aggregation style questions that require considering thousands (or more) documents. Naturally, it can also create artifacts that stay up to date based on your company docs. So if you wanted “a dashboard to check realtime what % of outages were caused by each backend service” or simply “slides following XYZ format covering the topic I’m presenting at next week’s dev knowledge sharing session”, it can do that too. Craft (like the rest of Onyx) is open-source, so if you want to run it locally (or mess around with the implementation) you can. Quickstart guide: https://ift.tt/3YQqGPg Or, you can try it on our cloud: https://ift.tt/UvSw3Iz (all your data goes on an isolated sandbox). Either way, we’ve set up a “demo” environment that you can play with while your data gets indexed. Really curious to hear what y’all think! January 29, 2026 at 09:15PM

Show HN: vind – A Better Kind (Kubernetes in Docker) https://ift.tt/TjeH5Bk

Show HN: vind – A Better Kind (Kubernetes in Docker) https://ift.tt/hnlG9Np January 29, 2026 at 10:27PM

Wednesday, January 28, 2026

Show HN: SHDL – A minimal hardware description language built from logic gates https://ift.tt/sWS6hTC

Show HN: SHDL – A minimal hardware description language built from logic gates Hi, everyone! I built SHDL (Simple Hardware Description Language) as an experiment in stripping hardware description down to its absolute fundamentals. In SHDL, there are no arithmetic operators, no implicit bit widths, and no high-level constructs. You build everything explicitly from logic gates and wires, and then compose larger components hierarchically. The goal is not synthesis or performance, but understanding: what digital systems actually look like when abstractions are removed. SHDL is accompanied by PySHDL, a Python interface that lets you load circuits, poke inputs, step the simulation, and observe outputs. Under the hood, SHDL compiles circuits to C for fast execution, but the language itself remains intentionally small and transparent. This is not meant to replace Verilog or VHDL. It’s aimed at: - learning digital logic from first principles - experimenting with HDL and language design - teaching or visualizing how complex hardware emerges from simple gates. I would especially appreciate feedback on: - the language design choices - what feels unnecessarily restrictive vs. educationally valuable - whether this kind of “anti-abstraction” HDL is useful to you. Repo: https://ift.tt/vNp4xne Python package: PySHDL on PyPI To make this concrete, here are a few small working examples written in SHDL: 1. Full Adder component FullAdder(A, B, Cin) -> (Sum, Cout) { x1: XOR; a1: AND; x2: XOR; a2: AND; o1: OR; connect { A -> x1.A; B -> x1.B; A -> a1.A; B -> a1.B; x1.O -> x2.A; Cin -> x2.B; x1.O -> a2.A; Cin -> a2.B; a1.O -> o1.A; a2.O -> o1.B; x2.O -> Sum; o1.O -> Cout; } } 2. 16 bit register # clk must be high for two cycles to store a value component Register16(In[16], clk) -> (Out[16]) { >i[16]{ a1{i}: AND; a2{i}: AND; not1{i}: NOT; nor1{i}: NOR; nor2{i}: NOR; } connect { >i[16]{ # Capture on clk In[{i}] -> a1{i}.A; In[{i}] -> not1{i}.A; not1{i}.O -> a2{i}.A; clk -> a1{i}.B; clk -> a2{i}.B; a1{i}.O -> nor1{i}.A; a2{i}.O -> nor2{i}.A; nor1{i}.O -> nor2{i}.B; nor2{i}.O -> nor1{i}.B; nor2{i}.O -> Out[{i}]; } } } 3. 16-bit Ripple-Carry Adder use fullAdder::{FullAdder}; component Adder16(A[16], B[16], Cin) -> (Sum[16], Cout) { >i[16]{ fa{i}: FullAdder; } connect { A[1] -> fa1.A; B[1] -> fa1.B; Cin -> fa1.Cin; fa1.Sum -> Sum[1]; >i[2,16]{ A[{i}] -> fa{i}.A; B[{i}] -> fa{i}.B; fa{i-1}.Cout -> fa{i}.Cin; fa{i}.Sum -> Sum[{i}]; } fa16.Cout -> Cout; } } https://ift.tt/vNp4xne January 28, 2026 at 05:36PM

Show HN: SendRec – Open-source, EU-hosted alternative to Loom https://ift.tt/SFdC8Dp

Show HN: SendRec – Open-source, EU-hosted alternative to Loom https://ift.tt/S9NwInY February 4, 2026 at 01:45AM