Thursday, August 28, 2025

Show HN: A private, flat monthly subscription for open-source LLMs https://ift.tt/q2m8hdw

Show HN: A private, flat monthly subscription for open-source LLMs Hey HN! We've run our privacy-focused open-source inference company for a while now, and we're launching a flat monthly subscription similar to Anthropic's. It should work with Cline, Roo, KiloCode, Aider, etc — any OpenAI-compatible API client should do. The rate limits at every tier are higher than the Claude rate limits, so even if you prefer using Claude it can be a helpful backup for when you're rate limited, for a pretty low price. Let me know if you have any feedback! https://ift.tt/PxrRW43 August 29, 2025 at 12:33AM

Show HN: Knowledgework – AI Extensions of Your Coworkers https://ift.tt/F6SPXoJ

Show HN: Knowledgework – AI Extensions of Your Coworkers Hey HN! We’re building Knowledgework.ai, which creates AI clones of your coworkers that actually know what they know. It's like having a version of each teammate that never sleeps, never judges you for asking "dumb" questions, and responds instantly. As a SWE at Amazon, I constantly faced two frustrations: 1. Getting interrupted on Slack all day with questions I'd already answered 2. Waiting hours (or days) for responses when I needed information from teammates When you compare this to the UX of an AI chatbot, humans start to look pretty inconvenient! It’s a bit of a wild take, but it’s really been reflected in my conversations with dozens of engineers, and especially juniors: people would rather spend 20 minutes wrestling with an unreliable AI than risk looking ignorant or wasting their coworkers’ time. One of my early users actually tried the product and told me she’s a bit worried her coworkers would prefer talking to her AI extension over talking to her! Here’s how it works: It’s a desktop app (mac only right now) that captures screenshots every 5 seconds while you work. It uses a bespoke, ultra-long context vision model (OCR isn’t enough, and generic models are far too expensive!) to understand what you're doing and automatically builds a searchable, hyperlinked knowledge base (wiki) of everything you work on - code you write, bugs you fix, decisions you make, or anything else you do on a computer that could be useful to you or your team’s productivity in the future. Even if you just turn on Knowledgework for ~30 mins while working on a personal project, I think you’ll find what it produces to be really interesting — something I’ve learned is that we tend to underestimate the extent of the valuable information we produce every day that is just ephemeral and forgotten. There’s also some really great opportunities surrounding quantified self and reflection — just ask it how you could have been more productive yesterday or how you could come across better in your meetings. The real value comes when your teammates can query your "Extension" - an AI agent that has access to all (only what you choose to share) of your captured work context. Imagine your coworker is on vacation, but you can still ask their Extension: "I'm trying to deploy a new Celery worker. It's gossiping but not receiving tasks. Have you seen this before?" We’ve spent a great deal of effort on optimizing for privacy as a priority; not just in terms of encryption and data security, but in terms of modulating what your Extension will divulge in a relationship appropriate way, and how you can configure this. By default, nothing is shared. In a team setting, you can choose to share your Extension with particular individuals. You can, in a fine-grained manner, grant and revoke access to portions of your time, or if you are on a tight-knit team, you can just leave it to AI to decide what makes sense to be accessed. This is the area we’re most excited to get feedback on, so we’re really aiming this launch at small, tight knit teams who care about speed and productivity at all costs who use Macs, Slack, Notion, and are all on Claude Code Max plans. We’re also working on SOC II type 2 compliance and can do on-prem, although on-prem will be quite expensive. If you’re curious about on-prem or additional certifications, I’d love to chat - griffin@knowledgework.ai. Check it out here: https://ift.tt/RQOltZ8 We’ve opened it up today for anyone to install and use for free. If you’re seeing this after Thursday 8/28, we’ll likely have put back the code wall — but we’d be happy to give codes to anyone who reaches out to griffin@knowledgework.ai https://ift.tt/RQOltZ8 August 29, 2025 at 12:11AM

Show HN: Persistent Mind Model (PMM) – Update: an model-agnostic "mind-layer" https://ift.tt/YEyz26K

Show HN: Persistent Mind Model (PMM) – Update: an model-agnostic "mind-layer" A few weeks ago I shared the Persistent Mind Model (PMM) — a Python framework for giving an AI assistant a durable identity and memory across sessions, devices, and even model back-ends. Since then, I’ve added some big updates: - DevTaskManager — PMM can now autonomously open, track, and close its own development tasks, with event-logged lifecycle (task_created, task_progress, task_closed). - BehaviorEngine hook — scans replies for artifacts (e.g. Done: lines, PR links, file references) and uto-generates evidence events; commitments now close with confidence thresholds instead of vibes. - Autonomy probes — new API endpoints (/autonomy/tasks, /autonomy/status) expose live metrics: open tasks, commitment close rates, reflection contract pass-rate, drift signals. - Slow-burn evolution — identity and personality traits evolve steadily through reflections and “drift,” rather than resetting each session. Why this matters: Most agent frameworks feel impressive for a single run but collapse without continuity. PMM is different: it keeps an append-only event chain (SQLite hash-chained), a JSON self-model, and evidence-gated commitments. That means it can persist identity and behavior across LLMs — swap OpenAI for a local Ollama model and the “mind” stays intact. In simple terms: PMM is an AI that remembers, stays consistent, and slowly develops a self-referential identity over time. Right now the evolution of it "identity" is slow, for stability and testing reasons, but it works. I’d love feedback on: What you’d want from an “AI mind-layer” like this. Whether the probes (metrics, pass-rate, evidence ratio) surface the right signals. How you’d imagine using something like this (personal assistant, embodied agent, research tool?). https://ift.tt/zchFrTO August 29, 2025 at 12:04AM

Wednesday, August 27, 2025

Show HN: Cross-device copy/paste and 5 MB file transfer (E2E, no signup) https://ift.tt/fRPgyNn

Show HN: Cross-device copy/paste and 5 MB file transfer (E2E, no signup) A browser-only way to copy/paste text and send small files between devices. • No accounts, join via code/QR • AES-256 E2E in the device • 5 MB file limit FAQ: https://ift.tt/bJ1Exfd https://ift.tt/9zKM5CE August 27, 2025 at 09:13PM

Tuesday, August 26, 2025

Show HN: Smooth – Faster, cheaper browser agent API https://ift.tt/RJHU2ZY

Show HN: Smooth – Faster, cheaper browser agent API Hey there HN! We're Antonio and Luca, and we're excited to introduce Smooth, a state-of-the-art browser agent that is 5x faster and 7x cheaper than Browser Use ( https://ift.tt/RzmvVfs ). We built Smooth because existing browser agents were slow, expensive, and unreliable. Even simple tasks could take minutes and cost dollars in API credits. We started as users of Browser Use, but the pain was obvious. So we built something better. Smooth is 5x faster, 7x cheaper, and more reliable. And along the way, we discovered two principles that make agents actually work. (1) Think like the LLM ( https://ift.tt/xj5I489 ). The most important thing is to put yourself in the shoes of the LLM. This is especially important when designing the context. How you present the problem to the LLM determines whether it succeeds or fails. Imagine playing chess with an LLM. You could represent the board in countless ways - image, markdown, JSON, etc. Which one you choose matters more than any other part of the system. Clean, intuitive context is everything. We call this LLM-Ex. (2) Let them write code ( https://ift.tt/UOVe1LA ) Tool calling is limited. If you want agents that can handle complex logic and manipulate objects reliably, you need code. Coding offers a richer, more composable action space. Suddenly, designing for the agent feels more like designing for a human developer, which makes everything simpler. By applying these two principles religiously, we realized you don't need huge models to get reliable results. Small, efficient models can get you higher reliability while also getting human-speed navigation and a huge cost reduction. How it works: 1. Extract: we look at the webpage and extract all relevant elements by looking at the rendered page. 2. Filter and Clean: then, we use some simple heuristics to clean up the webpage. If an element is not interactive, e.g. because a banner is covering it, we remove it. 3. Recursively separate sections: we use several heuristics to represent the webpage in a way that is both LLM-friendly and as similar as possible to how humans see it. We packaged Smooth in an easy API with instant browser spin-up, custom proxies, persistent sessions, and auto-CAPTCHA solvers. Our goal is to give you this infrastructure so that you can focus on what's important: building great apps for your users. Before we built this, Antonio was at Amazon, Luca was finishing a PhD at Oxford, and we've been obsessed with reliable AI agents for years. Now we know: if you want agents to work reliably, focus on the context. Try it for free at https://ift.tt/HBjTN3x Docs are here: https://ift.tt/DvjfBCY Demo video: https://youtu.be/18v65oORixQ We'd love feedback :) https://www.smooth.sh/ August 26, 2025 at 08:35PM

Show HN: Ubon – a solution for the "You're absolutely right" debugging dread https://ift.tt/nIziHo9

Show HN: Ubon – a solution for the "You're absolutely right" debugging dread I used Claude Code heavily while trying to launch an app while being quite sick and my mental focus was not at its best. So I relied 'too much' on Claude Code, and my Supabase keys slipped in a 'hidden' endpoint, causing some emails to be leaked. After some deep introspection, and thinking about the explosion of Lovable, Replit, Cursor, Claude Code vibe-coded apps, I thought about what's the newest newest and most dreadful pain points in the dev arena right now. And I came up with the scenario of debugging some non-obvious errors, where your AI of choice will reply "You're absolutely right! Let me fix that", but never nailing what's wrong in the codebase. So I built Ubon for the last week, listing thoroughly all the pain points I have experienced myself as a software engineer (mostly front-end) for 15 years. Ubon catches the stuff that slips past linters - hardcoded API keys, broken links, missing alt attributes, insecure cookies. The kind of issues that only blow up in production. And now I can use Ubon by adding it to my codebase ("npx ubon scan .", or simply telling Claude Code "install Ubon before commiting"), and it will give outputs that either a developer or an AI agent can read to pinpoint real issues, pinpointing the line and suggested fix. It's open-source, free to use, MIT licensed, and I won't abandon it after 7 days, haha. My hope is that it can become part of the workflow for AI agents or as a complement to linters like ESlint. It makes me happy to share that after some deep testing, it works pretty well. I have tried with dozens of buggy codebases, and also simulated faulty repos generated by Cursor, Windsurf, Lovable, etc. to use Ubon on top of them, and the results are very good. Would love feedback on what other checks would be useful. And if there's enough demand, I am happy to give online demos to get traction of users to enjoy Ubon. https://ift.tt/bleFB57 August 26, 2025 at 10:57PM

Monday, August 25, 2025

Show HN: Stop saving your scans on 3rd party servers https://ift.tt/CAHS6Qi

Show HN: Stop saving your scans on 3rd party servers Hi HN, I built DocsOrb to solve a simple but stressful problem (and my own problem too since many years!): keeping track of important documents like passports, rental contracts, and insurance papers. Too often they're scattered across folders, emails, or piles at home... and you only realize it when you urgently need them. DocsOrb helps you: > Scan documents with auto-crop and enhancements (mobile camera or file upload) > Organize them around life's "moments" (travel, housing, insurance, etc.) > Search quickly using Key Information > AI extracts Key Information so the most important details are always at your fingertips > Export or share in one tap > AI Bulk organize: load up multiple images from your Photos to automatically organize them as documents, put them in the right folders, extract Key Information and also suggest a recommended name and description. Everything stays on your device by default, with optional cloud backup if you want it. Privacy-first, so you're always in control. Tech-wise: it's built with Nuxt + Capacitor, Supabase for structured storage, and a custom scanning flow (to avoid pricey SDK lock-ins). I'd love your feedback: > Does this flow make sense to you? > What's missing in how you manage important documents? > Any suggestions before I go full blast on Marketing? https://docsorb.com/ August 26, 2025 at 06:06AM

Show HN: I built an AI trip planner https://ift.tt/hnqNSDj

Show HN: I built an AI trip planner https://milotrips.com August 26, 2025 at 02:39AM

Show HN: RAG-Guard: Zero-Trust Document AI https://ift.tt/cQVmwdM

Show HN: RAG-Guard: Zero-Trust Document AI Hey HN, I wanted to share something I’ve been working on: *RAG-Guard*, a document AI that’s all about privacy. It’s an experiment in combining Retrieval-Augmented Generation (RAG) with AI-powered question answering, but with a twist — your data stays yours . Here’s the idea: you can upload contracts, research papers, personal notes, or any other documents, and RAG-Guard processes everything locally in your browser. Nothing leaves your device unless you explicitly approve it. ### How It Works - * Zero-Trust by Design*: Every step happens in your browser until you say otherwise. - * Local Document Processing*: Files are parsed entirely on your device. - * Local Embeddings*: We use [all-MiniLM-L6-v2]( https://ift.tt/tN6WRkJ... ) via Transformers.js to generate embeddings right in your browser. - * Secure Storage*: Documents and embeddings are stored in your browser’s encrypted IndexedDB. - * Client-Side Search*: Vector similarity search happens locally, so you can find relevant chunks without sending anything to a server. - * Manual Approval*: Before anything is sent to an AI model, you get to review and approve the exact chunks of text. - * AI Calls*: Only the text you approve is sent to the language model (e.g., Ollama). No tracking. No analytics. No “training on your data.” ### Why I Built This I’ve been fascinated by the potential of RAG and AI-powered question answering, but I’ve always been uneasy about the privacy trade-offs. Most tools out there require you to upload sensitive documents to the cloud, where you lose control over what happens to your data. With RAG-Guard, I wanted to see if it was possible to build something useful without compromising privacy. The goal was to create a tool that respects your data and puts you in control. ### Who It’s For If you’re someone who works with sensitive documents — contracts, research, personal notes — and you want the power of AI without the risk of unauthorized access or misuse, this might be for you. ### What’s Next This is still an experiment, and I’d love to hear your thoughts. Is this something you’d use? What features would make it better? You can check it out here: [ https://mrorigo.github.io/rag-guard/ ] Looking forward to your feedback! https://ift.tt/D6mE35B August 26, 2025 at 03:12AM

Show HN: I built an image-based logical Sudoku Solver https://ift.tt/sna0DuP

Show HN: I built an image-based logical Sudoku Solver https://ift.tt/GnfUjlR August 26, 2025 at 12:09AM

Sunday, August 24, 2025

Show HN: I Built a XSLT Blog Framework https://ift.tt/B3eIad7

Show HN: I Built a XSLT Blog Framework A few weeks ago a friend sent me grug-brain XSLT (1) which inspired me to redo my personal blog in XSLT. Rather than just build my own blog on it, I wrote it up for others to use and I've published it on GitHub https://ift.tt/OcH1Kuf (2) Since others have XSLT on the mind, now seems just as good of a time as any to share it with the world. Evidlo@ did a fine job explaining the "how" xslt works (3) The short version on how to publish using this framework is: 1. Create a new post in HTML wrapped in the XML headers and footers the framework expects. 2. Tag the post so that its unique and the framework can find it on build 3. Add the post to the posts.xml file And that's it. No build system to update menus, no RSS file to update (posts.xml is the rss file). As a reusable framework, there are likely bugs lurking in CSS, but otherwise I'm finding it perfectly usable for my needs. Finally, it'd be a shame if XSLT is removed from the HTML spec (4), I've found it quite eloquent in its simplicity. (1) https://ift.tt/s46JEyU (2) https://ift.tt/OcH1Kuf (3) https://ift.tt/j4CAK30 (4) https://ift.tt/1y3QWm6 (Aside - First time caller long time listener to hn, thanks!) https://ift.tt/R7U5G8c August 24, 2025 at 11:08PM

Show HN: Komposer, AI image editor where the LLM writes the prompts https://ift.tt/gZOkMXH

Show HN: Komposer, AI image editor where the LLM writes the prompts A Flux Kontext + Mistral experiment. Upload an image, and let the AIs do the rest of the work. https://www.komposer.xyz/ August 25, 2025 at 12:36AM

Saturday, August 23, 2025

Show HN: LoadGQL – a CLI for load-testing GraphQL endpoints https://ift.tt/QMPet6l

Show HN: LoadGQL – a CLI for load-testing GraphQL endpoints Hi HN I’ve been working with GraphQL for a while and always felt the tooling around load testing was lacking. Most tools either don’t support GraphQL natively, or they require heavy setup/config. So I built *LoadGQL* — a single-binary CLI (written in Go) that lets you quickly stress-test a GraphQL endpoint. *What it does today (v1.0.0):* - Run queries against any GraphQL endpoint (no schema parsing required) - Reports median & p95 latency, throughput (RPS), and error rate - Supports concurrency, duration, and custom headers - Minimal and terminal-first by design *Roadmap:* p50/p99 latency, output formats (JSON/CSV), multiple query files. Landing page: [ https://ift.tt/CZ1uPTi ]( https://ift.tt/CZ1uPTi ) I’d love feedback from the HN community: - What metrics matter most to you for GraphQL performance? - Any sharp edges you’d expect in a GraphQL load tester? Thanks for checking it out! https://ift.tt/O5Edpg8 August 24, 2025 at 07:00AM

Show HN: I built aibanner.co to stop spending hours on marketing banners https://ift.tt/LfD0WUP

Show HN: I built aibanner.co to stop spending hours on marketing banners https://www.aibanner.co August 24, 2025 at 05:57AM

Show HN: Python library for fetching/storing/streaming crypto market data https://ift.tt/zcZX52K

Show HN: Python library for fetching/storing/streaming crypto market data https://ift.tt/cEemxVI August 23, 2025 at 09:51PM

Friday, August 22, 2025

Show HN: My First Game Made with My Homemade Engine https://ift.tt/SexoW3h

Show HN: My First Game Made with My Homemade Engine https://reprobate.site/ August 23, 2025 at 03:03AM

Show HN: JavaScript-free (X)HTML Includes https://ift.tt/ORfc12Z

Show HN: AICF – a tiny "what changed" feed for AI/RAG (v0.1 minimal core) https://ift.tt/Qihw7g8

Show HN: AICF – a tiny "what changed" feed for AI/RAG (v0.1 minimal core) I’m proposing AICF (AI Changefeed) — a minimal, web-native way for sites to expose append-only change events. Instead of crawlers or RAG systems re-embedding everything, they can refresh only the sections that changed. Discovery: a /.well-known/ai-changefeed JSON points to a feed. Feed: an append-only NDJSON file with just 4 required fields (id, action, url, time) plus optional hints (anchor, checksum, note). Goal: cut wasted crawling/embedding while keeping docs/pricing/policy pages fresh for AI/agents. Spec & examples here: https://ift.tt/p7L3fxG Would love feedback: is the minimal core (anchors only, no chunks/vectors/push yet) the right starting point? Would you use this in your docs/RAG stack? https://ift.tt/p7L3fxG August 23, 2025 at 01:46AM

Show HN: CopyMagic – The smartest clipboard manager for macOS https://ift.tt/ky6upd4

Show HN: CopyMagic – The smartest clipboard manager for macOS It’s been one month since I launched CopyMagic, a smarter clipboard manager for macOS that makes sure you never lose anything you copy. Instead of digging through endless items, you can type things like “URL from Slack”, “flight information”, or “crypto rate” and it instantly finds what you meant. It’s all completely offline and privacy-first (we don’t even track analytics). https://copymagic.app August 23, 2025 at 12:58AM

Thursday, August 21, 2025

Show HN: Playing Piano with Prime Numbers https://ift.tt/57qGj3T

Show HN: Playing Piano with Prime Numbers I decided to turn prime numbers into a mini piano and see what kind of music they could make. Inspired by: https://ift.tt/bDqy1jw Github: https://ift.tt/ATIOiSq https://ift.tt/uYbnzZU August 18, 2025 at 08:44PM

Show HN: PHP-fts – Full-text search engine in pure PHP, no extensions https://ift.tt/wgSBiJP

Show HN: PHP-fts – Full-text search engine in pure PHP, no extensions https://ift.tt/WpBoNzV May 7, 2026 at 01:58AM