edge-lm cuts Gemma 4 to 1.4GB, PMB adds 94.5% recall, Claude Code MCP grows

Sunday, 7 June 2026

Edge-lm compresses Gemma 4 models by 7x to 1.4GB while preserving instruction following and tool use for edge deployment. PMB offers local-first persistent memory for AI coding agents with 94.5% recall and 70ms p50 latency across Claude Code, Cursor, and Codex.

🎭Multi-model finance drama emerges

huggingface·2 min read

Hackathon project runs emergent economy simulation where each agent uses different lab's small model, creating heterogeneous market behaviour.

Takeaway

Interesting proof-of-concept for agent diversity. Using different models per agent creates more realistic emergent behaviour than single-model approaches.

MultimodalResearch

🤔AI cautionist seeks sanity checks

r/ExperiencedDevs

ExperiencedDevs discussion on using AI as basic validation even when cautious about over-reliance. on finding the balance.

DivisiveAI WorkflowsDev Tools

📱edge-lm cuts Gemma 4 to 1.4GB

GitHub

TheStageAI compresses Gemma 4 models by 7x to 1.4GB while preserving instruction following, tool use, and world knowledge for on-device deployment.

Bigger Picture

Compression Reality Check

The 7x compression claim for Gemma 4 comes with specific caveats: quality preserved on three key metrics but may degrade elsewhere. Still, 1.4GB fits mobile constraints.

Top Voted

PythonGeminiEdge

🗄️Lance format boosts multimodal AI

GitHub

Open lakehouse format for multimodal AI offering 100x faster random access than Parquet, with vector indexing and data versioning built in.

Trending

RustDatabaseVector DB

🔧SkillOpt optimises prompts locally

GitHub

Text-space optimiser that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits and validation loops.

Deep Dive

PythonAgentsLocal AI

Yesterday's Sentiment/Energised

Edge AI Gets Production-Ready

edge-lm delivers the compression breakthrough mobile AI needed, while PMB solves persistent memory with impressive benchmarks. MCP adoption accelerates.

🔍Hindsight Architecture tackles

r/LangChain

Hackathon approach to long-term AI agent memory using hindsight-based retrieval patterns. Discussed in LangChain community.

LangChainAgentsAI Workflows

🔌bricks-mcp-open covers 105 tools

GitHub

Comprehensive MCP server for Bricks Builder with 100+ tools for pages, templates, styles, SEO, and content management via Claude Code and Cursor.

Trending

PHPMCPClaude

Learn/Multiple Mentions

What is MCP in AI development today?

MCP (Model Context Protocol) is Anthropic's standard for connecting AI models to external tools and data sources. Multiple projects like PMB, bricks-mcp-open, and totem show it's becoming the go-to way to give Claude and other models access to databases, APIs, and services.

ClaudeIntegration

🧠Memory-first beats model-first

Dev.to·2 min read

Backboard's coding harness beats frontier models using open ones by prioritising memory and routing over raw model capability. R-CLI in open beta.

Takeaway

Build persistence into our AI workflows instead of chasing the latest model. 92% Terminal Bench with cheaper models suggests memory beats raw intelligence.

Under The RadarAI WorkflowsCost OptimisationTool Comparison

🖥️DS4 Control adds macOS menu bar

GitHub

macOS frontend for DeepSeek V4 with model selection, resource monitoring, and local agentic coding integration for Claude Code and Pi.

Trending

SwiftDeepSeek R1Local AI

📝Context loss handling strategies

r/cursor

Cursor community discusses practical approaches to maintaining context between AI coding sessions and avoiding repetitive explanations.

CursorAI Workflows

🧠PMB adds 94.5% recall memory

GitHub

Local-first persistent memory for AI coding agents with 94.5% LoCoMo recall and 70ms p50 latency. Works across Claude Code, Cursor, and Codex via MCP.

Deep Dive

PythonMCPClaude

🔄Incremental context acquisition

r/LLMDevs

Alternative to repository indexing: let AI acquire context incrementally rather than front-loading everything into embeddings.

AI WorkflowsLLM Ops

⚙️Continuum ships agent runtime

GitHub

Production-grade Python framework for autonomous AI agents with multi-LLM routing, persistent memory, MCP tools, and durable workflows.

Trending

PythonAgentsMCP

⌚totem bridges Whoop data to MCP

GitHub

48 MCP tools giving Claude full read/write access to Whoop fitness data via the private iOS API. Recovery, sleep, strain, and HRV tracking.

Trending

TypeScriptMCPClaude

🤖Claude Code Action gains features

GitHub

Anthropic's GitHub Action for Claude Code gains intelligent mode detection, structured outputs, and progress tracking. Growing traction with devs.

Trending

TypeScriptAnthropicClaude

📚ML Specialisation solutions

GitHub

Complete solutions and notes for Andrew Ng's Machine Learning Specialisation on Coursera. and gaining momentum.

TrendingResearch

Learn/Core Concept

How does model compression actually work?

Model compression reduces AI model size through techniques like quantisation, pruning, and distillation without destroying performance. edge-lm demonstrates this by shrinking Gemma 4 from 10GB+ to just 1.4GB while preserving instruction following and tool use capabilities.

QuantisationDistillation

Read online