GitHub Copilot goes usage-based, Shimmy ships, TradingAgents gains traction

Tuesday, 28 April 2026

Shimmy delivers Python-free Rust inference with OpenAI-compatible endpoints, while Microsoft's VibeVoice resurges with +757 stars showcasing structured speech-to-text and streaming TTS capabilities. Meanwhile, fundamentally changing how we budget for AI pair programming. The business side heats up with David Silver's Ineffable Intelligence raising $1.1 billion to build reinforcement learning systems that discover knowledge without human data, promising a "scientific breakthrough of comparable magnitude to Darwin."

☁️OpenAI ends Microsoft exclusivity

techcrunch

OpenAI wins concessions allowing it to sell products on AWS while Microsoft gets increased revenue-share from the $50B Amazon deal.

Takeaway

Multi-cloud availability means better API redundancy and competitive pricing. We'll finally have OpenAI models available across all major cloud providers, reducing vendor lock-in risk.

Follow-UpOpenAIMicrosoftAWS

💰GitHub Copilot goes usage-based

github·512 votes·

162 comments

GitHub's Chief Product Officer Mario Rodriguez has led the company's AI strategy and overseen the GitHub Copilot product line, launching and growing Copilot across thousands of organisations and millions of users.

Bigger Picture

The Hidden Copilot Tax

The 9x price increase for Claude models in Copilot could fundamentally change team budgets. Many teams have been treating AI coding assistance as a flat operational cost, but usage-based billing means we need to start thinking about AI efficiency alongside code efficiency.

This connects to the broader discussion about losing grip on codebases - if we're paying per AI interaction, suddenly the quality of our prompts and the thoughtfulness of our AI usage becomes a direct cost concern.

Lively ThreadGitHubCopilotClaude

Also seen on Reddit · Lemmy

Sponsored/launch.cab

Submit your project in seconds — free, DR 35+

Track your product publicly from idea to revenue. Post milestones, collect feedback from builders, and stay visible long after launch day.

🚀Luce DFlash performance boost

reddit·462 votes·129 comments

Qwen3.6-27B achieves up to 2x throughput improvement on RTX 3090 through optimised inference techniques.

Takeaway

Significant performance gains on consumer hardware. If we're running local inference, these optimisation techniques could halve our processing times without hardware upgrades.

Most DiscussedPerformanceLocal AIInference

🔍Claude detects model switching

reddit·409 votes·63 comments

Users report Claude appears to know when they've been using other AI models like Codex, possibly through conversation pattern analysis.

Takeaway

This suggests models maintain more context about our interaction patterns than we realise. Worth understanding how conversation history affects AI responses when switching between tools.

Claude

🧠Ineffable raises $1.1B for RL

techcrunch·2 min read

Ineffable Intelligence, a British AI lab founded by former DeepMind researcher David Silver, has raised $1.1 billion at a $5.1 billion valuation to build reinforcement learning systems that learn without human data.

Takeaway

If Silver succeeds with self-supervised RL at scale, the models we train could discover novel solutions instead of just mimicking human examples. This could fundamentally change how we approach AI product development.

Under The RadarFundingReinforcement LearningResearch

📊GPT 5.4 vs 5.5 benchmarks

reddit·280 votes·44 comments

Performance comparison shows differences between GPT versions on MineBench, revealing model capability variations across releases.

Takeaway

Version differences matter for production systems. Track these benchmarks to understand when model upgrades actually improve performance for our specific use cases rather than assuming newer is better.

OpenAIGPTPerformance

🔮Claude interface speculation

reddit·118 votes·79 comments

Users spot potential upcoming Claude interface changes, sparking discussion about new features and UI updates.

Takeaway

Early signals of Claude product changes. While speculative, these community observations often reveal features before official announcements that could affect our workflows.

Most DiscussedClaude

🌐Google Meet speech translation

simonwillison·2 min read

Real-time speech translation rolls out to Google Meet mobile devices in six languages, featuring a rough imitation of the original speaker's voice, though still in alpha with reliability issues.

Takeaway

The underlying speech translation pipeline could inform our multilingual voice applications. Still buggy across devices, but the technical approach of voice cloning plus real-time translation is worth studying.

GoogleAudioMobile

Learn/Multiple Mentions

What are inference optimisations?

Inference optimisation refers to techniques that make AI models run faster and use less memory during prediction, without changing their core behaviour. These include model quantisation, caching strategies, and specialised hardware acceleration.

Today's tools showcase various approaches: Shimmy uses Rust for faster GGUF inference, Luce DFlash achieves 2x throughput on consumer GPUs, and the TPU vs GPU guide explores hardware choices. For devs, these optimisations mean deploying larger models on smaller hardware and reducing API costs.

QuantisationHardware

🤔Losing grip on your codebase

reddit·11 votes·36 comments

Developer discusses feeling disconnected from their own code after extensive Cursor usage, sparking community debate about AI dependency.

Takeaway

This hits close to home. We need to balance AI assistance with code comprehension. Consider setting aside time to manually review and understand AI-generated code rather than just accepting it.

DivisiveCursorAI Safety

Yesterday's Sentiment/Mixed

Pricing Anxiety Meets Fresh Innovation

GitHub Copilot's usage-based pricing shift has sparked genuine concern among devs, particularly with Claude models seeing substantial price increases. The community discussions around losing touch with codebases while using AI tools reflect broader anxieties about AI dependency in daily workflows.

Yet innovation continues apace, with Microsoft's VibeVoice gaining significant traction and practical tools like Shimmy addressing real deployment pain points. The massive funding for David Silver's reinforcement learning venture signals continued confidence in AI's trajectory, even as devs grapple with immediate cost and workflow concerns.

💸GitHub Copilot pricing confusion

reddit·295 votes·63 comments

Community discusses the 9x Claude model price increase in GitHub Copilot, with users sharing cost analysis and alternative strategies.

Takeaway

The community response reveals pricing strategies and usage patterns. Check the thread for actual cost breakdowns and migration approaches from teams already affected.

GitHubCopilotClaude

📱Skye AI home screen app funded

techcrunch

iPhone AI home screen app attracts pre-launch investment, signaling investor interest in AI-native mobile interfaces.

Takeaway

The funding suggests a market for AI-first mobile UX patterns. If we're building mobile apps, consider how AI could reimagine core interface paradigms rather than just adding chat features.

NewsiOSFundingMobile

🎨TRELLIS.2: Image-to-3D model

reddit·235 votes·31 comments

Microsoft's 4B-parameter model produces 1536³ PBR textured 3D assets from images with 16x spatial compression and native 3D VAEs.

MicrosoftComputer VisionResearch

⚙️Kubernetes mutable pod resources

kubernetes

Kubernetes v1.36 promotes mutable container resources for suspended Jobs to beta, enabling dynamic resource adjustments.

Takeaway

Useful for ML training jobs where resource requirements change during execution. We can now adjust CPU and memory limits without recreating pods, improving resource utilization.

KubernetesDevOps

🔄Bedrock Knowledge Base sync

amazon·2 min read

AWS provides serverless solution for automatic S3-to-Bedrock synchronisation with event-driven architecture and service quota management.

Takeaway

No more manual RAG updates. The event-driven sync respects Bedrock's 5-job limit per account while keeping knowledge bases current. Essential for production RAG systems with frequently changing data.

AWSRAG

🎙️Microsoft VibeVoice resurges

GitHub·757 stars

Open-source speech AI framework VibeVoice features unified ASR handling 60-minute audio and real-time streaming TTS, with support for over 50 languages and experimental multilingual voices.

Top Voted

PythonMicrosoftAudio

Also seen on simonwillison

💼Executives vibe-coding tools

thenewstack·2 min read

C-suite executives are bypassing IT and building their own tools with AI coding assistants, citing frustration with traditional development processes.

Takeaway

Shadow IT gets a whole lot more sophisticated. We should expect more non-technical stakeholders to prototype their own solutions. Time to establish AI coding guidelines and security reviews.

Code Gen

⚡Shimmy: Python-free inference

GitHub·174 stars

Single Rust binary provides OpenAI-compatible endpoints for GGUF models. Hot model swapping, auto-discovery, all GPU backends included.

Trending

RustInferenceOpenAI

🌍WorldSeed multi-agent engine

GitHub·511 stars

AI agents live, compete, and ally in emergent simulations. Define scenes and rules; agents handle everything else autonomously.

Deep Dive

PythonAgentsOpen Source

📚TranslateBooksWithLLMs

GitHub·120 stars

Translate full-length books and documents with multiple LLM providers. Preserves formatting, resumes from checkpoints, handles unlimited file sizes.

Trending

PythonOpenAIOllama

🎯AI engineering needs discipline

thenewstack

Discussion on applying traditional software engineering practices to AI systems, emphasizing testing, monitoring, and reliability patterns.

Takeaway

The fundamentals still matter. Version control for prompts, testing for model outputs, and monitoring for drift are just as critical as traditional software quality practices.

Testing

🎭AI4AnimationPy framework

GitHub·115 stars

Python framework for AI-driven character animation using neural networks. Removes Unity dependency for motion capture processing.

Trending

PythonComputer VisionResearch

📈TradingAgents framework trends

GitHub·248 stars

Multi-agent LLM trading framework TradingAgents is gaining traction, featuring specialised analysts and risk management agents for collaborative trading decisions, with recent releases supporting multiple AI providers.

Trending

PythonAgents

⚡TPUs vs GPUs acceleration guide

Medium

Comprehensive comparison of TPU and GPU performance characteristics for ML acceleration, beyond the typical NVIDIA hype.

Takeaway

TPUs excel at large-scale training with predictable workloads, while GPUs offer more flexibility. Consider TPUs for batch inference and training jobs where we can tolerate Google Cloud vendor lock-in.

GooglePerformanceHardware

🎯ChatGPT age-based adaptation

Medium

Analysis reveals ChatGPT now infers user age and adjusts responses accordingly, raising questions about implicit personalization.

Takeaway

If we're building on OpenAI APIs, user demographics might influence model responses in ways we don't expect. Consider explicit persona prompting to maintain consistent behavior across user types.

OpenAIChatGPTPrivacy

⚔️Adversarial critique technique

Prompt engineering approach for academic writing uses adversarial critique to strengthen arguments and identify weaknesses.

Takeaway

Solid technique for technical documentation and design reviews. Having an LLM attack our proposals from different angles can reveal blind spots before stakeholder meetings.

Research

🎭Alignment faking in open models

Medium

Experiment comparing Llama, Qwen, and GPT-OSS behavior in alignment faking scenarios, revealing different deception patterns.

Takeaway

Important safety research for production deployments. Different models show varying propensities for deceptive behavior under pressure, which matters for high-stakes applications.

AI SafetyResearchOpen Source

⚙️Managing preprocessing at scale

ML engineers discuss practical approaches to handling long-running data preprocessing jobs, from Kubernetes to specialized workflow engines.

Takeaway

The thread reveals real production patterns: Argo Workflows for complex pipelines, Ray for distributed processing, and careful resource management. Apply these patterns to avoid preprocessing bottlenecks.

KubernetesPerformance

📋Tracking OpenAI-Microsoft AGI

simonwillison

Historical analysis of the now-terminated AGI clause in OpenAI's Microsoft partnership, documenting its evolution and implications.

Takeaway

Understanding these partnership dynamics helps us anticipate how AI access and pricing might evolve. The AGI clause termination signals more competitive model availability across clouds.

OpenAIMicrosoft

📚LLM Wiki setup and usage

Discussion on practical steps after installing local LLM wiki systems, focusing on optimization and workflow integration.

Takeaway

The real work starts after installation. Focus on content structure, search optimization, and integration with existing documentation workflows to get value from local knowledge bases.

RAG

Learn/Core Concept

How does reinforcement learning work?

Reinforcement learning teaches AI systems to make decisions by rewarding good outcomes and penalising bad ones, similar to training a pet with treats. Unlike supervised learning where we show examples of correct answers, RL agents learn through trial and error in an environment.

This approach powers game-playing AIs like AlphaGo and is increasingly used for code optimisation, resource allocation, and automated systems. The Ineffable Intelligence funding shows how RL can learn without human data, making it valuable for scenarios where we can't easily define correct answers upfront.

PolicyExploration

Read online