Releases, announcements, and updates.
## VoiceVibeCode v0.11.0 **System Requirements:** macOS 15.0+ (Sequoia), Apple Silicon **Download:** `VoiceVibeCode-0.11.0-arm64.dmg` — signed + notarized ### What's New - **Local Speaker Verification (Voiceprint)**: Enroll your voice once from the new **Voiceprint** settings tab. When enabled, VoiceVibeCode only transcribes audio that matches your voiceprint — nearby coworkers or background voices are ignored. All inference runs locally via FluidAudio/CoreML; no audio leaves your Mac. - **Voiceprint-aware tmux auto-Return**: In tmux mode, after a pause-based segment is inserted, the 3-second auto-Return timer is now only cancelled when the detected voice activity is verified as **your** voice. A colleague speaking nearby will no longer prevent the automatic Return. - **Voiceprint settings**: New settings page with enable toggle, similarity threshold slider, 5-second enrollment recording, re-enroll/clear controls, and a live similarity test tool. ### Improvements - ASR segments that fail voiceprint verification are discarded before Whisper/Qwen transcription, saving local inference time. - Voiceprint model is downloaded on first use from HuggingFace and cached locally under `~/.cache/fluidaudio/Models/`. ### Bug Fixes - Fixed an issue where any audio activity (including background voices) would cancel the tmux 3-second auto-Return timer. ### Notes - Voiceprint is disabled by default. Open Settings → Voiceprint to enroll. - If you change microphones or your voice changes significantly, re-enroll your voiceprint for best accuracy. - This build is signed with Developer ID Application: Kehong Liu (5MFYYSM9G3) and notarized by Apple.
View on GitHub →## VoiceVibeCode v0.10.13 **System Requirements:** macOS 15.0+ (Sequoia), Apple Silicon **Download:** `VoiceVibeCode-0.10.13-arm64.dmg` — signed + notarized ### What's New - **Smarter tmux auto-Return**: When VoiceVibeCode detects a tmux environment and a segment is inserted because of a natural pause, it now waits for **3 seconds of silence after the text appears** before pressing Return. If you keep speaking, the timer is cancelled and no Return is sent, so you can chain commands without accidental submissions. - **Chinese "回车" trailing command**: Saying "回车" at the end of an utterance now immediately sends a Return, just like "Enter" / "开干吧" / "请回答". - **Microphone Guard**: Added an Audio settings tab where you can define a prioritized list of preferred microphones. VoiceVibeCode will follow your priority order when multiple input devices are connected, without changing the system default microphone. - **Smoother microphone HUD**: Fixed the real-time microphone level glow from jittering during recording. - **More reliable clipboard paste**: Serialized pasteboard save/set/paste/restore operations to prevent clipboard state cross-contamination. - **Better quiet-speech handling**: Lowered silence/RMS thresholds so soft-spoken trailing words are no longer dropped. - **Qwen3-ASR offline models**: Local cached Qwen3-ASR models now load correctly with `offlineMode: true`, removing the previous network dependency. ### Bug Fixes - Fixed a Swift 6 concurrency warning in the new tmux auto-Return timer. ### Notes - This build is signed with Developer ID Application: Kehong Liu (5MFYYSM9G3) and notarized by Apple.
View on GitHub →## VoiceVibeCode v0.10.12 ### Fix - Fixed tmux environment detection failing when tmux was installed via Homebrew - detectTmuxEnvironment() now uses the app's own tmux path lookup (tmuxBinPath) - Auto-Return after pause-based segmentation now works reliably in tmux workflows ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.11 ### New Feature - Auto-press Return after pause-based segmentation when the active target is inside tmux - When you stop speaking for the configured pause duration (default 3s) in a tmux/terminal pane, the transcribed segment is inserted and Return is automatically sent - Useful for submitting commands to Claude Code / Codex via voice without saying 'Enter' ### Notes - Only triggers for pause-based segmentation; Space key, mute, and max-window segmentation still insert text without auto-Return - The pause duration is controlled by Settings → Advanced → Pause Segmentation Duration ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## NovaMLX v1.0.8 **System Requirements:** macOS 15.0+ (Sequoia), Apple Silicon **Download:** `NovaMLX-1.0.8-arm64.dmg` — signed + notarized SHA-256: `c870a363deff793e8b8ca07c8b7858ed4b6a295e1569931878c054d4bc183fe3`
View on GitHub →## VoiceVibeCode v0.10.10 ### Fix - Recognize Chinese voice command '回车' / '回车。' as a trailing Enter command - Works alongside the existing English 'Enter' command - Examples: 'submit, 回车', '提交,回车', '回车。' ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.9 ### Important Fix - Local Qwen3-ASR models no longer try to access the network when weights are already cached - Forces offline-mode loading when local weights exist, preventing TLS/network errors from blocking startup - Downloading a new model still uses the network - Once cached, models work fully offline (e.g. on a plane) ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.7 ### UI Improvements - Microphone glow is more responsive: animation removal is now done once on first real audio instead of on every 64 Hz callback - Raised glow gain (2.0 → 2.4) so quiet speech is more visible - Added a 12 pt right inset to floating status text for balanced spacing ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.6 ### UI Adjustment - Added a 28 pt left inset to floating status loading text so it no longer crowds the spinner icon ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.5 ### Fix - Fixed occasional rapid microphone-glow jitter while preserving real-time volume feedback - Removed the overly aggressive audio-level smoothing from previous attempts - Disable Core Animation implicit actions on glow-layer updates to prevent overlapping animations at 64 Hz ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.4 ### Fix - Fixed microphone glow becoming unresponsive to volume changes after smoothing - Switched to attack/release audio-level smoothing: rises quickly when volume increases, falls slowly when it drops - Keeps real-time volume feedback while suppressing noise-floor jitter ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.3 ### Fix - Fixed rapid microphone-glow self-oscillation jitter - Applied exponential moving average smoothing to real-time audio level - Reset smoothing value when entering recording state ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.2 ### Fix - Fixed unrelated clipboard content being pasted along with speech output - Serialized all clipboard save/set/paste/restore operations - Restored original clipboard synchronously after paste instead of after a 0.5 s async delay - Added clipboard-state verification logs ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.1 ### Fix - Fixed trailing words or phrases being dropped when speaking quietly - Lowered stop-recording RMS threshold (0.01 → 0.005) - Lowered pause-segmentation silence threshold (0.02 → 0.01) - Lowered mid-segment RMS threshold (0.015 → 0.008) ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.10.0 ### New Feature: In-App Microphone Priority - Added a new **Audio** settings tab - Add and reorder your preferred microphones in a drag-to-sort list - The app automatically picks the first currently available microphone from the list - Works across device plug/unplug and location changes (home to office) - Does **not** change the macOS system default microphone; only affects VoiceVibeCode's own recording ### Changes - Menu-bar Microphone submenu now shows the microphone currently used by the app - System-tray menu has a direct "Open Audio Settings" item ### Fixes - Device list did not refresh automatically when microphones were plugged or unplugged - Priority list was unexpectedly cleared after restart ### System Requirements macOS 15.0+ (Sequoia), Apple Silicon
View on GitHub →## VoiceVibeCode v0.9.3 ### New - **Proactive Recognition (prefetch)** — starts ASR at 0.5s of silence and reuses the result if you keep quiet past the 1s segment trigger. Lower perceived latency. Toggle in the menu bar (off by default). ### Changed - Removed "Playback Last Recording" menu item (declutter). ### Fixed - Empty-string transcription outputs no longer leak through after segmentation. - Localization polish for 9 languages (en/zh-Hans/zh-Hant/ja/ko/fr/de/es/ru). --- **System Requirements:** macOS 15.0+ (Sequoia), Apple Silicon **Download:** `VoiceVibeCode-0.9.3-arm64.dmg` — signed + notarized
View on GitHub →## VoiceVibeCode v0.9.1 **System Requirements:** macOS 15.0+ (Sequoia), Apple Silicon **Download:** `VoiceVibeCode-0.9.1-arm64.dmg` — signed + notarized
View on GitHub →## What's New in v0.9.0 ### Volcengine (火山引擎) Cloud ASR - **New ASR engine**: Volcengine 火山引擎 now available as a cloud-based speech recognition option - **Flash API**: Ultra-fast single-request transcription via `volc.bigasr.auc_turbo` - **Auto fallback**: Falls back to submit+query mode if Flash is unavailable - **No model download needed**: Cloud-based — enter your App Key + Access Key and start immediately - **Test Connection**: Verify your credentials before use ### No More Auto-Download - Whisper and Qwen3 models **no longer auto-download** when selected - **Explicit Download button** for each model in Settings - **Cancel button** on each active download — stop and clean up anytime - **Multiple simultaneous downloads** tracked in a unified list ### Bug Fixes - Error popups replaced with inline Settings messages (no more annoying dialogs) - Fixed `package.sh` symlink failure on repeated builds (`ln -sf`) - Fixed Qwen3-ASR duplicate `RecognitionError` enum conflict ### Settings UI Improvements - Unified model download rows for all engines (Whisper, Qwen3, Volcengine) - Progress bars with percentage + cancel button per download - Green check / red status indicators for model availability
View on GitHub →## What's New - **Two-stage voice command detection**: Regex always runs first, LLM as fallback for ASR drift. Both stages have full command coverage. - **Expanded tmux voice synonyms**: 任务/工作 as 窗口 equivalents, ordinal patterns (第N个/项), action prefixes (看/看看/去看), execution triggers (开始干吧, do it now/please) - **Auto-submit phrases**: 请你先理解一下, etc. — preserves original text + appends Enter - **Ctrl+Cmd+V paste-last**: Now works in terminal, browser, and Word. PTT mode pastes full recording (all segments joined), Toggle mode pastes last segment only. - **Fixed muted glow stuck at max brightness** after long mute periods ## Bug Fixes - Segment path now updates `lastRecognizedText` for paste-last support - Mute transition properly stops simulated audio timer and resets glow to baseline
View on GitHub →## Bug Fixes - **Fixed glow/audio freeze during long recordings** — the floating mic indicator would stop glowing and audio level tracking would freeze during extended recording sessions. Root cause: voice activity timestamps were not updated while a segment was being processed, causing false "long silence" triggers that cascaded into broken segmentation. Voice activity is now always tracked regardless of processing state. - **Fixed text loss after watchdog recovery** — when the audio engine died and was automatically recovered by the watchdog, any buffered audio was discarded during restart. The watchdog now saves accumulated audio before restarting the engine and processes it after recovery, so no spoken text is lost. - **Fixed potential hang on stop** — when stopping recording while a segment was still being processed (Whisper/LLM), the stop handler would wait indefinitely. Now has a 30-second safety timeout that force-resumes if processing gets stuck. ## New Features - **LLM-based voice command detection** — when LLM post-processing is enabled, voice commands (tmux, edit shortcuts, execution intent) are now detected by the LLM via structured JSON output instead of regex. This provides better accuracy, handles Whisper misrecognitions, and correctly distinguishes between executing a command and talking about it. Falls back to regex if LLM JSON parsing fails 3 times. - **tmux last-window command** — new voice command to switch to the previously active tmux window: "上一个窗口" / "返回窗口" / "last window" / "previous window" - **Anti-false-positive command detection** — the LLM now only recognizes commands when the user is requesting immediate execution, not when describing or mentioning the command in conversation. ## Languages - All 9 languages updated with new voice command descriptions
View on GitHub →## VoiceVibeCode v0.8.0 ### Voice-Controlled Line Editing - Say "行首/行尾" (or English "home/end") to send Ctrl+A/E — jump to start/end of line in terminal - Say "删到行首/删到行尾/全删" to send Ctrl+U/K/W — delete text using readline shortcuts - Say "粘贴/恢复" to send Ctrl+Y — yank/paste back deleted text - Handles Whisper homophone misrecognition (首/手, 行/航, 到/道) ### Tmux Voice Control - Say "tmux 1/2/3" or "去窗口一/二" to switch tmux windows - Say "去面板一/二" to switch tmux panes - Works with Chinese, English, and Japanese voice commands - Uses tmux CLI (not key simulation) for reliable switching ### Smart Execution Intent - Say "开干吧/开始吧/回车吧" at the end of your sentence → text is inserted + Enter is pressed - Say the trigger word alone → just presses Enter (no text) - Self-referential guard: if you're *talking about* the trigger phrase, it won't fire - Works in Chinese, English, Japanese, and Korean ### Settings Toggles - Three new toggles in Settings → General: Tmux Voice Control, Line Editing Shortcuts, Execution Intent - Each toggle shows the list of recognized voice commands in your UI language - All toggles default to ON ### UI Improvements - Floating mic window: AI processing animation layers behind the microphone icon - Menu bar: SF Symbols icons replace emoji for professional look - Option+M: mute/unmute without stopping recording (audio flushes before muting) ### Bug Fixes - Fix Chinese IME intercepting voice-to-text output (now uses clipboard paste) - Fix tmux commands failing on Apple Silicon (Homebrew path resolution) - Fix execution intent not detecting Chinese triggers when language is set to "auto" - Fix standalone trigger words not pressing Enter ### Full Changelog https://github.com/cnshsliu/VoiceVibeCode/compare/v0.7.0...v0.8.0
View on GitHub →- refactor: remove temporary language override, simplify Option+number to permanent language switch - chore: bump version to 0.7.0
View on GitHub →# VoiceVibeCode v0.5.0 We're excited to announce VoiceVibeCode v0.5.0 — a major update focused on the first-time user experience and visual identity. ## What's New ### Permission Guide (First Launch) New 3-step onboarding flow that walks you through granting **Accessibility**, **Input Monitoring**, and **Microphone** permissions. Each step auto-detects when you've granted the permission and advances automatically. No more guessing what's missing. ### App Icon VoiceVibeCode now has its own icon — a retro microphone on a blue-purple gradient with sound wave arcs. Looks great in the Dock and Finder. ### Settings Window in Dock The Settings window now appears in the Dock and CMD+TAB, making it easy to find your way back after switching apps. ## Bug Fixes - Permission guide no longer blocks macOS system security dialogs behind its own window - Permission guide window now closes correctly when clicking "Done" - Removed unnecessary file system path scanning that triggered extra macOS privacy prompts ## Download - `VoiceVibeCode-0.5.0-arm64.dmg` — Apple Silicon (M1/M2/M3/M4) - Requires macOS 15.0 (Sequoia) or later
View on GitHub →# VoiceVibeCode v0.4.0 ## What's New - **Stable Developer ID signed & notarized release** — First official production-ready build with proper Apple Developer ID signing and notarization. - Full hardened runtime + entitlements support. - Improved reliability for real-time voice transcription + LLM post-processing. - Better floating window and hotkey handling. - Multiple LLM provider support (Claude, custom endpoints, etc.). - Local WhisperKit model support with download-on-first-run experience. ## System Requirements - macOS 15.0 (Sequoia) or later - Apple Silicon (arm64) — Intel builds available on request ## Installation 1. Download `VoiceVibeCode-0.4.0-macOS-arm64.dmg` 2. Open the DMG and drag VoiceVibeCode to your Applications folder 3. Launch the app — it will download the base Whisper model on first run (~74MB) ## Notes This build has been signed with: `Developer ID Application: Kehong Liu (5MFYYSM9G3)` And has passed Apple's notarization process. ## Previous Work This release represents the culmination of extensive development across core features including: - Real-time local speech recognition (WhisperKit) - Intelligent LLM post-processing and correction - Seamless text insertion via simulated keystrokes - Customizable hotkeys and floating status window - Support for multiple languages and post-processing strategies For more details, see the project repository.
View on GitHub →## What's New in v0.3.0 ### Smart Context Engine - **Browser context**: Captures visible viewport text (including input fields) when typing in Safari/Chrome/Edge - **Word/Pages/TextEdit context**: Captures text around cursor position for document editing - **tmux context**: Uses `tmux capture-pane` for real-time terminal content (no longer reads JSONL files in tmux) - **Frontmost-app dispatch**: Context source is determined by the active application — no priority chains ### Hotkey Overhaul - **Unified tap + PTT**: Both styles work simultaneously — double-click Option (tap) or hold Option+Space (PTT) - **Language slots**: ⌥1/2/3/4 for temporary language, ⌃⌥1/2/3/4 for permanent switch - **No mode setting needed**: Auto-detected based on user action ### Clipboard Preservation - User clipboard content is automatically saved and restored after each paste (500ms delay) - Works correctly with segmented output — clipboard always returns to user's original content ### Pause Segmentation Improvements - Segment history carried into LLM post-processing for contextual awareness - RMS noise rejection: silent segments are discarded before Whisper (prevents hallucination garbage) - Enhanced hallucination detection: parenthesized noise words, repeated patterns ### Settings & Reliability - Resilient per-field JSON decoding — a single field type mismatch no longer nukes all settings - Corrupted config files are backed up before reset - LLM prompts now include confusable tech term pairs (Color↔Cluster, Server↔Service, etc.) ### Localization - All 9 languages updated for new hotkey descriptions
View on GitHub →UI: shorten LLM sidebar label to "Post-processing"
View on GitHub →## VoiceVibeCode v0.1.1 ### Bug Fixes - Fixed floating window LLM error text invisible (z-order fix) - Fixed DISPATCH_SYNC deadlock crash during text insertion - Fixed recording failures silently ignored - Fixed tmux detection spawning failing subprocesses repeatedly ### Performance - Replaced all UI-blocking `usleep()` calls - Added 5s timeout for subprocess runner - Extended clipboard restore timer for slow apps ### UI & Polish - New About tab with version, website, download links - Localized 10 hardcoded strings (Chinese/English) - Unified Settings tab styling and hotkey formatting - VoiceOver accessibility support - Removed browser JavaScript injection (clipboard paste is universal) **Requires:** macOS 15.0+ (Apple Silicon) **Website:** https://novamlx.ai/vvc/
View on GitHub →## What's New in v1.1.0 ### New Features - **Python SDK** (`sdk/python/`) — full client library with admin, streaming, tool calling examples and tests - **Tokenhub integration** — types + menu bar page - **Audio transcription** (`/v1/audio/transcriptions`) — Qwen3-ASR via Swift/MLX - **Image generation** (`/v1/images/generations`) — SDXL-Turbo via Swift/MLX - **Modelfile system** — user-authored model recipes with system prompt and sampling overrides - **Per-request `keep_alive`** — override model TTL per request - **Harmony streaming** — GPT-OSS channel-aware format - **`reasoning_effort` parameter** — OpenAI-standard thinking budget control - **Logprobs support** — `logprobs` and `top_logprobs` (OpenAI standard) - **Auto-load coordinator** — SSE keep-alive for cold model loads - **Nova capabilities** — `nova.capabilities` exposed on `/v1/models` ### Logging Overhaul - **Rotating log files** — keeps last 5 rotated copies instead of truncating - **Runtime log level** — `GET/PUT /admin/api/log-level` admin endpoint - **Spam reduction** — SSE, RunLoop, generate noise demoted to debug - **Module prefix convention** — all logs use `[Module]` format (Engine, SSE, Auth, etc.) - **AuthClient fix** — replaced per-call file I/O with `os.Logger` ### Infrastructure - **E2E model test suite** — `Scripts/test-all-models.sh` (load → 4 API tests → unload per model) - **Architecture doc** — comprehensive `architecture.md` with module deep dives, request lifecycle, diagnostic playbook - **Updated docs** — CHANGELOG, DEVELOPMENT.md, features.md, features.zh-CN.md with corrected ports and new sections - **`.gitignore`** — added build artifact patterns, vendors, .grok ### Bug Fixes - Tool message mapping preserves `tool_calls` and `tool_call_id` (OpenAI + Anthropic) - Streaming `prompt_tokens` plumbed through to usage stats - Benign macOS memory pressure warnings demoted from WARN to DEBUG - SSE `finished(nil)` demoted from WARN to DEBUG ### Full Changelog - feat(api): add reasoning_effort parameter (OpenAI standard) - feat(api): add logprobs and top_logprobs support (OpenAI standard) - feat(api): add auto-load coordinator with SSE keep-alive for cold loads - docs: consolidate TODOs into TODO.md and retire P1.1 (VRAM recovery) - test(scheduler): chaos tests + assertions for race regression coverage - test(api): tool message mapping edge cases (OpenAI + Anthropic) - feat(api): expose nova.capabilities on /v1/models - chore(log): demote benign macOS .warning pressure to debug - fix(api): plumb prompt_tokens through to streaming usage - chore(log): demote SSE finished(nil) WARN to DEBUG - feat: audio transcription, image generation, modelfiles, keep_alive, harmony streaming - feat: logging overhaul, tokenhub, Python SDK, docs update
View on GitHub →## What's New OpenAI API compatibility improvements: - **`reasoning_effort` parameter** — Maps OpenAI's `reasoning_effort` (low/medium/high) to native thinking token budgets. Works with all thinking-capable models (Qwen3, DeepSeek-R1, etc.). - **`logprobs` + `top_logprobs`** — Full log probability support matching the OpenAI standard. Returns token log probabilities and top alternatives per token when requested. Both additions bring NovaMLX closer to drop-in Ollama parity for tools that depend on these fields. **Full Changelog**: https://github.com/cnshsliu/novamlx/compare/v1.0.8...v1.0.9
View on GitHub →## What's Changed ### Thinking-Budget Enforcement (closes T5–T8 at temp=0) - **New `ThinkingBudgetProcessor`** — production-grade thinking-budget enforcement for reasoning models at `temperature=0`. When models like Qwen3.6 greedy-decode complex prompts, they can lock into chain-of-thought phase and exhaust `max_tokens` without producing response content. The processor counts generated tokens and forces close-marker emission when budget is exceeded — same primitive as vLLM/SGLang/llama.cpp `max_thinking_tokens`. - **Smart default**: `min(1024, max(256, maxTokens/2))` at temp=0 for thinking models. Opt-out via `thinking_budget=0`, custom via `thinking_budget=N`. - **`isImplicitThinkingModel` rewrite** — fixed mis-classification of Qwen3.6 as explicit-thinking when it's actually implicit (chat template injects `<think\n`). 4-step decision tree now correctly handles Qwen3.6, DeepSeek-R1, and non-thinking models. - **Extended `ComposedLogitProcessor`** to 4-slot chain: `penalty → grammar → turnStop → thinkingBudget`. ### VLM LogitProcessor Chain Fix - Both VLM paths now thread `LogitProcessor` chain (penalty + TurnStop + hallucination detection). Previously VLM requests silently bypassed repetition penalty and multi-token hallucination detection. ### Strict-FSM JSON Logit Processor - Rejects raw control chars (`\n`, `\r`, `\t`) inside JSON strings at both FSM step and mask precompute levels. Escaped sequences allowed. 15 unit tests. ### Chat Template Library & TokenMaskBuilder - Chat template format detection library with confidence scoring. `TokenMaskBuilder` cache for efficient logit masking. ### DeepSeek-V4 Lite Test Suite - 7 tests covering model registration, family routing, chat template detection, indexer contract, and family guard. ### ThinkingParser Regression Suite - 13 tests covering Qwen3.6 explicit markers, streaming chunk-boundary fuzz, mixed markers, implicit-detection fixtures, and `enable_thinking=false` regex behavior. ### Build & Infra - `build.sh` now runs idempotent post-build sync (Mach-O UUID comparison + codesign). Bypass with `NOVAMLX_SKIP_DIST_SYNC=1`. ### Full Changelog - feat: strict-FSM JSON logit processor, chat template library, TokenMaskBuilder cache, thinking detection overhaul - feat: VLM LogitProcessor chain, ThinkingParser regression tests, build.sh sync, GUI models path - test: DeepSeek-V4 lite regression suite (7 tests) - docs: close §2.10 DeepSeek-V4 lite test suite in todo.markdown - feat: ThinkingBudgetProcessor + isImplicitThinkingModel rewrite, close §2.12
View on GitHub →## What's New in v1.0.2 ### Safe Inference Defaults - **Default frequencyPenalty=0.5** prevents repetition collapse in small quantized models without requiring user parameters - **Default maxTokens lowered to 2048** prevents runaway generation for bare API requests - **Fixed temperature=0 override bug** explicit temp=0 was being overridden to 0.6 ### FusedBatchScheduler Improvements - **Frequency penalty in fused decode loop** GPU-based scatter_add penalty prevents repetition collapse - **Accumulated batch decode** fixes whitespace stripping in SentencePiece tokenized models - **Control token filtering** protocol tokens no longer leak into output ### Other Changes - Agent-aware context scaling with ClientDetector - N-gram speculative decoding in FusedBatchScheduler - ProcessMemoryEnforcer for memory pressure handling - OCROptimizer with model-specific sampling overrides - UI overhaul, deadlock fix
View on GitHub →**Full Changelog**: https://github.com/cnshsliu/novamlx/compare/v1.0.1...v1.0.0
View on GitHub →