What's New in Privacy AI
Stay up to date with the latest features and improvements
What’s New
v1.5.4Version 1.5.4
Features
- Rethink Button A new "Rethink" button allows you to resume the AI's thinking process without re-executing tool calls. This is useful when you want the AI to continue reasoning from where it left off without triggering the same tools again. It is perfect for exploring alternative interpretations, or extending analysis without incurring additional API costs from repeated tool executions.
- Min_P Sampling Parameter Support Privacy AI now supports the Min-P (Minimum Probability) sampling parameter for compatible providers including OpenRouter and DeepSeek. Min-P offers an advanced alternative to Top-P (nucleus sampling) by filtering tokens based on a minimum probability threshold relative to the most likely token. This gives you fine-grained control over output quality and diversity—higher values (e.g., 0.1) produce more focused and consistent outputs, while lower values (e.g., 0.01) allow greater creative diversity.
- Interleaved Thinking Support Privacy AI now supports interleaved thinking (reasoning expansion) for advanced reasoning models from multiple providers, such as DeepSeek (deepseek-reasoner), OpenRouter (minimax-m2, deepseek-v3.2, gpt-5.2, claude-opus-4, kimi-k2-thinking), MiniMax (minimax-m2), Kimi/Moonshot (kimi-k2-thinking), and Google Gemini (gemini-3-pro-preview). This feature preserves the model's internal reasoning process across multi-turn conversations and tool calls, improving reasoning continuity, multi-step problem solving, complex task planning, and decision-making quality. The app automatically detects and handles provider-specific reasoning formats (reasoning_content, reasoning, reasoning_details) and provides helpful error messages when models require this setting to be enabled. Enable Interleaved Thinking in chat settingsx for immediate use, or in model settings for persistent configuration across all future chats.
- Claude API Interleaved Thinking Support
Claude 3.7 and Claude 4+ models now support native interleaved thinking during multi-turn tool call conversations. When enabled via the existing "Include Thinking in History" setting, Privacy AI preserves Claude's internal reasoning blocks (with signature) across tool execution chains using the
interleaved-thinking-2025-05-14beta header. This completes comprehensive interleaved thinking support across all major AI providers: DeepSeek, Gemini, Kimi, OpenRouter, MiniMax, and Claude—ensuring consistent reasoning behavior regardless of which API you use. - VCard and ICS Support Now it supports reading and converting contact files (.vcf) and calendar files (.ics) into structured markdown. Share vCard files to view contacts with names, emails, phone numbers, addresses, and other details in an organized format. Import iCalendar files to see events, tasks, and journal entries with dates, locations, attendees, and reminders—perfect for analyzing schedules or exporting calendar data for AI assistance.
Improvements
- iOS 26.2 Platform Update Privacy AI is now built on iOS 26.2, bringing the latest platform optimizations, security enhancements, and system-level performance improvements to the app. This ensures compatibility with the newest Apple features and provides a more stable foundation for AI operations.
- Improved Tool Call Display Messages with multiple tool calls are now easier to read with automatic expand/collapse functionality. When there are more than 3 tool calls, only the latest 3 are shown initially with a subtle button to expand and view all calls. The toggle uses smooth spring animations and a minimal gray design that doesn't distract from your conversation flow.
- Consistent Citation Card Citation preview cards in the Sources section now display with uniform dimensions (160pt × 160pt), ensuring a clean and consistent visual layout. All citation cards now maintain the same width and height for a more polished reading experience.
- llama.cpp Engine Upgrade (b7332 to b7402) Upgraded the local inference engine with 67 commits of improvements. This update brings optimized Metal backend specifically for iOS Mamba and SSM (State Space Model) architectures, delivering better performance on iPhone and iPad. Added support for cutting-edge vision models including Qwen3VL and KimiVL, plus improved CLIP architecture handling for enhanced multimodal capabilities. Threadpool stability has been significantly improved for more reliable multi-threading during long conversations. Also includes a critical YaRN fix for extended context models (128K+), ensuring accurate position encoding at extreme context lengths.
- Tabbed Remote Model Configuration The Remote Model configuration view now uses a clean tabbed interface with three focused sections: Model, Pricing, and Settings. This organizes the many configuration options into logical groups, making it easier to find and modify specific settings.
- Default Model Setting The default model selection has been relocated from the Siri setting page to the system setting page for better organization. All setting views now feature improved UI consistency with a unified theme design across the entire app, providing a more polished and intuitive user experience.
- Chat List Swipe Action Added bottom padding to the chat list to prevent the floating action button from overlapping with swipe-to-delete and swipe-to-pin actions on the last chat item. This ensures you can always access all swipe actions without interference, especially on smaller iPhone screens.
- Quick Ask Keyboard Handling The Quick Ask view now features improved keyboard management: tap anywhere on the response area to dismiss the keyboard, and the text input field uses a more consistent component with proper height sizing for a smoother interaction experience.
- Redesigned Sharing System The share extension has been completely redesigned to support as many file types as possible. New support includes ICS calendar files and vCard contact files, allowing you to share calendar events and contact information directly to Privacy AI for AI-powered analysis and organization.
- Article Retrieval from X It can retrieve and process content from article URLs of X.com, converts them into clean, readable markdown. Perfect for analyzing social media content, trending topics, or specific posts with AI-powered insights.
- Enhanced Tool Search with Improved UI Tool searching now supports both tool names and descriptions for better discoverability. The tool toggle UI has been enhanced with concise, human-friendly descriptions that explain what each tool does and when to use it, making it easier to understand and manage your AI tools at a glance.
- Markdown Rendering Optimization Streaming markdown content now renders more smoothly with optimized text processing. The app automatically normalizes list indentation and cleans up formatting during streaming, ensuring consistent and readable text display as AI responses appear in real-time.
Bug Fixes
- Image Sharing Restored Fixed an issue where users were unable to share images from other apps to Privacy AI. The share extension now correctly handles images and allows users to send photos and screenshots to the AI for analysis, restoring a core workflow capability.
- Clear Chat Messages Fixed an issue where users were unable to clear all messages in a chat. The clear function now works properly and removes all messages as expected, making it easier to start fresh conversations or clean up chat history.
- OpenAI API Compliance Fixed an issue where the 'parallel_tool_call' parameter was being sent to OpenAI API even when no tools were present in the request. This parameter is now only included when tools are actually used, ensuring compliance with OpenAI's latest API requirements and preventing unnecessary parameter transmission.
- Markdown Theme Fixed an issue where the markdown color scheme picker always defaulted to light mode when opening settings, regardless of the current system appearance. The color scheme selector now correctly initializes based on your current theme (light or dark mode), ensuring the displayed setting matches your actual viewing experience.
- Reader Context Menu Actions Fixed an issue where Reader context menu actions (Copy, Select & Edit, Share, Read Aloud) failed to work after migrating to the new cache structure.
- Whisper Model Retry Failure Fixed an issue where users encountered errors when retrying failed Whisper model downloads. The app now properly cleans up partial downloads folders before attempting a retry. This ensures users can successfully download Whisper models after network interruptions or other failures.
- Duplicated Model Settings Fixed an issue where changes made to a duplicated remote model could not be saved. When you duplicate a model and modify its settings, the changes now persist correctly to the database and appear immediately when you reopen the model detail view.
Version 1.5.3
Features
- ZIP Archive and Multi-Sheet Excel Support The Reader now supports opening ZIP archives and displaying multiple files within. Excel files with multiple sheets are properly handled, allowing you to switch between sheets seamlessly. This is powered by the new ProcessedContentCollection system that handles multiple content items from a single source, making it easy to work with complex documents and compressed archives without manual extraction.
- OpenRouter Protocol Support Privacy AI now includes dedicated OpenRouter protocol support with native integration for OpenRouter-specific features. Now you can configure web search capabilities with the :online suffix, choose between native or Exa search engines, and control max results and context size—all through a dedicated settings panel. OpenRouter models are tagged with an "openrt" indicator for easy identification. This extends beyond standard OpenAI compatibility to unlock OpenRouter's full feature set while maintaining the same debugging and inspection tools you're familiar with.
- Media Discovery Tools Three new search tools for books, podcasts, and games. Search books via Google Books with NYTimes bestseller rankings. Discover podcasts through iTunes/PodcastIndex with episode transcripts. Browse Steam store for games, pricing, deals, and news. All tools work without setup and include URL citations for easy reference.
- Talk to Document The Reader now includes a built-in chat interface that lets you have AI-powered conversations directly with your documents. Switch between "Read" and "Talk" tabs using a segmented picker at the top of the screen. In Talk mode, select any AI model and ask questions—the full document content is automatically linked to your chat session, giving the AI complete context to provide accurate, document-aware answers. Perfect for analyzing long reports, researching articles, or extracting insights from technical documentation without leaving the reading experience.
- Quick Actions Long-press the Privacy AI app icon to access Quick Actions. Tap "Quick Ask" to open a streamlined interface for asking questions with real-time streaming responses. Quick Ask uses the same default model configured for Siri in Settings. Perfect for quick queries on the go and will not pollute your chat history.
- Duplicate Model You can now duplicate any remote model configuration with a single tap. The new model will clone all settings: API key, server URL, system prompt, tools configuration, and parameters—are copied from the original. The duplicated model opens immediately for editing, making it easy to create variations.
- Chat Organization Conversations are now grouped by year and month (e.g., "December 2024", "November 2024") for easier navigation. Each month section includes a delete button that lets you remove all chats from that period at once, making it simple to manage and clean up old conversations.
Improvements
- Collapsible Message Previews Chat messages are now automatically collapsed to approximately 5 lines for a cleaner reading experience and faster scrolling through conversation history. Tap "Show more" to expand any message and see its full content. The most recent message always stays expanded so you can immediately see the latest response without extra taps.
- llama.cpp Engine Upgrade (b7222 to b7332) Upgraded the local engine llama.cpp with critical stability improvements for iOS devices. This update introduces Metal residency sets keep-alive mechanism that significantly improves GPU memory stability during long inference sessions, preventing memory eviction on iOS. Also includes CUDA Flash Attention FP16 overflow fix, Mach-O build fix for iOS/macOS compatibility. Support new Rnj-1 Model. Expect more reliable performance during extended conversations with local models.
- Expanded File Type Support Privacy AI now supports 30+ additional file types for direct reading and analysis. You can now open subtitle files (SRT, VTT), markup formats (XML, YAML), code files, vector graphics (SVG), and audio files (FLAC). Code files are displayed with proper syntax recognition, making it easier to share and analyze source code, configuration files, and technical documentation directly within the app.
- Rich Preview Cards for Citation Links Reference and citation links in markdown now render as portrait-style preview cards. Each card displays the website favicon, domain name, and multi-line title for better readability. Smart caching ensures previews load quickly on subsequent views, making it easier to identify and navigate to source materials without leaving your conversation.
- Enhanced AI Model Pricing Data AI model price data now includes detailed capability information and metadata for supported models. The pricing view displays model capabilities such as attachment/vision support, extended reasoning, tool calling, structured output, and temperature control. Additional metadata includes knowledge cutoff dates, release dates, open weights status, and dedicated cache pricing fields—helping you make more informed decisions when selecting models for your workflows.
- Optimized Rating Prompt Logic The in-app review system now uses smarter milestone-based triggers for better user engagement.
- New Siri Shortcut 'Ask Privacy AI' Added a new App Shortcut 'Ask Privacy AI' that allows you to quickly ask questions directly from Siri or the Shortcuts app.
- Enhanced HTML to Markdown Conversion Improved HTML to Markdown conversion to better preserve heading content and list items. Web pages and shared content now convert more accurately, resulting in cleaner and more readable Markdown output.
- Table and Image Indexing in Reader Tables and images in markdown documents now appear in the Reader's outline navigation using level 6 headers (######). This makes it easier to jump directly to specific tables or images when navigating long documents, improving the overall reading and reference experience.
- Auto-Saved Message Drafts Chat view now automatically saves un-submitted messages, so your draft text is restored when you return to the same chat. The app saves your input text as you type and restores it when you reopen the chat.
Bug Fixes
- Memory Import Infinite Loop Fixed an infinite loop issue that could occur when importing chat messages into memory. The system now creates a clean model instance for memory evaluation instead of reusing the current chat model, preventing circular references and ensuring stable memory operations.
- Streaming Stop Control Fixed an issue where streaming text would continue to grow after pressing the chat's stop button. Now streaming responses now halt immediately when you tap stop, ensuring instant control over AI generation and preventing unwanted text accumulation.
- Crash with Long Paragraphs Fixed a crash that occurred when loading very long markdown paragraphs. The app now properly splits large content into smaller sections to prevent stack overflow during layout calculation, ensuring smooth and stable reading of lengthy documents.
- Message Resending Context Loss Fixed a bug where the system prompt is not properly restored when resending a query, causing the AI to lose context and not respond correctly. Now when you resend a message, the AI maintains the same system instructions, ensuring consistent and accurate responses.
- Shared Content Processing Fixed an issue where sharing content with both a text query and file attachment would send the query immediately before the attachment finished processing. Text and attachments now send together after processing completes, ensuring the AI receives your complete message.
Version 1.5.2
Features
- File Processing Mode in Sharing UI
When sharing files to Privacy AI, you can now choose how the content is handled before sending it to the model. Send as is preserves the original file format. Convert to Markdown extracts text and converts it into lightweight Markdown to reduce token usage—recommended for most workflows. Your last selection is remembered across sessions, so you don’t need to reselect it each time. - HuggingFace API
Privacy AI now supports the HuggingFace Inference API. Use your HuggingFace token to access 5,000+ models with the same debugging experience available for OpenAI and Claude protocols. All outgoing requests are captured by the built-in inspector, and the integration works seamlessly with mitmproxy and Charles Proxy for complete end-to-end API debugging. - Nous Research Provider
Nous Research is now available as a new remote provider. Connect directly to their API to access additional cutting-edge models and integrate them into your workflows just like other cloud providers. - Markdown Code Block Syntax
Code blocks in Markdown now render with syntax highlighting for 40+ programming languages. AI-generated code is easier to read and automatically adapts to light and dark mode for a consistent viewing experience. - Max Tool Calls Control
You can now set a maximum number of tool calls per response in each model’s settings. This gives you precise control over the trade-off between thorough reasoning and cost/performance. Higher limits (10–20) allow deeper research flows with multi-step tool use; lower limits (3–6) prioritize faster and more economical responses. This setting is saved per model and preserved inside chat configurations, with a default value of 6. - Network Traffic Inspection via mitmproxy
All protocols: OpenAI, Claude API, HuggingFace API, and OpenAI Responses API—now support full network traffic inspection through mitmproxy. This makes it easy to diagnose protocol issues, inspect raw request/response payloads, and understand how data flows through each provider. Enable proxy mode under Settings > Proxy.
Improvements
- llama.cpp Engine Upgrade (b7140 to b7222)
This release brings a major performance upgrade to the local inference engine. The Metal backend now supports Flash Attention with head size 48, improving memory efficiency and speed. ARM CPU performance for Q4_K quantized models has been significantly boosted through optimized i8mm and dotprod instructions—resulting in faster inference across all Apple Silicon devices (M1/M2/M3/M4 Macs and iPhone 13+). Added support for the Ministral-3-3B model and improved compatibility with LFM2-VL vision models. The engine now also supports model-embedded sampling parameters, enabling GGUF files to specify their own recommended generation settings for more consistent outputs. - Accurate Content Detection
The attachment previewer now detects the true document type after import and conversion. When sharing documents, spreadsheets, audio, video, or subtitle files, the previewer automatically identifies the resulting format (Markdown, plain text, JSON, CSV, etc.) and displays the correct viewer. - Large Markdown File Rendering
The Reader now handles large Markdown documents with significantly improved performance. Scrolling is smoother, memory usage is reduced, and multi-chapter books, long articles, and technical documentation load more efficiently.
Bug Fixes
- Action Extension Content Handling
Fixed an issue where the Action Extension failed to process certain shared content types. Screenshots (PNG), images, text snippets, and other file formats now attach reliably when shared to Privacy AI. Previously, sharing screenshots could result in empty attachments. - Attachment Preview Improvements
Fixed a UI freeze that could occur when previewing large documents. Text now loads promptly and scroll performance is noticeably smoother, especially for large chat histories or extensive Markdown files. - PDF Outline Position Accuracy
Fixed a bug in the PDF-to-Markdown converter where section headers appeared at the bottom of the document instead of their correct positions. The outline extraction logic has been rewritten from a recursive approach to a simpler, more reliable loop structure that respects page boundaries and preserves original hierarchy.
Version 1.5.1
Features
- Claude API Protocol Support Privacy AI now supports the Claude API protocol, enabling direct connections to Claude models, Z.ai GLM4.6, and Kimi K2 Thinking with full support for multimodal inputs (text, images, PDFs), tool calling, and extended thinking modes. The built-in protocol inspector captures all API requests, responses, and SSE streaming events in real-time, making it easy to debug API interactions and verify behavior. Compatible with Anthropic's official endpoint and any third-party providers supporting the Claude API standard. If you're already a Z.ai or Kimi Code Plan member, you can use your existing API key to connect their endpoints and chat at no extra cost.
- Max Tokens Parameter Support Added support for the max_tokens parameter for both local and remote models. You can now precisely control the maximum number of tokens generated in each response through the model settings interface. This parameter is fully integrated with the model configuration system, and works seamlessly with all protocols.
- Multiple Named Endpoints for API Keys API keys now support multiple named endpoints, allowing you to configure different endpoint URLs under a single API key. This is perfect for services that expose multiple endpoints with the same authentication (e.g., GLM4.6 / Kimi K2 Coding Plan). Each endpoint can have its own name and URL, making it easy to organize and switch between different configurations while reusing the same API credentials. Manage all your endpoints directly in the API Key detail view for streamlined configuration.
- Local and iCloud Sync You can now sync chats and data between local storage and iCloud Drive with dedicated sync buttons in Settings. The sync operation skips existing files to prevent overwrites, provides real-time progress tracking with file counts and current file names, and shows detailed completion statistics. Two-way sync is supported—transfer from local to iCloud or from iCloud to local, making it easy to migrate your data or consolidate chats from different storage locations. And Clean Local and Clean iCloud options are available with double-confirmation warnings to help manage storage space safely.
- Storage Management A new Storage Management section in Settings provides detailed visibility into your app's storage usage. View file counts and total sizes for each storage category. Each category displays its purpose and current storage footprint, with one-tap delete buttons to quickly free up space. Directory information refreshes automatically when switching between local and iCloud storage, and deletion requires confirmation to prevent accidental data loss. This makes it easy to identify large directories and clean up storage without leaving the app.
- Function Call Rate Control Remote model settings now include a Function Call Wait Time parameter that allows you to specify a delay (0-30 seconds) between consecutive tool calls. This helps prevent API rate limiting issues when using models with function calling capabilities. The workflow automatically waits the configured number of seconds between each tool execution, displaying a status message to keep you informed. This is especially useful for providers with strict rate limits or when executing multiple sequential tool calls in complex workflows.
- Subtitle File Support Privacy AI now supports SRT and VTT subtitle files when sharing content to the app. Both SubRip (.srt) and WebVTT (.vtt) subtitle formats are automatically recognized and processed as plain text, making it easy to analyze video captions, translate subtitles, or extract dialogue from media files. Simply share a subtitle file to Privacy AI from Files, Safari, or any other app, and the content will be ready for AI analysis.
- History-Aware Attachment Retrieval Chat history now pulls in only the most relevant excerpts from attached files for your current question, keeping context lean while still recalling what matters.
Improvements
- Smarter Web Search Result Ranking Web search results are now sorted by relevance using a combined keyword and semantic scoring algorithm. Links that better match your search query appear first, improving the quality and accuracy of search-based AI responses.
- Protocol Type Protection in Chat Settings Protocol type can no longer be changed from within chat settings to prevent configuration inconsistencies. When accessing model settings from an active chat, a red warning message indicates that protocol changes require creating a new chat from Remote Model settings. This safeguard ensures that each chat maintains its original protocol configuration, preventing unexpected behavior when switching between OpenAI Chats API, OpenAI Responses API, and Claude API protocols.
- Language-Aware Context Calculation Privacy AI now uses a centralized, language-aware system for calculating context and token usage. The app automatically detects content language (English, Chinese, Japanese, Korean, code, or mixed) and applies appropriate character-to-token ratios for more accurate estimates. Context allocation has been optimized with adaptive strategies based on model size. Small models reserve 25% for output, while larger models (32K+) reserve only 8%, maximizing available conversation history. All context calculations now use a single, tested implementation across the entire app, eliminating inconsistencies and providing more reliable context management for both local and remote models.
- llama.cpp Engine Upgrade (b7091 → b7140) Major performance boost for iOS and Mac devices with ARM64 i8mm optimizations—Q4_K quantized models now run 15-30% faster on iPhone 13+ and M1+ devices through hardware-accelerated matrix operations. Added support for RND1 diffusion language models and enhanced Metal backend stability for macOS 11+. Includes critical security fixes for grammar parsing overflow vulnerabilities and improved numerical accuracy for triangle solver operations. This upgrade delivers faster inference, better stability, and broader model compatibility for on-device AI.
- Protocol Indicator on Model Logos Each model logo in chat views and model selection screens now displays a protocol type indicator badge. The badge shows different tags for different API protocols: 'oai' for OpenAI (green), 'resp' for OpenAI Responses (purple), and 'claude' for Claude API (pink). For local models, the badge displays the inference engine type (e.g., 'gguf' for llama.cpp, 'mlx' for MLX). This visual indicator helps you quickly identify which protocol and engine each model uses, making it easier to understand your model configuration at a glance.
- Secure API Key Input API key text fields are now hidden by default with an eye icon toggle button on the trailing edge. Tap the eye to reveal or hide your API key for secure entry and verification. This protects your sensitive credentials from shoulder surfing while still allowing quick visibility checks when needed.
- MacOS Keyboard Shortcut Support MacOS now supports Shift+Enter to submit messages, matching the existing behavior on iPhone and iPad with Bluetooth keyboards. This provides a consistent keyboard-driven workflow across all platforms, allowing you to press Enter for newlines and Shift+Enter to send messages.
- Modernized Text Editor and Selector UI The text editor and selector view has been completely redesigned with a contemporary iOS interface. The new design features a clean segmented control for switching between Edit and View modes, a modern search interface with inline match navigation, and streamlined toolbars with system-standard buttons. The Done button has been moved to the navigation bar trailing position, character counters provide real-time feedback, and animated copy confirmations enhance the user experience.
- Long-Text RAG in Preparation Long messages get the same relevance-ranked retrieval as chat history, so only the useful parts of lengthy text are included before sending.
Bug Fixes
- Long Document Processing Fixed a bug where AI responses for intermediate chunks were not displayed in the chat when processing long books or documents. The chunk processing workflow now correctly creates message entries and forces UI updates for each chunk, ensuring all AI analysis results are visible as the document is being processed progressively.
- Model Lifecycle Stability Improved stability when switching between models or when memory is reclaimed during extended sessions. The app now gracefully handles model recycling events without throwing errors, preventing unexpected interruptions during active chats.
- Model Parameter Persistence Fixed an issue where context parameters and sampling parameters were not consistently synchronized between the model configuration and active chat state. Model settings now properly propagate to all workflow components, ensuring accurate token calculations and consistent generation behavior throughout the conversation.
- PDF Content Deduplication Fixed duplicate lines when importing PDF attachments.
Version 1.4.8
Improvements
- llama.cpp Engine Upgrade (b7091)
Updated to b7091 with improved Metal shader optimizations, faster argsort operations, and accelerated conv2d for better performance on iOS devices. This release adds support for new architectures such as AfmoeForCausalLM (Afmoe MoE models) and introduces new operators (CUMSUM, SOFTPLUS, TRI, SOLVE_TRI) required by hybrid and advanced reasoning models. iOS builds also benefit from enhanced ARM feature detection and improved numerical stability for more reliable inference. - Sharing to Privacy AI
Refined the UI layout to make the Share-to-Privacy-AI interface more compact and easier to use. Added a new Refresh button to reload the model list instantly. - Enhanced Keyboard Management in Model Configuration
All text fields and text editors in model configuration screens now include a convenient Done button to dismiss the keyboard. For numeric keyboards without a return key, a smart toolbar appears with a dismiss button. This enhancement applies across all model detail views. - Improved HTML to Markdown Conversion
The HTML-to-Markdown engine has been upgraded with significantly better noise filtering, especially tuned for X (Twitter) posts. You can now share X post URLs directly to Privacy AI, and the content will be loaded and converted into clean, readable Markdown for analysis. - Smarter Web Search with Reference Links
Web search results now include automatically extracted reference links appended to the end of AI responses, making it easier to verify information and explore original sources. - MCP Marketplace UI Optimization
Each MCP server’s card layout has been optimized so descriptions can be fully displayed. On iPad, cards now maintain a consistent height across multi-column layouts.
Bug Fixes
- Missing Local Models
Fixed an issue where downloaded local models were not displayed when sharing content to Privacy AI.
Version 1.4.7
Features
- Direct Video Input Support
Privacy AI now supports OpenRouter’s native video input protocol. You can send video files directly to supported multimodal models—such as Gemini 2.5 Flash, Flash Lite, Pro, and others—without any manual pre-processing. The app handles encoding and upload automatically, offering a seamless video-to-AI workflow. - Upgraded TTS/ASR Engine
sherpa-onnx has been upgraded to 1.12.15, adding support for newer TTS models like MatchTTS and expanding available voice options for higher-quality speech synthesis.
Improvements
- MLX Engine Upgrade
Updated to the latest MLX Swift framework with improved vision-model handling and greater stability. Qwen3-VL models now process images more reliably thanks to refined sanitization. Memory usage is optimized on both iOS and macOS, and the new context API enables more flexible prompt construction for advanced RAG workflows. - llama.cpp Engine Upgrade (b7032)
Major performance improvements for Apple Silicon, including Metal 4 Tensor API acceleration on M5-class devices, ARM64 SVE optimizations, and hybrid context shifting for better memory efficiency. This update also includes KV-cache optimizations, async buffer-retention fixes, and enhanced support for A19 devices. Expect 20–40% faster inference on supported hardware with improved GGUF stability. - WhisperKit Audio Transcription Boost
WhisperKit now uses automatic compute-unit selection, model prewarming, improved VAD chunking, and smarter decoding heuristics. Short audio (<15s) transcribes 10–20% faster, while long audio (>3 min) uses VAD to reduce memory pressure and improve reliability. - Whisper Model UI Refresh
The Whisper Models screen has been redesigned with more model variants and clearer descriptions to help you choose the right model for your workflows. - Improved Whisper Model Downloading
Whisper model downloads now run in the background, support resume-on-break, and display detailed progress information for a more reliable setup experience. - Modernized About & Acknowledgements UI
The About screen has been redesigned, with all third-party libraries updated to reflect their latest versions and descriptions.
Bug Fixes
- Partial Responses
Fixed a bug that caused truncated responses for some models in the Gemini family in OpenRouter. - Audio Transcription
Fixed an issue where audio and video files stored in iCloud Drive could not be read correctly on iPhone or iPad during transcription.
Version 1.4.5
Features
- MCP Marketplace (Pro) Discover and install pre-configured MCP servers in one tap. Powered by MCPRouter, the new Marketplace is a unified directory of integrations—time zones, web automation, image generation, data analysis, and more. Just enter your MCPRouter API key, search by name, and install any server instantly. All endpoints, headers, and auth keys are auto-configured—no manual setup required.
- llama.cpp Engine Refresh (b6962) Major CPU performance upgrade for iOS and macOS with ARM64 chunking and Flash Attention optimizations. Adds support for new vision models including Qwen3-VL, CogVLM, and Janus Pro, delivering enhanced multimodal reasoning and lower memory usage on mobile-class devices.
- Interleaved Thinking Support Privacy AI now supports interleaved thinking for advanced reasoning models such as Minimax M2, Claude Sonnet 4, and OpenAI GPT-5 (Thinking series). When enabled in model settings, the AI's internal reasoning is preserved across conversation turns, allowing continuous chains of thought and self-reflection for multi-step tasks. This enables truly agentic behavior where models can plan, execute, verify, and retry autonomously—transforming simple Q&A into end-to-end problem-solving workflows.
- OpenAI Responses API Support Privacy AI now supports the OpenAI Responses API protocol, OpenAI's next-generation API that enables stateless multi-turn conversations, no need to resend full chat history on every request. The Responses API also introduces conversation objects for automatic context management, reusable prompt templates with variable substitution, and native prompt caching for faster, cheaper repeated queries. Compatible with OpenAI and any third-party provider supporting the Responses API standard (including LM Studio). The new protocol inspector provides real-time visibility into all HTTP requests, responses, and SSE streaming events—perfect for debugging custom endpoints and verifying API behavior.
Improvements
- Streamlined MCP Authorization MCP configuration now includes a dedicated Authorization toggle with a secure API-key field. When enabled, Privacy AI automatically formats Bearer tokens in headers, removing manual string composition and reducing connection errors.
- Clearer Protocol Diagnostics Refined HTTP 404/429 messages now show whether a model is temporarily unavailable or rate-limited, with actionable guidance for each case.
- File Processing UX Enhancements Real-time progress bars appear when analyzing images or documents for OCR and text extraction, providing clear feedback through every processing stage.
- Multi-Image Selection Support You can now select multiple images at once from the photo picker (up to 10 images). All selected images are processed sequentially and attached to your message, streamlining workflows that require analyzing or comparing multiple photos.
- Collapsible Similar Models View The Similar Models section in Model Details now features a collapse/expand toggle. It starts collapsed by default to save space and can be expanded with one tap to compare pricing and alternatives.
Bug Fixes
- macOS Stability
Fixed a crash when opening Model Settings on Mac. The Model Detail screen now loads reliably across all platforms. - Local Model Resource Cleanup
Resolved a memory leak when unsupported GGUF or MLX models failed to load. Retries no longer cause crashes or stale allocations. - Super Siri Integration
Restored full compatibility—Super Siri now works seamlessly with MLX, GGUF, Apple Intelligence, and all remote providers. - Model Price Updating
Model pricing now refreshes instantly when you switch models in the Model View.
Version 1.4.2
Features
- New Vercel Provider
We’ve added Vercel (vercel.com) as a new official provider. Vercel offers some of the most cost-efficient AI models available today. Free-tier users receive $5 in credits every 30 days for use with AI Gateway models — a great way to explore their ecosystem at no cost. All Vercel model pricing data has been fully refreshed within Privacy AI. - MLX Vision Models Now Support Tool Calling Qwen3-VL, Gemma 3, Qwen 2 VL, Qwen 2.5 VL, and SmolVLM 2 now support tool calling. These vision-language models can act as text-to-text engines, enabling seamless integration with Privacy AI’s local tools and workflows.
Improvements
- Enhanced GGUF Model Downloads The GGUF download flow has been aligned with the MLX model process for a consistent user experience. You can now view both the primary GGUF model file and its optional CLIP component (if available). Added Verify and Refresh options ensure all required files are complete before initiating a chat.
- llama.cpp Engine Upgrade (b6871)
Upgraded the internal llama.cpp engine from b6778 to b6871.
- KV-cache memory alignment adjustments: removes over-padding and aligns context/buffer ordering, significantly reducing wasted RAM on mobile-class devices.
- New model support: adds compatibility for LightOnOCR-1B and BailingMoeV2, expanding local model choices.
Version 1.4.1
Features
- Memory Support
Introduces the long-awaited Memory feature, allowing you to add, edit, and delete memories that persist across all models. Switching between models no longer resets your personal context—your AI now truly remembers you. All memory data are stored securely on-device and optionally synced via iCloud for seamless continuity. Full privacy, full control.
Improvements
- llama.cpp Upgrade (b6804)
This major upgrade delivers substantial performance gains on iOS and macOS through optimized Metal backends and ARM NEON acceleration. It adds Flash Attention F32 support, non-padded KV cache for better memory efficiency, and faster normalization operations. Also includes new math ops, optimizer APIs, and fixes for memory leaks and build issues. Expect faster inference, lower memory use, broader model compatibility—including IBM Granite Hybrid—and improved overall stability for on-device AI. - MLX Engine Update
Updated to support the latest Qwen3 VL and Gemma 3 models for multimodal and text-generation tasks. - MLX Download Verification
MLX models often include multiple component files. The model-detail view now offers a Verify button to confirm all files are correctly downloaded; once verified, the Create Chat button becomes available automatically.
Bug Fixes
- DeepSeek Stability
Fixed an issue where chatting with DeepSeek models could cause unexpected crashes. - MLX Model Validation
Fixed a crash that could occur when attempting to load incomplete or corrupted MLX models.
Version 1.3.3
Improvements
- MLX Engine Upgrade
Upgraded the MLX engine, adding support for Falcon H1, Qwen3 embeddings, Granite Hybrid MoE, and LiquidAI LFM2MoE models for broader local compatibility and faster inference on Apple Silicon. - Updated Recommended MLX Models
Replaced the Granite 4.0 H Tiny (7B) model with the lighter Granite 4.0 H Micro model—optimized for iPhones where 7B models may exceed memory limits. Added LFM2 2.6B MLX to the recommended list for improved multimodal performance. - API Detail View Enhancements
The API configuration view now displays all available endpoints, letting you see exactly which URLs are used to connect to each remote server.
Bug Fixes
- Hugging Face Repository ID Resolution
Fixed an issue where Privacy AI failed to correctly extract the Hugging Face repository ID from certain downloaded MLX models. - Chat Creation During Downloads
Fixed a bug where already-downloaded models could not be used to start new chats while another model was still downloading.
Version 1.3.2
Features
MLX Model Integration
Privacy AI now supports the MLX model engine, enabling both text and vision models to run locally. You can directly download models from Hugging Face by entering a repository ID and access token (if required).
The new download manager adds resume-on-failure, background downloading, and a model integrity verifier for reliable large-model transfers.
MLX models now fully support local tool calls and MCP tool calls, just like GGUF and remote API models.
Best of all, MLX model support is included in the Free Plan.Clipboard and Drag-and-Drop Enhancements
The Chat and Reader editors now allow pasting images, videos, and files directly from the clipboard. On iPad, you can also drag and drop files or images into the chat editor — a smoother, more natural way to attach media.
Improvements
llama.cpp Upgrade
Updated llama.cpp from b6558 → b6692, adding compatibility for new models such as Qwen3 Reranker and LiquidAI LFM2-2.6B, plus broader quantization format support.Text Rendering Refinement
Improved text layout and rendering during “thinking” mode for better readability and consistency across devices.Smarter Model Switching
Switching local models now filters to show only models of the same type — MLX, GGUF, or Apple Intelligence — for a cleaner experience.
Bug Fixes
Thinking Text Display
Fixed an issue where “thinking” models occasionally displayed incomplete or cut-off lines when collapsed.Thinking Mode Persistence
Fixed a bug where the local model’s thinking-mode setting was not properly saved or applied to new chat sessions.Single Local Model Enforcement
Resolved a critical issue where MLX and GGUF models could run simultaneously, potentially causing memory overflow.
Version 1.2.4
Improvements
GLM-4.6 Integration
Added support for the latest GLM-4.6 model under the Z.ai provider, giving users access to improved performance and reasoning capabilities.IBM Model Branding
Introduced official IBM logos for the Granite series and other IBM models, making it easier to visually identify them in the model list.Enhanced iPad Experience
Improved the remote model configuration UI on iPad with a cleaner layout and better usability for larger screens.
Bug Fixes
- Remote Parameter Saving
Fixed an issue where key parameters—includingtemp,seed,top_p,repeat_penalty,presence_penalty,frequence_penalty, andreasoningEffort—were not being saved correctly for remote models. These settings are now preserved as expected.
Version 1.2.3
Improvements
- Refined switch model behavior
The switch model button now switches between models from the same provider instead of toggling between local and remote APIs.
Bug Fixes
Consistent model picker display
Fixed an issue where the model picker in chat did not consistently show both remote and local models.Removed duplicate switch button
Eliminated the extra switch model button in the chat UI. Users can now rely on the Clone and Fork Chat features for a better way to continue a conversation with another model.
Version 1.2.2
Features
Free Plan launched
Most core features are now permanently free, including Local Models, Apple’s on-device Foundation Model, iCloud sync, natural language chat, Reader, 25+ built-in tools (Search, News, Stocks, Weather, Health, Email, Calendar, etc.), conversation cloning, advanced export, audio/video transcription, Siri & Shortcuts, and more.
Only advanced features—Cloud models, MCP, and custom API providers—require a Pro Plan subscription.Configurable text rendering refresh rate
Control how often text refreshes under throttle to prevent UI lag and battery drain with very fast models.llama.cpp engine upgrade
Upgraded to version b6558 with support for Liquid AI series models.Model settings copy on chat clone
When cloning or forking a chat with the same model, all settings (temperature, top_p, top_k, context, etc.) are preserved.
Improvements
- Optimized typewriter effect
Optimize the typing effect with dynamic throttling and caching for smoother screen updates and longer battery life. - Auto-scroll enabled by default
Replies now scroll automatically when new text appears. - Updated default model parameters
Set Temperature to 0.3 (from 0.6) and Top P to 1.0 (from 0.9) based on community feedback for better defaults. Fully customizable. - Streaming support for Apple Foundation Model
The on-device model now streams responses for faster interaction.
Bug Fixes
Correct model categorization
Fixed an issue where the Apple on-device Foundation Model was mistakenly treated as a cloud model.Renaming imported models
Fixed a bug that prevented local or imported models from changing their titles.
Version 1.1.33
Feature
- Added Apple’s on device Foundation model Added Apple’s on device Foundation model to the local model list. Integrated with 25 built-in tools (search, email, news, and more) and extended support for external MCP tool calling.
Improvement
- Customizable think tags Users can now define their own start and end tags for self-hosted or cloud models.
Bug Fix
- Resolved system prompt issue Fixed a bug where system prompts for self-hosted and cloud models could not be saved.
Version 1.1.32
Features
- iOS 26 Support Integration with the latest iOS 26 UI and Apple’s on-device Foundation Model.
- Natural Talk UI A new hands-free interface that lets you talk naturally with any model.
- Update Highlights The “What’s New” screen now appears automatically after each app update.
- Smarter History Rebuilt history handling now supports binary attachments (images, PDFs, and more) directly in chat history, so you can reference them later.
- Richer Previews Attachments now preview Microsoft Office, Apple iWork, PDF, video, and audio files.
- New Image Models Added support for OpenAI’s 'gpt-image-1' and HuggingFace’s 'black-forest-labs/FLUX.1-dev'.
- TTS Model Support Added OpenAI protocol text-to-speech endpoints, including gpt-4o-tts' and 'gpt-4o-mini-tts'.
- Universal Preview Tool Open and export AI-generated PDFs, images, Office 365 documents, iWork files, videos, and audio.
Improvements
- OS Requirement Raised minimum requirement to iOS 18.6, keeping APIs aligned with current and previous iOS versions.
- Model Selection Prevents starting a remote chat session without choosing a model.
- File Compatibility Improved binary file handling (e.g., PDFs) with full support for OpenAI-compatible servers like OpenRouter. Raw PDFs can now be uploaded directly.
- Protocol Inspector Limited displayed text to 2,000 characters to avoid freezes from large Base64-encoded requests.
- Token Management Smarter history trimming to stay within each model’s token limits.
- API Pricing Added pricing data for TTS, image generation, and transcription models, including hybrid cost methods (per minute, per image, etc.).
- Tool Calls Improved UI for tool calls, which are now saved and reloaded with chats.
Bug Fixes
- Audio Playback Fixed an issue where the mini player bar stopped playback when scrolling through chat.
- Image Queries Fixed a bug where refreshing an image query ignored updated parameters (count and size).
Version 1.1.31
Features
- Super Siri Tool Control – Configure tools when invoking Siri. Enable web search or calendar management directly by voice.
- Smarter Local TTS – Pause and resume local text-to-speech from the current position.
- Upgraded Speech Engine – Updated sherpa-onnx to 1.12.11 for better pause handling at punctuation.
- Large File Processing – Oversized files are now split into chunks with preserved context, enabling summaries, translations, and more at any scale.
Improvements
- Mini TTS Player – A new toolbar player replaces the blocking dialog, letting you read while listening. It also highlights the current text being spoken.
- Multi-File Import – Import multiple files into a single chat for richer workflows.
- Smarter Image Text Extraction – When enabled, extracted text is added as the alt attribute in Markdown image links, improving clarity for AI.
Bug Fixes
- Markdown attachments now render in the Markdown viewer instead of plain text.
- URL submissions now extract images according to Reader settings (previously all images were included).
- Fixed X.com link handling: sharing or opening X posts in Reader now extracts the full post content.
Version 1.1.30
Features
Multiple Attachments in Chats
All models now support multiple attachments in conversations. You can upload one or more files of different types, such as documents, spreadsheets, or images, into a single chat. Privacy AI will process all of them together, giving you richer context and more accurate results.
Process Multiple URLs at Once
Instead of pasting links one by one, you can now include multiple URLs in a single message. Privacy AI automatically fetches and processes all of them, merges the content, and sends it to the AI model for unified analysis.
Latest Office 365 Excel Support
Reader and chat now support the newest Office 365 XLSX file formats. Even the most recent Excel spreadsheets can be converted directly into clean Markdown for further reading, summarization, or analysis.
Smarter Reader Settings
Two new configuration options: “JS Check Interval” and “JS Idle Threshold”, let you control how Privacy AI handles JavaScript execution when loading web pages. This fine-tuning improves accuracy and efficiency when converting remote web content into Markdown.
Interactive AI Images
AI-generated images are no longer static. You can now rotate, flip, zoom in, or zoom out within the app, making it easier to explore details and adjust perspective.
New Image Editor
A built-in Image Editor lets you draw entirely new images or make edits to existing ones. Use Apple Pencil or your fingers to sketch, highlight, or modify an image, then send it directly into a chat for AI processing.
Edit Sent Messages
Messages you’ve already sent can now be edited. When you make changes, all attachments linked to the original message are automatically preserved and copied to the updated message, saving time and preventing accidental data loss.
Context Usage Indicator
A new indicator shows live token usage in each chat. If the connected server supports usage reporting, Privacy AI will display accurate values; otherwise, it estimates them for you. This helps you track conversation length and know when it’s time to compress history or start a new chat.
Improvements
Friendly Error Messages
Technical errors are now automatically reformatted into plain, human-readable messages, so you can quickly understand what went wrong without needing to parse raw logs.
Upgraded Inspector
The Inspector tool is more powerful than ever. You can now export complete logs, copy them in full, or select and copy partial segments. This makes debugging model calls and protocol flows faster and more convenient.
Smarter Remote Error Handling
When remote AI servers return errors, Privacy AI will now reformat them into simple, explanatory messages rather than showing dense technical stack traces.
Better Message Editing
Edited messages now automatically retain their original attachments. This ensures that no files are lost when you make corrections or updates.
Optimized Token Allocation
The history management algorithm has been redesigned. It maximizes the amount of past conversation that can be preserved while maintaining safe margins for the current query and response, improving both reliability and model performance.
Improved File Upload Dialog
The file chooser now includes clear descriptions of each processing mode. These explain the technical details (such as BASE64 encoding or local vs. remote handling), the token cost implications, and the best situations for each option—helping you make smarter decisions.
Provider Memory
When you create a new remote AI model, Privacy AI now remembers the last provider you used. This removes the need to reselect your preferred provider each time, saving clicks and setup time.
Transparent Subscription UI
The subscription interface now shows detailed explanations of how free trials work, including timing and conditions, so you can make informed decisions before committing.
Faster Siri & Shortcuts
The AI workflow for Siri and Shortcuts has been streamlined. This stateless mode now runs more efficiently, improving performance and responsiveness in automations.
Bug Fixes
- Resolved an issue where Siri and Shortcuts could not connect to AI models if the application was not already running in the background.
Version 1.1.29
Features
- OpenRouter Image Generation: Generate images directly through OpenRouter. Try the free Gemini 2.5 Flash Image Preview model today.
- Smarter File Import for Media: When importing videos or images, a new dialog lets you configure dimensions. The app automatically scales the shorter side or uses iOS APIs to extract text, saving tokens and costs. You can also choose to send files unchanged to remote AI.
- Flexible File Handling: For non-media files, you can upload them as-is or convert them to Markdown locally. Markdown conversion reduces cost but may trade off some accuracy.
- iCloud Model Sync: Download a GGUF model once and use it across all your devices with the same iCloud account. Models load automatically in the background—no need for multiple downloads.
- Reader Enhancements: Added support for capturing photos directly in the reader and selecting processing methods before importing documents, images, or videos. The reader UI has also been refreshed.
- Expanded Self-Hosted Server Support: Now works with llama.cpp, vLLM, LocalAI, and Jan AI.
- HTML in Code Blocks: Code blocks containing HTML can now be rendered and captured as screenshots for easy sharing.
- Rich Chat History: Chat messages now preserve attachments and generated media, creating a more complete conversation record.
- Feature Overview: Added a section that lists all features and highlights of the app for easier discovery.
- Improved Model Pricing Display – Added new fields and improved the UI of model price information shown in the Remote Service and API Keys views.
Improvements
- Updated llama.cpp to b6301 for improved performance and compatibility.
- Remote Services now remember the collapsed/expanded status for each category.
Bug Fixes
- Fixed an issue where newly installed apps sometimes failed to display local AI models in the selection list.
- Fixed an issue where the application failed to refresh the local cache after remote configuration changes.
Version 1.1.26
Features
- Local vision models on device. You can now run vision-capable GGUF models locally via our llama.cpp mtmb integration (for example, Qwen2.5-VL 3B Instruct). Add images to a chat without sending data to the cloud.
- Add our first vision model, "Qwen2.5-VL 3B Instruct."
- Add new suggested models. Added "gemma-3-270m" to the recommended list
Improvements
- Faster and more reliable downloads. Local model downloads are quicker and more stable; we also fixed an issue that could start extra download workers when the app moved between background and foreground.
- Clearer remote model pricing. The price panel now shows "Cache Read (per 1M tokens)" and "Cache Write (per 1M tokens)" so you can estimate costs more accurately when KV cache is used.
- Easier ways to contact us. In Feedback, you can open your default email app or DM our X account. There's also an optional email field if you want a reply. Because the app has no user accounts, we can only respond if you leave contact info.
Bug Fixes
- Model list refresh. Newly downloaded local models now appear immediately when you switch models inside a chat.
Version 1.1.23
Improvements
- llama.cpp Upgrade – Updated from b5950 to b6131 for better GPU performance on Apple devices, more efficient KV cache handling, and support for new models: GLM-4.5, SmallThinker, Qwen3-Embedding, and Hunyuan Dense.
- Faster Large Model Imports – Refactored the GGUF file processor to handle 4B+ models with the latest llama.cpp engine, importing quickly without memory overflows.
- Perplexity API Update – Added support for the latest models: sonar, sonar-pro, sonar-deep-research, sonar-reasoning, and sonar-reasoning-pro. Model names can now be edited in Perplexity API settings. (Tip: If you already use the Perplexity API, remove it from the API list and restart the app to apply changes.)
- TTS Settings Enhancement – The TTS settings view now displays your device's CPU core count to help you choose the optimal thread count.
- Clearer X.com API Key Guide – Added more detailed instructions in Settings for using your X.com API key.
- Tool Description Improvements – Updated the search_contact local tool description so AI knows when and how to use it. Related tools like send_email and send_sms now rely on it for recipient lookup. For example: "Send my wife an SMS and tell her I love her" will first trigger search_contact to find the recipient, then create the message.
- Better HuggingFace Downloads – Model downloads now run in the background and can resume after interruptions. A prompt will notify you when the download is complete—ideal for large models like Qwen3-4B-Thinking/Instruct-2507.
- Tool Search Bar – Added a search bar for local and MCP tools to quickly find what you need as the tool list grows.
- UI Tips Added – Added quick usage tips for API Key Management, Tools, and Remote Services to help new users get started faster.
Bug Fixes
- Search bar in Remote Services now filters models correctly.
- Removed the "Add" menu from Local Tools, since adding external tools is no longer supported.
- Resolved a crash when sending SMS with the send_sms tool.
Version 1.1.21
Features
- Offline Text-to-Speech – An offline TTS model Kokoros-82M (53 distinct voice styles) is now bundled directly in the app. Any AI reply or Reader article can be spoken aloud entirely on-device, so nothing is sent to the cloud and there are no per-character fees. You can also export the generated audio (M4A,WAV,AIFF) to Files, AirDrop, or any media player for later listening.
- API Server Templates and Cloning – You can duplicate an existing server profile—such as the HuggingFace—to create custom endpoints in seconds. All models, headers, tokens, and endpoint settings are copied automatically. You only adjust what differs (base URL, model path, etc.). This is ideal when a provider exposes multiple endpoints under a single API key or when you run several private vLLM clusters.
- Built-in GitHub Provider – GitHub has been added to the list of internal API provider.
- Flexible Model Selection for Forked or Cloned Chats – When you fork an existing conversation or clone it into a new thread, you can now change the underlying model—local or remote—before the next message is generated. The original chat remains intact, and the new branch inherits the full context and tool settings while letting you compare answers or continue with a faster or cheaper engine.
Bug Fixes
- The file-import button that disappeared inside chats is back.
- Manually entered model names now save correctly in API Key configuration screen.
Version 1.1.17
Features
- Built-in API Access to z.ai – We've embedded native support for https://api.z.ai/api/paas/v4/. You can now call Z.AI services directly from Privacy AI with your token.
- Major Siri Integration Upgrade – Local Models Now Work with Siri: You can now trigger local models using Siri voice commands. This is made possible by enhanced performance of llama.cpp-based models. Faster AI Replies for Siri: Adjusted prompt logic helps AI respond within Siri's strict time limits (typically under 8 seconds). For best results, use fast-response models—avoid long-thinking agents.
- Search Tool Now Has Speed/Balance/Quality Modes – Choose your preferred mode for the searching_tool. "Speed" mode delivers up to 3× faster results, perfect for quick lookups.
Improvements
- Perplexity Model Deprecation – The outdated r1-1776 model from Perplexity has been removed. Please switch to sonar-reasoning or sonar-reasoning-pro for continued access via OpenRouter.
- Expanded OpenRouter Protocol Compatibility – Improved protocol handling ensures better performance and compatibility with the latest OpenRouter models and backends.
- Improved Thinking Text View – When thinking models produce long outputs, the text now scrolls smoothly with a visible scrollbar—no more UI lockups on lengthy thoughts.
- Code Block UI Stability – Markdown rendering has been optimized to remain responsive even when AI outputs include very large code blocks (1000+ lines). No more freezing or slowdowns in the chat UI.
Version 1.1.16
Improvements
- Fixed API Endpoint Save Bug – Resolved an issue where custom API base URLs were not being saved properly in the API settings screen.
- Improved URL Input Experience – Disabled automatic capitalization for URL fields to prevent input errors when entering API endpoints.
- Streamlined API Server Creation – You can now create a new API server directly from the API Key detail view—making it faster to configure your remote models.
- Enhanced Launch Feedback – Added detailed progress indicators during app startup to show what's being synced or initialized.
Version 1.1.6
Improvements
- Upgraded to llama.cpp b5950 with expanded local models support – Added support for the following new local GGUF models: Menlo_Lucy, SmolLM3, OpenReasoning-Nemotron-1.5B.
- Comprehensive rebuild and optimization of llama.cpp for iOS – Rebuilt the entire iOS build system for llama.cpp, generating a smaller and faster xcframework optimized for Apple devices. The Swift wrapper has been completely rewritten to improve memory handling and inference throughput. Benchmark results show a ~30% performance boost in local model prediction on Apple chips.
- Improved YouTube subtitle handling – The YouTube caption downloader now automatically falls back to the default subtitle track when the English caption is unavailable, improving compatibility with non-English content.
- Enhanced Markdown conversion for specific blog websites in Reader – Reader now better supports various blog layouts and formats, producing cleaner, more accurate Markdown output for improved content processing and analysis.
Version 1.1.5
Features
- Groq Cloud Support – You can now connect directly to Groq API inference service.
Improvements
- Chat Prompt Sync with iCloud – Each chat now syncs its prompt properly across devices via iCloud.
- Faster First-Time Setup on New Devices – We've optimized iCloud sync performance when launching the app for the first time on a new device. Chats, settings, and models now load faster and more reliably.
- Improved Database Readiness for iCloud Devices – All critical data — including API keys and remote server configs — are now synced before the app starts. We've also added a "Refresh" button to manually trigger sync if needed.
Version 1.1.4
Bug Fixes
- Fixed a bug that prevented users from switching between text-only and text-to-image chat modes without creating a new session.
- Resolved a serious performance issue on iPad that caused scrolling to drop below 10 FPS in some cases.
- Added a new cache management system that significantly improves app launch speed.
Version 1.1.3
Features
- Scanned PDF OCR – Extract text from scanned PDFs — fully offline, no cloud involved.
- Moonshot API Integration – Now supports Moonshot servers like Kimi-K2 natively.
- Parallel Conversations – iPhone: run 8 AI chats at once. iPad: up to 12. Seamless multitasking.
Improvements
- iPad Split View Enhanced – Smarter layout adaptation when multitasking on iPad.
- iCloud Key Fix – Resolved API key sync issues across your Apple devices.
- Chat Launch Boosted – Chats open faster than ever.
- LiquidAI Ready – Upgraded llama.cpp to support Liquid series models for offline use.
Version 1.1.2
Improvements
- Qwen Model Optimization – Switched from 1.7B to 0.6B for better performance on older devices, with strong summarization and tool execution still intact
- YouTube Caption Summarization – Quickly summarize videos using available captions
- Improved Reading – Remembers your last reading position and features a smoother outline view
- llama.cpp Upgrade – Updated to b5846, with support for Baidu's ERNIE-4.5 models
Bug Fixes
- Photo Sharing Fixes – Sharing images into Privacy AI now works reliably across apps
Version 1.1.1
Features
- HuggingFace Integration – Connect to any Inference Endpoint with your token
- Polymarket Tool – Analyze real-time prediction markets for research and strategy
- Statistical Toolkit – Run advanced Bayesian and frequentist analysis with any tool-capable model
- MCP Upgrade – Now supports Authorization headers for secure remote access
Version 1.1.0
Improvements
- Upgraded OpenAI Protocol Support – Compatibility and responsiveness with services like Perplexity, Gemini, Anthropic, Mistral, and xAI have been significantly improved by updating to the latest OpenAI-compatible protocol version.
- Faster Web Search Tool – The search_web tool has been completely rewritten, resulting in a 60% boost in search speed and responsiveness.
- llama.cpp Core Upgrade – Updated to b5760, this release brings full support for the latest Gemma 3n open-source model, enabling faster and smarter on-device AI performance.
- Fork a Chat Anytime – Now you can instantly fork any conversation into a new thread—preserving all previous messages and tool settings for seamless exploration.
- Improved Academic Search Accuracy – The search_arxiv tool has been fine-tuned for more accurate academic paper search, delivering better results from the ArXiv database.
Ready to Experience Privacy AI?
Download now and take control of your AI experience with complete privacy.
Download Privacy AI