How DailyVox actually works
A deep technical breakdown of the on-device AI architecture behind private voice journaling. No cloud. No APIs. No compromises.
Architecture Overview
Every piece of data in DailyVox flows through a pipeline that runs entirely on the device. There are zero network calls for AI processing. Here is the full system architecture.
The key constraint: data never leaves the device for processing. Transcription happens on the Neural Engine. NLP runs locally. The Digital Twin model is computed and stored in Core Data. The only optional network path is Apple's encrypted iCloud sync, which the user can disable.
On-Device AI Stack
DailyVox uses eight Apple frameworks to build a full AI pipeline without any third-party dependencies or server-side processing.
Speech Framework
SFSpeechRecognizerThe primary transcription engine in v1.0-1.x. Converts spoken audio to text entirely on-device.
requiresOnDeviceRecognition = trueensures zero network transmission- Input: AAC audio at 44.1kHz sample rate via AVAudioEngine
- Supports 60+ languages with on-device models
- Real-time partial results during recording for live feedback
- Runs on Apple Neural Engine for efficient battery usage
SpeechAnalyzer
iOS 26+ — next-gen transcriptionApple's next-generation speech recognition framework, replacing SFSpeechRecognizer in v2.0.
- Significantly faster recognition with lower latency
- Native long-form audio support without session timeouts
- No user setup required (no permission prompts for on-device)
- Volatile results for instant partial transcription feedback
- Built for sustained recording sessions — ideal for voice journaling
NaturalLanguage Framework
NLTaggerThe core NLP engine that extracts meaning from transcribed text. Runs multiple analysis passes per entry.
- Sentiment scoring: sentence-level valence from -1.0 to +1.0
- Named Entity Recognition: people, places, organizations, dates
- Part-of-Speech tagging: verb density, adjective richness, pronoun patterns
- Language identification: auto-detect entry language for multilingual users
- All tag schemes run locally using on-device CoreML models
NLEmbedding
512-dim word/sentence vectorsGenerates dense vector representations of journal entries for semantic search and clustering (v1.3+).
- 512-dimensional sentence embeddings per journal entry
- Cosine similarity for semantic search ("entries where I felt conflicted")
- K-means clustering to discover hidden thematic groupings
- Foundation for RAG retrieval layer in v2.0
- Vectors persisted in Core Data alongside entry text
Foundation Models
iOS 26 — 3B on-device LLMApple's on-device large language model, enabling conversational Digital Twin interactions in v2.0.
LanguageModelSessionfor multi-turn conversation with transcript memory- Tool calling: Twin autonomously queries Core Data via custom Tool protocol
@Generablemacro for type-safe structured outputs (mood reports as Swift structs)streamResponse()for real-time streaming chat UI- Dynamic instructions from DigitalTwinEngine for personality-matched responses
- Requires iPhone 15 Pro or later. Entire pipeline on-device.
Core Data + CloudKit
NSPersistentCloudKitContainerLocal-first persistence with optional encrypted cloud sync across devices.
- NSPersistentCloudKitContainer wraps SQLite with CloudKit sync
- Local-first: app works fully offline, syncs when available
- AIState entity stores all Digital Twin models as Codable JSON
- iCloud sync is optional and uses Apple's encrypted infrastructure
- User can disable sync entirely — data stays on-device only
CryptoKit
AES-256-GCM encryptionMilitary-grade encryption for backup exports and sensitive data at rest.
- AES-256-GCM authenticated encryption for backup files
- User-provided passphrase for backup key derivation
- Encrypted JSON export format for device migration
LocalAuthentication
Face ID / Touch IDBiometric authentication to protect access to journal entries.
- Face ID and Touch ID support via LAContext
- Biometric keys stored in Secure Enclave
- App lock with configurable auto-lock timeout
- Fallback to device passcode when biometrics unavailable
Digital Twin Engine
The DigitalTwinEngine is a custom personality modeling system that builds a multi-dimensional profile of the user from their voice journal entries. It does not use any external models or APIs. The entire model is computed from NLTagger output and stored as serialized JSON in Core Data's AIState entity.
The engine consists of four interconnected models.
CommunicationStyle STYLE
How the user expresses themselves. Updated with each entry.
- Type-Token Ratio (vocabulary richness)
- Expressiveness score (0.0 - 1.0)
- Directness score (0.0 - 1.0)
- Formality score (0.0 - 1.0)
- Signature words + frequency map
- Average sentence length
- Pronoun usage patterns (I vs we)
EmotionalSignature EMOTION
The user's emotional baseline and patterns over time.
- Valence baseline (positive/negative)
- Arousal baseline (energy level)
- Dominance baseline (control feeling)
- Morning vs evening mood patterns
- Weekday vs weekend patterns
- Trigger topics with correlation scores
- Emotional volatility index
PersonalKnowledgeGraph GRAPH
A network of people, places, and topics with emotional weights.
- NER-extracted entities (person, place, org)
- Emotional weight per entity (-1.0 to +1.0)
- Mention frequency over time
- Co-occurrence relationships
- Entity-mood correlation tracking
- Topic clusters from entity groupings
TwinPredictions PREDICT
Forecasts based on temporal pattern analysis.
- Day-of-week mood forecasting
- Time-of-day emotional patterns
- Trend direction (improving/declining)
- Seasonal pattern detection
- Trigger anticipation from schedule
- Confidence scores per prediction
Storage model: All four models are Swift Codable structs serialized to JSON and stored in a single Core Data entity called AIState. This means the entire personality model can be loaded in a single fetch, updated incrementally, and synced across devices as a single atomic object. No external database. No vector store (until v1.3). Just Core Data.
Privacy Architecture
Privacy is not a feature of DailyVox. It is the architectural constraint that every technical decision is built around. The system is designed so that private data physically cannot leave the device for processing.
Zero Network Processing
Every AI operation runs on the device's Neural Engine. Speech transcription uses requiresOnDeviceRecognition = true. NLTagger runs locally. The Digital Twin model is computed and stored in Core Data. There are no API calls, no cloud functions, no telemetry on journal content.
No Third-Party SDKs
DailyVox contains zero third-party dependencies for core functionality. No analytics SDKs. No crash reporting that sends journal content. No ad networks. The only external code is Google Analytics on the website (not in the app) and Apple's own frameworks.
Apple's Privacy Nutrition Label
DailyVox carries Apple's "Data Not Collected" privacy label on the App Store. This is the strictest category — it means the app does not collect any data linked or unlinked to the user's identity.
Encryption and Authentication
CryptoKit AES-256-GCM encrypts all backup exports. Secure Enclave stores biometric authentication keys. LocalAuthentication gates app access behind Face ID or Touch ID. iCloud sync, when enabled, uses Apple's encrypted CloudKit infrastructure with end-to-end encryption.
Cloud AI Journal vs DailyVox
| Typical Cloud AI Journal | DailyVox | |
|---|---|---|
| Audio processing | Sent to cloud servers | On-device Neural Engine |
| AI model location | Remote API (OpenAI, etc.) | Apple on-device models |
| Text analysis | Cloud NLP service | NLTagger (local) |
| Data storage | Company servers | Core Data (SQLite on device) |
| Account required | Yes (email, password) | No |
| Third-party SDKs | Analytics, crash, ads | None |
| Privacy label | "Data Linked to You" | "Data Not Collected" |
| Works offline | No | Yes, fully |
| Subscription | $5-15/month | Free |
| Who can read your journal | Company, employees, subprocessors | Only you |
Technical Roadmap
Where DailyVox has been, what's being built now, and where it's going. Each version adds a layer to the on-device AI stack.
Voice Journaling + On-Device AI
Core voice journaling with fully on-device transcription, NLP analysis, encrypted storage, biometric lock, widgets, and Siri Shortcuts.
Digital Twin + Personality Model
Custom DigitalTwinEngine building a multi-dimensional personality model. Communication style tracking, emotional baseline with time patterns, entity knowledge graph with emotional weights, and temporal mood forecasting.
Ask Your Twin + Social Sharing
TwinChatView with pattern-matched query system. ShareablePersonalityCardView renders cards at 3x for Instagram Stories and Twitter/X. Review prompts via SKStoreReviewController at milestone entries.
Semantic Search + Proactive Insights
NLEmbedding for 512-dim sentence embeddings. Custom cosine similarity vector search index. Statistical anomaly detection (z-score deviations from emotional baseline). K-means clustering on embedding space. Foundation for v2.0 RAG architecture.
Multi-Language + Wrist Capture
String Catalogs for UI localization (Hindi, Spanish, Japanese, German). WatchKit companion app with WatchConnectivity for iPhone sync. Quick voice entry from wrist with complications.
Apple Foundation Models + Tool Calling + SpeechAnalyzer
Apple's on-device 3B-parameter Foundation Model. Twin becomes a real chatbot with multi-turn LanguageModelSession. Tool calling protocol lets the Twin autonomously query Core Data — fetch entries by topic/date/mood, retrieve personality data, surface mood patterns. @Generable for type-safe structured outputs. streamResponse() for real-time chat. DigitalTwinEngine feeds user's communication style into session instructions for tone matching (~75% accuracy). SpeechAnalyzer replaces SFSpeechRecognizer. Requires iPhone 15 Pro+. Zero network calls, zero API costs.
LoRA Fine-Tuning — Twin Learns to Sound Like You
Apple's Foundation Models Adapter Training toolkit for Low-Rank Adaptation. Export 100-1,000 entries as JSONL training data. Train a personal adapter on Mac (32GB+ Apple Silicon) — original model weights stay frozen, only small adapter matrices trained. ~160MB adapter delivered via Background Assets. Loaded via SystemLanguageModel(adapter:). The Twin doesn't just know your data — it sounds like you. Learns sentence structure, emotional vocabulary, punctuation habits, hesitation patterns. ~95% tone accuracy vs ~60% with instructions alone. Training data never leaves user's Mac.
A True Digital Version of You
The end state: an on-device AI that acts like you, speaks like you, responds like you — built from years of journal entries. Full RAG architecture: NLEmbedding vector retrieval + Foundation Model generation. Personal LoRA adapter for your voice. Tool calling for autonomous data access. Session transcript management with context condensation. The Twin understands what you said, when, how you felt, who you were talking about, and what patterns repeat. Encrypted. On your device. Exportable only by you. Digital self-preservation.
Research Context
DailyVox exists at the intersection of on-device LLMs, personal AI, and mental health technology. Several recent research papers explore adjacent ideas.
What makes DailyVox different: No existing paper covers DailyVox's specific approach — building a private Digital Twin from voice journal data using on-device NLP (NLTagger, NLEmbedding) combined with Apple's Foundation Models framework. The combination of voice-first input, personality modeling from NER/sentiment analysis, and on-device LLM generation with tool calling for autonomous data retrieval is a novel architecture. DailyVox is, to our knowledge, the first app to attempt this full pipeline privately on-device.
Open Source
DailyVox is open source. The full codebase — including the DigitalTwinEngine, all NLP processing, the Core Data stack, and the SwiftUI interface — is available on GitHub.
We believe that privacy-critical software should be auditable. If you claim data never leaves the device, people should be able to verify that claim by reading the code.
Build with us
DailyVox is open source and contributions are welcome. Whether it's improving the Digital Twin engine, adding language support, or building the Foundation Models integration — there's room to shape the future of private AI journaling.
View on GitHub