How DailyVox actually works

A deep technical breakdown of the on-device AI architecture behind private voice journaling. No cloud. No APIs. No compromises.

Architecture Overview

Every piece of data in DailyVox flows through a pipeline that runs entirely on the device. There are zero network calls for AI processing. Here is the full system architecture.

INPUT Microphone --> AVAudioEngine (AAC 44.1kHz) | v TRANSCRIPTION SFSpeechRecognizer (requiresOnDeviceRecognition = true) | iOS 26: SpeechAnalyzer replaces this v NLP ANALYSIS NLTagger --> Sentiment | NER | POS | Language ID | v PERSONALITY MODEL DigitalTwinEngine |--> CommunicationStyle (TTR, formality, directness) |--> EmotionalSignature (valence, arousal, dominance) |--> PersonalKnowledgeGraph (entities + emotional weights) |--> TwinPredictions (temporal patterns, forecasts) | v STORAGE Core Data --> NSPersistentCloudKitContainer | | v v Local SQLite iCloud (optional, encrypted) | v PRESENTATION SwiftUI Views --> WidgetKit | AppIntents (Siri)

The key constraint: data never leaves the device for processing. Transcription happens on the Neural Engine. NLP runs locally. The Digital Twin model is computed and stored in Core Data. The only optional network path is Apple's encrypted iCloud sync, which the user can disable.

On-Device AI Stack

DailyVox uses eight Apple frameworks to build a full AI pipeline without any third-party dependencies or server-side processing.

S

Speech Framework

SFSpeechRecognizer

The primary transcription engine in v1.0-1.x. Converts spoken audio to text entirely on-device.

  • requiresOnDeviceRecognition = true ensures zero network transmission
  • Input: AAC audio at 44.1kHz sample rate via AVAudioEngine
  • Supports 60+ languages with on-device models
  • Real-time partial results during recording for live feedback
  • Runs on Apple Neural Engine for efficient battery usage
A

SpeechAnalyzer

iOS 26+ — next-gen transcription

Apple's next-generation speech recognition framework, replacing SFSpeechRecognizer in v2.0.

  • Significantly faster recognition with lower latency
  • Native long-form audio support without session timeouts
  • No user setup required (no permission prompts for on-device)
  • Volatile results for instant partial transcription feedback
  • Built for sustained recording sessions — ideal for voice journaling
T

NaturalLanguage Framework

NLTagger

The core NLP engine that extracts meaning from transcribed text. Runs multiple analysis passes per entry.

  • Sentiment scoring: sentence-level valence from -1.0 to +1.0
  • Named Entity Recognition: people, places, organizations, dates
  • Part-of-Speech tagging: verb density, adjective richness, pronoun patterns
  • Language identification: auto-detect entry language for multilingual users
  • All tag schemes run locally using on-device CoreML models
E

NLEmbedding

512-dim word/sentence vectors

Generates dense vector representations of journal entries for semantic search and clustering (v1.3+).

  • 512-dimensional sentence embeddings per journal entry
  • Cosine similarity for semantic search ("entries where I felt conflicted")
  • K-means clustering to discover hidden thematic groupings
  • Foundation for RAG retrieval layer in v2.0
  • Vectors persisted in Core Data alongside entry text
F

Foundation Models

iOS 26 — 3B on-device LLM

Apple's on-device large language model, enabling conversational Digital Twin interactions in v2.0.

  • LanguageModelSession for multi-turn conversation with transcript memory
  • Tool calling: Twin autonomously queries Core Data via custom Tool protocol
  • @Generable macro for type-safe structured outputs (mood reports as Swift structs)
  • streamResponse() for real-time streaming chat UI
  • Dynamic instructions from DigitalTwinEngine for personality-matched responses
  • Requires iPhone 15 Pro or later. Entire pipeline on-device.
D

Core Data + CloudKit

NSPersistentCloudKitContainer

Local-first persistence with optional encrypted cloud sync across devices.

  • NSPersistentCloudKitContainer wraps SQLite with CloudKit sync
  • Local-first: app works fully offline, syncs when available
  • AIState entity stores all Digital Twin models as Codable JSON
  • iCloud sync is optional and uses Apple's encrypted infrastructure
  • User can disable sync entirely — data stays on-device only
C

CryptoKit

AES-256-GCM encryption

Military-grade encryption for backup exports and sensitive data at rest.

  • AES-256-GCM authenticated encryption for backup files
  • User-provided passphrase for backup key derivation
  • Encrypted JSON export format for device migration
L

LocalAuthentication

Face ID / Touch ID

Biometric authentication to protect access to journal entries.

  • Face ID and Touch ID support via LAContext
  • Biometric keys stored in Secure Enclave
  • App lock with configurable auto-lock timeout
  • Fallback to device passcode when biometrics unavailable

Digital Twin Engine

The DigitalTwinEngine is a custom personality modeling system that builds a multi-dimensional profile of the user from their voice journal entries. It does not use any external models or APIs. The entire model is computed from NLTagger output and stored as serialized JSON in Core Data's AIState entity.

The engine consists of four interconnected models.

CommunicationStyle STYLE

How the user expresses themselves. Updated with each entry.

  • Type-Token Ratio (vocabulary richness)
  • Expressiveness score (0.0 - 1.0)
  • Directness score (0.0 - 1.0)
  • Formality score (0.0 - 1.0)
  • Signature words + frequency map
  • Average sentence length
  • Pronoun usage patterns (I vs we)

EmotionalSignature EMOTION

The user's emotional baseline and patterns over time.

  • Valence baseline (positive/negative)
  • Arousal baseline (energy level)
  • Dominance baseline (control feeling)
  • Morning vs evening mood patterns
  • Weekday vs weekend patterns
  • Trigger topics with correlation scores
  • Emotional volatility index

PersonalKnowledgeGraph GRAPH

A network of people, places, and topics with emotional weights.

  • NER-extracted entities (person, place, org)
  • Emotional weight per entity (-1.0 to +1.0)
  • Mention frequency over time
  • Co-occurrence relationships
  • Entity-mood correlation tracking
  • Topic clusters from entity groupings

TwinPredictions PREDICT

Forecasts based on temporal pattern analysis.

  • Day-of-week mood forecasting
  • Time-of-day emotional patterns
  • Trend direction (improving/declining)
  • Seasonal pattern detection
  • Trigger anticipation from schedule
  • Confidence scores per prediction

Storage model: All four models are Swift Codable structs serialized to JSON and stored in a single Core Data entity called AIState. This means the entire personality model can be loaded in a single fetch, updated incrementally, and synced across devices as a single atomic object. No external database. No vector store (until v1.3). Just Core Data.

Privacy Architecture

Privacy is not a feature of DailyVox. It is the architectural constraint that every technical decision is built around. The system is designed so that private data physically cannot leave the device for processing.

Zero Network Processing

Every AI operation runs on the device's Neural Engine. Speech transcription uses requiresOnDeviceRecognition = true. NLTagger runs locally. The Digital Twin model is computed and stored in Core Data. There are no API calls, no cloud functions, no telemetry on journal content.

No Third-Party SDKs

DailyVox contains zero third-party dependencies for core functionality. No analytics SDKs. No crash reporting that sends journal content. No ad networks. The only external code is Google Analytics on the website (not in the app) and Apple's own frameworks.

Apple's Privacy Nutrition Label

DailyVox carries Apple's "Data Not Collected" privacy label on the App Store. This is the strictest category — it means the app does not collect any data linked or unlinked to the user's identity.

Encryption and Authentication

CryptoKit AES-256-GCM encrypts all backup exports. Secure Enclave stores biometric authentication keys. LocalAuthentication gates app access behind Face ID or Touch ID. iCloud sync, when enabled, uses Apple's encrypted CloudKit infrastructure with end-to-end encryption.

Cloud AI Journal vs DailyVox

Typical Cloud AI Journal DailyVox
Audio processing Sent to cloud servers On-device Neural Engine
AI model location Remote API (OpenAI, etc.) Apple on-device models
Text analysis Cloud NLP service NLTagger (local)
Data storage Company servers Core Data (SQLite on device)
Account required Yes (email, password) No
Third-party SDKs Analytics, crash, ads None
Privacy label "Data Linked to You" "Data Not Collected"
Works offline No Yes, fully
Subscription $5-15/month Free
Who can read your journal Company, employees, subprocessors Only you

Technical Roadmap

Where DailyVox has been, what's being built now, and where it's going. Each version adds a layer to the on-device AI stack.

Shipped — v1.0

Voice Journaling + On-Device AI

Core voice journaling with fully on-device transcription, NLP analysis, encrypted storage, biometric lock, widgets, and Siri Shortcuts.

SFSpeechRecognizer NLTagger Core Data CryptoKit WidgetKit AppIntents
Shipped — v1.1

Digital Twin + Personality Model

Custom DigitalTwinEngine building a multi-dimensional personality model. Communication style tracking, emotional baseline with time patterns, entity knowledge graph with emotional weights, and temporal mood forecasting.

DigitalTwinEngine CommunicationStyle EmotionalSignature PersonalKnowledgeGraph TwinPredictions
Building Now — v1.2

Ask Your Twin + Social Sharing

TwinChatView with pattern-matched query system. ShareablePersonalityCardView renders cards at 3x for Instagram Stories and Twitter/X. Review prompts via SKStoreReviewController at milestone entries.

TwinChatView ShareablePersonalityCardView SKStoreReviewController UIActivityViewController ImageRenderer
Next — v1.3

Semantic Search + Proactive Insights

NLEmbedding for 512-dim sentence embeddings. Custom cosine similarity vector search index. Statistical anomaly detection (z-score deviations from emotional baseline). K-means clustering on embedding space. Foundation for v2.0 RAG architecture.

NLEmbedding Cosine Similarity Anomaly Detection K-Means Clustering
v1.4 — Localization + Apple Watch

Multi-Language + Wrist Capture

String Catalogs for UI localization (Hindi, Spanish, Japanese, German). WatchKit companion app with WatchConnectivity for iPhone sync. Quick voice entry from wrist with complications.

String Catalogs WatchKit WatchConnectivity Complications
v2.0 — Conversational Twin (iOS 26)

Apple Foundation Models + Tool Calling + SpeechAnalyzer

Apple's on-device 3B-parameter Foundation Model. Twin becomes a real chatbot with multi-turn LanguageModelSession. Tool calling protocol lets the Twin autonomously query Core Data — fetch entries by topic/date/mood, retrieve personality data, surface mood patterns. @Generable for type-safe structured outputs. streamResponse() for real-time chat. DigitalTwinEngine feeds user's communication style into session instructions for tone matching (~75% accuracy). SpeechAnalyzer replaces SFSpeechRecognizer. Requires iPhone 15 Pro+. Zero network calls, zero API costs.

Foundation Models LanguageModelSession Tool Calling @Generable streamResponse() SpeechAnalyzer
v2.5 — Train Your Twin

LoRA Fine-Tuning — Twin Learns to Sound Like You

Apple's Foundation Models Adapter Training toolkit for Low-Rank Adaptation. Export 100-1,000 entries as JSONL training data. Train a personal adapter on Mac (32GB+ Apple Silicon) — original model weights stay frozen, only small adapter matrices trained. ~160MB adapter delivered via Background Assets. Loaded via SystemLanguageModel(adapter:). The Twin doesn't just know your data — it sounds like you. Learns sentence structure, emotional vocabulary, punctuation habits, hesitation patterns. ~95% tone accuracy vs ~60% with instructions alone. Training data never leaves user's Mac.

LoRA Adapters Adapter Training JSONL Export Background Assets ~160MB Adapter
v3.0 — The Vision

A True Digital Version of You

The end state: an on-device AI that acts like you, speaks like you, responds like you — built from years of journal entries. Full RAG architecture: NLEmbedding vector retrieval + Foundation Model generation. Personal LoRA adapter for your voice. Tool calling for autonomous data access. Session transcript management with context condensation. The Twin understands what you said, when, how you felt, who you were talking about, and what patterns repeat. Encrypted. On your device. Exportable only by you. Digital self-preservation.

Full RAG Personal LoRA Vector Retrieval Context Condensation Secure Enclave Digital Self-Preservation

Research Context

DailyVox exists at the intersection of on-device LLMs, personal AI, and mental health technology. Several recent research papers explore adjacent ideas.

"Memory-Efficient Structured Backpropagation for On-Device LLM Fine-Tuning"
2026 — Efficient fine-tuning under mobile memory constraints
"MoPHES: On-device LLMs for Mobile Psychological Health"
2025 — Using on-device LLMs for psychological health applications
"PocketLLM: On-Device Fine-Tuning for Personalized LLMs"
2024 — Personal model adaptation on mobile hardware
"PLMM: Personal Large Language Models on Mobile Devices"
2023 — Architecture for personal LLMs running on phones

What makes DailyVox different: No existing paper covers DailyVox's specific approach — building a private Digital Twin from voice journal data using on-device NLP (NLTagger, NLEmbedding) combined with Apple's Foundation Models framework. The combination of voice-first input, personality modeling from NER/sentiment analysis, and on-device LLM generation with tool calling for autonomous data retrieval is a novel architecture. DailyVox is, to our knowledge, the first app to attempt this full pipeline privately on-device.

Open Source

DailyVox is open source. The full codebase — including the DigitalTwinEngine, all NLP processing, the Core Data stack, and the SwiftUI interface — is available on GitHub.

We believe that privacy-critical software should be auditable. If you claim data never leaves the device, people should be able to verify that claim by reading the code.

Build with us

DailyVox is open source and contributions are welcome. Whether it's improving the Digital Twin engine, adding language support, or building the Foundation Models integration — there's room to shape the future of private AI journaling.

View on GitHub

Try DailyVox

Free. Private. No account needed. All AI runs on your device.

Download Free