technical paper · v1.3.5 · on-device

How DailyVox
actually works.

A deep technical breakdown of the on-device AI pipeline behind the Twin. Nine Apple frameworks. Zero third-party SDKs. No cloud calls. Every layer — capture, transcription, NLP, personality modeling, storage — runs on your phone.

0 network calls for AI 9 Apple frameworks AES-256-GCM backups Neural Engine inference Data Not Collected

§01

Architecture overview.

pipeline

Every piece of data in DailyVox flows through a pipeline that runs entirely on the device. There are zero network calls for AI processing. Here is the full system architecture.

INPUT Microphone ─▶ AVAudioEngine · AAC 44.1kHz │ ▼ TRANSCRIPTION SFSpeechRecognizer (requiresOnDeviceRecognition = true) │ iOS 26: SpeechAnalyzer replaces this ▼ NLP ANALYSIS NLTagger ─▶ sentiment · NER · POS · language ID │ ▼ PERSONALITY MODEL TwinEngine ├▶ CommunicationStyle (TTR, formality, directness) ├▶ EmotionalSignature (valence, arousal, dominance) ├▶ PersonalKnowledgeGraph (entities + emotional weights) └▶ TwinPredictions (temporal patterns, forecasts) │ ▼ STORAGE Core Data ─▶ NSPersistentCloudKitContainer │ │ ▼ ▼ Local SQLite iCloud (optional, encrypted) │ ▼ PRESENTATION SwiftUI ─▶ WidgetKit │ AppIntents · Siri

The key constraint: data never leaves the device for processing. Transcription runs on the Neural Engine. NLP runs locally. The Twin is computed and stored in Core Data. The only optional network path is Apple's encrypted iCloud sync — which the user can disable.

§02

The on-device stack.

apple.frameworks[]

DailyVox uses nine Apple frameworks to build a full AI pipeline without any third-party dependencies or server-side processing.

SHIPPED

SFSpeechRecognizer

Speech.framework

The primary transcription engine in v1.0 – 1.x. Converts spoken audio to text entirely on-device.

requiresOnDeviceRecognition = true ensures zero network transmission
Input: AAC audio at 44.1 kHz via AVAudioEngine
60+ languages with on-device models
Real-time partial results for live feedback
Runs on Apple Neural Engine

NEXT · v1.6

SpeechAnalyzer

Speech.framework · iOS 26

Apple's next-generation speech recognition framework, replacing SFSpeechRecognizer in v1.6.

Significantly faster recognition, lower latency
Native long-form audio without session timeouts
No user setup required (no permission prompts for on-device)
Volatile results for instant partial feedback
Built for sustained recording — ideal for journaling

SHIPPED

NLTagger

NaturalLanguage.framework

The core NLP engine that extracts meaning from transcribed text. Runs multiple analysis passes per entry.

Sentiment scoring — sentence-level valence −1.0 → +1.0
Named Entity Recognition — people, places, organisations, dates
Part-of-Speech tagging — verb density, adjective richness
Language ID — auto-detect entry language
All tag schemes use on-device CoreML models

NEXT · v1.5

NLEmbedding

512-dim word / sentence vectors

Generates dense vector representations of journal entries for semantic search and clustering.

512-dim sentence embeddings per entry
Cosine similarity for semantic search
K-means clustering for hidden thematic groupings
Foundation for v1.6's RAG retrieval layer
Vectors persisted in Core Data alongside entry text

NEXT · v1.6

Foundation Models

iOS 26 · 3B on-device LLM

Apple's on-device large language model, enabling conversational Twin interactions in v1.6.

LanguageModelSession — multi-turn conversation with transcript memory
Tool calling — Twin autonomously queries Core Data via custom Tool protocol
@Generable — type-safe structured outputs (mood reports as Swift structs)
streamResponse() — real-time streaming chat UI
Dynamic instructions from TwinEngine for tone matching
Requires iPhone 15 Pro+. Entire pipeline on-device.

SHIPPED

Core Data + CloudKit

NSPersistentCloudKitContainer

Local-first persistence with optional encrypted cloud sync across devices.

SQLite wrapped by NSPersistentCloudKitContainer
Local-first: app works fully offline
AIState entity stores the Twin as Codable JSON
iCloud sync uses Apple's encrypted infrastructure
Sync is optional — user can disable entirely

SHIPPED

CryptoKit

AES-256-GCM

Military-grade encryption for backup exports and sensitive data at rest.

AES-256-GCM authenticated encryption for backups
User passphrase for key derivation
Encrypted JSON export for device migration

SHIPPED

LocalAuthentication

Face ID · Touch ID · LAContext

Biometric authentication gating access to journal entries.

Face ID and Touch ID via LAContext
Biometric keys held in the Secure Enclave
App lock with configurable auto-lock timeout
Falls back to device passcode

SHIPPED

WidgetKit + AppIntents

Home · Lock · Siri Shortcuts

Home & Lock-screen widgets. AppIntents-powered Siri Shortcuts for hands-free entries.

Mood and streak widgets
Hands-free voice entry via Siri
AppIntents make Siri aware of DailyVox operations

§03

The Twin Engine.

model card

The TwinEngine is a custom personality-modeling system that builds a multi-dimensional profile of the user from their voice journal entries. It uses no external models or APIs. The entire model is computed from NLTagger output and stored as serialized JSON in Core Data's AIState entity.

TwinEngine v1.2 · SHIPPED

storage: Core Data / AIState · size: ~12 KB · network: 0 rpc

Architecture

rule-based + NLP

NLTagger signals feed four sub-models. No external ML weights required.

Update cadence

per entry · online

Every voice entry updates the Twin incrementally. No batch jobs.

Inputs

transcript · sentiment · NER

Text, valence score, detected entities, timestamp, audio metadata.

Outputs

personality · predictions

Traits, mood forecast, trigger topics, entity–emotion graph.

The engine consists of four interconnected sub-models.

CommunicationStyle

STYLE

How the user expresses themselves. Updated with each entry.

Type-Token Ratio (vocabulary richness)
Expressiveness score (0 – 1)
Directness score (0 – 1)
Formality score (0 – 1)
Signature words + frequency map
Average sentence length
Pronoun patterns (I vs we)

EmotionalSignature

EMOTION

The user's emotional baseline and patterns over time.

Valence baseline (positive / negative)
Arousal baseline (energy level)
Dominance baseline (control feeling)
Morning vs evening mood patterns
Weekday vs weekend patterns
Trigger topics with correlation scores
Emotional volatility index

PersonalKnowledgeGraph

GRAPH

A network of people, places, and topics with emotional weights.

NER-extracted entities (person, place, org)
Emotional weight per entity (−1 → +1)
Mention frequency over time
Co-occurrence relationships
Entity–mood correlation tracking
Topic clusters from entity groupings

TwinPredictions

PREDICT

Forecasts based on temporal pattern analysis.

Day-of-week mood forecasting
Time-of-day emotional patterns
Trend direction (improving / declining)
Seasonal pattern detection
Trigger anticipation from schedule
Confidence scores per prediction

Storage model: all four sub-models are Swift Codable structs serialized to JSON and stored in a single Core Data entity (AIState). The entire personality model can be loaded in one fetch, updated incrementally, and synced across devices as a single atomic object. No external database. No vector store until v1.5. Just Core Data.

§04

Privacy architecture.

zero cloud

Privacy is not a feature of DailyVox. It is the architectural constraint every technical decision is built around. The system is designed so that private data physically cannot leave the device for processing.

Zero network processing

Every AI operation runs on the device's Neural Engine. Transcription uses requiresOnDeviceRecognition = true. NLTagger runs locally. The Twin is computed and stored in Core Data. There are no API calls, no cloud functions, no telemetry on journal content.

No third-party SDKs

DailyVox contains zero third-party dependencies for core functionality. No analytics SDKs. No crash reporting that sends journal content. No ad networks. The only external code is Google Analytics on this website (not in the app) and Apple's own frameworks.

Apple's "Data Not Collected"

DailyVox carries Apple's "Data Not Collected" privacy label on the App Store. This is the strictest category — the app collects no data, linked or unlinked to the user's identity.

Cloud AI journal vs DailyVox

	typical cloud AI journal	DailyVox
audio processing	sent to cloud servers	on-device Neural Engine
AI model location	remote API (OpenAI etc)	Apple on-device models
text analysis	cloud NLP service	NLTagger (local)
data storage	company servers	Core Data · SQLite on device
account required	yes (email, password)	no
third-party SDKs	analytics, crash, ads	none
privacy label	"Data Linked to You"	"Data Not Collected"
works offline	no	yes, fully
subscription	$5–15 / month	free
who can read your journal	company, employees, sub-processors	only you

§05

Technical roadmap.

build log

Where DailyVox has been, what's being built now, and where it's going. Each version adds a layer to the on-device AI stack.

✓

Shipped · v1.0

Voice journaling + on-device AI

Core voice journaling with fully on-device transcription, NLP, encrypted storage, biometric lock, widgets, Siri Shortcuts.

SFSpeechRecognizerNLTaggerCore DataCryptoKitWidgetKitAppIntents

✓

Shipped · v1.1

Twin + personality model

Custom TwinEngine with communication style, emotional signature, knowledge graph, and temporal mood forecasting.

TwinEngineCommunicationStyleEmotionalSignaturePersonalKnowledgeGraphTwinPredictions

✓

Shipped · v1.2

Ask Your Twin + social sharing

TwinChatView with pattern-matched query system. ShareablePersonalityCardView renders at 3× for Instagram Stories and X. Review prompts via SKStoreReviewController at milestone entries.

TwinChatViewShareablePersonalityCardViewSKStoreReviewControllerUIActivityViewControllerImageRenderer

→

Next · v1.4

Body Twin — HealthKit + Apple Watch

The Twin gains a body. HealthKit reads sleep, HRV, resting heart rate, steps, and mindful minutes as background context for every entry. Apple Watch companion records from the wrist and samples heart rate during the recording — but only when at rest, so a workout isn't mistaken for emotional stress. Activity context tag (at_rest / active / post_workout) tells the Twin how to interpret each entry. HR stored as delta from your personal hour-of-day baseline. "Data Not Collected" privacy label preserved.

HealthKitHKWorkoutSessionCMMotionActivityWatchKitWatchConnectivityComplications

○

v1.5 · Semantic Search & Proactive Insights

Find entries by meaning, not keywords

NLEmbedding for 512-dim sentence embeddings. Cosine similarity vector search. Z-score anomaly detection. K-means clustering. Foundation for v1.6 RAG. Embodied search builds on v1.4 — "show me entries where I was stressed but didn't say so."

NLEmbeddingCosine SimilarityAnomaly DetectionK-MeansEmbodied Search

○

v1.6 · Foundation Models Twin (iOS 26)

Foundation Models + tool calling + SpeechAnalyzer

On-device 3B Foundation Model replaces template-based Twin chat. Tool calling lets the Twin autonomously query Core Data. @Generable for structured output. streamResponse() for streaming. RAG-grounded answers cite the entries they drew from. SpeechAnalyzer replaces SFSpeechRecognizer. Requires iPhone 15 Pro+. Zero network calls.

Foundation ModelsLanguageModelSessionTool Calling@GenerablestreamResponse()SpeechAnalyzer

○

v1.7 · macOS, Multi-Language & Spatial

Desktop Twin + localization

Native macOS target — same SwiftUI codebase, sidebar navigation, Twin accessible from the desktop. String Catalogs for multi-language UI (Hindi, Spanish, Japanese, German first). Speech framework already supports 60+ transcription languages. Exploratory visionOS spatial constellation — walk inside your inner sky.

macOSSwiftUI (shared)String CatalogsvisionOS (exploratory)

○

v2.0 · Apple Intelligence Native (iOS 27)

The Twin joins the system assistant — on your terms

Siri AI integration via App Intents: ask the system assistant and it consults your Twin. Cross-app context adoption as iOS 27 APIs open to third parties — context flows in, nothing flows out. Next-generation Foundation Models for deeper reasoning. Twin replies in your voice via Personal Voice.

Siri AIApp IntentsCross-App ContextPersonal Voice

○

v2.1 · Train Your Twin

LoRA fine-tuning — Twin learns to sound like you

Apple's Foundation Models Adapter Training toolkit for Low-Rank Adaptation. Export 100–1,000 entries as JSONL. Train a personal adapter on Mac. ~160 MB adapter delivered via Background Assets. Loaded via SystemLanguageModel(adapter:). Training data never leaves your Mac.

LoRA AdaptersAdapter TrainingJSONL ExportBackground Assets~160 MB adapter

○

v3.0 · True Digital Self

The most accurate mirror of yourself

After years of daily entries: a Twin that talks like you, sounds like you (Personal Voice), predicts your reactions, explains causality from past entries, and shows personality evolution over time. Full RAG, personal LoRA adapter, autonomous tool calling. Not a clone — it knows your narrated self, not your complete self. Thoughts you don't journal are invisible to it. Entirely on-device, exportable only by you.

Full RAGPersonal LoRACausal ReasoningPersonality EvolutionContext CondensationDigital Self-Preservation

§06

Research context.

related work

DailyVox exists at the intersection of on-device LLMs, personal AI, and mental-health technology. Several recent research papers explore adjacent ideas.

[1] 2026

Memory-Efficient Structured Backpropagation for On-Device LLM Fine-Tuning

Efficient fine-tuning under mobile memory constraints — directly relevant to v2.1 LoRA adapter training.
[2] 2025

MoPHES: On-Device LLMs for Mobile Psychological Health

Using on-device LLMs for psychological health applications — parallel to the Twin's emotional modelling.
[3] 2024

PocketLLM: On-Device Fine-Tuning for Personalized LLMs

Personal model adaptation on mobile hardware — foundational for v2.1's personal adapter.
[4] 2023

PLMM: Personal Large Language Models on Mobile Devices

Early architecture proposal for personal LLMs running on phones — the direction DailyVox pursues.

What makes DailyVox different: no existing paper covers DailyVox's specific combination — a private Twin built from voice journal data using on-device NLP (NLTagger, NLEmbedding) plus Apple Foundation Models. The combination of voice-first input, personality modelling from NER/sentiment, and on-device LLM generation with tool calling for autonomous retrieval is a novel architecture. To our knowledge, DailyVox is the first app to attempt this full pipeline privately on-device.

§07

Open source.

auditable

DailyVox is open source. The full codebase — the TwinEngine, all NLP processing, the Core Data stack, the SwiftUI interface — is on GitHub.

Privacy-critical software should be auditable. If you claim data never leaves the device, people should be able to verify that claim by reading the code.

Build with us.

The DailyVox app is open source and contributions to the app are welcome — UI polish, accessibility, language support, app-side integrations. The Digital Twin engine itself is a separate proprietary Swift Package.

View on GitHub

install · $0 · no account

Try DailyVox.

Free. Private. No account needed. All AI runs on your device.

Download free on the App Store →

How DailyVox actually works.

Architecture overview.

The on-device stack.

SFSpeechRecognizer

SpeechAnalyzer

NLTagger

NLEmbedding

Foundation Models

Core Data + CloudKit

CryptoKit

LocalAuthentication

WidgetKit + AppIntents

The Twin Engine.

CommunicationStyle

EmotionalSignature

PersonalKnowledgeGraph

TwinPredictions

Privacy architecture.

Zero network processing

No third-party SDKs

Apple's "Data Not Collected"

Cloud AI journal vs DailyVox

Technical roadmap.

Voice journaling + on-device AI

Twin + personality model

Ask Your Twin + social sharing

Body Twin — HealthKit + Apple Watch

Find entries by meaning, not keywords

Foundation Models + tool calling + SpeechAnalyzer

Desktop Twin + localization

The Twin joins the system assistant — on your terms

LoRA fine-tuning — Twin learns to sound like you

The most accurate mirror of yourself

Research context.

Open source.

Build with us.

Try DailyVox.

How DailyVox
actually works.