Features

How TokKong Runs AI Entirely on Your iPhone

Running AI on a phone seemed impossible a few years ago. Here's how TokKong leverages Apple's hardware to deliver powerful AI features without any internet connection.

T
TokKong Team
6 min read
on-device AIApple Neural EngineWhisperlocal LLMtechnical

The On-Device AI Revolution

Until recently, AI meant cloud computing. Voice assistants, transcription services, and text generation all required sending data to powerful remote servers.

That's changing. Modern smartphones have specialized hardware that can run AI models locally. TokKong takes advantage of this to deliver transcription and text processing without ever connecting to the cloud.

Here's how it works.

Apple's AI Hardware

The Neural Engine

Since 2017's A11 chip, every iPhone has included a Neural Engine - dedicated hardware designed specifically for machine learning operations. The latest chips have dramatically increased its capabilities:

  • A17 Pro (iPhone 15 Pro): 35 trillion operations per second
  • M-series chips (iPad Pro, Mac): Even faster performance

The Neural Engine excels at the mathematical operations neural networks require: matrix multiplications and tensor operations at scale.

Core ML Framework

Apple's Core ML framework provides a bridge between AI models and their hardware. When TokKong uses Core ML:

  1. Models are converted to Apple's optimized format
  2. The framework automatically uses the best available hardware
  3. Operations route to Neural Engine, GPU, or CPU as appropriate
  4. Memory is managed efficiently for mobile constraints

This lets TokKong run sophisticated AI models with better battery life and performance than generic implementations.

Two AI Systems in TokKong

TokKong uses two distinct AI systems:

1. Whisper for Transcription

OpenAI's Whisper is a speech recognition model trained on 680,000 hours of multilingual audio. It handles:

  • Speech-to-text conversion
  • Automatic language detection
  • Multilingual transcription
  • Handling various accents and audio quality

TokKong runs Whisper models optimized for Apple hardware. The model processes audio in chunks, converting speech to text in near real-time on recent devices.

2. LLMs for Text Processing

Large Language Models (LLMs) power TokKong's text processing features:

  • Summarization
  • Translation
  • Reformatting
  • Question answering about transcripts

TokKong supports several models:

Phi-3: Microsoft's efficient small model, good for faster processing

Gemma: Google's lightweight model, balanced performance

Llama: Meta's model, available in various sizes

These models are downloaded once and run entirely locally.

Technical Deep Dive

Audio Processing Pipeline

When you record in TokKong:

  1. Audio Capture: iOS audio framework captures at 16kHz sample rate
  2. Preprocessing: Audio is normalized and chunked into segments
  3. Feature Extraction: Mel spectrogram generation for Whisper input
  4. Inference: Whisper model processes spectrograms to tokens
  5. Decoding: Tokens convert to text with timestamp alignment
  6. Post-processing: Text is cleaned and formatted

This happens in a background thread while you continue recording.

Model Quantization

Running large AI models on mobile devices requires optimization. TokKong uses quantized models:

  • Full precision: 32 bits per parameter (too large for phones)
  • FP16: 16 bits per parameter (smaller, good quality)
  • INT8: 8 bits per parameter (much smaller, slightly less accurate)
  • INT4: 4 bits per parameter (smallest, optimized for mobile)

Quantization reduces memory requirements by 4-8x with minimal quality loss.

Memory Management

iPhones have limited RAM compared to servers:

  • iPhone 15 Pro: 8GB
  • Standard iPhones: 6GB
  • Compare to servers: 64GB+

TokKong manages memory by:

  • Loading model weights on-demand
  • Streaming inference (processing chunks, not entire files at once)
  • Releasing memory when models aren't active
  • Using Apple's memory-mapped file features

This lets large models run on devices with limited memory.

Performance Expectations

Transcription Speed

On recent iPhones (A14 chip and newer):

  • 1 minute of audio transcribes in ~10-30 seconds
  • Longer files show better efficiency
  • Speed varies with model size selection

Older devices work but take longer.

LLM Processing Speed

For text generation (summarization, reformatting):

  • Phi-3 (smallest): ~10-20 tokens/second
  • Larger models: 5-15 tokens/second
  • Depends on output length requested

A paragraph summary generates in a few seconds; longer outputs take proportionally more time.

Battery Impact

AI processing is computationally intensive. During active transcription or text generation:

  • Battery drain increases noticeably
  • Approximately 5-10% per hour of continuous use
  • Device may warm slightly

Idle with models loaded has minimal impact.

Storage Requirements

Model Sizes

Whisper models:

  • Tiny: ~75MB
  • Base: ~150MB
  • Small: ~500MB

LLM models:

  • Phi-3 Mini (quantized): ~2GB
  • Gemma 2B (quantized): ~1.5GB
  • Llama 7B (quantized): ~4GB

Managing Storage

TokKong downloads models on-demand. You can:

  • Choose which models to keep
  • Delete models to free space
  • Re-download when needed

Total storage for full functionality: 3-6GB depending on model choices.

Privacy Architecture

No Network Calls

TokKong's AI features make zero network requests:

  • Model files are self-contained
  • No telemetry or analytics
  • No "call home" for any feature
  • Works in airplane mode

Data Flow

When you transcribe audio:

  1. Audio stays in app memory
  2. Processed by local Whisper model
  3. Text output stored locally
  4. Nothing transmitted anywhere

When you process text:

  1. Transcript feeds to local LLM
  2. Processing happens in device RAM
  3. Output displayed and stored locally
  4. Your prompts and results remain private

Comparison with Cloud AI

Cloud Advantages

  • More powerful models available
  • Faster processing on slow devices
  • No storage requirements
  • Continuous model improvements

Local Advantages

  • Complete privacy
  • Works offline
  • No ongoing costs
  • No service dependencies
  • No processing limits

TokKong prioritizes privacy and independence. For users who value these, local processing is worth the tradeoffs.

The Future of On-Device AI

Apple continues to improve AI hardware:

  • Neural Engine gets faster each chip generation
  • Memory increases enable larger models
  • Metal optimizations improve GPU utilization

As hardware improves, on-device AI will handle increasingly complex tasks. What requires a data center today may run on your phone tomorrow.

TokKong's architecture is designed to take advantage of these improvements. As better models become practical for mobile, the app can adopt them while maintaining its privacy-first approach.

Summary

TokKong's on-device AI relies on:

  1. Apple's Neural Engine: Specialized hardware for ML operations
  2. Core ML: Optimized framework for Apple hardware
  3. Whisper: State-of-the-art speech recognition model
  4. Quantized LLMs: Compressed language models for mobile
  5. Careful engineering: Memory management and optimization

The result is professional-quality transcription and AI text processing that never needs the cloud. Your voice and your words stay on your device, processed by AI running in your pocket.

Technology that seemed impossible a few years ago is now practical, private, and in your hands.

TokKong

Try TokKong Free

Experience offline transcription and AI-powered text processing on your iPhone, iPad, or Mac.

Download for iOS

Ready to Try TokKong?

Download free and experience offline transcription with AI-powered text enhancement.

Download Free on App Store