Skip to content

The Ollama for Mobile

On-Device AI, Shared Across Your Mobile Apps.

Altio runs a local on-device AI inference service on Android. It exposes private REST and Server-Sent Events endpoints over localhost so multiple apps can share text generation and audio transcription without sending inference data off-device.

Preview

See Altio running on-device.

A short Android demo showing the local service flow, client interaction, and inference running without a cloud API.

The problem

Embedding local inference independently in every app does not scale.

Each app that bundles its own model runtime also inherits distribution, storage, memory, and scheduling concerns.

Model distribution

Every app has to solve model download, versioning, and update strategy for artifacts that can reach multiple gigabytes.

Duplicate artifacts

The same LLM or transcription model can be stored repeatedly across apps on the same device.

RAM pressure

Multiple embedded runtimes compete for memory and accelerator access, increasing the chance of process death or degraded UX.

Runtime ownership

Each app inherits scheduling, session isolation, crash recovery, native runtime updates, and device-specific edge cases.

The solution

Altio moves local inference behind a shared Android service boundary.

One service owns model downloads, loading, scheduling, and inference. Client apps integrate through localhost HTTP and SSE.

Single model cache

Download and update models once, then expose them to authorized local clients through the shared service.

One loaded runtime

Keep model residency and accelerator use centralized instead of letting apps load competing copies.

Shared control plane

Move lifecycle, scheduling, and recovery logic into one process with a stable local API boundary.

Consistent behavior

Expose the same capabilities, streaming semantics, and operational constraints to every client app.

Key features

A local inference layer with explicit API and runtime boundaries.

Local transport

Altio binds to 127.0.0.1 and serves private REST and SSE endpoints without routing prompts, audio, or outputs through a cloud API.

Text and audio jobs

The current runtime supports local text generation and MP3 transcription paths, with LiteRT powering the first backend.

Bearer-token access

Clients authenticate against a device-owner controlled token instead of assuming every local process should be trusted.

Android service integration

A background service owns model management, active port state, request handling, and operational logs.

Client integration

Integrate with Android IPC plus HTTP/SSE.

Native SDKs are planned, but the protocol is intentionally simple: discover the loopback port, create a session, submit jobs, and stream responses.

Discover the loopback port

fun resolvePort(context: Context): Int? {
  val uri = Uri.parse(
    "content://app.altio.service.port/port"
  )

  return context.contentResolver
    .query(uri, null, null, null, null)
    ?.use { cursor ->
      if (cursor.moveToFirst()) {
        cursor.getInt(
          cursor.getColumnIndexOrThrow("port")
        )
      } else {
        null
      }
    }
}

Stream text generation

val client = AiServiceClient(
  port = discoveredPort,
  bearerToken = token
)
val sessionId = client.createSession("gemma-2b-it")
val jobId = client.generate(
  sessionId,
  "Explain gravity simply."
)

client.streamTokens(jobId).collect { token ->
  print(token)
}

Transcribe audio locally

val transcript = client.transcribe(
  sessionId = sessionId,
  audio = audioMp3Bytes
)

println(transcript)

Reference demo app

The :demo module exercises streaming chat, transcription flows, job monitoring, health checks, and HTTP logging against the local service.

Project status

Current implementation status.

What is the current state of the project?

Altio is an active Android prototype with a LiteRT-LM backend, localhost HTTP/SSE surface, port discovery, demo client, and ongoing work on scheduling and runtime backends.

What are the current runtime constraints?

The LiteRT path currently assumes one active inference session loaded in memory at a time while mobile GPU/runtime support matures.

What is the licensing model?

Altio is AGPL-3.0-or-later, with repository documentation covering copyleft obligations and commercial licensing options.

How should contributors engage?

Contributions go through the public repository process and require the project Contributor License Agreement.