Perimattic
LocalAI logo

Managed LocalAI Hosting

AI & ML

OpenAI-compatible local AI inference API

LocalAI is a drop-in OpenAI API replacement for running LLMs, image generation, and audio models locally. ManageStacks deploys LocalAI with GPU acceleration and optimized model storage.

About LocalAI

LocalAI is a free, open-source alternative to OpenAI that acts as a drop-in replacement REST API compatible with the OpenAI API specification. It runs LLMs, generates images, creates audio transcriptions, and produces embeddings entirely on local hardware without requiring a GPU, though GPU acceleration is fully supported.

LocalAI supports a broad range of model families including LLaMA, Mistral, Stable Diffusion, and Whisper. It provides a single API endpoint that mimics the OpenAI interface, making it straightforward to migrate existing applications from cloud AI services to self-hosted inference.

Key Features

  • OpenAI-compatible REST API for drop-in replacement
  • Support for LLMs, image generation, and audio models
  • CPU and GPU inference with automatic optimization
  • Model gallery for one-click model downloads
  • Text embeddings and vector generation
  • Function calling and grammar-constrained output

How ManageStacks Helps

ManageStacks deploys LocalAI with pre-configured GPU drivers, persistent model storage, and monitoring dashboards. Run a private OpenAI-compatible API without managing CUDA dependencies or container orchestration.

Frequently Asked Questions

Can I use LocalAI as a drop-in replacement for OpenAI on ManageStacks?+
Yes. LocalAI exposes an OpenAI-compatible API, so you can point any application that uses the OpenAI SDK to your ManageStacks-hosted LocalAI endpoint by changing the base URL.
Does ManageStacks provide GPU support for LocalAI?+
ManageStacks provisions LocalAI on GPU-enabled infrastructure with CUDA drivers pre-installed. CPU-only deployments are also available for embedding and smaller model workloads.
How do I add new models to LocalAI on ManageStacks?+
You can download models from the built-in model gallery via the API, or upload custom GGUF and safetensors files to the persistent model storage that ManageStacks provisions.