Deploying AI Models on Formation

Welcome to Formation's AI Model deployment guide! This section provides everything you need to deploy, host, and monetize custom AI models on the Formation decentralized network.

What are Formation AI Models?

Formation AI models are containerized AI inference services that:

Serve Inference Requests: Provide AI model inference via OpenAI-compatible APIs
Scale Automatically: Distribute across Formation's network for high availability and performance
Earn Revenue: Generate income through usage-based billing for inference requests
Integrate Seamlessly: Work with existing applications through standard API interfaces

Model Deployment Overview

Formation supports any AI model that can be containerized and serve HTTP requests:

┌─────────────────────────────────────────┐
│            Formation Model              │
├─────────────────────────────────────────┤
│  OpenAI-Compatible API Layer            │
│  ├── POST /v1/chat/completions          │
│  ├── POST /v1/completions               │
│  ├── GET /v1/models                     │
│  └── GET /health                        │
├─────────────────────────────────────────┤
│  Model Inference Layer                  │
│  ├── Model Loading & Initialization     │
│  ├── Request Processing                 │
│  └── Response Generation                │
├─────────────────────────────────────────┤
│  Formation Integration Layer            │
│  ├── Usage Metrics Tracking             │
│  ├── Resource Management                │
│  └── Error Handling & Logging           │
└─────────────────────────────────────────┘

Quick Start

1. Choose Your Model Type

Formation supports various model types and frameworks:

🤖 Language Models: GPT-style models, instruction-tuned models, chat models
🎨 Image Generation: Stable Diffusion, DALL-E style models, image-to-image
🔊 Audio Models: Speech-to-text, text-to-speech, audio generation
👁️ Vision Models: Image classification, object detection, OCR
🧬 Specialized Models: Code generation, scientific models, domain-specific AI

2. Deployment Path

3. Time Investment

Simple Model Wrapper: 1-2 hours
Custom Model Integration: 2-4 hours
Complex Multi-Modal Model: 4-8 hours

Documentation Structure

📖 Model Requirements

Essential Reading - Technical specifications and API requirements

OpenAI-compatible API requirements
Required endpoints (/v1/chat/completions, /v1/models)
Authentication handling
Usage metrics reporting

🚀 Deployment Guide

Step-by-Step Process - From model to production service

Containerization best practices
Resource requirements specification
Registration with form-state (/models/create)
Testing inference endpoints

💡 Examples

Working Code - Complete, runnable model deployments

Language model deployment examples
Image generation model examples
Custom model integration patterns

Core Model Concepts

OpenAI API Compatibility

Formation models must implement OpenAI-compatible endpoints to ensure seamless integration with existing applications:

Required Endpoints

POST /v1/chat/completions    # Chat-based completions
POST /v1/completions         # Text completions
GET  /v1/models             # List available models
GET  /health                # Health check

Optional Endpoints

POST /v1/embeddings         # Text embeddings
POST /v1/images/generations # Image generation
POST /v1/audio/transcriptions # Speech-to-text

Model Lifecycle

1. Development Phase

Prepare your trained model
Create OpenAI-compatible API wrapper
Implement required endpoints
Add usage tracking and metrics

2. Containerization Phase

Package model and dependencies
Optimize container size and startup time
Configure resource requirements
Set up health checks

3. Deployment Phase

Register model with Formation
Deploy across network nodes
Configure pricing and billing
Set up monitoring and alerts

4. Operation Phase

Handle inference requests
Monitor performance metrics
Track usage and revenue
Scale based on demand

Popular Model Frameworks

Formation supports models built with any framework. Here are examples of popular frameworks and how to deploy them:

1. Hugging Face Transformers

# Hugging Face model with OpenAI-compatible API
from transformers import AutoTokenizer, AutoModelForCausalLM
from flask import Flask, request, jsonify
import torch

app = Flask(__name__)

# Load model and tokenizer
model_name = "microsoft/DialoGPT-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

@app.route('/v1/chat/completions', methods=['POST'])
def chat_completions():
    data = request.json
    messages = data.get('messages', [])
    
    # Convert messages to prompt
    prompt = "\n".join([f"{msg['role']}: {msg['content']}" for msg in messages])
    
    # Generate response
    inputs = tokenizer.encode(prompt, return_tensors='pt')
    with torch.no_grad():
        outputs = model.generate(inputs, max_length=150, pad_token_id=tokenizer.eos_token_id)
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    return jsonify({
        "choices": [{
            "message": {
                "role": "assistant",
                "content": response
            }
        }],
        "usage": {
            "prompt_tokens": len(inputs[0]),
            "completion_tokens": len(outputs[0]) - len(inputs[0]),
            "total_tokens": len(outputs[0])
        }
    })

@app.route('/v1/models', methods=['GET'])
def list_models():
    return jsonify({
        "data": [{
            "id": model_name,
            "object": "model",
            "owned_by": "formation"
        }]
    })

@app.route('/health', methods=['GET'])
def health():
    return jsonify({"status": "healthy", "framework": "huggingface"})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

2. vLLM (High-Performance Inference)

# vLLM model deployment with OpenAI API
from vllm import LLM, SamplingParams
from vllm.entrypoints.openai.api_server import app
import uvicorn

# Initialize vLLM model
llm = LLM(model="meta-llama/Llama-2-7b-chat-hf")

# vLLM automatically provides OpenAI-compatible endpoints
# Just need to add Formation-specific health endpoint

@app.get("/health")
async def health():
    return {"status": "healthy", "framework": "vllm"}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8080)

3. Ollama Integration

# Ollama model with Formation integration
import requests
from flask import Flask, request, jsonify

app = Flask(__name__)

OLLAMA_BASE_URL = "http://localhost:11434"

@app.route('/v1/chat/completions', methods=['POST'])
def chat_completions():
    data = request.json
    messages = data.get('messages', [])
    
    # Convert to Ollama format
    prompt = messages[-1]['content'] if messages else ""
    
    # Call Ollama API
    response = requests.post(f"{OLLAMA_BASE_URL}/api/generate", json={
        "model": "llama2",
        "prompt": prompt,
        "stream": False
    })
    
    result = response.json()
    
    return jsonify({
        "choices": [{
            "message": {
                "role": "assistant",
                "content": result.get("response", "")
            }
        }],
        "usage": {
            "total_tokens": len(result.get("response", "").split())
        }
    })

@app.route('/v1/models', methods=['GET'])
def list_models():
    # Get available Ollama models
    response = requests.get(f"{OLLAMA_BASE_URL}/api/tags")
    models = response.json().get("models", [])
    
    return jsonify({
        "data": [{"id": model["name"], "object": "model"} for model in models]
    })

@app.route('/health', methods=['GET'])
def health():
    return jsonify({"status": "healthy", "framework": "ollama"})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

4. Custom PyTorch/TensorFlow Models

# Custom model with OpenAI-compatible wrapper
import torch
import torch.nn as nn
from flask import Flask, request, jsonify

app = Flask(__name__)

class CustomModel(nn.Module):
    def __init__(self):
        super().__init__()
        # Your custom model architecture
        self.layers = nn.Sequential(
            nn.Linear(768, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 50257)  # Vocab size
        )
    
    def forward(self, x):
        return self.layers(x)

# Load your trained model
model = CustomModel()
model.load_state_dict(torch.load('model.pth'))
model.eval()

@app.route('/v1/completions', methods=['POST'])
def completions():
    data = request.json
    prompt = data.get('prompt', '')
    
    # Your custom inference logic
    with torch.no_grad():
        # Process prompt and generate response
        response = f"Generated response for: {prompt}"
    
    return jsonify({
        "choices": [{
            "text": response
        }],
        "usage": {
            "prompt_tokens": len(prompt.split()),
            "completion_tokens": len(response.split()),
            "total_tokens": len(prompt.split()) + len(response.split())
        }
    })

@app.route('/health', methods=['GET'])
def health():
    return jsonify({"status": "healthy", "framework": "pytorch"})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

Development Best Practices

Performance Optimization

Model Quantization: Use INT8/FP16 to reduce memory usage
Batch Processing: Handle multiple requests efficiently
Caching: Cache frequently used model outputs
GPU Utilization: Optimize GPU memory and compute usage

Resource Management

Memory Efficiency: Monitor and optimize memory usage
Startup Time: Minimize container startup time
Health Checks: Implement robust health monitoring
Graceful Shutdown: Handle shutdown signals properly

API Compatibility

OpenAI Standards: Follow OpenAI API specifications exactly
Error Handling: Return proper HTTP status codes and error messages
Rate Limiting: Implement request rate limiting
Input Validation: Validate all incoming requests

Monetization Strategies

Pricing Models

Per-Token: Charge based on input/output tokens
Per-Request: Fixed price per inference request
Per-Minute: Time-based pricing for long-running inference
Tiered Pricing: Different rates for different model sizes/capabilities

Revenue Optimization

Efficient Inference: Faster inference = more requests = more revenue
Model Specialization: Specialized models command premium pricing
Quality Service: High-quality outputs improve customer retention
Resource Optimization: Lower hosting costs increase profit margins

Getting Started Checklist

Ready to deploy your first Formation model? Follow this checklist:

Read Model Requirements - Understand API specifications
Prepare Your Model - Ensure model is trained and ready
Create API Wrapper - Implement OpenAI-compatible endpoints
Add Usage Tracking - Implement metrics for billing
Containerize Model - Create Docker container
Test Locally - Verify all endpoints work correctly
Follow Deployment Guide - Deploy to Formation network
Monitor Performance - Track metrics and optimize
Scale and Iterate - Improve based on usage patterns

Support and Resources

Documentation

Model Requirements: Technical specifications and API requirements
Deployment Guide: Step-by-step deployment process
Examples: Working model deployment examples

Tools and APIs

form-pack: Container building and packaging
form-state API: Model registration and management
form-vmm API: Instance deployment and monitoring

Community

Model Marketplace: Discover and share model implementations
Performance Benchmarks: Compare model performance and efficiency
Best Practices: Learn from experienced model deployers

Next Steps

Choose your next action based on your experience level:

🆕 New to Model Deployment

Start with Model Requirements to understand the fundamentals

🔧 Ready to Deploy

Jump to Examples for working deployment templates

🚀 Ready for Production

Follow the Deployment Guide for step-by-step instructions

💰 Focus on Revenue

Check out Monetization Strategies for pricing optimization

Ready to make your AI models globally accessible? Let's deploy your first Formation model!