GPU Acceleration
Configure Veriafy for maximum performance with GPU support
Supported GPUs
NVIDIA (CUDA)
- • RTX 3060 or higher (recommended)
- • CUDA 11.8 or higher
- • cuDNN 8.6 or higher
- • 8GB+ VRAM recommended
Apple Silicon
- • M1, M2, M3, M4 chips
- • Metal Performance Shaders
- • macOS 12.0 or higher
- • Automatic optimization
Installation
NVIDIA GPU
pip install veriafy[gpu]Apple Silicon
pip install veriafy # GPU enabled by default on Apple SiliconEnable GPU
CLI
# Enable GPU globally veriafy config set gpu true # Or per-command veriafy classify image.jpg --model veriafy/nsfw --gpu
Python SDK
from veriafy import Veriafy
# Enable GPU on initialization
client = Veriafy(gpu=True)
# Check GPU status
print(f"GPU available: {client.gpu_available}")
print(f"GPU name: {client.gpu_name}")
print(f"VRAM: {client.gpu_memory_mb} MB")Performance Comparison
| Operation | CPU | GPU (RTX 4090) | Speedup |
|---|---|---|---|
| Single image | 15ms | 2ms | 7.5x |
| Batch (1000 images) | 12s | 0.8s | 15x |
| Video (1 min) | 8s | 0.5s | 16x |
| Model training (10k vectors) | 45min | 3min | 15x |
Multi-GPU Support
from veriafy import Veriafy
# Automatic multi-GPU distribution
client = Veriafy(gpu=True, gpu_ids=[0, 1, 2, 3])
# Process with automatic load balancing
results = client.classify_batch(
files=large_file_list,
model="veriafy/classifier",
batch_size=256, # Larger batches for multi-GPU
)
# Or manually control GPU assignment
with client.gpu_context(gpu_id=0):
result1 = client.classify(file1, model="veriafy/model1")
with client.gpu_context(gpu_id=1):
result2 = client.classify(file2, model="veriafy/model2")Docker with GPU
# Run with NVIDIA GPU
docker run --gpus all -d -p 8080:8080 veriafy/veriafy:latest-gpu
# Docker Compose
services:
veriafy:
image: veriafy/veriafy:latest-gpu
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- VERIAFY_GPU=1
- CUDA_VISIBLE_DEVICES=0,1Troubleshooting
GPU not detected
Run veriafy doctor to diagnose. Ensure CUDA drivers are installed and nvidia-smi works.
Out of memory
Reduce batch_size or useVERIAFY_GPU_MEMORY_FRACTION=0.8 to limit VRAM usage.
Slow first inference
First run compiles CUDA kernels. Use veriafy warmupto pre-compile for your hardware.