Classification Models
How ML models classify Veriafy Vectors without seeing content
Model Architecture
Veriafy classification models are neural networks trained to predict categories from Veriafy Vectors. They never see the original content — only the compact hash and embedding representation.
Typical Architecture
Model Types
Binary Classifier
Simple yes/no classification for single categories like "is NSFW" or "is fraud".
Multi-class Classifier
Assigns content to one of several mutually exclusive categories.
Multi-label Classifier
Assigns multiple tags to content (e.g., "violence" AND "weapons").
Anomaly Detector
Identifies unusual content that doesn't match known patterns.
Training Process
Models are trained on Veriafy Vectors with corresponding labels. The training data never contains raw content — only vectors and their classifications.
# Training workflow example
from veriafy import Veriafy
from veriafy.training import Trainer
client = Veriafy()
# Generate vectors from labeled data (done locally)
vectors = []
for file_path, label in training_data:
vector = client.extract_vector(file_path)
vectors.append((vector, label))
# Train model on vectors only
trainer = Trainer(
model_type="binary_classifier",
hidden_layers=[512, 256],
dropout=0.3
)
model = trainer.train(
vectors=vectors,
epochs=100,
batch_size=32
)
# Export model (no training data included)
model.save("my-classifier.veriafy")Privacy During Training
Training data is converted to Veriafy Vectors locally before training. The training process only sees vectors — original files are never uploaded or shared, even during model development.
Model Output
Classification results include probabilities and recommended actions:
{
"vector_id": "v_8f3a2b1c4d5e6f7a",
"model": "veriafy/nsfw-classifier",
"model_version": "2.1.0",
"categories": {
"safe": 0.02,
"suggestive": 0.05,
"explicit": 0.93
},
"confidence": 0.93,
"action": "block",
"processing_time_ms": 2.4,
"thresholds_used": {
"flag": 0.5,
"block": 0.9
}
}Performance
| Metric | Value |
|---|---|
| Inference latency (CPU) | < 5ms |
| Inference latency (GPU) | < 1ms |
| Batch throughput (GPU) | 10,000+ vectors/sec |
| Model size (typical) | 10-50 MB |
| Memory usage | ~100 MB per model |