Model Fine-tuning
Train custom classification models on your Veriafy Vectors
Overview
Fine-tuning allows you to create custom classification models tailored to your specific use case. The training process only uses Veriafy Vectors — your original files are never accessed or stored.
Privacy Guarantee
Training data is converted to Veriafy Vectors locally before any training begins. The model learns from vectors only — it's mathematically impossible to reconstruct original content from the training process.
Prepare Training Data
First, generate Veriafy Vectors from your labeled training data:
from veriafy import Veriafy
from veriafy.training import Dataset
client = Veriafy()
# Prepare labeled data
training_data = [
("invoice_001.pdf", "legitimate"),
("invoice_002.pdf", "fraudulent"),
("invoice_003.pdf", "legitimate"),
# ... more samples
]
# Generate vectors (files processed locally)
dataset = Dataset()
for file_path, label in training_data:
vector = client.extract_vector(file_path)
dataset.add(vector, label)
# Save dataset (contains only vectors, not files)
dataset.save("fraud_detection_dataset.veriafy")Train the Model
from veriafy.training import Trainer, ModelConfig
# Load dataset
dataset = Dataset.load("fraud_detection_dataset.veriafy")
# Split into train/val
train_data, val_data = dataset.split(0.8)
# Configure model
config = ModelConfig(
model_type="binary_classifier",
hidden_layers=[512, 256, 128],
dropout=0.3,
learning_rate=0.001,
)
# Train
trainer = Trainer(config)
model = trainer.train(
train_data=train_data,
val_data=val_data,
epochs=100,
batch_size=32,
early_stopping=True,
patience=10,
)
# Evaluate
metrics = model.evaluate(val_data)
print(f"Accuracy: {metrics['accuracy']:.2%}")
print(f"F1 Score: {metrics['f1']:.2%}")
print(f"AUC-ROC: {metrics['auc_roc']:.2%}")Model Types
binary_classifier
Yes/no classification (fraud/not fraud)
multiclass_classifier
Mutually exclusive categories
multilabel_classifier
Multiple tags per item
anomaly_detector
Identify unusual patterns
Export and Deploy
# Export model
model.save("my-fraud-detector.veriafy")
# Use locally
client = Veriafy()
client.load_model("my-fraud-detector.veriafy")
result = client.classify("new_invoice.pdf", model="my-fraud-detector")
# Or publish to marketplace
from veriafy.marketplace import publish
publish(
model_path="my-fraud-detector.veriafy",
name="Invoice Fraud Detector",
description="Detects fraudulent invoices with 99% accuracy",
tier="pro", # free, pro, business, enterprise
price=0.001, # per classification
)Best Practices
- •Balanced datasets: Ensure similar numbers of samples per class
- •Minimum samples: At least 100 samples per class for basic models
- •Validation split: Always hold out 20% for validation
- •Early stopping: Prevent overfitting with patience-based stopping
- •Versioning: Track model versions for A/B testing