Model Fine-tuning

Train custom classification models on your Veriafy Vectors

Overview

Fine-tuning allows you to create custom classification models tailored to your specific use case. The training process only uses Veriafy Vectors — your original files are never accessed or stored.

Privacy Guarantee

Training data is converted to Veriafy Vectors locally before any training begins. The model learns from vectors only — it's mathematically impossible to reconstruct original content from the training process.

Prepare Training Data

First, generate Veriafy Vectors from your labeled training data:

from veriafy import Veriafy
from veriafy.training import Dataset

client = Veriafy()

# Prepare labeled data
training_data = [
    ("invoice_001.pdf", "legitimate"),
    ("invoice_002.pdf", "fraudulent"),
    ("invoice_003.pdf", "legitimate"),
    # ... more samples
]

# Generate vectors (files processed locally)
dataset = Dataset()

for file_path, label in training_data:
    vector = client.extract_vector(file_path)
    dataset.add(vector, label)

# Save dataset (contains only vectors, not files)
dataset.save("fraud_detection_dataset.veriafy")

Train the Model

from veriafy.training import Trainer, ModelConfig

# Load dataset
dataset = Dataset.load("fraud_detection_dataset.veriafy")

# Split into train/val
train_data, val_data = dataset.split(0.8)

# Configure model
config = ModelConfig(
    model_type="binary_classifier",
    hidden_layers=[512, 256, 128],
    dropout=0.3,
    learning_rate=0.001,
)

# Train
trainer = Trainer(config)
model = trainer.train(
    train_data=train_data,
    val_data=val_data,
    epochs=100,
    batch_size=32,
    early_stopping=True,
    patience=10,
)

# Evaluate
metrics = model.evaluate(val_data)
print(f"Accuracy: {metrics['accuracy']:.2%}")
print(f"F1 Score: {metrics['f1']:.2%}")
print(f"AUC-ROC: {metrics['auc_roc']:.2%}")

Model Types

binary_classifier

Yes/no classification (fraud/not fraud)

multiclass_classifier

Mutually exclusive categories

multilabel_classifier

Multiple tags per item

anomaly_detector

Identify unusual patterns

Export and Deploy

# Export model
model.save("my-fraud-detector.veriafy")

# Use locally
client = Veriafy()
client.load_model("my-fraud-detector.veriafy")

result = client.classify("new_invoice.pdf", model="my-fraud-detector")

# Or publish to marketplace
from veriafy.marketplace import publish

publish(
    model_path="my-fraud-detector.veriafy",
    name="Invoice Fraud Detector",
    description="Detects fraudulent invoices with 99% accuracy",
    tier="pro",  # free, pro, business, enterprise
    price=0.001,  # per classification
)

Best Practices

  • Balanced datasets: Ensure similar numbers of samples per class
  • Minimum samples: At least 100 samples per class for basic models
  • Validation split: Always hold out 20% for validation
  • Early stopping: Prevent overfitting with patience-based stopping
  • Versioning: Track model versions for A/B testing

Next Steps

Veriafy - Universal File Classification Platform