Veriafy Vectors

Understanding the irreversible hash representation that makes privacy-preserving classification possible

What is a Veriafy Vector?

A Veriafy Vector is a compact, irreversible mathematical representation of a file. It captures the semantic and perceptual features of content without storing the content itself.

Key Properties

  • Irreversible: Cannot be converted back to the original file
  • Compact: 500,000x smaller than the source file
  • Semantic: Preserves meaning for classification
  • Deterministic: Same input always produces the same vector

Vector Components

Each Veriafy Vector consists of two main components:

Perceptual Hash

A content-based fingerprint that captures structural features while being resilient to minor modifications. Different algorithms for different file types:

  • • PDQ for images (256-bit)
  • • TMK for video (temporal)
  • • Chromaprint for audio
  • • SimHash for text

Semantic Embedding

A dense vector representation that captures the meaning and context of the content using neural network encoders:

  • • CLIP for images (512-dim)
  • • SBERT for text (768-dim)
  • • AudioCLIP for audio
  • • Custom embeddings for code

Vector Structure

A Veriafy Vector is represented as a JSON object:

{
  "version": "1.0",
  "vector_id": "v_8f3a2b1c4d5e6f7a",
  "file_type": "image",
  "extractor": "pdq_clip",
  "created_at": "2025-01-08T12:00:00Z",
  "components": {
    "perceptual_hash": {
      "algorithm": "pdq",
      "value": "f8a3b2c1d4e5f6a7...",  // 256-bit hex
      "quality": 0.95
    },
    "semantic_embedding": {
      "model": "clip-vit-b32",
      "dimensions": 512,
      "value": [0.123, -0.456, ...]  // normalized float32
    }
  },
  "metadata": {
    "file_size_category": "medium",  // not exact size
    "aspect_ratio_bucket": "landscape",  // not exact ratio
    "duration_bucket": null  // for video/audio
  }
}

Why Irreversibility Matters

The mathematical properties of Veriafy Vectors guarantee that the original content cannot be reconstructed:

  • 1.Hash Collision Space: PDQ produces 256 bits from millions of pixels. Infinite images map to the same hash — there's no unique inverse.
  • 2.Embedding Compression: CLIP compresses an image to 512 floats. The dimensionality reduction is lossy by design.
  • 3.No Raw Features: Unlike some ML systems, VERIAFY doesn't store intermediate features that could leak information.

Impossible Operations

From a Veriafy Vector, you cannot: view the image, play the audio, read the document, or extract any recognizable portion of the original content. This is guaranteed by mathematics, not policy.

Compression Ratios

File TypeTypical SizeVector SizeCompression
Image (JPEG)2 MB4 KB500x
Video (1 min)50 MB12 KB4,000x
PDF Document500 KB6 KB80x
Audio (3 min)5 MB8 KB600x

Next Steps

Veriafy - Universal File Classification Platform