Veriafy Vectors
Understanding the irreversible hash representation that makes privacy-preserving classification possible
What is a Veriafy Vector?
A Veriafy Vector is a compact, irreversible mathematical representation of a file. It captures the semantic and perceptual features of content without storing the content itself.
Key Properties
- Irreversible: Cannot be converted back to the original file
- Compact: 500,000x smaller than the source file
- Semantic: Preserves meaning for classification
- Deterministic: Same input always produces the same vector
Vector Components
Each Veriafy Vector consists of two main components:
Perceptual Hash
A content-based fingerprint that captures structural features while being resilient to minor modifications. Different algorithms for different file types:
- • PDQ for images (256-bit)
- • TMK for video (temporal)
- • Chromaprint for audio
- • SimHash for text
Semantic Embedding
A dense vector representation that captures the meaning and context of the content using neural network encoders:
- • CLIP for images (512-dim)
- • SBERT for text (768-dim)
- • AudioCLIP for audio
- • Custom embeddings for code
Vector Structure
A Veriafy Vector is represented as a JSON object:
{
"version": "1.0",
"vector_id": "v_8f3a2b1c4d5e6f7a",
"file_type": "image",
"extractor": "pdq_clip",
"created_at": "2025-01-08T12:00:00Z",
"components": {
"perceptual_hash": {
"algorithm": "pdq",
"value": "f8a3b2c1d4e5f6a7...", // 256-bit hex
"quality": 0.95
},
"semantic_embedding": {
"model": "clip-vit-b32",
"dimensions": 512,
"value": [0.123, -0.456, ...] // normalized float32
}
},
"metadata": {
"file_size_category": "medium", // not exact size
"aspect_ratio_bucket": "landscape", // not exact ratio
"duration_bucket": null // for video/audio
}
}Why Irreversibility Matters
The mathematical properties of Veriafy Vectors guarantee that the original content cannot be reconstructed:
- 1.Hash Collision Space: PDQ produces 256 bits from millions of pixels. Infinite images map to the same hash — there's no unique inverse.
- 2.Embedding Compression: CLIP compresses an image to 512 floats. The dimensionality reduction is lossy by design.
- 3.No Raw Features: Unlike some ML systems, VERIAFY doesn't store intermediate features that could leak information.
Impossible Operations
From a Veriafy Vector, you cannot: view the image, play the audio, read the document, or extract any recognizable portion of the original content. This is guaranteed by mathematics, not policy.
Compression Ratios
| File Type | Typical Size | Vector Size | Compression |
|---|---|---|---|
| Image (JPEG) | 2 MB | 4 KB | 500x |
| Video (1 min) | 50 MB | 12 KB | 4,000x |
| PDF Document | 500 KB | 6 KB | 80x |
| Audio (3 min) | 5 MB | 8 KB | 600x |