ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation

AI Safety Fundamentals: Alignment - Podcast autorstwa BlueDot Impact

Podcast artwork

This paper presents a technique to scan neural network based AI models to determine if they are trojaned. Pre-trained AI models may contain back-doors that are injected through training or by transforming inner neuron weights. These trojaned models operate normally when regular inputs are provided, and mis-classify to a specific output label when the input is stamped with some special pattern called trojan trigger. We develop a novel technique that analyzes inner neuron behaviors by determini...

Visit the podcast's native language site