Fraud detection via behavioral sequence embedding

摘要

Fraud detection is usually compared to finding a needle in a haystack and remains a challenging task because fraudulent acts are buried in massive amounts of normal behavior and true intentions may be disguised in a single snapshot. Indeed, fraudulent incidents usually take place in consecutive time steps to gain illegal benefits, which provides unique clues for probing fraudulent behavior by considering a complete behavioral sequence rather than detecting fraud from a snapshot of behavior. Additionally, fraudulent behavior may involve different parties, such that the interaction patterns between sources and targets can help distinguish fraudulent acts from normal behavior. Therefore, in this paper, we model the attributed behavioral sequences generated from consecutive behaviors in order to capture the sequential patterns, while those that deviate from the pattern can be detected as fraudulence. Considering the characteristics of the behavioral sequence, we propose a novel model, NHA-LSTM, by augmenting the traditional LSTM with a modified forget gate, where the interval time between consecutive time steps is considered. Furthermore, we design a self-historical attention mechanism to allow for long time dependencies, which can help identify repeated or cyclical appearances. In addition, we propose an enhanced network embedding method, FraudWalk, to construct embeddings for the nodes in the interaction network with regard to higher-order interactions and particular time constraints for revealing potential group fraudulence. The node embeddings, along with the feature vectors, are fed into the model to capture the interactions between sources and targets. To validate the effectiveness of sequential behavior embeddings, we experiment on a real-world telecommunication dataset with prediction and classification tasks based on the learned embeddings. The experimental results show that the learned embeddings can better identify fraudulent behavior. Finally, we visualize the weights of the attention mechanism to provide a rational interpretation of human behavioral patterns.