A leading provider of solutions to manage email migration and Office 365 reporting addressed us with a request to enhance the functionality of their email services. The client had collected a lot of data from its customers that had to be examined for anomalies (something not ordinary, suspicious or strange) in user sessions. The actionable intelligence model built at the results of anomalies analysis was supposed to be integrated into the client’s email services system. It represented a new additional service that would help our client’s customers predict, first of all, such cases as fraud and data theft.
The solution was implemented in Python and delivered as a Jupyter Notebook for the convenience of visualization. This enables using the Python code of the prototype for further development. The data was taken by the Python script from the Kibana directly and processed on the fly. No intermediate data storage is required.
Intellica team created several types of models that can be split into two classes: “anomalies detection” and “novelty detection”. We found the best ways to predict/define anomalies by using Machine Learning with such algorithms as One-Class SVM, Isolation Forest, Elliptic Envelope, Local Outlier Factor, Ensemble of Models, Feature Bagging with Isolation Forest, Dimension reduction and clustering, and Neural Networks. All these models have been implemented and compared in performance and compliance within the project.
The result was the ensemble of the best-tested models which show anomalies. At the second step, these marked anomalies were fed to the input of the neural network. It allowed that the neural network independently defines the significance of incoming variables, as well as more flexible anomalies forecast. In the future, the neural network can additionally be trained on the data of the new sources, if there are examples of known cases of fraud or anomalies.
Technologies and skills
Expertise: Data Processing, Machine Learning, AI, Outlier Detection, generating actionable data-based insights
Outlier Detection and Novelty Detection: OneClassSVM, IsolationForest, Elliptic Envelope, Dimensionality reduction and clustering with t-SNE, Bagging, and Neural Networks
Data source: Kibana (Elasticsearch). No intermediate data storage is required.
The solution: Python and delivered as a Jupyter Notebook for the convenience of visualization. This enables using the Python code of the prototype for further development.
Main libraries: NumPy, SciPy, Pandas, Matplotlib, mpl_toolkits, Basemap, Sklearn, and others.
The Intellica team has successfully developed this solution that gives the client such business advantages:
- Slice 360 of user behaviour in the cloud service.
- The algorithm receives data from the Kibana and immediately analyses it. It’s not necessary to configure intermediate layers for data transfer and storage, as well as spend additional time and money for intermediate data storage.
- The analysis of user behaviour patterns allows to identify users’ groups and build marketing strategies based on their typical behaviour, allows to increase the personalization of services, raise the level of user engagement and customer satisfaction.
- The atypical behaviour analysis (anomalies detection) allows to identify of suspicious cases, predict and prevent fraud, accounts hacking, misuse of services, etc.
- The solution can be applied to analyse information like user activity that can be suspicious, the risk profile for a particular user or tenant, and some related information.