Anomaly Detection ETL

Tuesday, March 1, 2022
2 min read

Background

In late 2021 a telecommunications company with a market cap measured in billions of dollars retained the services of my employer to make better use of an anomaly detection service. The anomaly detection service offered a service that identifies anomalous results from previously ingested data. The team was leveraging Databricks as a central component of their analytics platform and requested that the consulting team utilize it as well.

Solution

My team designed a process that accomplished the following:


Background

In late 2021 a telecommunications company with a market cap measured in billions of dollars retained the services of my employer to make better use of an anomaly detection service. The anomaly detection service offered a service that identifies anomalous results from previously ingested data. The team was leveraging Databricks as a central component of their analytics platform and requested that the consulting team utilize it as well.

Solution

My team designed a process that accomplished the following:

  1. Automated ingestion of data via structured API calls
  2. Data transformation operations, written in PySpark are executed as a Databricks job
  3. Data loaded to external storage for consumption by a powerBI dashboard

Lessons Learned

  1. Parameterization of API calls can work pending authentication methods
  2. Databricks is a very powerful tool and something to focus on as I begin my career
  3. There is a very thin line that separates

Process

I was brought into the ISLE project first through my experience as a teaching assistant for 36-200: Reasoning with Data in July 2017. After two semesters as a teaching assistant Philipp further included me as a research assistant writing relevant functionalities.

The development process revolved around our goal to make ISLE a first-class data analytics platform with every functionality needed. Given our reliance on the stdlib-js Philipp and I developed both back-end implementations and front-end interfaces to common statistical functionalities. The new functionalities were inspired by leading statistical interfaces, such as R and MiniTab, and situations that would be used by analysts.

I focused on the development of the report editor. Using an open-source markdown editor I enhanced and integrated it into the larger data explorer component. The report editor gave students the ability to document their analyses, drag and drop graphs and tables, format findings using an interface inspired by Microsoft Word and export the findings as posters or reports.

Outcomes

Skills Used

  • JavaScript
  • Node-JS
  • Statistical Pedagogy
  • Software Engineering
  • Networking and Signal Processing
  • Integration of Feedback

Resources