Anomaly Detection ETL

Tuesday, March 1, 2022

Background

In late 2021 a telecommunications company with a market cap measured in billions of dollars retained the services of my employer to make better use of an anomaly detection service. The anomaly detection service offered a service that identifies anomalous results from previously ingested data. The team was leveraging Databricks as a central component of their analytics platform and requested that the consulting team utilize it as well.

Solution

My team designed a process that accomplished the following:


Background

In late 2021 a telecommunications company with a market cap measured in billions of dollars retained the services of my employer to make better use of an anomaly detection service. The anomaly detection service offered a service that identifies anomalous results from previously ingested data. The team was leveraging Databricks as a central component of their analytics platform and requested that the consulting team utilize it as well.

Solution

My team designed a process that accomplished the following:

  1. Automated ingestion of data via structured API calls
  2. Data transformation operations, written in PySpark are executed as a Databricks job
  3. Data loaded to external storage for consumption by a PowerBI dashboard

Lessons Learned

  1. Parameterization of API calls can work pending authentication methods
  2. Databricks is a very powerful tool and something to focus on as I begin my career
  3. There is a very thin line that separates

Process

Outcomes

Skills Used

  • Statistical Pedagogy
  • Software Engineering
  • Networking and Signal Processing
  • Integration of Feedback

Resources