Video tutorials

Videos, potentially accompanied by other materials, with insights on creating analytics with Power BI, PySpark and SQL. All resources can also be accessed through Matt's GitHub repos and YouTube channel.

Ingest into a Fabric Lakehouse using Power Query and visualize information with notebooks

We can do anything we like in MS Fabric: even ingest into our Data Lakehouse using Power Query (aka Gen2 DataFlows) and then perform our calculations and visualizations in a PySpark notebook.


Text Analytics for a Pizza Shop

This is a series of tutorials and accompanying report & notebook resources. On the relevant YouTube videos, I showcase how to perform sentiment analysis, text translation and key phrase extraction using Cognitive Services, Power BI and Azure Synapse Analytics. The end result is a Power BI report, supposingly used by the pizzeria owner to understand why sales change, based on the customer feedback and the sentiment it indicates. This report, as well as intermediate resources used to follow the tutorials, can be downloaded from the relevant GitHub repo.

1) Intro

This is the introduction of the tutorials. The meat and potatoes are in the subsequent videos, as well as in the materials on GitHub.

2) Provision Azure Resources

Having an Azure account as a prerequisite, the first step involves the provision of Azure Synapse Analytics & Cognitive Services resources.

3) Spark notebook in Synapse

This step shows how to use Cognitive Services within Synapse Analytics, to perform automatic text translation, key phrase extraction and sentiment analysis. A text file is used as input in the Spark notebook demonstrated in the video.

4) Creating the Power BI dataset

Next we connect to Azure Synapse workspace & Azure Data Lake with Power BI. In addition to retrieving the output of the previous step, we connect to Cognitive Services from within Power BI, set up Data Flows and incremental refresh. The end result is a Power BI dataset that can be used for reporting & visualizations.

5) Creating the report

Then we work on the Power BI dataset to create the visuals of the report, and produce the final result.

6) Epilogue

Wrap-up and some thoughts on the value brought to organizations by the Cloud.

GitHub Download the resources from my Github Repo


Notebook tutorials

Slowly Changing Dimensions in Delta Lake

A tutorial in Jupyter Notebook format, on how to create a Type 2 SCD in Spark / Databricks, using Pyspark and Spark SQL


Covert status logs to availability per day

A tutorial in Jupyter Notebook format, on how to calculate the duration of service station outages within a day using status log files, in Spark / Databricks, using Pyspark and Spark SQL.


Complex metrics: Power BI vs. SQL

The aim of this notebook is to demonstrate the savings we have in productivity when we use DAX in Power BI (or SSAS) to calculate complex metrics, instead of T-SQL.