I am an amateur cook so I always subscribe to the youtube channels of professional chefs. I need the Step 1–2–3 guide to make sure my dish is delicious. I guess there is value to have a tour guide when you hike the Himalayas. Because I have written a series of articles on data science, here I organize them along the data science journey. Either you are a data scientist or a project manager, this guide will help you deliver a successful outcome.


Once your machine learning model is built, you are ready to deploy it for real use. Do you know there are requirements that you should have considered early in data pre-processing and modeling? I call them the modeling through the lens of model deployment. The failure to incorporate those requirements may result in irreversible errors. The purpose of this article is two fold: It will show you how to incorporate those requirements to support an “error-free” prediction app; it also will show you how easy it is to deploy your model with Python Streamlit. So read on! …


I have written a series of articles on the techniques for machine learning modeling with extremely imbalanced target data, and in such case the ROC curve is not a sensitive measure. However, I believe it will be helpful to showcase a model prediction example. In this article I will show you

  • How to construct the ROC curve from the confusion matrix,
  • Why the ROC curve is not sensitive enough when the target is extremely imbalanced, and
  • How to read a Precision-recall Curve.

I also build the Python code snippets for those of you who may be interested.

(1) What Is…


Time series modeling connects the dots in the starry night; time series forecasting extends the dreams as the breeze brushes the stars into trails. You and I, standing in awe, look at the blue summer night.

In Part I and II of the series “Time Series with Zillow’s Luminaire”, I have walked you through the data exploration and model specification Steps, now we are ready for modeling!


In “Monitor Your Machine Learning Model Performance” I demonstrated what to monitor for the performance of your machine learning model. In this article I will show you both the design concept for a monitoring dashboard, and how easy it can be built with Python Streamlit Dashboard. If you are an R programmer, you may find the sister article “Build an R Shiny Dashboard to Monitor Your Model Performance” helpful.

Why Python Streamlit?

I love Streamlit after I tried it out to deploy my models. I like its smart code and its neat interface. My productivity seems boosted ten-fold (or at…


Before you train your time series model, you need to specify the model. What is the frequency of the time series? Where are the change points? Do we need a logarithmic transformation? Should some of the data points be truncated? Does it need flags for holidays? Should you apply the Fourier transformation, with what frequency? Overall, what is the best modeling algorithm for the particular time series? These are not trivial questions, isn’t it?

Luminaire offers the capacity to search through these questions for the optimal specifications so you can “hand off” to Luminaire. All you need to do is…


Last year we all worked from home due to the COVID-19 pandemic. I was working on my work laptop for company projects, and the personal laptop for machine learning hobbies. After several months, I realized the remote working life may become longer. The immediate need is a big computer monitor. So I bought a 27-inch monitor for $170. Nice! But I really do not want a bulky personal desktop in the future. Why don’t I try mini-PC or a Raspberry Pi? Will a mini-PC get me the same or better performance? …


Suppose you wear an iWatch to monitor your heart rate. You run for a quarter mile, walk for ten minutes, then run for another quarter mile. The heart rate data will show a cluster of high heart rate, following by a cluster of low heart rate, then back to high rate like Figure 1. When a data analyst analyzes the heart rate data, the changes in the time series reveal there are changes in the underlying activities. Similarly, the heart rate of a patient in an Intensive Care Unit (ICU) is monitored. It is critical to detect any changes in…


I sit at my local beach in the midnight, enjoying the big moon casting sparkling lights on the ocean. The wavy moonlight path looks to me like a time series path, not always smooth but traceable. It shows me tranquility and serendipity.

I wrote a few articles on time series forecasting and anomaly detection. The Luminaire by the Zillow Tech Hub is the next one that I want to write an in-depth introduction. Are you doing time series forecasting and outlier detection now or in the near future? After reading this article, you will be running your time series model…


The local beach is not far from where I live, so sometimes I go there to enjoy my solitude. This day I was meditating on my tomorrow’s lecture on Kalman Filter. “It could be a challenging concept for some students”, I talked to myself. I sit on a large rock, felt the gentle breeze, and soaked in the warm sunset. I watched the seagulls flying over the majestic sunset, casting fuzzy reflections on the sparkling ocean. Suddenly I have an indescribable joyful moment — the line of the flying seagull in the sky and the fuzzy reflections on the ocean

Dr. Dataman

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store