Open in app

Sign In

Write

Sign In

Chris Kuo/Dr. Dataman
Chris Kuo/Dr. Dataman

4.3K Followers

Home

About

Published in

Dataman in AI

·Pinned

Handbook of Anomaly Detection: With Python Outlier Detection — (1) Introduction

Anomaly detection is the detection of any rare events that deviate significantly from the majority of the data. Those rare events do not conform to a well-defined behavior. They are also called Outliers, noises, novelties, or exceptions. Rare events can detrimentally impact the business operation and result in a significant…

Data Science

16 min read

Handbook of Anomaly Detection: With Python Outlier Detection — (1) Introduction
Handbook of Anomaly Detection: With Python Outlier Detection — (1) Introduction
Data Science

16 min read


Published in

Dataman in AI

·Pinned

Explain Your Model with the SHAP Values

Better Interpretability Leads to Better Adoption Is your highly-trained model easy to understand? A sophisticated machine learning algorithm usually can produce accurate predictions, but its notorious “black box” nature does not help adoption at all. Think about this: If you ask me to swallow a black pill without telling me…

Machine Learning

13 min read

Explain Your Model with the SHAP Values
Explain Your Model with the SHAP Values
Machine Learning

13 min read


Published in

Dataman in AI

·Pinned

Transfer Learning for Image Classification — (2) Pre-trained Image Models

Image classification is the task to recognize an image. It is also called image recognition. Computer scientists have been innovative in extracting meaning from images. Its history is fascinating, though most people don’t know much about it. For this reason, I am going to tell you the stories of innovation…

Data Science

13 min read

Transfer Learning for Image Classification — (2) Pre-trained Image Models
Transfer Learning for Image Classification — (2) Pre-trained Image Models
Data Science

13 min read


Published in

Dataman in AI

·Pinned

The SHAP Values with H2O Models

Many machine learning algorithms are complicated and not easy to understand, even though they have rendered an impressive level of accuracy. As humans, we must be able to fully understand how decisions are being made so that we can trust the decisions of AI systems. We need ML models to…

Data Science

9 min read

The SHAP Values with H2O Models
The SHAP Values with H2O Models
Data Science

9 min read


Published in

Dataman in AI

·Pinned

Top Data Science Interview Questions and Answers

You receive a data science interview opportunity from your dream company. You have surveyed many the-top-50-question types of articles but still feel uncertain. Since there are already many similar articles, why do I dare to add an article to this crowded topic? In this article, I re-write many ordinary answers…

Data Science

18 min read

Top Data Science Interview Questions and Answers
Top Data Science Interview Questions and Answers
Data Science

18 min read


May 21

Understand ROUGE

In supervised learning, we use R-squared, ROC, Precision-call, or F-sore to evaluate performance during model training. How is a Large Language Model evaluated? Large Language Models are Transformer-based models built on complex neural networks and fundamentally follow supervised learning framework. They still apply the typical train-test-validation data split. The language…

Large Language Models

7 min read

Understand ROUGE
Understand ROUGE
Large Language Models

7 min read


May 9

Large Language Model Datasets

Why can Large Language Models (LLMs) answer questions, do book reports, draft notes, or summarize a document? An important reason is the data that they were trained on, or fine-tuned with. This post helps you to understand those widely used datasets that are known in the LLM community. While you…

Gpt

18 min read

Large Language Model Datasets
Large Language Model Datasets
Gpt

18 min read


Published in

Dataman in AI

·Jan 20

The Intuitions for the Discrete Distributions: Bernoulli, Binomial, Beta, Dirichlet Distributions

Machine learning uses a lot of discrete distributions such as the Bernoulli, Binomial, and Multinomial distributions to solve problems. Two related discrete distributions, the Beta and Dirichlet distributions are less known but are widely used in data science. The Dirichlet distribution is especially important in Natural Language Processing (NLP). The…

Data Science

16 min read

The Intuitions for the Discrete Distributions: Bernoulli, Binomial, Beta, Dirichlet Distributions
The Intuitions for the Discrete Distributions: Bernoulli, Binomial, Beta, Dirichlet Distributions
Data Science

16 min read


Published in

Dataman in AI

·Oct 9, 2022

Handbook of Anomaly Detection: With Python Outlier Detection — (11) XGBOD

In Chapter 1, we talked about supervised learning can target better for known outliers, and unsupervised learning can explore new types of outliers. Can we take the advantage of both supervised and unsupervised learning? …

Data Science

10 min read

Handbook of Anomaly Detection: With Python Outlier Detection — (11) XGBOD
Handbook of Anomaly Detection: With Python Outlier Detection — (11) XGBOD
Data Science

10 min read


Published in

Dataman in AI

·Oct 9, 2022

Handbook of Anomaly Detection: With Python Outlier Detection — (6) OCSVM

Classification problems are often solved using supervised learning algorithms such as Random Forest, Support Vector Machine, Logistic Regressor, and so on. Supervised learning algorithms require a known target to build a model. However, it is often the case that we only see normal data patterns but not rare events. The…

Data Science

10 min read

Handbook of Anomaly Detection: With Python Outlier Detection — (6) OCSVM
Handbook of Anomaly Detection: With Python Outlier Detection — (6) OCSVM
Data Science

10 min read

Chris Kuo/Dr. Dataman

Chris Kuo/Dr. Dataman

4.3K Followers

The Dataman articles are my reflections on data science and teaching notes at Columbia University https://sps.columbia.edu/faculty/chris-kuo

Following
  • Dariusz Gross #DATAsculptor

    Dariusz Gross #DATAsculptor

  • TDS Editors

    TDS Editors

  • Alessandro Butler

    Alessandro Butler

  • Thiago Carvalho

    Thiago Carvalho

  • Jose Antonio Ribeiro Neto (Zezinho)

    Jose Antonio Ribeiro Neto (Zezinho)

See all (208)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams