Orange is a visual programming environment for data science and machine learning projects. Under Orange, users can drag-and-drop LEGO-like components to construct a complete solution, including data manipulation/visualization and model building/training/validation for his/her projects. This post illustrates the process of developing and comparing different models for binary classification on normal chest X-ray images and pneumonia chest X-ray images in Orange.

Figure 1. Orange Screenshot — widget (left), workflow (middle), image viewer (right).

Here is a brief about the data, models, and results:

  • 1341 normal chest X-ray images and 3875 pneumonia chest X-ray images;
  • 66% for training and 34% for validation;
  • Inception v3 for feature extraction and Multi-Layer Perceptron for feature classification together…

列舉十個人工智慧醫療應用資料集的問題說明及解題概要, 資料型態包括數字文字表格資料、時間序列資料、自然語言資料、圖像資料、音頻資料、圖網路資料。

Doctor Strange

1. 心臟病資料集

  • 參考鏈接:
  • 問題說明: 根據病患的年齡/性別/胸痛情況/靜息血壓/膽固醇含量/… 等13 項指標判斷病患是否患有心臟病
  • 資料內容: 單一數字表格文件, 297 筆資料, 14 個欄位, 有心臟病 137, 無心臟病: 160
  • 解題概要: 監督式學習 (supervised learning) 二元分類 (binary classification)
  • 相關軟體: Scikit-Learn, XGBoost, Keras, PyTorch

2. 中風資料集

  • 參考鏈

Figure 1. Supervised Learning, Semi-Supervised Learning, and Unsupervised Learning. [credit]

Semi-supervised learning is an approach to machine learning that considers both labeled data and unlabeled data in a problem solving process. Semi-supervised learning falls between supervised learning (dealing with labeled data) and unsupervised learning (dealing with unlabeled data). [wiki]

Let’s take image classification under supervised learning and semi-supervised learning as an example.

  • Image classification under supervised learning trains a machine learning model or a deep learning model using labeled images, then verifies the performance of the model by predicting a test dataset against its target.
  • Image classification under semi-supervised learning trains a machine learning model or a deep learning model…

Figure 1. Barking and Meowing (cartoon, artist)

After presenting studies of clustering and visualizing a dog/cat image dataset, it is worth examining tasks for a dog/cat audio dataset. This post explores some design issues for developing and evaluating machine learning approaches for audio classification on Audio Cats and Dogs in Kaggle. Because this is a small and imbalanced audio dataset (164 wav files on cats, 113 wav files on dogs, between 0.9 seconds and 17.9 seconds on durations), this post will focus on three major issues, i.e., feature extraction, data augmentation, and model selection, in audio classification. We will also compare and contrast the performance of four…

Deep Blue vs Garry Kasparov 1997 and AlphaGo vs Lee Sedol 2016 were two critical moments in the history of artificial intelligence. Deep Blue vs Garry Kasparov 1997 was a six-game chess matches, where Deep Blue won the matches 3.5–2.5. AlphaGo vs Lee Sedol 2016 was a five-game Go matches, where AlphaGo won the matches 4–1. The matches of Deep Blue and the matches of AlphaGo were twenty years apart. However, there was one interesting coincidence in these matches— both computers made their most amazing moves on Move 37 in Game 2. Figure 1 shows the positions in the games.

Figure 1. 18 images from MNIST as the normal data and 2 images from Fashion-MNIST as the anomaly data.

Image Anomaly Detection appears in many scenarios under real-life applications, for example, examining abnormal conditions in medical images or identifying product defects in an assemble line. In this post, we setup our own case to explore the process of image anomaly detection using a convolutional autoencoder under the paradigm of unsupervised learning. Here is a brief about the data, the task, the solution, and the evaluation criteria in a nutshell.

a case study on object detection in image recognition

My last post “Exploring OpenCV’s Deep Learning Object Detection Library” had given a review on SSD/MobileNet and YOLOv2 under OpenCV 3.4.1 Deep Neural Network Module for object detection. It had also shown some examples detected by these two models. (ref: Figure 1 and Figure 2)

Figure 1.

a case study on classifying surface defects in hot-rolled steel strips

Automated Optical Inspection is commonly used in electronics industry and manufacturing industry to detect defects in products or components during production. Conceptually, common practices in deep learning for image classification, object detection, and semantic segmentation could be all applied to Automated Optical Inspection. Figure 1 shows some common tasks in image recognition and Figure 2 shows some examples of surface defects in steel parts for cross reference.

Figure 1. Different tasks in image recognition. [source]

Will different models create different results? (Mission Impossible 6)

Deep learning for object detection on image and video has become more accessible to practitioners and programmers recently. One reason for this trend is the introduction of new software libraries, for example, TensorFlow Object Detection API, OpenCV Deep Neural Network Module, and ImageAI. These libraries have one thing in common: they all have integrated many deep-learning object-detection models into their systems. As a result, users of these libraries could reach many pre-trained models and check the best one to meet their needs. However, evaluating different models (even under the same library) might not be an easy task.

[2019.06 更新] 下文是在 2018 年 10 月寫的, 在此補充一下這個題目在發文之後的發展: 1. AIdea 在 2018 年 12 月有舉辦一個技術分享會, 視頻在此供大家參考; 2. 這個題目有被很多學校拿來作為課程專題, 連接在此供大家參考; 3. 原題有被 AIdea 列為長期開放競爭的題目, 所以後續正確率有在不斷刷高, 但是提醒做這個題目的同好一件事, 在產線實際應用上過檢率及漏檢率可能是比正確率更受到重視的指標, 類似情況可能也會發生在醫療影像的判斷上, 所以在實務上需要謹慎選擇正確的指標做評估標準。

[2018.10 原文] 上個月參加了一個資料科學和機器學習的網路競賽 — 自動光學檢查 ( Automated Optical Inspection 簡稱 AOI) …


Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store