File size: 3,027 Bytes
28c6cd4
 
 
 
 
 
 
a1a7ffd
 
 
28c6cd4
103b546
25a1afa
 
 
 
26242e7
71acd6e
25a1afa
 
 
 
 
 
71acd6e
 
 
25a1afa
 
71acd6e
 
 
 
 
25a1afa
 
71acd6e
 
 
 
 
 
0186c0e
 
 
 
 
 
 
 
 
 
 
 
 
 
71acd6e
bbebd05
0186c0e
71acd6e
bbebd05
25a1afa
 
 
 
 
891e51e
103b546
 
28c6cd4
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
<!DOCTYPE html>
<html>
	<head>
		<meta charset="utf-8" />
		<meta name="viewport" content="width=device-width" />
		<title>My static Space</title>
		<link rel="stylesheet" href="style.css" />
      <link href="https://fonts.googleapis.com/css2?family=Source+Sans+Pro:ital,wght@0,200;0,300;0,400;0,600;0,700;0,900;1,200;1,300;1,400;1,600;1,700;1,900&amp;display=swap" rel="stylesheet">
  	<link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:wght@400;600;700&amp;display=swap" rel="stylesheet">
    </head>
	<body>
      <div class="container">
        <div>
          <h1>MLTest</h1>
          <p>
            This is a demo of MLTest on the dataset
            <a href="https://huggingface.co/datasets/marmal88/skin_cancer"><code>marmal88/skin_cancer</code></a>.
            Each image is labeled with one of three different kinds of cancers.
          </p>
          <p>
            The model has been trained on five models: two variants of Swin Transformers, ViT, ResNet, and BEiT. The test results for each model can
            be inspected in the dashboard below.
          </p>
          <p>
            <b>Performance tests</b>:
            in order to measure how well the model performs, we compute common performance metrics like accuracy,
            precision, recall, F1 score, and more.
          </p>
          <p>
            <b>Failure clusters</b>:
            these clusters give meaningful insights when the model is failing and can be inspected in the "Failure Clusters" tab.
            These failure clusters are automatically detected
            for different combinations of metadata.
            For example, the BEiT transformer has a significantly lower accuracy on images taken of cancers of the back with class label <code>0</code>.
          </p>
          <p>
            <b>Robustness</b>: these tests help ML developers evaluate how well their model performs under different conditions.
            These conditions could include different levels of brightness, compression, and many other types of interference.
          </p><p>
            The following robustness tests were enabled for this test case:
          </p>
          <ul>
            <li>Brightness</li>
            <li>CompressImage</li>
            <li>Contrast</li>
            <li>DarkSpots</li>
            <li>GaussianBlur</li>
            <li>GaussianNoise</li>
            <li>Glare</li>
            <li>GlassBlur</li>
            <li>HorizontalFlip</li>
            <li>MedianBlur</li>
            <li>MotionBlur</li>
            <li>OilSpots</li>
            <li>Perspective</li>
            <li>VerticalFlip</li>
          </ul>


          <p>
            The full list of transforms supported by MLTest can be found in the <a href="https://docs.lakera.ai/configuration/robustness">documentation</a>.
          </p>
          <p>
            <b>Fairness tests</b>: TODO
          </p>
        </div>
        
        <iframe src="https://hf.lakera.ai/search"></iframe>
      </div>
	</body>
</html>