[ { "page_content": "Text: SGIENTIFICREP\n\nOPEN\n\nAtypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder\n\nGeraldine Dawson ${ }^{1}$, Kathleen Campbell², Jordan Hashemi ${ }^{1,3}$, Steven J. Lippmann@4, Valerie Smith ${ }^{4}$, Kimberly Carpenter ${ }^{1}$, Helen Egger ${ }^{5}$, Steven Espinosa ${ }^{3}$, Saritha Vermeer ${ }^{1}$, Jeffrey Baker ${ }^{6}$ \\& Guillermo Sapiro ${ }^{3,7}$\n\nEvidence suggests that differences in motor function are an early feature of autism spectrum disorder (ASD). One aspect of motor ability that develops during childhood is postural control, reflected in the ability to maintain a steady head and body position without excessive sway. Observational studies have documented differences in postural control in older children with ASD. The present study used computer vision analysis to assess midline head postural control, as reflected in the rate of spontaneous head movements during states of active attention, in 104 toddlers between 16-31 months of age (Mean $=22$ months), 22 of whom were diagnosed with ASD. Time-series data revealed robust group differences in the rate of head movements while the toddlers watched movies depicting social and nonsocial stimuli. Toddlers with ASD exhibited a significantly higher rate of head movement as compared to non-ASD toddlers, suggesting difficulties in maintaining midline position of the head while engaging attentional systems. The use of digital phenotyping approaches, such as computer vision analysis, to quantify variation in early motor behaviors will allow for more precise, objective, and quantitative characterization of early motor signatures and potentially provide new automated methods for early autism risk identification.\n\n\nContext: Early motor signatures in autism spectrum disorder, specifically atypical postural control in toddlers, as detected through computer vision analysis.", "metadata": { "doc_id": "Dawson_0", "source": "Dawson" } }, { "page_content": "Text: Although the core symptoms of autism spectrum disorder (ASD) are defined by atypical patterns of social interaction and the presence of stereotyped and repetitive behaviors and interests, evidence suggests that differences in motor function are also an important early feature of autism. Motor delays could contribute to early hallmark autism symptoms, including difficulties in orienting to name involving the eyes and head turns, coordinating head and limb movements involved in gaze following and other joint attention behavior, such as pointing. Teitelbaum et al. ${ }^{1}$ found that atypical movements (e.g. shape of mouth, patterns of lying, righting, sitting) were present by 4-6 months of age in infants later diagnosed with ASD. Another study of videotapes taken of infants 12-21 weeks of age detected lower levels of positional symmetry among infants later diagnosed with ASD ${ }^{2}$ suggesting atypical development of cerebellar pathways that control balance and symmetry. Six-month-old infants who later were diagnosed with ASD tend to exhibit head lag when pulled to sit, reflecting early differences in motor development ${ }^{3}$. A study of home videos taken between birth and six months of age found that some infants who were later diagnosed with ASD showed postural stiffness, slumped posture, and/or head lag ${ }^{4}$. Other motor symptoms observed in infants later diagnosed with ASD include fluctuating muscle tone ${ }^{5}$ and oral-motor abnormalities, such as insufficient opening of the mouth in anticipation of the approaching spoon during feeding ${ }^{6}$. Longitudinal research with very low birth weight infants revealed that infants who are later diagnosed with ASD had poorer ability in maintaining midline position of the head at 9-20 weeks of age ${ }^{7}$. The authors used visual\n\n\nContext: Early motor differences in autism spectrum disorder (ASD) and their potential contribution to core autism symptoms.", "metadata": { "doc_id": "Dawson_1", "source": "Dawson" } }, { "page_content": "Text: [^0] [^0]: ${ }^{1}$ Duke Center for Autism and Brain Development, Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA. ${ }^{2}$ University of Utah, Salt Lake City, Utah, USA. ${ }^{3}$ Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina, USA. ${ }^{4}$ Department of Population Health Sciences, Duke University, Durham, North Carolina, USA. ${ }^{5}$ NYU Langone Child Study Center, New York University, New York, New York, USA. ${ }^{6}$ Department of Pediatrics, Duke University, Durham, NC, USA. ${ }^{7}$ Departments of Biomedical Engineering, Computer Science, and Mathematics, Duke University, Durham, NC, USA. Correspondence and requests for materials should be addressed to G.D. (email: geraldine.dawson@duke.edu)\n\ninspection to classify head position in each video frame to yield a measure of midline head position and number of changes in position.\n\nThe development of postural control is an index of neuromuscular reactions to the motion of body mass in order to retain stability. Previous studies have documented the developmental progression of the ability to maintain an upright posture that is accompanied by decreases in postural sway^{8}. Several studies with older children with ASD have documented deficiencies in postural control, reflected in the presence of postural sway, which is accentuated when children with ASD are viewing arousing stimuli, including complex multi-sensory and social stimuli^{9--11}. Less is known about the presence of postural sway in young children with ASD.\n\n\nContext: Affiliations of authors and contact information, followed by a discussion of postural control development and its relevance to autism spectrum disorder.", "metadata": { "doc_id": "Dawson_2", "source": "Dawson" } }, { "page_content": "Text: Studies of motor and other behaviors in young children have typically relied on subjective and labor-intensive human coding to rate and measure behavior. The recent use of digital phenotyping approaches, such as computer vision analysis (CVA) of videotaped recordings of behavior, has allowed for automated, precise and quantitative measurement of subtle, dynamic differences in motor behavior. We reported previously on a result using CVA to more precisely measure toddlers' orienting response to a name call, noting that, compared to toddlers without ASD, toddlers with ASD oriented less frequently; when they did orient, their head turn was a full second slower, on average^{12}. Such differences in motor speed would likely not be detected with the naked eye during a typical clinical evaluation. Anzulewicz et al.^{13} used smart tablet computers with touch-sensitive screens and embedded inertial movement sensors to record movement kinematics and gesture forces in 3--6-year-old children with and without ASD. Children with ASD used greater force and faster and larger gesture kinematics. Machine learning analysis of the children's motor patterns classified the children with ASD with a high level of accuracy. In another study using automated methods, differences in head movement dynamics were found between 2.5--6.5-year-old children with and without ASD while they watched movies of social and nonsocial stimuli. Children with ASD showed more frequent head turning, especially while watching social stimuli^{14}. The authors suggested that the children with ASD might be using head movement to modulate their arousal while watching social stimuli. Wu et al.^{15} used electromagnetic sensors to analyze continuous movements at millisecond time scale in older children with ASD versus typical development. They applied a triangular smoothing algorithm to the 3D positional raw movement data that preserved the local speed fluctuations. They found that individuals with ASD exhibited significantly more “sensorimotor noise” when compared\n\n\nContext: Methods for analyzing motor behaviors in autism spectrum disorder (ASD) are evolving beyond traditional subjective human coding.", "metadata": { "doc_id": "Dawson_3", "source": "Dawson" } }, { "page_content": "Text: algorithm to the 3D positional raw movement data that preserved the local speed fluctuations. They found that individuals with ASD exhibited significantly more “sensorimotor noise” when compared to individuals with typical development.\n\n\nContext: Computer vision analyses of movement data in individuals with ASD.", "metadata": { "doc_id": "Dawson_4", "source": "Dawson" } }, { "page_content": "Text: The present study used CVA to characterize head movements that didn't involve spontaneous or volitional orienting or turning away from the stimuli. Rather, we were interested in subtler midline head movements that are more likely related to postural stability. The study compared the behaviors of toddlers with ASD versus those without ASD while the children watched a series of dynamic movies involving different types of stimuli, including stimuli of both a social and nonsocial nature. While the children watched the movies, their head movements were automatically detected and tracked using landmarks on the participant's face. The goal of this analysis was to quantify the rate of spontaneous head movements and to determine whether there were differences in this motor feature between young children with and without ASD.\n\nMethods\n\nParticipants. Participants were 104 children between 16--31 months of age (Mean = 22 months). Exclusionary criteria included known vision or hearing deficits, lack of exposure to English at home and/or caregivers who did not speak and read English sufficiently for informed consent. Twenty-two of the children had autism spectrum disorder. The non-ASD comparison group was comprised of 96 typically developing children and 8 children with language delay or developmental delay of clinical significance sufficient to qualify for speech or developmental therapy. Participants in the comparison group had a mean age of 21.91 months (SD = 3.78) and those in the ASD group had a mean age of 26.19 months (SD = 4.07). Ethnic/racial composition of the ASD and comparison groups, respectively, was 59% and 45% white, 13% and 14% African American, 6% and 5% Asian, and 22% and 36% multi-racial/other. Percent males was 77% in the ASD group and 59% in the comparison group.\n\n\nContext: Within the \"Results\" section, describing the study's methodology for analyzing head movements in toddlers with and without ASD while watching movies.", "metadata": { "doc_id": "Dawson_5", "source": "Dawson" } }, { "page_content": "Text: Participants were recruited from primary care pediatric clinics by a research assistant, referral from their physician, and by community advertisement. All caregivers/legal guardians of participants gave written, informed consent, and the study protocol was approved by the Duke University Health System Institutional Review Board. Methods were carried out in accordance with institutional, state, and federal guidelines and regulation.\n\nDiagnostic Assessments. Diagnostic evaluations to confirm ASD were based on the Autism Diagnostic Observation Scale-Toddler (ADOS-T), which were conducted by a licensed psychologist or trained research-reliable examiner overseen by a licensed psychologist^{16}. The mean ADOS-T score was 18.81 (SD = 4.20). The mean IQ based on the Mullen Scales of Early Learning Composite Score for the ASD group was 63.58 (SD = 25.95). Developmental and/or language delay was determined based on the Mullen Scales (>1 SD below the mean in overall learning composite or receptive/expressive language).\n\n\nContext: Study methods, including participant recruitment, consent procedures, and diagnostic assessments.", "metadata": { "doc_id": "Dawson_6", "source": "Dawson" } }, { "page_content": "Text: Stimuli. A series of stimuli, comprised of brief movies, were shown on a smart tablet while the child sat on a caregiver's lap. The tablet was placed on a stand approximately 3 feet away from the child to prevent the child from touching the screen. The stimuli consisted of a series of brief developmentally-appropriate movies designed to elicit positive affect and engage the child's attention. The movies consisted of cascading bubbles, a mechanical bunny, animal puppets interacting with each other, and a split screen showing on one side a woman singing nursery rhymes and on the other side dynamic, noise-making toys. The lengths of the movies were 30 seconds (Bubbles), 60 seconds (Rhyme), and ∼70 seconds (Bunny and Puppets). Each movie was shown once except for Bubbles which was shown at the beginning and end of the series. The entire series of movies lasted 5 minutes. Examples of the stimuli and experimental setup are presented in Fig. 1 and described in two previous publications^{17}. Examples of clips from the movies are provided in the Supplementary Material. During three of the movies, the examiner, standing behind the child, called the child's name. A failure to orient to name is an early\n\nimg-0.jpeg\n\n\nContext: Experimental methods and materials used in the study, specifically detailing the stimuli presented to children during the study.", "metadata": { "doc_id": "Dawson_7", "source": "Dawson" } }, { "page_content": "Text: img-0.jpeg\n\nFigure 1. iPad movie task and facial landmark detection: (A) Two examples of facial landmark points detected by CVA and estimated head pose (indicated by the three arrows). The landmarks colored in red are the inner left, inner right, and central nose landmarks that are used for head movement computation. The left example depicts landmarks and head pose of a participant engaged in the movie stimuli; while in the right example, the participant is looking away. Both states are automatically detected. (B) Example frames from movie stimuli. Each row displays a frame from corresponding movie stimuli show in the columns (going from left-to-right): Bubbles (30 seconds, two repetitions), Bunny (66 seconds), Rhymes (60 seconds), and Puppet show (68 seconds). symptom of autism, and results of our analysis of the orienting results have previously been published^{15}. However, all segments when children looked away from the movie, including to orient to name, as well as all 5 second segments post the name-call stimulus, were automatically removed from the present analyses. Specifically, in order to remove any influence on head movement due to the child orienting when his or her name was called, we removed the time window starting at cue for the name call prompt (a subtle icon used to prompt the examiner to call the name) through the point where 75% of the audible name calls actually occurred, plus 150 frames (5 seconds). Since previous studies have shown that orienting tends to occur within a few seconds after a name call, this eliminated segments influenced by the name call.\n\nParents were asked to attempt to keep the child seated in their lap, but to allow the child to get off their lap if the child became too distressed to stay seated. Researchers stopped the task for 1 child due to crying. Researchers restarted the task for three participants due to noncompliance.\n\n\nContext: Methods section describing the iPad movie task, facial landmark detection, and data exclusion procedures.", "metadata": { "doc_id": "Dawson_8", "source": "Dawson" } }, { "page_content": "Text: Computer Vision Analysis. The frontal camera in the tablet recorded video of the child's face throughout the experiment at 1280 × 720 spatial resolution and 30 frames per second. The fully automatic CVA algorithm detects and tracks 49 facial landmarks on the child's face (see Fig. 1)^{18} and estimates head pose angles relative to the camera by computing the optimal rotation parameters between the detected landmarks and a 3D canonical face model^{19}. For each video frame the algorithm outputted 2D positional coordinates of the facial landmarks and 3 head pose angles: yaw (left-right), pitch (up-down), and roll (tilting left-right). The yaw head pose angle was used to determine the frames when the child was engaged with the movie stimuli, where frames exhibiting a yaw pose with a magnitude less than 20° were considered as the child being engaged.\n\n\nContext: Methods section describing the computer vision analysis performed on video recordings of children's faces during an experiment.", "metadata": { "doc_id": "Dawson_9", "source": "Dawson" } }, { "page_content": "Text: Following the work of^{17}, to quantify head movement when the child is engaged (less than 20° yaw), per-frame pixel-wise displacements of 3 central facial landmarks were computed and normalized with respect to the child's eye width, thus head movement was measured as a (normalized) proportion of the child's eye width per frame. The pixel-wise displacements of the central facial landmarks are dependent on the child's distance to the camera in the tablet. Although the tablet was placed approximately 3 feet away from the child at the start of the experiment, the child is free to move throughout the experiment, thus affecting the magnitude of landmark displacements (when the child is near to the camera the pixel displacements are larger than if the child did the same movement but farther away from the camera). Normalizing the displacements with respect to the eye-width diminishes this distance to camera dependency. More formally, the head movement between frame n and n-1 is defined as the average Euclidean displacements of the central nose, left inner eye, and right inner eye landmarks (see Fig. 1) normalized by a ± second windowed-average, centered around frame n, of the Euclidean distances between the inner left and right eye landmarks,\n\n$$ \\frac{d_{n-1, n}}{w_{n-15, n+15}} $$\n\nwhere $\\overline{d_{n-1, n}}$ is the average landmark displacement of the three central landmarks between frame n and n-1, and $\\overline{w_{n-15, n+15}}$ is the average Euclidean distance between the left and right eye landmarks when the child is engaged between a half-second (15 frames) before and after frame n.\n\n\nContext: Methods section, describing the specific technique used to quantify head movement in children during tablet engagement.", "metadata": { "doc_id": "Dawson_10", "source": "Dawson" } }, { "page_content": "Text: Results evaluating the validity of the CVA methods, which rely on landmark identification and tracking on the face, have been previously published. One study demonstrated high reliability between the automatic methods and the expert human rater of head movements, with agreement between the computer and expert clinical rater occurring 92.5% of the time with interrater reliability based on Cohen's kappa = 0.75^{20}. A second study compared the automatic classification based on landmarks to human coders for head movement, demonstrated inter-rater reliability based on intraclass correlation coefficient (ICC) = 0.89^{17}. Other papers report high reliability between CVA and human coding for head turning in response to name (ICC = 0.84)^{12} and positive affective expression (happy; ICC = 0.90 and 0.89 for ASD and non-ASD toddlers)^{21}.\n\nThe original dataset consisted of frame-by-frame measurements of head movement, with observations for each 1/30^{th} of a second. Groups interested in direct use of the data can do so via collaboration with the authors due to privacy and consent considerations as well as backend designs, and the data will be stored in a separate partition at Duke University. In order to prepare the data for statistical analysis, we first aggregated the movement measurements by calculating the head movement rate, defined as the moving sum of the cumulative frame-by-frame movement measurements for each 10 frame period (representing 1/3^{rd} of a second). If any individual frames within a 10-frame set were set to missing, such as when the facial landmarks were not visible or during the name-call period, the moving sum was also set to missing. Outliers were addressed by Winsorizing to the 95^{th} percentile prior to aggregation.\n\nAll statistical analyses were performed separately for each of the movie stimuli. To visualize the time series, we calculated and plotted the median head movement rate as well as the 1^{st} and 3^{rd} quartiles at each 1/3 second time interval for both ASD and non-ASD children.\n\n\nContext: Methods and Results section, describing the data analysis techniques and findings related to head movement measurements.", "metadata": { "doc_id": "Dawson_11", "source": "Dawson" } }, { "page_content": "Text: Unadjusted and adjusted rate ratios for the association between ASD diagnosis and the rate of head movement in each 1/3 second time interval were estimated using a generalized linear mixed log-gamma regression model. Adjusted estimates controlled for ethnicity/race (white; other), age (in months), and sex (male; female). To account for potential within-subject correlations due to repeated measurement, we included a random intercept for each participant.\n\nResults\n\nThe time series data depicting the rate of head movement, defined as the distance traveled per 1/3 second (10 videoframes), for the ASD and non-ASD groups are shown in Fig. 2.\n\nBased on a generalized linear mixed regression model with a log link and gamma distribution (adjusting for ethnicity/race, age, and sex), significant associations between diagnostic group (ASD versus non-ASD) and rate of head movement were found during all movies except for the Bubbles 2, the last movie. For Bubbles 2, the shorter duration might have affected power to detect a result as there was nevertheless a trend toward a group difference in the same direction as all other movies. Results of the analysis are shown in Table 1.\n\nRobust group differences in the rate of head movement were evident during 4 out of 5 of the movies. For example, the rate of head movement among participants with ASD was 2.22 times that of non-ASD participants during the Bunny movie, after adjusting for age, ethnicity/race, and sex (95% Confidence Interval 1.60, 3.07). The rate ratio was higher for all movies that had animated and more complex stimuli (Bunny, Puppets, Rhymes and Toys), as compared to the less complex Bubbles videos.\n\n\nContext: Statistical analysis of head movement rate differences between ASD and non-ASD groups during movie viewing.", "metadata": { "doc_id": "Dawson_12", "source": "Dawson" } }, { "page_content": "Text: Although the LD/DD group was too small to conduct independent analyses of that group, as a sensitivity analysis, the 8 patients with LD/DD were from the main regression model and re-estimated the associations, as shown in Table 2. Overall, the results are consistent with those reported in the main analysis; in fact, the associations are slightly stronger when the LD/DD group is removed from the non-ASD group.\n\nDiscussion\n\n\nContext: Sensitivity analysis regarding the impact of including individuals with intellectual disability/developmental delay in the study's results.", "metadata": { "doc_id": "Dawson_13", "source": "Dawson" } }, { "page_content": "Text: The present study adds to a large and growing body of literature indicating that differences in early motor development are an important feature of ASD. We found highly significant differences in postural control, reflected in differences in the rate of spontaneous movement of the head between toddlers with ASD versus those without ASD. Using an automated, objective approach, we analyzed data comprised of video-frame-level measurements of head movements with observation for each 1/30^{th} of a second and created 10-frame moving sums to capture movement. Time-series data revealed group differences in the rate of head movement across all movies representing a wide range of stimuli, such as bubbles, a hopping bunny, and a woman singing a nursery rhyme paired with dynamic toys. An increase in the rate of head movement observed in young children with ASD during states of engaged attention might indicate underlying differences in the ability to maintain midline postural control and/or atypical engagement of attentional systems in young toddlers with ASD. These movements were not defined by spontaneous looking away from the stimulus, as was reported by Martin et al.^{14}. Rather, they were characterized by a failure to keep the head in a still midline position while viewing the movie. This is distinct from the feature studied by Martin et al., which was characterized by greater yaw angular displacement and greater yaw and roll angular velocity, which was primarily present during the presentation of social stimuli and might reflect sensory modulation. The movements we describe in this paper may be similar to those described in previous studies of postural sway in older children with ASD, as well as school aged children with attention deficit hyperactivity disorder (ADHD) by Heiser et al.^{22}. Heiser et al. used infrared motion analysis to record head movements during a continuous performance task and found that boys with ADHD moved their head 2.3 times as far as typically-developing boys performing the same task. In a\n\n\nContext: Findings regarding differences in postural control and head movements in toddlers with ASD, compared to typically developing toddlers, and relating these findings to previous research on ADHD.", "metadata": { "doc_id": "Dawson_14", "source": "Dawson" } }, { "page_content": "Text: analysis to record head movements during a continuous performance task and found that boys with ADHD moved their head 2.3 times as far as typically-developing boys performing the same task. In a study of siblings of children with ASD, Reiersen and colleagues^{23} found that siblings who have impaired motor coordination, features of attention deficit hyperactivity disorder (ADHD), or both are much more likely to have ASD than are other siblings. They suggest\n\n\nContext: Research on motor differences in autism and related conditions.", "metadata": { "doc_id": "Dawson_15", "source": "Dawson" } }, { "page_content": "Text: img-1.jpeg\n\nFigure 2. Time series of head movement rate, measured as the distance traveled per 1/3 seconds (10 video frames), by ASD diagnosis. Solid lines are the median values at each time point. Bands represent the first and third quartiles at each time point. Blank sections represent name calls, which were removed from this analysis.\n\nMovie Unadjusted Adjusted Rate Ratio (95\\% Confidence Interval) for ASD vs non-ASD P-value Rate Ratio (95\\% Confidence Interval) for ASD vs non-ASD P-value Video Bubbles 1 $1.46(1.09,1.97)$ 0.011 $1.53(1.10,2.12)$ 0.012 Video Bunny $2.13(1.60,2.85)$ $<0.0001$ $2.22(1.60,3.07)$ $<0.0001$ Video Puppets $2.08(1.50,2.88)$ $<0.0001$ $2.30(1.60,3.31)$ $<0.0001$ Video Rhymes and Toys $2.37(1.77,3.16)$ $<0.0001$ $2.45(1.78,3.39)$ $<0.0001$ Video Bubbles 2 $1.52(1.08,2.14)$ 0.018 $1.43(0.97,2.10)$ 0.070\n\nTable 1. Unadjusted and adjusted rate ratios for the associations between diagnostic group and rate of head movement. that identification of nonspecific traits that can amplify risk for ASD, such as attention and motor differences, could allow for earlier identification and targeted therapy that modify these traits and potentially reduce later risk for ASD.\n\nDelays and differences in sensorimotor development have been noted across the lifespan in individuals with ASD from early infancy through adulthood ${ }^{24}$. For example, Lim et al. showed that postural sway and attention demands of postural control were larger in adults with ASD than in typically developed adults ${ }^{25}$. Morris et al. found that adults with ASD did not use visual information to control standing posture, in contrast to adults without ASD ${ }^{26}$. Brain imaging studies suggest that atypical motor function in autism may be related to increased\n\nTable 2. Adjusted rate ratios for the associations between diagnostic group and rate of head movement after removing LD/DD participants.\n\n\nContext: Results of a study examining head movement rates in children with and without ASD, presented alongside figures and tables detailing statistical analysis.", "metadata": { "doc_id": "Dawson_16", "source": "Dawson" } }, { "page_content": "Text: Table 2. Adjusted rate ratios for the associations between diagnostic group and rate of head movement after removing LD/DD participants.\n\nsensitivity to proprioceptive error and a decreased sensitivity to visual error, aspects of motor learning dependent on the cerebellum^{27}. Atypical presentation of motor functions of the cerebellum has been noted in children with ASD as young as 14 months of age. Esposito et al. identified significant differences in gait pattern, reflected in postural asymmetry, in toddlers with ASD as compared to those without ASD^{28}.\n\nThe sample of toddlers with ASD was recruited from primary pediatric care where children suspected of having autism were then evaluated using gold-standard diagnostic methods. Although this method of recruitment increases the likelihood of obtaining a more representative population-based sample, it also results in a comparison group of toddlers without ASD that is much larger than the ASD sample. Because the sample of ASD toddlers in this study was relatively small, it will be important to replicate these findings with a larger group of children. A larger sample would also provide the statistical power to examine whether differences in postural control exist based on individual characteristics of children with ASD, such as age, sex, and co-morbid intellectual disability and/or ADHD.\n\n\nContext: Discussion of findings related to postural control and gait in toddlers with ASD, acknowledging limitations and future research directions.", "metadata": { "doc_id": "Dawson_17", "source": "Dawson" } }, { "page_content": "Text: Previous analyses of motor differences associated with ASD often have required labor-intensive coding of patterns of behavior that are recognizable by the naked eye. Moreover, such studies typically use a “top down” approach in which specific behaviors of interest are defined and then rated by more than one person (for reliability assessments). The use of digital phenotyping offers multiple advantages over previous methods that rely on human coding, namely, the ability to automatically and objectively measure dynamic features of behavior on a spatiotemporal scale that is not easily perceptible to the naked eye. Because digital approaches are scalable, they also allow for collection of larger data sets that can be analyzed using machine learning. We anticipate that the use of digital phenotyping will reveal a number of objective biomarkers, such as the head movements described in this report, which can be used as early risk indices and targets for intervention. By combining multiple features that reflect different aspects of sensorimotor function, including patterns of facial expressiomn, orienting, midline head movements, reaching behavior, and others, it might be possible to create a reliable, objective and automated risk profile for ASD and other neurodevelopmental disorders.\n\nReferences\n\nTeitelbaum, P., Teitelbaum, O., Nye, J., Fryman, J. \\& Maurer, R. G. Movement analysis in infancy may be useful for early diagnosis of autism. Proc Natl Acad Sci USA 95, 13982--13987 (1998).\n\nEsposito, G., Venuti, P., Maestro, S. \\& Muratori, F. An exploration of symmetry in early autism spectrum disorders: analysis of lying. Brain et development 31, 131--138, https://doi.org/10.1016/j.braindev.2008.04.005 (2009).\n\nFlanagan, J. E., Landa, R., Bhat, A. \\& Bauman, M. Head lag in infants at risk for autism: a preliminary study. The American journal of occupational therapy: official publication of the American Occupational Therapy Association 66, 577--585, https://doi.org/10.5014/ajot.2012.004192 (2012).\n\n\nContext: The authors discuss the advantages of using digital phenotyping and machine learning to objectively measure behavioral patterns for early autism risk assessment, contrasting it with traditional, labor-intensive coding methods.", "metadata": { "doc_id": "Dawson_18", "source": "Dawson" } }, { "page_content": "Text: Zappella, M. et al. What do home videos tell us about early motor and socio-communicative behaviours in children with autistic features during the second year of life--An exploratory study. Early human development 91, 569--575, https://doi.org/10.1016/j. earthundev.2015.07.006 (2015).\n\nDawson, G., Osterling, J., Meltzoff, A. N. \\& Kuhl, P. Case Study of the Development of an Infant with Autism from Birth to Two Years of Age. Journal of applied developmental psychology 21, 299--313, https://doi.org/10.1016/s0193-3973(99)00042-8 (2000).\n\nBrisson, J., Warreyn, P., Serres, J., Foussiier, S. \\& Adrien-Louis, J. Motor anticipation failure in infants with autism: a retrospective analysis of feeding situations. Autism: the international journal of research and practice 16, 420--429, https://doi.org/10.1177/1362361311423385 (2012).\n\nGima, H. et al. Early motor signs of autism spectrum disorder in spontaneous position and movement of the head. Experimental brain research 236, 1139--1148, https://doi.org/10.1007/s00221-018-5202-x (2018).\n\nHytonen, M., Pyykko, I., Aalto, H. \\& Starck, J. Postural control and age. Acta oto-laryngologica 113, 119--122 (1993).\n\nGhanouni, P., Memari, A. H., Gharibzadeh, S., Eghlidi, J. \\& Moshayedi, P. Effect of Social Stimuli on Postural Responses in Individuals with Autism Spectrum Disorder. J Autism Dev Disord 47, 1305--1313, https://doi.org/10.1007/s10803-017-3032-5 (2017).\n\nMinshew, N. J., Sung, K., Jones, B. L. \\& Furman, J. M. Underdevelopment of the postural control system in autism. Neurology 63, 2056--2061 (2004).\n\nGouleme, N. et al. Postural Control and Emotion in Children with Autism Spectrum Disorders. Translational neuroscience 8, 158--166, https://doi.org/10.1515/tnsci-2017-0022 (2017).\n\nCampbell, K. et al. Computer vision analysis captures atypical attention in toddlers with autism. Autism, 1362361318766247, https://doi.org/10.1177/1362361318766247 (2018).\n\n\nContext: References related to motor development, postural control, and atypical attention in autism.", "metadata": { "doc_id": "Dawson_19", "source": "Dawson" } }, { "page_content": "Text: Campbell, K. et al. Computer vision analysis captures atypical attention in toddlers with autism. Autism, 1362361318766247, https://doi.org/10.1177/1362361318766247 (2018).\n\nAnzulewicz, A., Sobota, K. \\& Delafield-Butt, J. T. Toward the Autism MotorSignature: Gesture patterns during smart tablet gameplay identify children with autism. Scientific reports 6, 31107, https://doi.org/10.1038/srep31107 (2016).\n\nMartin, K. B. et al. Objective measurement of head movement differences in children with and without autism spectrum disorder. Molecular autism 9, 14, https://doi.org/10.1186/s13229-018-0198-4 (2018).\n\nWu, D., Jose, J. V., Nurnberger, J. I. \\& Torres, E. B. A Biomarker Characterizing Neurodevelopment with applications inAutism. Scientific reports 8, 614, https://doi.org/10.1038/s41598-017-18902-w (2018).\n\nGotham, K., Risi, S., Pickles, A. \\& Lord, C. The Autism Diagnostic Observation Schedule: revised algorithms for improved diagnostic validity. J Autism Dev Disord 37, 613-627, https://doi.org/10.1007/s10803-006-0280-1 (2007).\n\nHashemi, J. et al. In Proceedings of the EAI International Conference on Wireless Mobile Communication and Healthcare. MobiHealth (2015).\n\nDe La Torre, F. IntraFace. Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition Workshops (2015).\n\nDementhon, D. D. L.D. Model-based object pose in 25 lines of code. International Journal of Computer Vision 15, 123-141 (1995).\n\nHashemi, J. et al. Computer vision tools for low-cost and noninvasive measurement of autism-related behaviors in infants. Autism Res Treat. 2014, 933686, https://doi.org/10.1155/2014/935686 (2014).\n\nHashemi, J. et al Computer vision analysis for quantification of autism risk behaviors. IEEE Transactions on Affective Computing, 1-1 (2018).\n\n\nContext: References cited in the article \"Computer vision analysis captures atypical attention in toddlers with autism.\"", "metadata": { "doc_id": "Dawson_20", "source": "Dawson" } }, { "page_content": "Text: Hashemi, J. et al Computer vision analysis for quantification of autism risk behaviors. IEEE Transactions on Affective Computing, 1-1 (2018).\n\nHeiser, P. et al. Objective measurement of hyperactivity, impulsivity, and inattention in children with hyperkinetic disorders before and after treatment with methylphenidate. European child \\& adolescent psychiatry 13, 100-104, https://doi.org/10.1007/s00787-004-0365-3 (2004).\n\nReiersen, A. M., Constantino, J. N. \\& Todd, R. D. Co-occurrence of motor problems and autistic symptoms in attention-deficit/ hyperactivity disorder. J Am Acad Child Adolesc Psychiatry 47, 662-672, https://doi.org/10.1097/CHL0b013e31816bf88 (2008).\n\nCook, J. L., Blakemore, S. J. \\& Press, C. Atypical basic movement kinematics in autism spectrum conditions. Brain: a journal of neurology 136, 2816-2824, https://doi.org/10.1093/brain/awt208 (2013).\n\nLim, Y. H. et al. Effect of Visual Information on Postural Control in Adults with Autism Spectrum Disorder. J Autism Dev Disord, https://doi.org/10.1007/s10803-018-3634-6 (2018).\n\nMorris, S. L. et al. Differences in the use of vision and proprioception for postural control in autism spectrum disorder. Neuroscience 307, 273-280, https://doi.org/10.1016/j.neuroscience.2015.08.040 (2015).\n\nMarko, M. K. et al. Behavioural and neural basis of anomalous motor learning in children with autism. Brain 138, 784-797, https:// doi.org/10.1093/brain/awu394 (2015).\n\nEsposito, G., Venuti, P., Apicella, F. \\& Muratori, F. Analysis of unsupported gait in toddlers with autism. Brain \\& development 33, 367-373, https://doi.org/10.1016/j.braindev.2010.07.006 (2011).\n\nAcknowledgements\n\n\nContext: References cited in the article.", "metadata": { "doc_id": "Dawson_21", "source": "Dawson" } }, { "page_content": "Text: Acknowledgements\n\nFunding for this work was provided by NICHD 1P50HD093074, Duke Department of Psychiatry and Behavioral Sciences PRIDe award, Duke Education and Human Development Initiative, Duke-Coulter Translational Partnership Grant Program, National Science Foundation, and the Department of Defense. Some of the stimuli used for the movies were created by Geraldine Dawson, Michael Murias, and Sara Webb at the University of Washington. We gratefully acknowledge the editorial assistance of Elizabeth Sturdivant and the participation of the children and families in this study.\n\nAuthor Contributions\n\nG.D. and G.S. were responsible for conceptualizing and drafting the manuscript. All other authors reviewed and contributed to the manuscript. G.S. and J.H. were responsible for carrying-out the computer vision analyses. S.L. and V.S. were responsible for the statistical analyses. G.D., K.C., K.C. and S.V. were responsible for collection of the data and diagnostic confirmations. All authors were responsible for the design of the study and/or data analyses. K.C. and H.E. performed this work while employed at Duke University,\n\nAdditional Information\n\nSupplementary information accompanies this paper at https://doi.org/10.1038/s41598-018-35215-8. Competing Interests: Geraldine Dawson is on the Scientific Advisory Boards of Janssen Research and Development, Akili, Inc., LabCorps, and Roche Pharmaceutical Company, has received grant funding from Janssen Research and Development, L.L.C. and PerkinElmer, speaker fees from ViaCord, and receives royalties from Guilford Press and Oxford University Press. Geraldine Dawson and Guillermo Sapiro are affiliated with DASIO, LLC. Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.\n\n\nContext: This section details funding sources, acknowledgements, author contributions, and additional information related to the research paper.", "metadata": { "doc_id": "Dawson_22", "source": "Dawson" } }, { "page_content": "Text: Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. (c) The Author(s) 2018\n\n\nContext: Concluding remarks regarding licensing and copyright information.", "metadata": { "doc_id": "Dawson_23", "source": "Dawson" } }, { "page_content": "Text: Research and Applications\n\nMachine learning approach for early detection of autism by combining questionnaire and home video screening\n\nHalim Abbas, ${ }^{1}$ Ford Garberson, ${ }^{1}$ Eric Glover, ${ }^{2}$ and Dennis P Wall ${ }^{1,3,4}$ ${ }^{1}$ Cognoa Inc., Palo Alto, CA, USA www.linkedin.com/in/halimabbas, ${ }^{2}$ eric_g@ericglover.com, ${ }^{3}$ Department of Pediatrics, Stanford University, Stanford, CA, USA, ${ }^{4}$ Department of Biomedical Data Science, Stanford University, Stanford, CA, USA\n\nCorrespondence to: Cognoa Inc., Palo Alto, CA, USA; halim@cognoa.com Received 19 September 2017; Revised 16 March 2018; Editorial Decision 25 March 2018; Accepted 2 April 2018\n\nAbstract\n\n\nContext: Introduction to a study using machine learning to detect autism by combining questionnaire and home video screening, presented within a larger paper detailing the research and applications.", "metadata": { "doc_id": "Abbas_2018_0", "source": "Abbas_2018" } }, { "page_content": "Text: Correspondence to: Cognoa Inc., Palo Alto, CA, USA; halim@cognoa.com Received 19 September 2017; Revised 16 March 2018; Editorial Decision 25 March 2018; Accepted 2 April 2018\n\nAbstract\n\nBackground: Existing screening tools for early detection of autism are expensive, cumbersome, time- intensive, and sometimes fall short in predictive value. In this work, we sought to apply Machine Learning (ML) to gold standard clinical data obtained across thousands of children at-risk for autism spectrum disorder to create a low-cost, quick, and easy to apply autism screening tool. Methods: Two algorithms are trained to identify autism, one based on short, structured parent-reported questionnaires and the other on tagging key behaviors from short, semi-structured home videos of children. A combination algorithm is then used to combine the results into a single assessment of higher accuracy. To overcome the scarcity, sparsity, and imbalance of training data, we apply novel feature selection, feature engineering, and feature encoding techniques. We allow for inconclusive determination where appropriate in order to boost screening accuracy when conclusive. The performance is then validated in a controlled clinical study. Results: A multi-center clinical study of $n=162$ children is performed to ascertain the performance of these algorithms and their combination. We demonstrate a significant accuracy improvement over standard screening tools in measurements of AUC, sensitivity, and specificity. Conclusion: These findings suggest that a mobile, machine learning process is a reliable method for detection of autism outside of clinical settings. A variety of confounding factors in the clinical analysis are discussed along with the solutions engineered into the algorithms. Final results are statistically limited and will benefit from future clinical studies to extend the sample size.\n\nKey words: supervised machine learning, autism spectrum disorder, diagnostic techniques and procedures, mobile applications\n\nINTRODUCTION\n\n\nContext: The abstract and introduction of a research paper detailing the development and validation of machine learning algorithms for autism screening.", "metadata": { "doc_id": "Abbas_2018_1", "source": "Abbas_2018" } }, { "page_content": "Text: Key words: supervised machine learning, autism spectrum disorder, diagnostic techniques and procedures, mobile applications\n\nINTRODUCTION\n\nDiagnosis within the first few years of life dramatically improves the outlook of children with autism, as it allows for treatment while the child's brain is still rapidly developing. ${ }^{1,2}$ Unfortunately, autism is typically not diagnosed earlier than age 4 in the United States, with approximately $27 \\%$ of cases remaining undiagnosed at age $8 .{ }^{3}$ This delay in diagnosis is driven primarily by a lack of effective screening tools and a shortage of specialists to evaluate at-risk children. The use of higher accuracy screening tools to prioritize children to be seen by specialists is therefore essential.\n\nMost autism screeners in use today are based on questions for the parent or the medical practitioner, that produce results by comparing summed answer scores to predetermined thresholds. Notable examples are the Modified Checklist for Autism in Toddlers, Revised (M-CHAT), ${ }^{4}$ a checklist-based screening tool for autism that is intended to be administered during developmental screenings for children between the ages of 16 and 30 months, and the Child Behavior Checklist (CBCL). ${ }^{5}$ Both are parent-completed screening tools. For both instruments, responses to each question are summed with each question given equal weighting, and if the total is above a pre-determined threshold the child is considered to be at high risk of\n\nautism. In the case of CBCL there are multiple scales based upon different sets of questions corresponding to different conditions. The \"Autism Spectrum Problems\" scale of CBCL is used when comparing its performance to the performances of our algorithms in this paper.\n\n\nContext: The introduction to a research paper on improving autism screening tools, specifically discussing existing parent-completed screening tools like M-CHAT and CBC.", "metadata": { "doc_id": "Abbas_2018_2", "source": "Abbas_2018" } }, { "page_content": "Text: In this paper, we present two new machine learning screeners that are reliable, cost-effective, short enough to be completed in minutes, and achieve higher accuracy than existing screeners on the same age span as existing screeners. One is based on a short questionnaire about the child, which is answered by the parent. The other is based on identification of specific behaviors by trained analysts after watching two or three short videos of the child within their natural environment that are captured by parents using a mobile device.\n\nThe parent questionnaire screener keys on behavioral patterns similar to those probed by a standard autism diagnostic instrument, the Autism Diagnostic Interview - Revised (ADI-R). ${ }^{6}$ This clinical tool consists of an interview of the parent with 93 multi-part questions with multiple choice and numeric responses which are delivered by a trained professional in a clinical setting. While this instrument is considered a gold-standard, and gives consistent results across examiners, the cost and time to administer it can be prohibitive in a primary care setting. In this paper, we present our approach to using clinical ADI-R instrument data to create a screener based on a short questionnaire presented directly to parents without supervision.\n\n\nContext: Introducing the two new machine learning screener approaches: a parent questionnaire and a video-based analysis, and comparing the parent questionnaire to the Autism Diagnostic Interview - Revised (ADI-R).", "metadata": { "doc_id": "Abbas_2018_3", "source": "Abbas_2018" } }, { "page_content": "Text: The video screener keys on behavioral patterns similar to those probed in another diagnostic tool, the Autism Diagnostic Observation Schedule (ADOS). ${ }^{7}$ ADOS is widely considered a gold standard and is one of the most common behavioral instruments used to aid in the diagnosis of autism. ${ }^{8}$ It consists of an interactive and structured examination of the child by trained clinicians in a tightly controlled setting. ADOS is a multi-modular diagnostic instrument, with different modules for subjects at different levels of cognitive development. In this paper, we present our approach to mining ADOS clinical records, with a focus on younger developmental age, to create a video-based screener that relies on an analyst evaluating short videos of children filmed by their parents at home.\n\n\nContext: Autism diagnostic tools and the development of a video-based screener.", "metadata": { "doc_id": "Abbas_2018_4", "source": "Abbas_2018" } }, { "page_content": "Text: The use of behavioral patterns commonly probed in ADI-R and ADOS scoresheets as inputs to train autism screening classifiers was introduced, studied, and clinically validated in previous work. ${ }^{9-12}$ There are several new aspects in this paper. First, the algorithms detailed in the present study have been designed to be more accurate and more robust against confounding biases between training and application data. Next, this paper focuses considerable attention on the impact of confounding factors on machine learning algorithms in this context. Examples of these confounding biases will be discussed below and highlighted in Table 2. Labeled data usually originates from tightly controlled clinical environments and is, hence, clean but sparse, unbalanced, and of a different context to the data available when applying the screening techniques in a less formal environment. This paper also presents a combination between the algorithms for a more powerful single screener. Lastly, this paper generalizes the algorithms to be non-binary, sometimes resulting in an \"inconclusive\" determination when presented with data from more challenging cases. This allows higher screening accuracy for those children who do receive a conclusive screening, while still presenting a clinically actionable inconclusive outcome in the more challenging cases.\n\nThese classifiers of this paper were applied to screen children in a clinical study using the Cognoa ${ }^{13}$ App. To date, Cognoa has been used by over 250000 parents in the US and internationally. The majority of Cognoa users are parents of young children between 18 and\n\n30 months. The clinical study consisted of 162 at-risk children who had undergone full clinical examination and received a clinical diagnosis at a center specialized in neurodevelopmental disorders.\n\nMETHODS\n\n\nContext: Clinical application of autism screening classifiers and a clinical study using the Cognoa App.", "metadata": { "doc_id": "Abbas_2018_5", "source": "Abbas_2018" } }, { "page_content": "Text: METHODS\n\nIt is not feasible to amass large training sets of children who have been evaluated by the mobile screeners and who also have received a professional medical diagnosis. Our approach is to start with historical medical instrument records of previously diagnosed subjects, and use those as training data for screeners that will rely on information acquired outside the clinical setting. Expected performance degradation from applying the algorithms into a less controlled setting would result in inaccurate screeners if conventional machine learning methods were used. Much of this paper outlines the details of creative machine learning methods designed to overcome this challenge and create reliable screeners in this setting.\n\nTraining data were compiled from multiple repositories of ADOS and ADI-R score-sheets of children between 18 and 84 months of age including Boston Autism Consortium, Autism Genetic Resource Exchange, Autism Treatment Network, Simons Simplex Collection, and Vanderbilt Medical Center. Since such repositories are highly imbalanced with very few non-autistic patients, the controls across the datasets were supplemented with balancing data obtained by conducting ADI-R interviews by a trained clinician on a random sample of children deemed at low risk for autism from Cognoa's user base. For both algorithms a smaller set of optimal features was selected using methods that will be discussed below. Details about the final selected features are given in the Supplementary Material.\n\n\nContext: Methods: Data compilation and feature selection for training autism screener algorithms.", "metadata": { "doc_id": "Abbas_2018_6", "source": "Abbas_2018" } }, { "page_content": "Text: The clinical validation sample consists of 230 children who presented to one of three autism centers in the United States between 18 and 72 months of age. All participants were referred through the clinics' typical referral program process, and only those with English-speaking parents were considered for the study. The three clinical centers were approved on a multisite IRB (project number 2202803). Every child received an ADOS as well as standard screeners like M-CHAT and CBCL as appropriate, and a diagnosis was ultimately ascertained by a licensed health care provider. For 162 of those children, the parents also used their mobile devices to complete the short parental questionnaire and submit the short videos required for the screeners discussed in this paper. The sample breakdown by age group and diagnosis for both the training and clinical validation datasets is shown in Table 1.\n\nApproach\n\nWe trained two independent ML classifiers and combined their outputs into a single screening assessment. The parent questionnaire classifier was trained using data from historical item-level ADI-R score-sheets with labels corresponding to established clinical diagnoses. The video classifier was trained using ADOS instrument scoresheets and diagnostic labels. In each case, progressive sampling was used to verify sufficient training volume as detailed in the Supplementary Materials. Multiple machine learning algorithms were evaluated including ensemble techniques on the training data. A number of algorithms performed well. Random Forests were chosen because of robustness against overfitting.\n\nADI-R and ADOS instruments are designed to be administered by trained professionals in highly standardized clinical settings and typically take hours. In contrast, our screening methods are deliberately\n\n\nContext: Clinical validation of the autism screening methods.", "metadata": { "doc_id": "Abbas_2018_7", "source": "Abbas_2018" } }, { "page_content": "Text: Table 1. Dataset Breakdown by Age Group and Condition Type for Each of the Sources of Training Data and for the Clinical Validation Sample. The Negative Class Label Includes Normally Developing (i.e. neurotypical) Children as Well as Children with Developmental Delays and Conditions other than Autism\n\nAge (years) Condition Classification type Questionnaire Number of samples training Video training Clinical validation $<4$ Autism $+$ 414 1445 84 $<4$ Other condition - 133 231 18 $<4$ Neurotypical - 74 308 3 $\\geq 4$ Autism $+$ 1885 1865 37 $\\geq 4$ Other condition - 154 133 11 $\\geq 4$ Neurotypical - 26 277 9\n\nTable 2. Differences Between Training and Application Environments. These Differences are Expected to Cause Bias that Cannot be Captured by Cross-validation Studies\n\n\nContext: Dataset characteristics and environmental differences impacting model bias.", "metadata": { "doc_id": "Abbas_2018_8", "source": "Abbas_2018" } }, { "page_content": "Text: Table 2. Differences Between Training and Application Environments. These Differences are Expected to Cause Bias that Cannot be Captured by Cross-validation Studies\n\nAspect Training Setting Application Setting Source ADI-R and ADOS instrument administered by trained professionals during clinical eval-uations Short parent questionnaires displayed on smartphone, and behavior tagging by ana-lysts after observing two or three 1-minute home videos uploaded by parents Proctor Highly trained medical professionals Parents answering the questionnaires are un-trained, and the analysts evaluating the home videos are only minimally trained. As a result, their answers may not be as consistent, objective, or reliable Setting Clinic setting with highly standardized and semi-structured interactions At home. Not possible to recreate the structured clinical environment, resulting in an undesired variability of the output signals. Subjects might also behave differently at the clinic than at home, further amplifying the bias Duration The ADI-R can take up to 4 hours to complete; The ADOS can take up to 45 minutes of direct observation by trained professionals Under 10 minutes to complete the parent questionnaire, and a few minutes of home video. As a result, some symptoms and behavioral patterns might be present but not observed. Also causes big uncertainty about the severity and frequency of observed symptoms Questionnaires Sophisticated language involving psychological concepts, terms, and subtleties unfamiliar to nonexperts Simplified questions and answer choices result in less nuanced, noisier inputs\n\ndesigned to be administered at home by parents without expert supervision, and to take only minutes to complete. This change of environment causes significant data degradation and biases resulting in an expected loss of screening accuracy. For each classifier, we present mindful adjustments to ML methodology to mitigate these issues. These biases and efforts to mitigate them are discussed below.\n\n\nContext: Discussion of biases introduced by differences between the training and application environments for autism detection models.", "metadata": { "doc_id": "Abbas_2018_9", "source": "Abbas_2018" } }, { "page_content": "Text: Differences Between Training and Application Environments\n\nThe screeners are trained on historical patient records that correspond to controlled, lengthy clinical examinations, but applied via web or mobile app aimed at unsupervised parents at home. Table 2 details the various mechanisms by which confounding biases may consequently creep into the application data. Note that inaccuracies introduced by such biases cannot be probed by cross- validation or similar analysis of the training data alone.\n\nHyperparameter Optimization\n\nFor each parental questionnaire and video model that will be discussed below, model hyperparameters were tuned with a bootstrapped grid search. In all cases, class labels were used to stratify the folds, and (age, label) pairs were used to weight-balance the samples. More details can be found in the Supplementary Materials.\n\nParent Questionnaire\n\nMultiple model variants representing incremental improvements over a generic ML classification approach are discussed below.\n\nGeneric ML Baseline Variant\n\nA random forest was trained over the ADI-R instrument data. Each of the instrument's 155 data columns was treated as a categorical variable and one-hot encoded. The subject's age and gender were included as features as well. Of the resulting set of features, the top 20 were selected using feature-importance ranking in the decision forest.\n\nRobust Feature Selection Variant\n\n\nContext: Methods section, describing model variants and feature selection techniques.", "metadata": { "doc_id": "Abbas_2018_10", "source": "Abbas_2018" } }, { "page_content": "Text: Robust Feature Selection Variant\n\nDue to the small size and sparsity of the training dataset, generic feature selection was not robust, and the selected features (along with the performance of the resulting model) fluctuated from run to run due to the stochastic nature of the learner's underlying bagging approach. Many ADI-R questions are highly correlated, leading to multiple competing sets of feature selection choices that were seemingly equally powerful during training, but which had different performance characteristics when the underlying sampling bias was exposed via full bootstrapped cross-validation. This resulted in a wide performance range of the variant of the Generic ML baseline method as shown in Table 3.\n\nTable 3. Performance of Increasingly Effective Classifier Variants Based on the Training Data for the Parent Questionnaire. Results in the Top Table are Based on Cross-validated Training Performance. Results in the BottomTable (which are only available for variants using the optimally selected features) are Based on Actual Clinical Results\n\n\nContext: Challenges in feature selection due to limited training data and correlated questions.", "metadata": { "doc_id": "Abbas_2018_11", "source": "Abbas_2018" } }, { "page_content": "Text: AUC Sensitivity Specificity All ages $<4$ years $>=4$ years All ages $<4$ years $>=4$ years All ages $<4$ years Training scenario Generic ML baseline 0.932 to 0.928 to 0.928 to 0.976 to 0.975 to 0.975 to 0.628 to 0.625 to 0.950 0.953 0.953 0.982 0.984 0.984 0.645 0.648 Robust feature selection variant 0.958 0.958 0.958 0.982 0.982 0.982 0.624 0.624 Age silo variant 0.953 0.939 0.961 0.962 0.939 0.977 0.777 0.774 Severity-level feature encoding variant 0.965 0.950 0.974 0.962 0.912 0.993 0.748 0.833 Aggregate features variant 0.972 0.987 0.963 0.992 0.988 0.994 0.754 0.894 With inconclusive allowance [up to $25 \\%$ ] 0.991 0.997 0.983 1.000 1.000 1.000 0.939 0.977 Application scenario Age silo variant 0.62 0.68 0.54 0.65 0.62 0.52 0.48 0.46 Severity-level feature encoding variant 0.67 0.69 0.64 0.64 0.62 0.58 0.48 0.46 Aggregate features variant 0.68 0.73 0.68 0.68 0.69 0.65 0.57 0.62 With inconclusive allowance [up to $25 \\%$ ] 0.72 0.72 0.73 0.70 0.72 0.67 0.67 0.71\n\nRobust feature selection overcame that limitation using a twostep approach. First, a 100 -count bootstrapped feature selection was run, with a weight balanced $90 \\%$ random sample selected in each iteration. The top 20 features were selected each time, and a rank-invariant tally was kept for the number of times each feature made it to a top-20 list. Next, the top 30 features in the tally were kept as candidates and all other features were discarded. A final feature-selection run was used to pick the best subset of these candidate features. This approach was found to be more robust to statistical fluctuations, usually selecting the same set of features when run multiple times. A minimal subset of maximally performant features was chosen and locked for clinical validation, totaling 17 features for the young children and 21 features for the old. Details about these selected features are available in the Supplementary Material.\n\nAge Silo Variant\n\n\nContext: Performance metrics (AUC, Sensitivity, Specificity) for different machine learning approaches and scenarios.", "metadata": { "doc_id": "Abbas_2018_12", "source": "Abbas_2018" } }, { "page_content": "Text: Age Silo Variant\n\nThis variant built upon the improvements of the robust feature selection method, by exploiting of the dichotomy between pre-phrasal and fully-phrasal language capability in at-risk children. Language development is significant in this domain as it is known to affect the nature in which autism presents, and consequently the kinds of behavioral clues to look for in order to screen for it.\n\nThis variant achieved better performance by training separate classifiers for children in the younger and older age groups of Table 1. The age dichotomy of $<4, \\geq 4$ was chosen to serve as the best proxy for language ability. Feature selection, model parametertuning, and cross-validation were run independently for each age group classifier. Before siloing by age group, the classifier was limited to selecting features that work well across children of both developmental stages. Siloing enabled the classifiers to specialize on features that are most developmentally appropriate within each age group.\n\nSeverity-level Feature Encoding Variant\n\n\nContext: Within a section describing different variants of a machine learning model for autism risk assessment, focusing on improvements to feature selection and model training.", "metadata": { "doc_id": "Abbas_2018_13", "source": "Abbas_2018" } }, { "page_content": "Text: Severity-level Feature Encoding Variant\n\nBuilding upon the method including age siloing above, this variant achieved better performance by replacing one-hot feature encoding with a more context-appropriate technique. One-hot encoding does not distinguish between values that correspond to increasing levels of severity of a behavioral symptom, and values that do not convey a clear concept of severity. This is especially troublesome since a typical ADIR instrument question includes answer choices from both types of values. For example, ADI-R question 37, which focuses on the child's tendency to confuse and mix up pronouns, allows for answer codes 0,1 , $2,3,7,8$, and 9 . Among those choices, 0 through 3 denote increasing degrees of severity in pronominal confusion, while 7 denotes any other type of pronominal confusion not covered in 0-3 regardless of severity. Codes 8 and 9 denote the non-applicability of the question (for example, to a child still incapable of phrasal speech) or the lack of an answer (for example, if the question was skipped) respectively. When coding the answers to such questions, generic one-hot encoding would allow for non-symptomatic answer codes to be selected as screening features based on phantom correlations present in the dataset.\n\n\nContext: Methods for improving autism detection model performance.", "metadata": { "doc_id": "Abbas_2018_14", "source": "Abbas_2018" } }, { "page_content": "Text: Severity-level encoding converts all answer codes that do not convey a relevant semantic concept to a common value, thereby reducing the chance of useless feature selection, and reducing the number of features to choose from. In addition, severity-level encoding condenses the signal according to increasing ranges of severity. For example, the encoding of ADI-R question 37 would map its responses to new features with 1 s in the following cases (all other new features would be zero): $(0 \\rightarrow \\sim 0, \" 1 \\rightarrow \" 1, \" 2 \\rightarrow[\" 1, \" 2 \"], 3 \\rightarrow[\" 1, \" \" 2, \" 3 \"], 7 \\rightarrow$ $\\sim 7, \" 8,9 \\rightarrow$ None). This more closely resembles the way medical practitioners interpret such answer choices, and helps alleviate the problem of sparsity over each of the one-hot encoded features in the dataset.\n\nAggregate Features Variant\n\nBuilding upon the method including severity level encoding above, this variant achieved better performance by incorporating aggregate\n\nfeatures such as the minimum, maximum, and average severity level, as well as number of answer choices by severity level across the questions corresponding to the 20 selected features. These new features were especially helpful due to the sparse, shallow, and wide nature of the training set, whereupon any semantically meaningful condensation of the signal can be useful to the trained classifier.\n\nInconclusive Results Variant\n\nChildren with more complex symptom presentation are known to pose challenges to developmental screening. These children often screen as false positives or false negatives, resulting in an overall degradation of screening accuracy that is observed by all standard methods and has become acceptable in the industry. Given that our low-cost instruments do not rely on sophisticated observations to differentiate complex symptom cases, our approach was to avoid assessing them altogether, and to try instead to spot and label them as \"inconclusive.\"\n\n\nContext: Feature engineering techniques used to improve autism detection model performance.", "metadata": { "doc_id": "Abbas_2018_15", "source": "Abbas_2018" } }, { "page_content": "Text: Building upon the method including feature engineering, two methods to implement this strategy were devised. The first was to train a binary classifier with a continuous output score, then replace the cutoff threshold with a cutoff range, with values within the cutoff range considered inconclusive. A grid search was used to determine the optimal cutoff range representing a tradeoff between inconclusive determination rate and accuracy over conclusive subjects. The second approach was to train and cross-validate a simple binary classifier, label the correctly and incorrectly predicted samples as conclusive or inconclusive respectively, and then build a second classifier to predict whether a subject would be incorrectly classified by the first classifier. At runtime, the second classifier was used to spot and label inconclusives. The conclusives were sent for classification by a third, binary classifier trained over the conclusive samples only. Both methods for labeling inconclusive results yielded similar performance. Therefore, the simpler method of using a threshold range in the machine learning output was used to report inconclusive results for this paper.\n\nThe inconclusive rate is a configurable model parameter that controls the tradeoff between coverage and accuracy. Throughout this paper, the inconclusive rate for this variant was set to $25 \\%$.\n\nVideo\n\nThe second of our two-method approach to autism screening is an ML classifier that uses input answers about the presence and severity of target behaviors among subjects. This information was provided by an analyst upon viewing two or three 1-minute home videos of children in semi-structured settings that are taken by parents on their mobile phones. The classifier was trained on item-level data from two of the ADOS modules (module 1: preverbal, module 2: phrased speech) and corresponding clinical diagnosis.\n\n\nContext: Methods for handling inconclusive autism screening results and the video-based machine learning classifier.", "metadata": { "doc_id": "Abbas_2018_16", "source": "Abbas_2018" } }, { "page_content": "Text: Two decision forest ML classifiers were trained corresponding to each ADOS module. For each classifier, 10 questions were selected using the same robust feature selection method, and the same allowance for inconclusive outcomes was made as for the parental questionnaire classifier. Each model was independently parameter-tuned with a bootstrapped grid search. Class labels were used to stratify the cross-validation folds, and (age, label) pairs were used to weightbalance the samples.\n\nProblems related to the change of environment from training to application are especially significant in the case of video screening because ADOS involves a 45 minute direct observation of the child by experts, whereas our screening was based on unsupervised short home videos. Specifically, we expect the likelihood of inconclusive or unobserved behaviors and symptoms to be much higher in the application than in the training data, and the assessed level of severity or frequency of observed symptoms to be less reliable in the application than in the training data. The following improvements were designed to help overcome these limitations.\n\nPresence of Behavior Encoding\n\nTo minimize potential bias from a video analyst misreading the severity of a symptom in a short cell phone video, this encoding scheme improves feature reliability at the expense of feature information content by collapsing all severity gradations of a question into one binary value representing the presence vs absence of the behavior or symptom in question. Importantly, a value of 1 denotes the presence of behavior, regardless of whether the behavior is indicative of autism or of normalcy. This rule ensures that a value of 1 corresponds to a reliable observation, whereas a 0 does not necessarily indicate the absence of a symptom but possibly the failure to observe the symptom within the short window of observation.\n\nMissing Value Injection to Balance the Nonpresence of Features for the Video Screener Training Data\n\n\nContext: Methods for training video-based classifiers for autism risk assessment.", "metadata": { "doc_id": "Abbas_2018_17", "source": "Abbas_2018" } }, { "page_content": "Text: Missing Value Injection to Balance the Nonpresence of Features for the Video Screener Training Data\n\nWhile collapsing severity gradations into a single category overcomes noisy severity assessment, it does not help with the problem of a symptom not present or unnoticeable in a short home video. For this reason, it is important that the learning algorithm treat a value of 1 as semantically meaningful, and a value of 0 as inconsequential. To this end, we augmented the training set with duplicate samples that had some feature values flipped from 1 to 0 . The injection of 0 s was randomly performed with probabilities such that the sample-weighted ratio of positive to negative samples for which the value of any particular feature is 0 is about $50 \\%$. Such ratios ensure that the trees in a random forest will be much less likely to draw conclusions from the absence of a feature.\n\nCombination\n\nIt is desirable to combine the questionnaire and video screeners to achieve higher accuracy. However, the needed overlapping training set was not available. Instead, the clinical validation dataset itself was used to train the combination model.\n\nThe numerical responses of each of the parent questionnaire and video classifiers were combined using L2-regularized logistic regression, which has the advantage of reducing the concern of overfitting, particularly given the logistic model has only three free parameters. Bootstrapping and cross -validation studies showed that any overfitting that may be present from this procedure is not detectable within statistical limitations. Since each of the individual methods was siloed by age, separate combination algorithms were trained per age group silo. For each combination algorithm, optimal inconclusive output criteria were chosen using the logistic regression response, using the same techniques as for the parental questionnaire and video classifiers. The performance characteristics of the overall screening process compared to standard alternative screeners are shown below.\n\nRESULTS\n\n\nContext: Addressing missing data and combining questionnaire and video screeners for autism detection.", "metadata": { "doc_id": "Abbas_2018_18", "source": "Abbas_2018" } }, { "page_content": "Text: RESULTS\n\nParent Questionnaire Performance on Training Data\n\nBootstrapped cross-validation performance metrics for the optimally parameter-tuned version of each of the variants of the parental\n\nTable 4. Performance Comparisons Between Various Algorithms on Clinical Data\n\nBase model Model from this paper AUC improvement Mean recall improvement 2012 publication Questionnaire $0.07,[-0.03,0.17]$ $0.1,[0.02,0.17]$ M-CHAT Questionnaire $0.01,[-0.11,0.12]$ $0.06,[-0.04,0.17]$ CBCL Questionnaire $0.06,[-0.04,0.17]$ $0.11,[0.03,0.2]$ 2012 publication Questionnaire \\& video $0.16,[0.07,0.25]$ $0.12,[0.04,0.2]$ M-CHAT Questionnaire \\& video $0.08,[-0.03,0.19]$ $0.1,[-0.01,0.21]$ CBCL Questionnaire \\& video $0.15,[0.04,0.26]$ $0.14,[0.04,0.24]$ 2012 publication Questionnaire + inconclusive $0.16,[0.02,0.28]$ $0.09,[-0.02,0.2]$ M-CHAT Questionnaire + inconclusive $-0.01,[-0.39,0.31]$ $0.08,[-0.18,0.29]$ CBCL Questionnaire + inconclusive $0.15,[0.01,0.29]$ $0.11,[-0.02,0.24]$ 2012 publication Questionnaire \\& video + inconclusive $0.21,[0.1,0.32]$ $0.19,[0.1,0.28]$ M-CHAT Questionnaire \\& video + inconclusive $0.09,[-0.05,0.23]$ $0.15,[0.04,0.27]$ CBCL Questionnaire \\& video + inconclusive $0.2,[0.09,0.32]$ $0.2,[0.09,0.31]$ Questionnaire Questionnaire \\& video $0.09,[0.02,0.15]$ $0.03,[-0.04,0.09]$ Questionnaire Questionnaire + inconclusive $0.09,[-0.01,0.17]$ $-0.0,[-0.09,0.08]$ Questionnaire Questionnaire \\& video + inconclusive $0.14,[0.06,0.23]$ $0.09,[0.01,0.17]$ Q. and video Questionnaire \\& video + inconclusive $0.06,[0.01,0.11]$ $0.06,[0.0,0.13]$\n\n\nContext: Performance comparisons of various algorithms on clinical data, including AUC and mean recall improvements.", "metadata": { "doc_id": "Abbas_2018_19", "source": "Abbas_2018" } }, { "page_content": "Text: Each row evaluates the improvement of one of the algorithms from this paper over a \"Base model\" algorithm for the AUC metric, and for the average between the autism and the non-autism recalls at a response threshold point that achieves approximately $80 \\%$ sensitivity. Negative values would represent a worsening of performance for a given algorithm compared to the base model. Both average values of the improvements and [ $5 \\%, 95 \\%$ ] confidence intervals are reported. Algorithms that are labeled \"inconclusive\" allow up to $25 \\%$ of the most difficult samples to be discarded from the metric evaluation. Note that the M-CHAT instrument is intended for use on younger children. Therefore, older children were excluded when preforming comparisons to M-CHAT in this table. questionnaire are reported in the top of Table 3. The results for baseline variant are reported as a range rather than a single value, because the unreliability of generic feature selection leads to different sets of features selected from run to run, with varying performance results.\n\nParents of children included in the clinical study answered short, age-appropriate questions chosen using the robust feature selection method discussed above. The clinical performance metrics for each of the classification variants that build upon that feature selection scheme are shown in the bottom of Table 3. The difference in performance between the training and validation datasets is driven by the differences that are emphasized in Table 2. See below and the results of Table 4 for a discussion of the statistical significance of these results.\n\nROC curves in Figure 1 show how our parent questionnaire classification approach outperforms some of the established screening tools like MCHAT and CBCL on the clinical sample. Since clinical centers are usually interested in screening tools with a high sensitivity, we have drawn shaded regions between $70 \\%$ and $90 \\%$ sensitivity to aid the eye.\n\nCombination Screening Performance on Clinical Data\n\n\nContext: Results of a study comparing the performance of different autism screening algorithms.", "metadata": { "doc_id": "Abbas_2018_20", "source": "Abbas_2018" } }, { "page_content": "Text: Combination Screening Performance on Clinical Data\n\nROC curves in Figure 2 show how combining the questionnaire and video classifiers into a single assessment further boosted performance on the clinical study sample. When up to $25 \\%$ of the most challenging cases are allowed to be determined, inconclusive the performance on the remaining cases is shown in Figure 3. Note that the ROC curves in these figures for M-CHAT contain only younger children (mostly under four years of age) due to the fact that this instrument is not intended for older children. A same-sample comparison between M-CHAT and the ML screeners can be seen in the age binned figures (Figures 4 and 5).\n\nResults for Young Children\n\nYoung children are of particular interest given the desire to identify autism as early as possible. Results restricted to only children less than four years old are shown in Figures 4 and 5.\n\nStatistical Significance\n\nFor the training data, sample sizes are large enough that statistical limitations are minimal. However, results reported for the clinical data have significant statistical limitations. In this section we compare the performance of the screening algorithms on the clinical data that we have discussed in this paper: (1) the questionnairebased algorithm of, ${ }^{13}$ (2) M-CHAT, (3) CBCL, (4) the questionnaire-based algorithm of this paper, and (5) the combined questionnaire plus video algorithm of this paper. Direct comparisons in performance between many of these algorithms are reported along with statistical significances in Table 4.\n\nDISCUSSION\n\n\nContext: Results and analysis of the screening algorithms' performance on clinical data, including comparisons with existing tools and age-specific findings.", "metadata": { "doc_id": "Abbas_2018_21", "source": "Abbas_2018" } }, { "page_content": "Text: DISCUSSION\n\nWe have introduced a novel machine learning algorithm based on a parental questionnaire and another based on short home videos recorded by parents and scored by a minimally trained analyst. We have discussed pitfalls such as data sparsity, and the mixed ordinal and categorical nature of the questions in our training data. We have also identified several important confounding factors that arise from differences between the training and application settings of the algorithms. We have shown novel feature encoding, feature selection, and feature aggregation techniques to address these challenges, and have quantified their benefits. We have shown the benefits of allowing some subjects with lower certainty output from the algorithms to be classified as inconclusive. We have also shown the benefits of combining the results of the two algorithms into a single determination.\n\nBy specializing the machine learning models on a dichotomy of age groups, we found that the screener for younger children capitalized on non-verbal behavioral features such as eye contact, gestures, and facial expressions, while the screener for older children focused more on verbal communication and interactions with other children. For more details please refer to the Supplementary Material.\n\nThe methods and resulting improvements shown in this paper are expected to translate well into other clinical science applications\n\nimg-0.jpeg\n\nFigure 1. ROC curves on the clinical sample for various questionnaire based autism screening techniques, ordered from the least to most sophisticated. Note that unlike Figures 2 through 3 and 4, 168 children are included in this sample (six children did not have videos available).\n\nimg-1.jpeg\n\nFigure 2. ROC curves on the clinical sample for the questionnaire and the video based algorithms, separately and in combination. The established screening tools MCHAT and CBCL are included as baselines.\n\nimg-2.jpeg\n\n\nContext: Discussion of novel machine learning algorithms for autism screening, including challenges, solutions, and potential applications.", "metadata": { "doc_id": "Abbas_2018_22", "source": "Abbas_2018" } }, { "page_content": "Text: img-2.jpeg\n\nFigure 3. ROC curves on the clinical sample for the questionnaire and the video based algorithms, separately and in combination. Inconclusive determination is allowed for up to $25 \\%$ of the cases. The established screening tools MCHAT and CBCL are included as baselines.\n\nimg-3.jpeg\n\nFigure 4. ROC curves on the clinical results for children under four years of age, for the questionnaire and the video based algorithms, as well as the combination. Comparisons with the established (nonmachine learning) screening tools MCHAT and CBCL are also shown.\n\nimg-4.jpeg\n\nFigure 5. ROC curves on the clinical results for children under four years of age, for the questionnaire and the video based algorithms, as well as the combination, restricted to the children who were not determined to have an inconclusive outcome (tuned to have at most $25 \\%$ allowed to be inconclusive). Comparisons with the established (nonmachine learning) screening tools MCHAT and CBCL are also shown. including screening for cognitive conditions such as dementia for the elderly and physical conditions such as concussions in adults. Further, we expect that these methods would apply well to any other survey based domain in which the application context is different from the training context.\n\nSignificant further improvements may be possible. Initial studies have identified probable improvements to the machine learning methodology as well as improved methods for handling the biases between the training data and application settings. A new clinical trial with larger sample sizes is underway that will make it possible to validate new improvements resulting from these studies as well as to improve confidence in the high performance of our algorithms.\n\nCONCLUSION\n\n\nContext: Clinical trial results and performance of machine learning algorithms for autism screening.", "metadata": { "doc_id": "Abbas_2018_23", "source": "Abbas_2018" } }, { "page_content": "Text: CONCLUSION\n\nMachine learning can play a very important role in improving the effectiveness of behavioral health screeners. We have achieved a significant improvement over established screening tools for autism in children as demonstrated in a multi-center clinical trial. We have also shown some important pitfalls when applying machine learning in this domain, and quantified the benefit of applying proper solutions to address them.\n\nFUNDING\n\nThis research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.\n\nCOMPETING INTERESTS\n\nAll authors are affiliated with Cognoa Inc. in an employment and/or advisory capacity.\n\nCONTRIBUTORS\n\nAll listed authors contributed to the study design as well as the drafting and revisions of the paper. All authors approve of the final version of the paper to be published and agree to be accountable for all aspects of the work.\n\nSUPPLEMENTARY MATERIAL\n\nSupplementary material is available at Journal of the American Medical Informatics Association online.\n\nREFERENCES\n\nDurkin MS, Maenner MJ, Meaney FJ. Socioeconomic inequality in the prevalence of autism spectrum disorder: evidence from a U.S. crosssectional study. PLoS One 2010; 5 (7): e11551.\n\nChristensen DL, Baio J, Braun KV, et al. Prevalence and characteristics of autism spectrum disorder among children aged 8 years—autism and developmental disabilities monitoring network, 11 sites, United States, 2012. MMWR Surveill Summ 2016; 65 (3): 1-23.\n\nZwaigenbaum L, Bryson S, Lord C, et al. Clinical assessment and management of toddlers with suspected autism spectrum disorder: insights from studies of high-risk infants. Pediatrics 2009; 123 (5): $1383-91$.\n\nBernierMao RA, Yen J. Diagnosing autism spectrum disorders in primary care. Practitioner 2011; 255 (1745): 27-30.\n\nAchenbach TM, Rescorla LA. Manual for the ASEBA School-Age Forms o Profiles. Burlington, VT: University of Vermont, Research Center for Children, Youth, \\& Families. 2001.\n\n\nContext: Concluding remarks of a research paper on improving autism screening using machine learning.", "metadata": { "doc_id": "Abbas_2018_24", "source": "Abbas_2018" } }, { "page_content": "Text: Achenbach TM, Rescorla LA. Manual for the ASEBA School-Age Forms o Profiles. Burlington, VT: University of Vermont, Research Center for Children, Youth, \\& Families. 2001.\n\nLord C, Rutter M, Le Couteur A. Autism diagnostic interview-revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J Autism Dev Disord 1994; 24 (5): 659-85.\n\nLord C, Rutter M, Goode S, et al. Autism diagnostic observation schedule: a standardized observation of communicative and social behavior. J Autism Dev Disord 1989; 19 (2): 185-212.\n\nLord C, Petkova E, Hus V, et al. A multisite study of the clinical diagnosis of different autism spectrum disorders. Arch Gen Psychiatry 2012; 69 (3): $306-13$.\n\nWall DP, Dally R, Luyster R, et al. Use of artificial intelligence to shorten the behavioral diagnosis of autism. PLoS One 2012; 7 (8): e43855.\n\nDuda M, Kosmicki JA, Wall DP, et al. Testing the accuracy of an observation-based classifier for rapid detection of autism risk. Transl Psychiatry 2014; 4 (8): e424.\n\nDudaDaniels MJ, Wall DP. Clinical evaluation of a novel and mobile autism risk assessment. J Autism Dev Disord 2016; 46 (6): 1953-1961.\n\nFusaro VA, Daniels J, Duda M, et al. The potential of accelerating early detection of autism through content analysis of youtube videos. PLoS One 2014; 16;9 (4): e93533.\n\nCognoa, Inc. Palo Alto: CA. https://www.cognoa.com/.\n\n\nContext: List of references cited in the paper.", "metadata": { "doc_id": "Abbas_2018_25", "source": "Abbas_2018" } }, { "page_content": "Text: Behavior and interaction imaging at 9 months of age predict autism/intellectual disability in high-risk infants with West syndrome\n\nLisa Ouss ${ }^{1}$, Giuseppe Palestra ${ }^{2}$, Catherine Saint-Georges ${ }^{2,3}$, Marluce Leitgel Gille ${ }^{1}$, Mohamed Afshar ${ }^{4}$, Hugues Pellerin ${ }^{2}$, Kevin Bailly ${ }^{2}$, Mohamed Chetouani ${ }^{2}$, Laurence Robel ${ }^{1}$, Bernard Golse ${ }^{1}$, Rima Nabbout ${ }^{5}$, Isabelle Desguerre ${ }^{5}$, Mariana Guergova-Kuras ${ }^{4}$ and David Cohen ${ }^{2,3}$\n\nAbstract\n\nAutomated behavior analysis are promising tools to overcome current assessment limitations in psychiatry. At 9 months of age, we recorded 32 infants with West syndrome (WS) and 19 typically developing (TD) controls during a standardized mother-infant interaction. We computed infant hand movements (HM), speech turn taking of both partners (vocalization, pause, silences, overlap) and motherese. Then, we assessed whether multimodal social signals and interactional synchrony at 9 months could predict outcomes (autism spectrum disorder (ASD) and intellectual disability (ID)) of infants with WS at 4 years. At follow-up, 10 infants developed ASD/ID (WS+). The best machine learning reached $76.47 \\%$ accuracy classifying WS vs. TD and $81.25 \\%$ accuracy classifying WS+ vs. WS-. The 10 best features to distinguish WS+ and WS- included a combination of infant vocalizations and HM features combined with synchrony vocalization features. These data indicate that behavioral and interaction imaging was able to predict ASD/ ID in high-risk children with WS.\n\nIntroduction\n\n\nContext: Machine learning applications to predict autism/intellectual disability in high-risk infants using behavioral and interaction imaging.", "metadata": { "doc_id": "22_Ouss_ASD_0", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Introduction\n\nBehavior and interaction imaging is a promising domain of affective computing to explore psychiatric conditions ${ }^{1-3}$. Regarding child psychiatry, many researchers have attempted to identify reliable indicators of neurodevelopmental disorders (NDD) in high-risk populations (e.g., siblings of children with autism) during the first year of life to recommend early interventions ${ }^{4,5}$. However, social signals and any alterations of them are very difficult to identify at such a young age ${ }^{6}$. In addition, exploring the quality and dynamics of early interactions is a complex endeavor. It usually requires (i) the perception and integration of multimodal social signals and (ii) an understanding of how two\n\n[^0]interactive partners synchronize and proceed in turn taking ${ }^{7,8}$. Affective computing offers the possibility to simultaneously analyze the interaction of several partners while considering the multimodal nature and dynamics of social signals and behaviors ${ }^{9}$. To date, few seminal studies have attempted to apply social signal processing to mother-infant interactions with or without a specific condition, and these studies have focused on speech turns (e.g., Jaffe et al. ${ }^{10}$ ), motherese ${ }^{11}$, head movements ${ }^{12}$, hand movements ${ }^{13}$, movement kinematics ${ }^{2}$, and facial expressions ${ }^{3}$. Here, we focused on West syndrome (WS), a rare epileptic encephalopathy with early onset (before age 1 year) and a high risk of NDD outcomes, including one-third of WS children showing later autism spectrum disorder (ASD) and/or intellectual disability (ID). We recruited 32 infants with WS and 19 typically developing (TD) controls to participate in a standardized early mother-infant\n\n\nContext: Application of affective computing to analyze mother-infant interactions, specifically focusing on West syndrome (WS) and its potential link to autism spectrum disorder (ASD) and intellectual disability (ID).", "metadata": { "doc_id": "22_Ouss_ASD_1", "source": "22_Ouss_ASD" } }, { "page_content": "Text: [^1] [^0]: Correspondence: Lisa Ouss (lisa.ouss@aphp.fr) or David Cohen (david.cohen@aphp.fr) ${ }^{1}$ Service de Psychiatrie de l'Enfant, AP-HP, Hôpital Necker, 149 rue de Sèvres, 75015 Paris, France ${ }^{2}$ Institut des Systèmes Intelligents et de Robotique, CNRS, UMR 7222, Sorbonne Université, 4 Place Jussieu, 75252 Paris Cedex, France Full list of author information is available at the end of the article\n\n[^1]: (c) The Author(s) 2020\n\nOpen Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.\n\ninteraction protocol and followed infants with WS to assess outcomes at 4 years of age. We aim to explore whether multimodal social signals and interpersonal synchrony of infant-mother interactions at 9 months could predict outcomes.\n\nMaterials and methods\n\nDesign, participants, and clinical measures\n\n\nContext: Introduction to a research article investigating the prediction of outcomes for infants with Williams syndrome using multimodal social signals and interpersonal synchrony analysis of infant-mother interactions.", "metadata": { "doc_id": "22_Ouss_ASD_2", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Materials and methods\n\nDesign, participants, and clinical measures\n\nWe performed a prospective follow-up study of infants with WS ${ }^{14}$. The Institutional Review Board (Comité de Protection des Personnes from the Groupe-Hospitalier Necker Enfants Malades) approved the study, and both parents gave written informed consent after they received verbal and written information on the study. They were asked to participate to a follow-up study to assess outcome of WS taking into account development, early interaction, genetics and response to pharmacological treatment ${ }^{14}$. The study was conducted from November 2004 to March 2010 in the Neuro-pediatrics Department Center for Rare Epilepsia of Necker Enfants-Malades Hospital, Paris. Of the 41 patients screened during the study period, we enrolled all but two cases $(N=39)$ with WS. Seven patients dropped out before the age of 3 leading to a sample of 32 patients with detailed follow-up data. Typical developing infants $(N=19)$ were recruited from Maternal and Infant Prevention institutions, in pediatric consultations, or by proxy.\n\n\nContext: Study design and participant recruitment for a prospective follow-up study of infants with West syndrome (WS) and a comparison group of typically developing infants.", "metadata": { "doc_id": "22_Ouss_ASD_3", "source": "22_Ouss_ASD" } }, { "page_content": "Text: To assess neurodevelopmental outcomes, we focused on ID and ASD. ID was assessed through the Brunet-Lézine Developmental Examination, performed for all children at the age of 3 years. The Brunet-Lézine Developmental Examination estimates a developmental quotient (DQ) based upon normative data available for 3-year-old French toddlers ${ }^{15}$. The diagnosis of autism was based upon several measurements and an expert assessment that was blind to other variables: (i) At 3 years of age, all parents completed the Autism Diagnostic InterviewRevised (ADI-R) to assess autism signs by dimensions and developmental delay ${ }^{16}$. (ii) At 2 and 3 years of age, all patients were assessed with the Children's Autism Rating Scale (CARS) ${ }^{17}$. (iii) An expert clinician (LR) who was blind to child history assessed autism and ID from 20-min videotapes of child/mother play at 2 years of age. Finally, diagnoses of ASD and/or ID at age 4 were based upon a consensus approach using direct assessment of the child by a clinician with expertise in autism (LO) as well as by clinical information from the CARS, ADI-R, and DQ.\n\nVideo recordings\n\nInfant-mother interactions were assessed between 9 and 12 months of age during a play session (Fig. 1). Two synchronized cameras (face and profile; Fig. S1A) recorded the movements in two dimensions while the infant was sitting in a baby chair. Audio interactions were also\n\nimg-0.jpeg\n\nFig. 1 Pipeline of our machine learning approach to classify WS vs. TD.\n\n\nContext: Methods section describing the assessment of intellectual disability and autism spectrum disorder in a study of infant-mother interactions.", "metadata": { "doc_id": "22_Ouss_ASD_4", "source": "22_Ouss_ASD" } }, { "page_content": "Text: img-0.jpeg\n\nFig. 1 Pipeline of our machine learning approach to classify WS vs. TD.\n\nrecorded. The standardized situation encompassed three sequences of 3 min : (sequence 1) free play after instructing the mother to interact \"as usual\" without any toy; (sequence 2) free play using the help of a toy (Sophie the giraffe); (sequence 3) mother singing to her baby. Due to the position of the baby chair on the floor and the mother's seated position, the mother was positioned slightly higher in all of the recordings. The mother's indicated position was on the left of the child as shown on the picture, but exceptions were sometimes observed during the recordings. For infant hand movement (HM) features, 1 min was extracted from each 3-min video and all recordings, according to two criteria: the child's hands should be visible for at least part of the sequence (e.g., the mother is not leaning on the child), and the minute represented the greatest amount of interaction between the mother and the child. For audio and speech turntaking computing, we only used the 3-min audio recording of sequence 1.\n\nVision computing (Fig. S1B, vision computing panel)\n\n\nContext: A description of the methodology used to collect data for a machine learning approach to classify West Syndrome (WS) versus typical development (TD).", "metadata": { "doc_id": "22_Ouss_ASD_5", "source": "22_Ouss_ASD" } }, { "page_content": "Text: To process infant hand movements (HM), we used the methods developed in Ouss et al. ${ }^{13}$. Here, we summarize the successive steps to calculate HM features. In step 1 (hand trajectory extraction and data processing), the twodimensional coordinates of the hand were extracted from each of the video recordings by tracking a wristband on the right hand (yellow in Fig. S1A, video-audio recording panel). The tracking framework comprised three steps: prediction, observation, and estimation as proposed in ref. ${ }^{18}$. As the hand motion was highly nonlinear, we developed an approach using a bootstrap-based particle filter with a first-order model to address abrupt changes in direction and speed ${ }^{19,20}$. To address hand occlusion, we implemented an approach combining tracking with detection by adding a boolean variable to the state vector associated with each particle ${ }^{18}$. Each extracted trajectory consisted of 1500 pairs of $x$ and $y$ coordinates ( 25 frames per second, generating 1500 pairs of coordinates in the 60 s ; see Fig. S1 left panel, vision computing). The frames where the hand was not visible were clearly indicated in each trajectory as missing coordinates for these time points. To account for differences in the camera zoom parameters, the trajectories obtained were normalized using a fixed reference system present in the settings of each video recording. The normalization was performed on all trajectories, and $95 \\%$ of the normalization factors ranged between 0.8 and 1.22 with a few outlier trajectories that required greater correction. Forty-one percent of the trajectories required $<5 \\%$ correction. Although the recordings between the two cameras were synchronized and in principle allowed 3D reconstruction of the trajectory, the accumulation of missing data prevented such reconstruction. However, 2D motion capture with appropriately defined movement descriptors can be powerful for detecting clinically relevant changes ${ }^{21}$, thereby justifying the independent analysis of the\n\n\nContext: Methods for analyzing infant hand movements using video recordings and particle filters.", "metadata": { "doc_id": "22_Ouss_ASD_6", "source": "22_Ouss_ASD" } }, { "page_content": "Text: However, 2D motion capture with appropriately defined movement descriptors can be powerful for detecting clinically relevant changes ${ }^{21}$, thereby justifying the independent analysis of the 2D-trajectory videos (see Fig. S1B, vision computing, 2 d panel on the left). In step 2, the descriptors of the HM were calculated from the planar trajectories (Fig. S1B, table shown in the vision computing panel). Descriptors covered those already reported in the literature as important in characterizing infants' $\\mathrm{HM}^{21}$. (1) To describe the space explored by the hand, we calculated the maximum distance observed on the two axes (xRange, yRange) and the standard deviation of the X and Y coordinates observed during the 60 s (xSd, ySd). We also calculated the maximum distance between any two points of the trajectory using the FarthestPair java library (http://algs4.cs. princeton.edu/code/) (Fig. S1B, vision computing panel, red line in the third panel from the left). (2) To evaluate HM dynamics, we calculated the velocity and acceleration. (3) Also related to HM dynamics, we calculated HM pauses defined as part of the trajectory in which the velocity was lower than a specific threshold for a minimum duration of 4 s . (4) Finally, the curvature of the trajectories was calculated using a standard definition of the curvature $(\\kappa)$ of plane curves in Cartesian coordinates as $\\gamma(t)=(x(t), y(t))$. The curvature calculated at each point of the trajectory is presented in the right panel of Fig. S1B (video computing), where the first 1.2 s of the trajectory are plotted and the associated calculated curvatures at each point (and respective time, indicated on the axis) are presented as columns.\n\n\nContext: The document describes a study using video analysis to detect early signs of autism in infants, and this chunk details the specific methods used to analyze 2D hand movements (hand movements, or HM) extracted from video.", "metadata": { "doc_id": "22_Ouss_ASD_7", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Audio computing (Fig. S1C, audio computing)\n\nWe extracted two types of audio social signals from the audio channel of the mother-infant interaction: speech turn taking (STT) and motherese. For STT extraction, we followed the methods developed by Weisman et al. ${ }^{22}$ and Bourvis et al. ${ }^{23}$ (Fig. S1, audio computing). First, we used ELAN to segment the infants' and mothers' speech turns and annotate the dialog acts. Mothers' audio interactions were categorized as mother vocalization (meaningful vocalizations, laugh, singing, animal sounds) or other noise (clap hands, snap fingers or snap the tongue, mouth noise, etc.). Similarly, infants' audio production was defined as infant vocalization (babbling vocalizations, laugh, and cry) or atypical vocalization (other noise such as \"rale\"). The infant's and mother's utterances were labeled by two annotators (blind to group status). Cohen's kappa between the two annotators was calculated for each dyad, each task and each item of the grid. For all items, the kappa values were between 0.82 and 1. From the annotation, we extracted all the speech turns of the infant and the mother. A speech turn is a continuous stream of speech with $<150 \\mathrm{~ms}$ of silence. We\n\n\nContext: Methods for analyzing audio signals from mother-infant interactions.", "metadata": { "doc_id": "22_Ouss_ASD_8", "source": "22_Ouss_ASD" } }, { "page_content": "Text: obtained a list of triples: speaker label (infant or mother), start time, and duration of speech turn. From these triples, we also deduced the start time and duration of the time segments when the mother or the infant were not speaking (pauses). Therefore, we extracted Mother Vocalizations; Mother Other Noise; Infant Vocalizations; Infant Atypical Vocalizations; Mother Pauses; Infant Pauses. We also extracted three dyadic features: (1) Silence defined as sequences of time during which neither participant was speaking for more than 150 ms ; (2) Overlap Ratio defined as the duration of vocalization overlaps between mothers and infants divided by the duration of the total interaction. This ratio measures the proportion of interactional time in which both participants were simultaneously vocalizing; (3) Infant Synchrony Ratio defined as the number of infants' responses to their mother's vocalization within a time limit of 3 s divided by the number of mother vocalizations during the time paradigm. The 3 -s window was based on the available literature on synchrony ${ }^{7,24}$. From the mother vocalizations, we also computed affective speech analysis, as previous work has shown that motherese may shape parent-infant interactions ${ }^{25}$. The segments of mother vocalizations were analyzed using a computerized classifier for categorization as \"motherese\" or \"non-motherese/other speech\" initially developed to analyze home movies ${ }^{11}$. The system exploits the fusion of two classifiers, namely, segmental and suprasegmental ${ }^{26}$. Consequently, the utterances are characterized by both segmental (Mel frequency cepstrum coefficients) and suprasegmental/prosodics (e.g., statistics with regard to fundamental frequency, energy, and duration) features. The detector used the GMM (Gaussian mixture model) classifier for both segmental and suprasegmental features ( $M$, number of Gaussians for the GMM Classifier: $M=12$ and 15 , respectively, and $\\lambda=$ weighting coefficient used in the equation fusion:\n\n\nContext: Analysis of vocal and non-vocal behaviors during parent-infant interactions, including feature extraction of vocalizations, pauses, and synchrony.", "metadata": { "doc_id": "22_Ouss_ASD_9", "source": "22_Ouss_ASD" } }, { "page_content": "Text: for both segmental and suprasegmental features ( $M$, number of Gaussians for the GMM Classifier: $M=12$ and 15 , respectively, and $\\lambda=$ weighting coefficient used in the equation fusion: $\\lambda=0.4$ ). For the purpose of the current study, we explored the performance of our motherese classifier in French mothers. We analyzed 200 sequences from French mothers ( 100 motherese vs. 100 other speech) that were blindly validated by two psycholinguists. We calculated the Intraclass correlation (ICC) between the two raters (the expert and the algorithm) and found a good and very significant ICC (ICC $=$ $0.79(95 \\%$ CI: $0.59-0.90), p<0.001$ ). This level of prediction made it suitable for further analysis of the entire data set. Based on this automatic detection of motherese, we created two subclasses for mother vocalizations: motherese vs. non-motherese. Two variables were derived: Motherese Ratio (duration of motherese vocalization/ duration of interaction) and Non-motherese Ratio (duration of non-motherese vocalization/duration of interaction). We also derived two synchrony ratios: Synchrony\n\n\nContext: Evaluation of a motherese classifier in French mothers, including validation and derivation of variables like Motherese Ratio and Synchrony.", "metadata": { "doc_id": "22_Ouss_ASD_10", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Motherese Ratio and Synchrony Non-motherese Ratio, which reflect the ratio of time during which the infant vocalizes in response to his/her mother motherese and other speech (non-motherese).\n\nPrediction of the outcome using machine learning\n\nThe pipeline of our approach is shown in Fig. 1. First, a data quality analysis was performed to ensure the validity of the data. As expected, all data were available for audio analysis. However, a substantial proportion of the data were discarded due to video recording or vision computing issues. We finally kept 18 video recordings for the WS and 17 videos for the TD groups. Second, given the number of features ( 21 infant HM for each camera and each sequence; 16 STT) compared with the data set ( 32 WS and 19 TD), we reduced our data using principal component analysis (PCA). Third, we tested several algorithms to classify WS vs. TD based on the whole data set available for both vision and audio computing features (leave one out) (Table S1). The best algorithm was decision stump ${ }^{27}$. All results presented here are based on the classification with a decision stump algorithm. We also analyzed WS with ID/ASD (WS+) vs. WS without ID/ ASD (WS-). For each classification, we also extracted a confusion matrix and explored which individual features contributed the most to a given classification using Pearson correlations.\n\nResults\n\nTable S2 summarizes the demographic and clinical characteristics of children with WS. At follow-up, 10 infants out of 32 children with WS developed ASD/ID (WS+). Eight children had ASD and ID, whereas 2 had only ID. As expected, all variables related to ASD and ID were significantly different in WS+ compared with WS-.\n\n\nContext: Machine learning classification of Williams Syndrome (WS) infants predicting Autism Spectrum Disorder (ASD)/Intellectual Disability (ID) outcomes, using audio and video features.", "metadata": { "doc_id": "22_Ouss_ASD_11", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Figure 2a summarizes the best classification models using the decision stump algorithm (leave one out). As shown, multimodal classification outperformed unimodal classification to distinguish WS and TD. Therefore, we only used the multimodal approach to classify WS+ vs. WS-. The best model reached $76.47 \\%$ accuracy classifying WS vs. TD and $81.25 \\%$ accuracy classifying WS+ vs. WS- based on multimodal features extracted during early interactions. Interestingly, the confusion matrices (Fig. 2b) show that when classifying WS vs. TD, all errors came from TD being misclassified as WS $(N=12)$; when classifying WS+ vs. WS-, most errors came from WS+ being misclassified as WS- $(N=5)$. Table 1 lists the best features for each multimodal classification based on the Pearson correlation values. The best features to distinguish WS and TD included four infant HM features, 1 mother audio feature. In contrast, the best features to distinguish WS+ and WS- included a combination of infant vocalization features $(N=2)$,\n\nimg-1.jpeg\n\nFig. 2 Machine learning classification of WS vs. TD and WS+ vs. WS- based on uni- and multimodal features extracted during early infant-mother interaction.\n\nTable 1 Best features for classification (based on significant Pearson's correlation between feature and class).\n\n\nContext: Machine learning classification results of infant-mother interactions to distinguish between typically developing (TD) infants and those at risk for autism (WS), further differentiating those exhibiting early signs (WS+) versus those without (WS-).", "metadata": { "doc_id": "22_Ouss_ASD_12", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Table 1 Best features for classification (based on significant Pearson's correlation between feature and class).\n\nFeature characteristics Pearson $\\boldsymbol{r}$ $\\boldsymbol{p}$-value West vs. Typical developing Ratio of all maternal audio intervention during free interaction Audio, mother 0.35 0.012 Total number of infant HM pauses (side view camera) during free interaction Video, infant 0.34 0.014 Total number of infant HM pauses (side view camera) when the mother is singing Video, infant 0.32 0.023 Vertical amplitude of the giraffe (front view camera) Video, infant -0.30 0.032 Movement acceleration max (side view camera) during free interaction Video, infant 0.29 0.034 West with ASD/ID vs. West without ASD/ID Total number of all infant vocalization during free interaction Audio, infant -0.56 $<0.001$ Synchrony ratio (infant response to mother) Audio, synchrony -0.55 $<0.001$ Ratio of all infant vocalization during free interaction Audio, infant -0.55 0.001 Motherese synchrony ratio (infant response to motherese) Audio, synchrony -0.54 0.002 Non-motherese synchrony ratio (infant response to non-motherese) Audio, synchrony -0.48 0.005 HM acceleration SD (front view camera) during the giraffe interaction Video, infant -0.46 0.008 HM acceleration max (side view camera) during the giraffe interaction Video, infant -0.45 0.01 HM velocity SD (front view camera) during the giraffe interaction Video, infant -0.43 0.014 Curvature max (side view camera) during the giraffe interaction Video, infant -0.37 0.039 Relative time spent motionless (pause) (front view camera) during free interaction Video, infant 0.36 0.04\n\nHM hand movement, ASD autism spectrum disorder, ID intellectual disability, SD standard deviation.\n\nsynchrony vocalization features $(N=3)$ and infant HM features $(N=5)$, the last of which showed lower correlation scores.\n\nDiscussion\n\n\nContext: A table presenting the best features for classifying infants based on Pearson correlation analysis, followed by a discussion of vocalization and hand movement features.", "metadata": { "doc_id": "22_Ouss_ASD_13", "source": "22_Ouss_ASD" } }, { "page_content": "Text: synchrony vocalization features $(N=3)$ and infant HM features $(N=5)$, the last of which showed lower correlation scores.\n\nDiscussion\n\nTo the best of our knowledge, this is the first study to apply multimodal social signal processing to mother-infant interactions in the context of WS. Combining speech turns and infant HM during an infant-mother interaction at 9 months significantly predicted the development of ASD or severe to moderate ID at 4 years of age in the high-risk children with WS. Confusion matrices showed that the classification errors were not random, enhancing the interest of the computational method proposed here. In addition, the best contributing features for the performed classifications differed when classifying WS vs. TD and WS+ vs. WS-. Infant HMs were the most significant features to distinguish WS versus TD, probably reflecting the motor impact due to acute WS encephalopathy. For classifying WS+ vs. WS-, the contribution of infant audio features and synchrony features became much more relevant combined with several HM features.\n\n\nContext: A study using multimodal social signal processing to predict ASD or ID in high-risk children with West syndrome (WS).", "metadata": { "doc_id": "22_Ouss_ASD_14", "source": "22_Ouss_ASD" } }, { "page_content": "Text: We believe that the importance of synchrony and reciprocity during early interactions is in line with recent studies that have investigated the risk of ASD or NDD during the first year of life from home movies (e.g., refs. ${ }^{11,24}$ ), from prospective follow-up of high-risk infants such as siblings (e.g., refs. ${ }^{4,28}$ ) or infants with WS (e.g., ref. ${ }^{14}$ ), and from prospective studies assessing tools to screen risk for autism (e.g., ref. ${ }^{29}$ ). In the field of ASD, synchrony, reciprocity, parental sensitivity, and emotional engagement are now proposed as targets of early interventions ${ }^{30}$, which could prevent early interactive vicious circles. Parents of at-risk infants try to compensate for the lack of interactivity of their child by modifying their stimulation and therefore sometimes reinforcing the dysfunctional interactions ${ }^{24}$. Early identification of these interactive targets is especially useful among babies with neurological comorbidities because delays in developmental milestones and impairments in early social interactions are not sufficient to predict ASD.\n\nSimilarly, we believe that the importance of HM in distinguishing WS vs. TD on one hand, and WS+ vs. WS- on the other hand, is also in line with the studies that investigated the importance of non-social behaviors for investigating the risk of ASD or NDD during the first year of life. For example, studying home movies, Purpura et al. found more bilateral HM and finger movements in infants who will later develop ASD ${ }^{31}$. Similarly, several prospective follow-up studies of high-risk siblings ${ }^{32-35}$ or retrospective studies on home movies ${ }^{36,37}$ reported specific motor atypical repertoire in infants with ASD.\n\n\nContext: Early identification of interactive targets and atypical motor behaviors in infants at risk for ASD or NDD, drawing on research from home movies, sibling studies, and screening tools.", "metadata": { "doc_id": "22_Ouss_ASD_15", "source": "22_Ouss_ASD" } }, { "page_content": "Text: In ASD, early social signals have previously been assessed with automatized and computational procedures, focusing on eye tracking at early stages ${ }^{38-40}$, vocal productions ${ }^{41}$, analysis of acoustics of first utterances or cry episodes ${ }^{42}$, but none was done in an interactive setting. Our study proposed a paradigm shift from the assessment of infant behavior to dyadic assessment of interactions, as previously achieved in retrospective approaches using home movies ${ }^{24}$. The aim is not to implement studies of social signal processing in routine clinical work but rather to decompose clinical intuitions and signs and validate the most relevant cues of these clinical features. From clinical work, back to clinics, social signal processing is a rigorous step to help clinicians better identify and assess early targets of interventions.\n\nGiven the exploratory nature of both our approach and method, our results should be interpreted with caution taking into account strengths (prospective follow-up, automatized multimodal social signal processing, and ecological standardized assessment) and limitations. These limitations include (1) the overall sample size knowing that WS is a rare disease; (2) the high rate of missing data during video recording due to the ecological conditions of the infant-mother interaction (mothers interposing between the camera and the infant); the final sample size of WS+ $(N=10)$ that limited the power of machine learning methods.\n\nWe conclude that the method proposed here combining multimodal automatized assessment of social signal processing during early interaction with infants at risk for NDD is a promising tool to decipher clinical features that remain difficult to identify and assess. In the context of WS, we showed that such a method we proposed to label 'behavioral and interaction imaging' was able to significantly predict the development of ASD or ID at 4 years of age in high-risk children who had WS and were assessed at 9 months of age.\n\nAcknowledgements\n\n\nContext: A study investigating automated social signal processing during early infant-mother interactions to predict autism spectrum disorder (ASD) and intellectual disability (ID) in children with Williams syndrome (WS).", "metadata": { "doc_id": "22_Ouss_ASD_16", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Acknowledgements\n\nThe authors thank all of the patients and families who participated in this study. The study was funded by the EAD5 foundation (PILE), by the Agence Nationale de la Recherche (ANR-12-SAMA-006-1) and the Groupement de Recherche en Psychiatrie (GDR-3557). It was partially performed in the Labex SMART (ANR-11-LABX-65), which is supported by French state funds and managed by the ANR in the Investissements d'Avenir program under reference ANR-11-IDEX-0004-02. The sponsors had no involvement in the study design, data analysis, or interpretation of the results.\n\nAuthor details\n\n${ }^{1}$ Service de Psychiatrie de l'Enfant, AP-HP, Hôpital Necker, 149 rue de Sèvres, 75015 Paris, France. ${ }^{2}$ Institut des Systèmes Intelligents et de Robotique, CNRS, UMR 7222, Sorbonne Université, 4 Place Jussieu, 75252 Paris Cedex, France. ${ }^{3}$ Département de Psychiatrie de l'Enfant et de l'Adolescent, AP-HP, Hôpital Pité-Salpêtrière, 47-83, Boulevard de l'Hôpital, 75651 Paris, Cedex 13, France. ${ }^{4}$ Ariana Pharmaceuticals, Research Department, Paris, France. ${ }^{5}$ Service de Neuropédiatrie, AP-HP, Hôpital Necker, 136, Rue de Vaugirard, 75015 Paris, France\n\nConflict of interest\n\nThe authors declare that they have no conflict of interest.\n\nPublisher's note\n\nSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.\n\nSupplementary Information accompanies this paper at [https://doi.org/ 10.1038/s41398-020-0743-8).\n\nReceived: 7 December 2019 Revised: 13 January 2020 Accepted: 16 January 2020 Published online: 03 February 2020\n\nReferences\n\nSpodenkewicz, M. et al. Distinguish self- and hetero-perceived stress through behavioral imaging and physiological features. Prog. Neuropsychopharmacol. Biol. Psychiatry 82, 107-114 (2018).\n\nLeclere, C. et al. Interaction and behaviour imaging: a novel method to measure mother-infant interaction using video 3D reconstruction. Transl. Psychiatry 6, e816 (2016).\n\n\nContext: This chunk details acknowledgements, funding sources, author affiliations, conflict of interest declarations, and references for a research paper on interaction and behavior imaging of mother-infant interaction.", "metadata": { "doc_id": "22_Ouss_ASD_17", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Leclere, C. et al. Interaction and behaviour imaging: a novel method to measure mother-infant interaction using video 3D reconstruction. Transl. Psychiatry 6, e816 (2016).\n\nMessinger, D. S., Mahoor, M. H., Chow, S. M. \\& Cohn, J. F. Automated measurement of facial expression in infant-mother interaction: a pilot study. Infancy 14, 285-305 (2009).\n\nWan, M. W. et al. Parent-infant interaction in infant siblings at risk of autism. Res Dev. Disabil. 33, 924-932 (2012).\n\nRogers, S. J. et al. Autism treatment in the first year of life: a pilot study of infant start, a parent-implemented intervention for symptomatic infants. J. Autism Dev. Disord. 44, 2981-2995 (2014).\n\nZwaigenbaum, L., Bryson, S. \\& Garon, N. Early identification of autism spectrum disorders. Behav. Brain Res 251, 133-146 (2013).\n\nFeldman, R. Parent-infant synchrony and the construction of shared timing: physiological precursors, developmental outcomes, and risk conditions. J. Child Psychol. Psychiatry 48, 329-354 (2007).\n\nDelaherche, E. et al. Interpersonal synchrony: a survey of evaluation methods across disciplines. IEEE Trans. Affect Comput 3, 349-365 (2012).\n\nVinciarelli, A., Pantic, M. \\& Bourlard, H. Social signal processing: survey of an emerging domain. Image Vis. Comput 27, 1743-1759 (2009).\n\nJaffe, J., Beebe, B., Feldstein, S., Crown, C. L. \\& Jasnow, M. D. Rhythms of dialogue in infancy: coordinated timing in development. Monogr. Soc. Res Child Dev. 66, 1-132 (2001).\n\nCohen, D. et al. Do parentese prosody and fathers' involvement in interacting facilitate social interaction in infants who later develop autism? PLoS ONE 8, e61402 (2013).\n\nHammal, Z., Cohn, J. F. \\& Messinger, D. S. Head movement dynamics during play and perturbed mother-infant interaction. IEEE Trans. Affect Comput. 6, 361-370 (2015).\n\nOuss, L. et al. Developmental trajectories of hand movements in typical infants and those at risk of developmental disorders: an observational study of kinematics during the first year of life. Front Psychol. 9, 83 (2018).\n\n\nContext: Methods for analyzing mother-infant interaction using video and automated analysis.", "metadata": { "doc_id": "22_Ouss_ASD_18", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Ouss, L. et al. Taking into account infant's engagement and emotion during early interactions may help to determine the risk of autism or intellectual disability in infants with West syndrome. Eur. Child Adolesc. Psychiatry 23, 143-149 (2014).\n\nJosse, D. Le marssel BLR-C, Brunet-Lézine Révisé: Echelle de Developpement Psychomoteur de la Première Enfance (EAP, Paris, 1997).\n\nLord, C., Rutter, M. \\& Le Couteur, A. Autism diagnostic interview-revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J. Autism Dev. Disord. 24, 659-685 (1994).\n\nSchopler, E., Reichler, R. J., DeVellis, R. F. \\& Daly, K. Toward objective classification of childhood autism: Childhood Autism Rating Scale (CARS). J. Autism Dev. Disord. 10, 91-103 (1980).\n\nCzyz, J., Ristic, B. \\& Macq, B. A color-based particle filter for joint detection and tracking of multiple objects. In Proceedings (ICASSP '05) IEEE International\n\n\nContext: References cited in a research article on infant engagement and autism risk.", "metadata": { "doc_id": "22_Ouss_ASD_19", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Conference on Acoustics, Speech, and Signal Processing (IEEE, Philadelphia, PA, 2005). 19. Hue, C. Méthodes Séquentielles de Monte Carlo pour le Filtrage non Linéaire Multi-Objets dans un Environnement Bruité: Applications au Pistage Multi-Cibles et à la Trajectographie d'Entités dans des Séquences d'Images 2D. PhD Thesis, Université de Rennes I, Rennes, France (2003). 20. Isard, M. \\& Blake, A. Condensation-conditional density propagation for visual tracking. Int J. Comput Vis. 29, 5-28 (1998). 21. Marcroft, C., Khan, A., Embleton, N. D., Trerell, M. \\& Plotz, T. Movement recognition technology as a method of assessing spontaneous general movements in high risk infants. Front Neurol. 5, 284 (2014). 22. Weisman, O. et al. Dynamics of non-verbal vocalizations and hormones during father-infant interaction. IEEE Trans. Affect Comput 7, 337-345 (2016). 23. Bounis, N. et al. Pre-linguistic infants employ complex communicative loops to engage mothers in social exchanges and repair interaction ruptures. R. Soc. Open Sci. 5, 170274 (2018). 24. Saint-Georges, C. et al. Do parents recognize autistic deviant behavior long before diagnosis? Taking into account interaction using computational methods. PLoS ONE 6, e22393 (2011). 25. Saint-Georges, C. et al. Motherese in interaction at the cross-road of emotion and cognition? (A systematic review). PLoS ONE 8, e78103 (2013). 26. Mahdhaoui, A. et al. Computerized home video detection for motherese may help to study impaired interaction between infants who become autistic and their parents. Int J. Methods Psychiatr. Res. 20, e6-e18 (2011). 27. Iba, W. \\& Langley, P. Induction of one-level decision trees, in Machine Learning: Proceedings of the Ninth International Workshop (eds Sleeman, D. \\& Edwards, P.) 233-240 (Morgan Kaufmann, San Mateo, CA, 1992). 28. Wan, M. W. et al. Quality of interaction between at-risk infants and caregiver at 12-15 months is associated with 3-year autism outcome. J. Child Psychol. Psychiatry 54, 763-771 (2013). 29. Oliac, B. et al. Infant and dyadic\n\n\nContext: A list of references cited in a document concerning autism research and related technologies.", "metadata": { "doc_id": "22_Ouss_ASD_20", "source": "22_Ouss_ASD" } }, { "page_content": "Text: of interaction between at-risk infants and caregiver at 12-15 months is associated with 3-year autism outcome. J. Child Psychol. Psychiatry 54, 763-771 (2013). 29. Oliac, B. et al. Infant and dyadic assessment in early community-based screening for autism spectrum disorder with the PREAUT grid. PLoS ONE 12, e0188831 (2017). 30. Green, J. et al. Parent-mediated intervention versus no intervention for infants at high risk of autism: a parallel, single-blind, randomised trial. Lancet Psychiatry 2, 133-140 (2015). 31. Purpura, G. et al. Bilateral patterns of repetitive movements in 6- to 12-month-old infants with autism spectrum disorders. Front Psychol. 8, e1168 (2017). 32. Loh, A. et al. Stereotyped motor behaviors associated with autism in high-risk infants: a pilot videotape analysis of a sibling sample. J. Autism Dev. Disord. 37, 25-36 (2007). 33. Morgan, L., Wetherby, A. M. \\& Barber, A. Repetitive and stereotyped movements in children with autism spectrum disorders late in the second year of life. J. Child Psychol. Psychiatry 49, 826-837 (2008). 34. Elison, J. T. et al. Repetitive behavior in 12-month-olds later classified with autism spectrum disorder. J. Am. Acad. Child Adolesc. Psychiatry 53, 1216-1224 (2014). 35. Wolff, J. J. et al. Longitudinal patterns of repetitive behavior in toddlers with autism. J. Child Psychol. Psychiatry 55, 945-953 (2014). 36. Phagava, H. et al. General movements in infants with autism spectrum disorders. Georgian Med. N. 156, 100-105 (2008). 37. Libertus, K., Sheperd, K. A., Ross, S. W. \\& Landa, R. J. Limited fine motor and grasping skills in 6-month-old infants at high risk for autism. Child Dev. 85, 2218-2231 (2014). 38. Bedford, R. et al. Precursors to social and communication difficulties in infants at-risk for autism: gaze following and attentional engagement. J. Autism Dev. Disord. 42, 2208-2218 (2012). 39. Elsabbagh, M. et al. What you see is what you get: contextual modulation of face scanning in typical and atypical development. Soc. Cogn. Affect Neurosci. 9, 538-543\n\n\nContext: Studies examining early indicators and interventions for autism spectrum disorder.", "metadata": { "doc_id": "22_Ouss_ASD_21", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Disord. 42, 2208-2218 (2012). 39. Elsabbagh, M. et al. What you see is what you get: contextual modulation of face scanning in typical and atypical development. Soc. Cogn. Affect Neurosci. 9, 538-543 (2014). 40. Jones, W. \\& Klin, A. Attention to eyes is present but in decline in 2-6-monthold infants later diagnosed with autism. Nature 504, 427-431 (2013). 41. Paul, R., Fuemt, Y., Ramsay, G., Chawarska, K. \\& Klin, A. Out of the mouths of babes: vocal production in infant siblings of children with ASD. J. Child Psychol. Psychiatry 52, 598-598 (2011). 42. Sheinkopf, S. J., Iverson, J. M., Rinaldi, M. L. \\& Lester, B. M. Atypical cry acoustics in 6-month-old infants at risk for autism spectrum disorder. Autism Res. 5, 331-339 (2012).\n\n\nContext: Research on early behavioral markers and atypical development in infants at high risk for autism spectrum disorder.", "metadata": { "doc_id": "22_Ouss_ASD_22", "source": "22_Ouss_ASD" } }, { "page_content": "Text: Robot-Assisted Autism Spectrum Disorder Diagnostic Based on Artificial Reasoning\n\nAndrés A. Ramírez-Duque ${ }^{1}$ (1) $\\cdot$ Anselmo Frizera-Neto ${ }^{1} \\cdot$ Teodiano Freire Bastos ${ }^{1}$\n\nAbstract\n\nAutism spectrum disorder (ASD) is a neurodevelopmental disorder that affects people from birth, whose symptoms are found in the early developmental period. The ASD diagnosis is usually performed through several sessions of behavioral observation, exhaustive screening, and manual coding behavior. The early detection of ASD signs in naturalistic behavioral observation may be improved through Child-Robot Interaction (CRI) and technological-based tools for automated behavior assessment. Robot-assisted tools using CRI theories have been of interest in intervention for children with Autism Spectrum Disorder (CwASD), elucidating faster and more significant gains from the diagnosis and therapeutic intervention when compared to classical methods. Additionally, using computer vision to analyze child's behaviors and automated video coding to summarize the responses would help clinicians to reduce the delay of ASD diagnosis. In this article, a CRI to enhance the traditional tools for ASD diagnosis is proposed. The system relies on computer vision and an unstructured and scalable network of RGBD sensors built upon Robot Operating System (ROS) and machine learning algorithms for automated face analysis. Also, a proof of concept is presented, with participation of three typically developing (TD) children and three children in risk of suffering from ASD.\n\nKeywords Child-Robot interaction $\\cdot$ Autism spectrum disorder $\\cdot$ Convolutional neural network $\\cdot$ Robot reasoning model $\\cdot$ Statistical shape modeling\n\n1 Introduction\n\n\nContext: This is an article presenting a robot-assisted diagnostic tool for Autism Spectrum Disorder (ASD) utilizing Child-Robot Interaction (CRI), computer vision, and machine learning.", "metadata": { "doc_id": "1_Ramırez-Duque__0", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Keywords Child-Robot interaction $\\cdot$ Autism spectrum disorder $\\cdot$ Convolutional neural network $\\cdot$ Robot reasoning model $\\cdot$ Statistical shape modeling\n\n1 Introduction\n\nResearch in Child-Robot Interaction (CRI) aims to provide the necessary conditions for the interaction between a child and a robotic device taking into account some fundamental features, such as child's neurophysical and physical condition, and the child's mental health [1]. That is how Robot-Assisted Therapies (RAT) using CRI theories have been of interest as an intervention for CwASD, elucidating faster and more significant gains from the therapeutic intervention when compared to traditional therapies [2-4].\n\nASD is a neurodevelopmental disorder that affects people from birth, and its symptoms are found in the early\n\n[^0]developmental period. Individuals suffering from ASD exhibit persistent deficits in social communication, social interaction and repetitive patterns of behavior, interests, or activities [5]. Some of the ASD signs may be observed before the age of 10 months, although a reliable diagnosis can only be performed at 18 months of age, according to [6], or 24 months according to [7].\n\nThe use of computer vision to analyze the child's behaviors, and automated video coding to summarize the interventions, can help the clinicians to reduce the delay of ASD diagnosis, providing the CwASD with access to early therapeutic interventions. In addition, CRI-based intervention can transform traditional diagnosis methods through a robotic device to systematically elicit child's behaviors that exhibit ASD signs [8].\n\nSome of the first systems developed to assist ASD therapists and make diagnosis based on robotic devices have primarily been open loop and remotely operated systems. However, these approaches are unable to perform autonomous feedback to enhance the interaction [9-11].\n\n[^0]: ㅁ Andrés A. Ramírez-Duque aaramírezd@gmail.com\n\n1 Universidade Federal do Espírito Santo., Av. Fernando Ferrari, 514 (29075-910), Vitoria, Brazil\n\n\nContext: A research paper introducing a system for Child-Robot Interaction (CRI) aimed at assisting in the diagnosis and therapy of Autism Spectrum Disorder (ASD).", "metadata": { "doc_id": "1_Ramırez-Duque__1", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: [^0]: ㅁ Andrés A. Ramírez-Duque aaramírezd@gmail.com\n\n1 Universidade Federal do Espírito Santo., Av. Fernando Ferrari, 514 (29075-910), Vitoria, Brazil\n\nNevertheless, different systems are able to modify the behavior of the robot according to environmental interactions and the child's response, using a closed-loop and artificial cognition approaches [12-16]. These systems have been hypothesized to offer technological mechanisms for supporting more flexible and potentially more naturalistic interaction [17]. In fact, literature reports that automatic robot's social behaviors modulation according to specifics scenarios has a strong effect on child's social behavior [12]. However, despite the increase of positive evidence, this technology has rarely been applied to specific ASD diagnosis.\n\nThis work aims to present a robot-assisted framework using an artificial reasoning module to assist clinicians with the ASD diagnostic process. The framework is composed of a responsive robotic platform, a flexible and scalable vision sensor network, and an automated face analysis algorithm based on machine learning models. In this research we take advantage of some neural models available as open sources projects to build a completely new pipeline algorithm for global recognition and tracking of child's face among many faces present in a typical unstructured clinical intervention, in order to estimate the child's visual focus of attention along the time. The proposed system can be used in different behavioral analysis scenarios typical of an ASD diagnostic process. In order to illustrate the feasibility of the proposed system, in this paper an experimental trial to assess jointattention behavior is presented employing an in-clinic setup (unstructured environment).\n\n\nContext: A research paper describing a robot-assisted framework for assisting clinicians with Autism Spectrum Disorder (ASD) diagnosis, including an experimental trial assessing joint-attention behavior.", "metadata": { "doc_id": "1_Ramırez-Duque__2", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: The main contributions of this paper are: (i) the development of a new artificial reasoning module upon a flexible and scalable ROS-based vision system using state-of-the-art machine learning neural models; (ii) the proposal and implementation of a supervised CRI (childrobot interaction) based on an open source social robotic platform to enhance the traditional tools for ASD diagnosis using an in-clinic setup protocol. For the best of our knowledge, there are no open source projects available for face analysis based on a multi-camera approach using ROS with the characteristics described in our research.\n\n2 Related Work\n\nRecent researches have shown the acceptance and efficiency of technologies used as auxiliary tools for therapy and teaching of individuals with ASD [18-21]. Such technologies may also be useful for people surrounding ASD individuals (therapists, caregivers, family members). For example, the use of artificial vision systems to measure and analyze the child's behavior can lead to alternative screening and monitoring tools that help the clinicians to get feedback from the effectiveness of the intervention [22].\n\nAdditionally, social robots have great potential for aid in the diagnosis and therapy of children with ASD [18, 23]. A higher degree of control, prediction and simplicity may be achieved in interactions with robots, impacting directly on frustration and reducing the anxiety of these individuals [24].\n\nRespect to the use of computer vision techniques, previous studies already analyzed child's behaviors, such as visual attention, eye gaze, eye contact, smile events, and visual exploration using cameras and eye trackers [25, 26] and RGBd cameras [27, 28]. These studies have shown the potential of vision systems in improving the behavioral coding in ASD therapies. However, these studies did not implement techniques of CRI to enhance the intervention.\n\n\nContext: Within the \"Introduction\" section, following a description of the paper's contributions and preceding a review of related work.", "metadata": { "doc_id": "1_Ramırez-Duque__3", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: On the other hand, studies about how CwASD respond to a robot mediator compared to human mediator have been reported, such as intervention scenarios with imitation games [29, 30], telling stories [9] and free play tasks [12, 31]. These works used features, such as proxemics, body gestures, visual contact and eye gaze as behavioral descriptors, whereas the behavior analysis was estimated using manual video coding.\n\n\nContext: Research on robot mediators for children with Autism Spectrum Disorder (CwASD) and comparisons to human mediators.", "metadata": { "doc_id": "1_Ramırez-Duque__4", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Researchers of Vanderbilt University published a series of research showing an experimental protocol to assess joint attention (JA) tasks defined as the capacity for coordinated orientation of two people toward an object or event [6]. The protocol consisted of directing the attention of the child towards objects located in the room through adaptive prompts [32]. Bekele et al. inferred the participant's eye gaze by the head pose, which was calculated in real-time by an IR camera array [17]. In their last works, Zheng et al. and Warren et al. used a commercial eye tracker to estimated the children's eye gaze around the robot and manual behavioral coding for global evaluation [10, 33]. However, eye tracker devices require pre-calibration and may limit the movement of the individual. The results of these works showed that the robot attracted children's attention and that CwASD reached all JA task. Nevertheless, developing JA tasks is more difficult with a robot than with humans [10]. Anzalone et al. developed a CRI scenario using the NAO robot to perform JA tasks, in which the authors used an RGBD camera to estimate only body and head movements. The results showed that JA performance of children with ASD was similar to the performance of TD children when interacting with the human mediator, however, with a robot mediator, the children with ASD presented a lower performance than the TD children, i.e, the children with ASD needed more social cues to finalize the task [34]. Chevalier et al. analyzed in their study, some features, such as proprioceptive and visual integration in CwASD, using an RGBD sensor to record the interventions sessions and manual behavior coding to analyzed the participants' performance [35]. In none of the previous works, a closed-loop subsystem was\n\nimplemented to provide some level of artificial cognition to enable automated robot behavior.\n\n\nContext: Review of existing research on joint attention tasks with robots, particularly for children with Autism Spectrum Disorder (ASD).", "metadata": { "doc_id": "1_Ramırez-Duque__5", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: implemented to provide some level of artificial cognition to enable automated robot behavior.\n\nIn contrast with the aforementioned researches, other works implemented automated face analysis and artificial cognition through robot-mediator and computer vision, which analyzed child's engagement [36, 37], emotions recognition capability $[13,15,38]$ and child's intentions [14, 16]. In these works, two different strategies were implemented, where the most common is based on monocamera approach using an external RGB or RGBd sensor $[15,36,37]$ or using on-board RGB cameras mounted on the robotic-platform [13, 16]. Other strategies are based on a highly structured environment composed of an external camera plus an on-board camera [38] or a network of vision sensors attached to a small table [14]. These strategies based on multi-camera methods improve the system's performance, but remain constrained in relation to desired features, such as flexibility, scalability, and modularity. Thus, despite the potential that these techniques have shown, achieving automated child's behavior analysis in a naturalistic way into unstructured clinical-setups with robots that interact accordingly remains a challenge in CRI.\n\n3 System Architecture Overview\n\nThe ROS system used in this work is a flexible and scalable open framework for writing modular robotcentered systems. Similar to a computing operating system, ROS manages the interface between robot hardware and software modules and provides common device drivers, data structures and tool-based packages, such as visualization and debugging tools. In addition, ROS uses an interface definition language (IDL) to describe the messages sent between process or nodes, this feature facilitates the multilanguage (C++, Python and Lisp) development [39].\n\n\nContext: A review of existing research on child-robot interaction (CRI) and automated child behavior analysis using robots.", "metadata": { "doc_id": "1_Ramırez-Duque__6", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: The overall system developed here was built using a node graph architecture, taking advantages of the principal ROS design criteria. As with ROS, our system consists of a number of nodes to local video processing together a robot's behavior estimation, distributed around a number of different hosts and connected at runtime in a peer-topeer topology. The inter-node connection is implemented as a hand-shaking and occurs in XML-RPC protocol along with a web-socket communication for robot's webbased node (/ONO_node, see Fig. 1). The node structure is flexible, scalable and can be dynamically modified, i.e., each node can be started and left running along an experimental session or resumed and connected to each other at runtime. In addition, from a general perspective, any robotic platform with web-socket communication can be integrated. The developed system is composed of two interconnected modules as shown in Fig. 1: an artificial reasoning module and a CRI-channel module. The module architectures are detailed in the following subsections.\n\n3.1 Architecture of Reasoning Module\n\nIn this module, a distributed architecture for local video processing is implemented. The data of each RGBD sensor in the multi-camera system are processed for two nodes, in which the first is a driver level node and the second is a processing node. The driver ${ }^{1}$ node transforms the streaming data of the RGBD sensor into the ROS messages format. The driver addresses the data through a specialized transport provided by plugings to publishes images in a compressed representations while the receptor node only sees sensor_msgs/Image messages. The data processing node executes the face analysis algorithm. This node uses a image_transport subscriber and a ROS packages called CvBridge to turn the data into a image format supported for the typical computer vision algorithms. Later, the same node publishes the head pose and eye gaze direction by means of a ROS navigation message defined as nav_msgs/Odometry.\n\n\nContext: System architecture and design details.", "metadata": { "doc_id": "1_Ramırez-Duque__7", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: An additional node hosted in the most powerful workstation carries out a data fusion of all navigation messages that were generated in the local processing stage. In addition to the fusion, this node computes the visual focus of attention (VFOA) and publishes this as a std_msgs/Header, in which the time stamp and the target name of the VFOA estimation are registered.\n\n3.2 Architecture of CRI-Channel\n\nThe system proposed here has two bidirectional communication channels, a robot-device, and a web-based application to interact with both the child and the therapist. The robot device can interact with the CwASD executing different physical actions, such as facial expression, upper limb poses, and verbal communication. Thus, according to the child's performance, the reasoning module can modify the robot's behavior through automatic gaze shifting, changing the facial expression and providing sound rewards. The client-side application was developed to allow the therapist to control and register all step of the intervention protocol. This interface was also used to supervise and control the robot's behavior and to offer feedback to the therapist about the child's performance along the intervention. This App has two channels of communication for interacting with the reasoning module. The first connection uses a websocket protocol and a RosBridge_suite package to support the interpretation of ROS messages, as well as, JSON-based commands in ROS. The second one uses a ROS module\n\n[^0] [^0]: ${ }^{1}$ Tools for using the Kinect One (Kinect V2) in ROS, https://github. com/code-iai/iai_kinect2.\n\nFig. 1 Node graph architecture of the proposed ROS-based system. The system is composed of two interconnected modules, an artificial reasoning module and a CRI-channel module. The ONO web server has two way of bidirectional communication: a websocket and a standard ROS Subscriber\n\nimg-0.jpeg\n\ndeveloped in the server-side application to directly run a ROS node and communicate with standard ROS publishers and subscribers.\n\n\nContext: Within a ROS-based system for Child-Robot Interaction (CRI) with children with Autism Spectrum Disorder (CwASD), this chunk describes a node responsible for data fusion, visual focus of attention (VFOA) computation, and publishing VFOA estimations.", "metadata": { "doc_id": "1_Ramırez-Duque__8", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: img-0.jpeg\n\ndeveloped in the server-side application to directly run a ROS node and communicate with standard ROS publishers and subscribers.\n\n4 The Robotic Platform ONO\n\nThe CRI is implemented through the open source platform for social robotics (OPSORO), ${ }^{2}$ which is a promising and straightforward system developed for face to face communication composed of a low-cost modular robot called ONO (see Fig. 2) and web-based applications [40]. Some of the most important requirements and characteristics that make ONO interesting for this CRI strategy are explained in the following sections.\n\n4.1 Appearance and Identity\n\nThe robot is covered in foam and also fabric to have a more inviting and huggable appearance to the children. The robot has an oversized head to make its facial expressions more prominent and to highlight the importance for communication and emotional interaction. As a consequence of its size and pose, children can interact with the robot at eye height when the robot is placed on a table.\n\nThe robot ONO has not a predefined identity, as the only element previously conceived is the name. Unlike other robots that have well-defined identities, such as Probo [9] or Kaspar [41], in this work the ONO's identity is built with the participation of the child through a co-creation process. For this reason, a neutral appearance is initially used. In the\n\n[^0]intervention, the therapist can provide the child with clothes and accessories to define the identity of ONO.\n\n4.2 Mechanics Platform\n\nAs the initial design of ONO is composed only of the actuated face, in this work it was needed to provide the ONO with some body language. For this purpose, motorized arms were designed and implemented.\n\n\nContext: A description of the robotic platform ONO, used for a child-robot interaction (CRI) system, including its appearance, mechanics, and adaptability for co-creation of identity.", "metadata": { "doc_id": "1_Ramırez-Duque__9", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: The new design of ONO has a fully face and two arms actuated, giving a total of 17 Degrees of Freedom (DOF). The ONO is able to perform facial expressions and nonverbal cues, such as waving, shake hands and pointing towards objects, moving its arms (2 DOF x 2), eyes (2 DOF x 2), eyelids ( 2 DOF x 2), eyebrows ( 1 DOF x 2), and mouth (3 DOF). The robot has also a sound module that allows explicit positive feedback as well as reinforcement learning through playing words, conversations and other sounds.\n\n4.3 Social Expressiveness\n\nIn order to improve social interaction with a child, the ONO is able to exhibit different facial expressions. The ONO's expressiveness is based on the Facial Action Coding System (FACS) developed in [42]. Each DOF that composes the ONO's face is linked with a set of Action Units (AU) defined by the FACT, and each facial expression is determined for specific AU values. The facial expressions are represented as a 2D vector $f e=(v, a)$ in the emotion circumplex model defined by valence and arousal [9]. In this context, the basic facial expressions are specified on a unit circle, where the neutral expression corresponds to the origin of the space $f e_{0}=(0,0)$. The relation between the DOF position and the AU values is resolved through a lookup table algorithm using a predefined configuration file [40].\n\n[^0]: ${ }^{2}$ Open Source Platform for Social Robotics (OPSORO) http://www. opsoro.com.\n\nFig. 2 ONO robot, developed through the open source platform for social robotics (OPSORO)\n\nimg-1.jpeg\n\n4.4 Adaptability and Reproducibility\n\n\nContext: Description of the ONO robot's design, capabilities, and social expressiveness within a larger document about social robotics.", "metadata": { "doc_id": "1_Ramırez-Duque__10", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Fig. 2 ONO robot, developed through the open source platform for social robotics (OPSORO)\n\nimg-1.jpeg\n\n4.4 Adaptability and Reproducibility\n\nThe application of the Do-It-Yourself (DIY) concept is the principal feature of ONO's design, which facilitates its dissemination and use in research areas other than engineering as health care. These characteristics allow ONO building for any person without specialized engineering knowledge. Additionally, it is possible to replicate ONO without the need for high-end components or manufacturing machines [40]. The electronic system is based on a Raspberry Pi single-board computer combined with a custom OPSORO module with circuitry to control up to 32 servos, drive speakers and touch sensors. Any sensor or actuator compatible with the embedded communication protocols (UART, I2C, SPI) implemented on the Raspberry Pi can be used by this platform.\n\n4.5 Control and Autonomy\n\nWith the information delivered for the automated reasoning module, it was possible to automate the ONO's behavior and, then, the robot can infer and interpret the children's intentions to react most accurately to the action performed by them, thus enabling a more efficient and dynamic interaction with ONO. In this work, the automated ONO's behavior is partially implemented, i.e., the framework can modify some physical actions of ONO using the feedback information about the child's behavior. The actions suitable to be modified are gaze shift toward the child in specifics events, changing from neutral to positive facial expression when the child looks toward the target, and providing sound rewards. Also, an Aliveness Behavior Module (ABM) is implemented to improve the CRI, which consist of blinking the robot's eyes and changing its arms among some predefined poses. Also, the robot can be manually operated through a remote controller hosted in the clientside application.\n\n5 Reasoning Module: Machine Learning Methods for Child's Face Analysis\n\n\nContext: Description of the ONO robot's design, functionality, and control system, including its adaptability, reasoning module, and machine learning methods for child face analysis.", "metadata": { "doc_id": "1_Ramırez-Duque__11", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: 5 Reasoning Module: Machine Learning Methods for Child's Face Analysis\n\nThe automated child's face analysis consists of monitoring nonverbal cues, such as head and body movements, head pose, eye gaze, visual contact and visual focus of attention. In this work, a pipeline algorithm is implemented using machine learning neural models for face analysis. The chosen methods were developed using state-of-art trained neural models, available by Dlib ${ }^{3}$ [43] and OpenFace ${ }^{4}$ [44]. Some modification such as, turn the neural model an attribute of the ROS node class and evaluate this in each topic callback, were needed to run the neural models into a common ROS node.\n\nThe algorithm proposed for child's face analysis involves face detection, recognition, segmentation and tracking, landmarks detection and tracking, head pose, eye gaze and visual focus of attention (VFOA) estimation. In addition, the architecture proposed here also implement new methods for asynchronous matching and fusion of all local data, visual focus of attention estimation based on Hidden Markov Model (HMM) and direct connection with the CRI-channel to influence the robot's behaviors. A scheme of the pipeline algorithm is shown in Fig. 3.\n\n[^0] [^0]: ${ }^{3}$ Dlib C++ Library http://dlib.net/. ${ }^{4}$ A Open Source Facial Behavior Analysis https://github.com/ TadasBaltrusaitis/OpenFace.\n\nFig. 3 Pipeline algorithm of the automated child's face analysis\n\nimg-2.jpeg\n\n5.1 Child's Face Detection and Recognition\n\nThe in-clinic setup requires differentiate the child's face from other faces detected and found in the scene. For this reason, a face recognition process was also implemented in this work. First, the face detection is executed to initialize the face recognition process and, subsequently, initialize the landmarks detection. In this work, both detection and recognition are implemented using deep learning models, which are described in this section.\n\n\nContext: This section details the machine learning methods used for automated child's face analysis, including face detection, recognition, and tracking, within a larger research paper on child-robot interaction.", "metadata": { "doc_id": "1_Ramırez-Duque__12", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: In the detection process, a Convolutional Neural Network (CNN) based face detector with a Max-Margin Object Detection (MMOD) as loss layer is used [45]. The CNN consist first of a block composed of three downsampling layers, which apply convolution with a $5 \\times 5$ filter size and $2 \\times 2$ stride to reduce the size of the image up to eight times its original size and generate a feature map with 16 dimensions. Later, the result are processed for one more block composed of four convolutional layers to get the final output of the network. The three first layers of the last block have $5 \\times 5$ filter size and $1 \\times 1$ stride, but, the last layer has only 1 channel and a $9 \\times 9$ filter size. The values in the last channel are large when the network thinks it has found a face at a particular location. All convolutional block above are implemented with two additional layers among convolutional layers, pointwise linear transformation, and Rectified Linear Units (RELU) to apply the non-saturating activation function $f(x)=\\max (0, x)$. The training dataset used to create the model is composed of 6975 faces and is available at Dlib's homepage. ${ }^{5}$\n\nThe face recognition algorithm used in this work is inspired on the deep residual model from [46]. The\n\n[^0]residual network (ResNet) model developed by He et. al reformulates the convolutional layers to learn a residual functions $F(x):=H(x)-x$ with reference to the layer inputs $x$, instead of learning unreferenced functions. In the practical implementation, the previous formulation means inserting shortcut connections, which turn the network into its counterpart residual version [46]. The CNN model then transforms each face detected to a 128D vector space in which images from the same person will be close to each other, but faces from different people will be far apart. Finally, the faces are classified as child's face, caregiver's face and therapist's face.\n\n\nContext: Technical details of the face detection and recognition algorithms used in the research.", "metadata": { "doc_id": "1_Ramırez-Duque__13", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Both detection and recognition CNN model were implemented and trained from [43] and released in Dlib 19.6.\n\n5.2 Face Analysis, Landmarks, Head Pose and Eye Gaze\n\nThis work uses the technique for landmarks detection, head pose and eye gaze estimation developed by Baltrušaitis et al., named Conditional Local Neural Fields (CLNF) [47]. This technique is an extension of the Constrained Local Model (CLM) algorithm using specialized local detectors or patch experts. CNLF model consists of a statistical shape model, which its learned from data examples and is parametrized for $m$ components of linear deformation to control the possible shape variations of the non-rigid objects [48]. Approaches based on CLM [49, 50] and CLNF [47] model the object appearance in a local fashion, i.e, each feature point has its own appearance model to describe the amount of misalignment.\n\nCLNF-based landmark detection consists of three main parts: the shape model, the local detectors or patch experts, and the fitting algorithm, which are detailed below.\n\n[^0]: ${ }^{5}$ http://dlib.net/files/data/dlib_face_detection_dataset-2016-09-30.tar. gz.\n\n5.2.1 Shape Model\n\nThe CLNF technique uses a linear model to describe nonrigid deformations called Point Distribution Model (PDM). The PDM is used to estimate the likelihood of the shapes being in a specific class, given a set of feature points [48]. This is important for model fitting and shape recognition.\n\n\nContext: Methods for face analysis, including landmark detection, head pose estimation, and eye gaze estimation, are described within the broader section on face analysis techniques.", "metadata": { "doc_id": "1_Ramırez-Duque__14", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: The shape of a face that has $n$ landmark points can be described as: $X=\\left[X_{1}, X_{2}, \\ldots, X_{n}, Y_{1}, Y_{2}, \\ldots, Y_{n}, Z_{1}, Z_{2}, \\ldots, Z_{n}\\right]$, and the class that describes a valid instance of a face using PDM can be represented as: $X=\\tilde{X}+\\Phi q$, where $\\tilde{X}$ is the mean shape of the face, $\\Phi$ described the principal deformation modes of the shape, and $q$ represent the non-rigid deformation parameters. Both $\\tilde{X}$ and $\\Phi$ are learned automatically from labeled data using Principal Component Analysis (PCA). The probability density distribution of the instances into the shape class is expressed as a zero mean Gaussian with Covariance matrix $\\Lambda=\\left(\\left[\\lambda_{1} ; \\ldots ; \\lambda_{m}\\right]\\right)$ evaluated at $q$ : $p(\\mathbf{q})=\\mathcal{N}(q ; 0 ; \\Lambda)=\\frac{1}{\\sqrt{(2 \\pi)^{m}|\\Lambda|}} \\exp \\left{-\\frac{1}{2}\\left(q^{T} \\Lambda^{-1} q\\right)\\right}$\n\nOnce the model is defined, it is necessary to place the 3D PDM in an image space. The following equation is used to transform between 3D space to image space using weak perspective projection [49]: $x_{i}=s \\cdot R_{2 D} \\cdot\\left(\\tilde{X}{i}+\\Phi{i} q\\right)+t$, where $\\tilde{X}{i}=\\left[\\tilde{x}{i}, \\tilde{y}{i}, \\tilde{z}{i}\\right]^{T}$ is the mean value of the $i^{\\text {th }}$ landmark. The instance of the face in an image is, therefore, controlled using the parameter vector $\\mathbf{p}=[s, w, t, q]$, where $q$ represents the local non-rigid deformation, $s$ is a scaling term, $w$ is the rotation term that controls the $2 \\times 3$ matrix $R_{2 D}$, and $t$ is the translation term.\n\nThe global parameters are used to estimate the head pose in reference to the camera space using orthographic camera projection and solving the Perspective-n-Point (PnP) problem respect to the detected landmarks. The PDM used in [44] was trained on two public datasets [51, 52]. This result in a model with 34 non-rigid (Principal modes) and 6 rigid shape parameters.\n\n5.2.2 Patch Experts\n\n\nContext: 3D face modeling using Principal Component Analysis (PCA) and Perspective-n-Point (PnP) problem solving.", "metadata": { "doc_id": "1_Ramırez-Duque__15", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: 5.2.2 Patch Experts\n\nThe patch experts scheme is the main novelty implemented in the CLNF model. The new Local Neural Field (LNF) patch expert takes advantage of the non linear relationship between pixel values and the patch response maps. The LNF captures two kinds of spatial characteristics between pixels, such as similarity and sparsity [47].\n\nLNF patch expert can be interpreted as a three layer perceptron with a sigmoid activation function followed by a weighted sum of the hidden layers. It is also similar to the first layer of a Convolutional Neural Network [44]. The new LNF patch expert is able to learn from multiple illuminations and retain accuracy. This becomes important when creating landmark detectors and trackers that are expected to work in unseen environments and on unseen people.\n\nThe learning and inference process is developed using a gradient-based optimization method to help in finding locally optimal model parameters faster and more accurately.\n\nIn the CLNF model implemented in [44], 28 set in total of LNF patch experts were trained for seven views and four scales. The framework uses patch experts specifically trained to recognize the eyelids, iris and the pupil, in order to estimate the eye gaze [44].\n\n5.2.3 Fitting Algorithm\n\n\nContext: Details of the CLNF model's novel patch experts and fitting algorithm.", "metadata": { "doc_id": "1_Ramırez-Duque__16", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: 5.2.3 Fitting Algorithm\n\nFor each new image or video frame, the fitting algorithm of CLNF-based landmark detection process attempts to find the value of the local and global deformable model parameters $\\mathbf{p}$ that minimizes the following function [49]: $\\mathcal{E}(\\mathbf{p})=\\mathcal{R}(\\mathbf{p}) \\sum_{i=1}^{n} \\mathcal{D}{i}\\left(x{i} ; \\mathcal{I}\\right)$, where $\\mathcal{R}$ is a weight to penalize unlikely shapes, which depends on the shape model, and $\\mathcal{D}$ represents the misalignment of the $i^{\\text {th }}$ landmark in the image $\\mathcal{I}$, which is function of both the parameters $\\mathbf{p}$ and the patch experts. Under the probabilistic point of view, the solution of (5) is equivalent to maximize the a posteriori probability (MAP) of the deformable model parameters $p$ : $p\\left(\\mathbf{p} \\mid\\left{l_{i}=1\\right}{i=1}^{n}, \\mathcal{I}\\right) \\propto p((p)) \\prod{i=1}^{n} p\\left(l_{i}=1 \\mid x_{i}, \\mathcal{I}\\right)$, where, $l_{i} \\in{1,-1}$ is a discrete random variable indicating whether the $i^{\\text {th }}$ landmark is aligned or misaligned, $p(\\mathbf{p})$ is the prior probability of the deformable parameters $\\mathbf{p}$, and $p\\left(l_{i}=1 \\mid x_{i}, \\mathcal{I}\\right)$ is the probability of a landmark being aligned at a particular pixel location $x_{i}$, which is quantified from the response maps created by patch. Therefore, the last term in (6) represents the joint probability of the patch expert response maps.\n\nThe MAP problem is solved using a optimization strategy designed specifically for CLNF fitting called non-uniform\n\nregularized landmark mean shift (NU-RLMS) [47], which uses two step process. The first step evaluates each of the patch experts around the current landmark using a Gaussian Kernel Density Estimator (KDE). The second step iteratively updates the model parameters to maximize (6).\n\n\nContext: Technical details of the algorithm used for landmark detection.", "metadata": { "doc_id": "1_Ramırez-Duque__17", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: The NU-RLMS uses expectation maximization algorithm, where the E-step involves evaluating the posterior probability over the candidates, and the M-step finds the parameter updated through the mean shift vector $\\mathbf{v}$. The mean shift vector points in the direction where the feature point should go, but the motion is restricted by the statistical shape model and the $\\mathcal{R}(\\mathbf{p})$. This interpretation leads to the new update function: $\\underset{\\Delta \\mathbf{p}}{\\operatorname{argmin}}\\left{|J \\Delta \\mathbf{p}-\\mathbf{v}|{W}^{2}+r|\\mathbf{p}+\\Delta \\mathbf{p}|{\\bar{\\Lambda}^{-1}}^{2}\\right}$, where $r$ is a regularization term, $J$ is the Jacobian, which describe how the landmarks location are changing based on the infinitesimal changes of the parameters $\\mathbf{p}, \\bar{\\Lambda}^{-1}=$ $\\operatorname{diag}\\left(\\left[0 ; 0 ; 0 ; 0 ; 0 ; 0 ; \\lambda_{1}^{-1} ; \\cdot ; \\lambda_{m}^{-1}\\right]\\right)$, and $W$ allows for weighting of mean-shift vectors. Non-linear least squares leads to the following update rule: $\\Delta \\mathbf{p}=-\\left(J^{T} W J+r \\Lambda^{-1}\\right)\\left(r \\Lambda^{-1} \\mathbf{p}-J^{T} W \\mathbf{v}\\right)$. To construct W, the performance of patch experts on training data is used.\n\n5.3 Data Fusion\n\nThe fusion of the local results for the head pose estimation is done applying a consensus over the rotation algorithm [53]. This algorithm consists of calculating the weighted average pose between each camera estimation and its immediate sensors' estimation neighbors using the axisangle representation. The local pose is penalized by two weights: the alignment confidence of landmarks detection procedure and the Mahalanobis distances between the head pose and a neutral pose.\n\n5.4 Field of View (FoV) and Visual focus of Attention (VFOA)\n\n\nContext: A method for refining landmark positions using a statistical shape model and non-linear least squares, followed by data fusion and calculation of field of view and visual focus of attention.", "metadata": { "doc_id": "1_Ramırez-Duque__18", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: 5.4 Field of View (FoV) and Visual focus of Attention (VFOA)\n\nThe VFOA estimation model is implemented as a dynamic Bayesian network through a Hidden Markov Model (HMM). The model assumes a specific set of child's attention attractors or targets $\\mathbb{F}$. The estimation process decodes the sequence of child's head poses $H_{t}=$ $\\left(H_{t}^{\\text {saw }}, H_{t}^{p i t c h}\\right) \\in \\mathbb{R}^{2}$ in terms of VFOA states $F_{t} \\in \\mathbb{F}$ at time $t$ [54]. The probability distribution of the head poses in reference to a given VFOA target is represented by a Gaussian distribution, whereas the transitions among these targets are represented by the transition matrix $A$. The HMM equations can then be written as follows:\n\n$$ \\begin{aligned} P\\left(H_{t} \\mid F_{t}=f, \\mu_{t}^{h}\\right) & =\\mathcal{N}\\left(H_{t} \\mid \\mu_{t}^{h}(f), \\Sigma_{H}(f)\\right) \\ p\\left(F_{t}=f \\mid F_{t-1}=\\hat{f}\\right) & =A_{f \\hat{f}} \\end{aligned} $$\n\n\nContext: A description of a dynamic Bayesian network model used to estimate a child's visual focus of attention, referencing a specific publication.", "metadata": { "doc_id": "1_Ramırez-Duque__19", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: The Gaussian covariances is defined manually to reflect target sizes and head pose estimation variability. Moreover, the Gaussian means corresponding to each specific target $\\mu_{t}^{h}$ is calculated through a gaze model that sets this parameter as a fixed linear combination of the target direction and the head reference direction [55]: $\\mu_{t}^{h}(f)=\\alpha \\star \\mu_{t}(f)+\\left(1_{2}-\\alpha\\right) \\star R_{t}$, where $\\star$ denotes the component wise product $1_{2}=(1,1)$, $\\alpha=\\left(\\alpha^{\\text {saw }}, \\alpha^{p i t c h}\\right)=(0.7,0.5)$ are adjustable constants that describe the fraction of the gaze shift that corresponds to the child's head rotation, $\\mu_{t} \\in\\left(\\mathbb{R}^{2}\\right)^{K}$ is the directions of the given K targets, and $R_{t} \\in \\mathbb{R}^{2}$ represents the reference direction, which is the average head pose over a time window $W^{R}$. The above assumption describes the body orientation behavior of any child who tends to orient himself/herself towards the set of gaze targets to make more comfortable to rotate his/her head towards different targets [55]. $R_{t}=\\frac{1}{W^{R}} \\sum_{i=t-W^{R}}^{t} H_{i}$. Finally, for the estimation of the VFOA sequence a classic Viterbi algorithm of HMM is implemented [54].\n\n6 Case Study\n\nFor the case study, the vision system is composed of three Kinect V2 sensors. Each sensor is connected to a workstation equipped with a processor of Intel Core i5 family and a GeForce GTX GPU board (two workstation with GTX960 board, and one workstation with GTX580 board). All workstation are connected through a local area network synchronized using the NTP protocol. ${ }^{6}$ The sensors were intrinsically and extrinsically calibrated through a conventional calibration process using a standard blackwhite chessboard. ${ }^{7}$\n\n6.1 In-clinic Setup\n\nA multidisciplinary team of psychologists, doctors and engineers developed a case study using a psychology room equipped with a unidirectional mirror to perform behavioral\n\n\nContext: Technical details of a vision system and algorithm for visual focus of attention (VFOA) estimation, including Gaussian covariance definition, gaze modeling, and a case study setup using Kinect V2 sensors.", "metadata": { "doc_id": "1_Ramırez-Duque__20", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: 6.1 In-clinic Setup\n\nA multidisciplinary team of psychologists, doctors and engineers developed a case study using a psychology room equipped with a unidirectional mirror to perform behavioral\n\n[^0] [^0]: ${ }^{6}$ Network Time Protocol Homepage, http://www.ntp.org. ${ }^{7}$ Tools for using the Kinect One (Kinect V2) in ROS, https://github. com/code-iai/iai_kinect2.\n\nFig. 4 Representation of the interventions room of in-clinic setup\n\nimg-3.jpeg\n\nobservation appropriately. The room was prepared with a table and three chairs: one for the child, another for the caregiver and a third one for the therapist. The robot was placed on the table, and the following toys, a helicopter, a truck and a train, were attached to room's walls. The RGBD sensors were located close to the walls, and no additional camera was placed on the robot or the table, so as not to attract the child's attention. A representation of the interventions room of in-clinic setup is shown in Fig. 4.\n\n6.2 Intervention Protocol\n\nIn this work, a technology-based system was used as a tool in various stages of the ASD diagnostic process. The framework can be implemented to extract different behavioral features to be assessed, e.g., eye contact, stereotyped movements of the head, concentration and excessive interest in objects or events. However, for the scope of this research, a specific clinical setup intervention to assess Joint Attention (JA) behaviors is presented. The intervention aims to evaluate the capacity of JA; which can be divided into three classes: initiation of joint attention (IJA), responding to joint attention bids (RJA), and initiation of request behavior (IRB) [6]. The therapist guides the intervention all the time and leverages the robot device as an alternative channel of communication with the child, for the above, both the specialist and the robot remained in the room during the intervention. The children were accompanied throughout the session by a caregiver who was oriented not to help the child in the execution of the\n\n\nContext: Clinical setup and intervention protocol for assessing Joint Attention behaviors in children with ASD.", "metadata": { "doc_id": "1_Ramırez-Duque__21", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Fig. 5 The child's nonverbal cues elicited by the CRI, to look towards the therapist, towards the robot, point and self occlusion\n\nimg-4.jpeg\n\nFig. 6 Performance of the child's face analysis pipeline for the case study. Face detection and recognition, landmarks detection, head pose and eye gaze estimation were executed\n\nFace Detection and Recognition\n\nimg-5.jpeg\n\ntasks. The exercise developed aimed to direct the attention of the child towards objects located in the room through stimuli, such as, look at, point and speak. The stimuli were generated first only by the therapist and later just by the robot.\n\n6.3 Subjects\n\nThree children without confirmed ASD diagnosis, but with evidence of risk factors, and three typically developing (TD) children as the control group participated in the experiments. All volunteers participated with their parent's consent, which were five boys (3 ASD, 2 TD) and one TD girl, between 36 months to 48 months. Each volunteer participated in one single session. The goal was to analyze the based-line of the child's behavior and establish differences in the behavioral reaction between TD and ASD children for stimuli generated through CRI and leverage the novelty effect raised by the robot mediator.\n\n7 Results and Discussion\n\nThe child's nonverbal cues elicited by the CRI can be observed in Fig. 5. Some examples of children's behavior tagged to perform the behavioral coding are shown in the six pictures. The tagged behaviors were: to look towards an object, towards the robot, and towards the therapist, to point and, to respond to a prompt of both mediators and self occlusion. Typical occlusion problem, as occlusion by hair, hands and the robot were detected.\n\nThe performance of video processing in the proof of concept session is reported in Fig. 6. In the case study sessions, the child's face detection and recognition, the\n\nFig. 7 Evolution over time of the child's head/neck rotation (yaw rotation) for a TD group\n\nimg-6.jpeg\n\nimg-7.jpeg\n\n\nContext: Results of a child-robot interaction study analyzing nonverbal cues and face analysis performance in children at risk for ASD.", "metadata": { "doc_id": "1_Ramırez-Duque__22", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Fig. 7 Evolution over time of the child's head/neck rotation (yaw rotation) for a TD group\n\nimg-6.jpeg\n\nimg-7.jpeg\n\nFig. 8 Evolution over time of the child's head/neck rotation (yaw rotation) for a TD volunteer and VFOA estimation results landmarks detection, head pose and eye gaze estimation for different viewpoints are shown in Fig. 3. The recognition process was able to detect all faces in the session successfully in most cases.\n\nThe child's head pose was captured throughout the session and analyzed automatically to estimate the evolution over time of child's head and the VFOA. Along the session, the child's neck right/left rotation movement was predominant (Yaw axis), while the neck flexion/extension (Pitch axis) and neck R/L lateral flexion movements (Roll axis) remained approximately constant. The Yaw rotation of the TD children group is reported in Fig. 7. The vertical light blue stripe indicates the intervention period with therapistmediator, and the vertical light green stripe represents the period with robot-mediator. The continuous blue line represents the raw data recorded, and the continuous red line describes the average data trend. From the observation of the three plot, the TD children started the intervention looking towards the robot, evidently, the robot was a naturalistic attention attractor. Subsequently, when the therapist begins the protocol explaining the tasks, the children attention shifts towards the therapist. The children remained this behavior until that the therapist introduced the robotmediator. In this transition, the children's behaviors, such as,\n\nRJA and IJA toward the therapist were observed. Once the therapist changed the mediation with the robot, the children turned his/her attention to the robot and the objects in the room.\n\n\nContext: Analysis of child's head and neck rotation during interventions with a therapist and a robot-mediator.", "metadata": { "doc_id": "1_Ramırez-Duque__23", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: RJA and IJA toward the therapist were observed. Once the therapist changed the mediation with the robot, the children turned his/her attention to the robot and the objects in the room.\n\nA more detailed analysis of one of the TD volunteers is shown in Fig. 8. The plot (A) shows the overall intervention session; the plot (B) and plot (C) are a zoom of the period with therapist and robot mediator, respectively. The colors convention in the three plots of Fig. 8 describes the results generated by the automated estimation of VFOA. From these scenarios, some essential aspects already emerge. In the therapist-mediator interval, the child responded to JA task using only one repetition for all prompt level. The child's behavior of RJA was according to the protocol, i.e., the child looked towards the therapist to wait for instructions, rapidly the child searched in the target, and next looked again toward the therapist (Color sequence: light blue - yellow - light blue - orange - light blue - red). This behavior was the same for all prompts. In contrast, with the robot-mediator, the child did not look toward the robot among indications at consecutive targets (Color sequence: light green - yellow - orange - red - orange yellow). The above happened because, in the protocol, both mediators executed the instructions in the same order, and\n\nimg-8.jpeg\n\n\nContext: Analysis of child attention shifts during a therapeutic intervention using a robot mediator.", "metadata": { "doc_id": "1_Ramırez-Duque__24", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: img-8.jpeg\n\nFig. 9 Evolution over time of the child's head/neck rotation (yaw rotation) for a ASD group the child memorized the commands and the object's position until the robot mediator interval. This fact did not affect the intervention's aim, as the robot mediator succeeded to elicit the child's behaviors of RJA and IJA. In addition, as highlighted in the plot (A) in Fig. 8, when the session finalized and the robot mediator said goodbye, again, RJA and IJA behaviors were perceived. The pictures (a-d) show these events: first the child said goodbye towards the robot, then, he looked the therapist to confirm that the session ended and looked again towards the robot, finally the child took the robot's hand.\n\n\nContext: Analysis of child behavior during a robot-mediated intervention for children with Autism Spectrum Disorder.", "metadata": { "doc_id": "1_Ramırez-Duque__25", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: From the analysis of the three TD volunteers, the same reported behaviors were perceived. However, the analysis of the children in the ASD group showed different behavior patterns concerning comfort, visual contact and novelty stimulus effect during the sessions. The evolution over time of the child's head/neck rotation (yaw rotation) for an ASD group is shown in Fig. 9. On the one hand, the three children in the ASD group maintained more visual contact with the robot compared to the therapist and exhibited more interest in the robot platform compared to the TD children. However, the performance of the children in the activities of JA did not improve significantly when the robot executed the prompt. On the other hand, the clinicians manifested that in all cases the first visual contact toward them occurred in the instant that the robot entered the scene and started interacting, i.e., the ONO mediation elicited behaviors of IJA towards the therapist. In addition, the CwASD exhibited less discomfort regarding the session, from the first moment when the robot initiated mediation in the room and, in some cases, when showed appearance of verbal and non-verbal pro-social behaviors. These facts did not arise with the TD children, because the first visual contact with the therapist occurred when they entered the room. Additionally, TD children showed the ability to divide the attention between the robot and the therapist from the beginning to the end of the intervention, exhibiting comfort in every moment. The behavior modulation of CwASD is observed in Fig. 9. Before the period with robot-mediator the children exhibited discomfort (unstable movements of their head), and after of this period, the head movement tended to be more stable.\n\n\nContext: A study on robot-mediated joint attention (JA) interventions for children with Autism Spectrum Disorder (ASD) and typically developing (TD) children, detailing observed behavioral differences between the groups.", "metadata": { "doc_id": "1_Ramırez-Duque__26", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: The novelty of a robot-mediator at diagnostic session can be analyzed as an additional stimulus of the CRI. Accordingly, in this case study the children of the ASD group showed more behavior modification (attention and comfort) produced by the robot interaction at the beginning of the CRI, remaining until the end of the session. On the other hand, the children of the TD group responded to the novelty effect of the robot mediator from the time the child entered the room and saw the robot, until the beginning of the therapist presentation. For the above, despite the novelty\n\nof the stimuli effect, these did not seem to affect the social interaction between the TD children and the therapist, and in contrast, these stimuli seemed to enhance the CwASD social interaction with the therapist along the intervention.\n\nThese results are impressive, since they show the potential of CRI intervention to systematically elicit differences between the pattern of behavior on TD and ASD children. We identified RJA and IJA toward the therapist at the beginning of the intervention, at the transition between therapist to robot mediator, and at the end for all TD children. In contrast, we only identified IJA towards the therapist in the transition between mediators, for ASD children. This fact shows a clear difference of behavior pattern between CwASD and TD children, which can be analyzed using a JA task protocol. In fact, these pattern differences can be used as evidence to improve the ASD diagnosis.\n\n8 Conclusions\n\n\nContext: Results and conclusions of a study on child-robot interaction during diagnostic sessions with children with Autism Spectrum Disorder (ASD) and typically developing (TD) children.", "metadata": { "doc_id": "1_Ramırez-Duque__27", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: 8 Conclusions\n\nThis work presented a Robot-Assisted tool to assist and enhance the traditional practice of ASD diagnosis. The designed framework combines a vision system with the automated analysis of nonverbal cues in addition to a robotic platform; both developed upon open source projects. This research contributes to the state-of-the-art with an innovative flexible and scalable architecture capable to automatically register events of joint attention and patterns of visual contact before and after of a robot-based mediation as well as the pattern of behavior related to comfort or discomfort along the ASD intervention.\n\nIn addition, an artificial vision pipeline based on a multicamera approach was proposed. The vision system performs face detection, recognition and tracking, landmark detection and tracking, head pose, gaze and estimation of visual focus of attention was proposed, with its performance considered suitable for use into conventional ASD intervention. At least one camera captured the child's face in each sample frame. Furthermore, the feedback information about the child's performance was successfully used to modulate the supervised behavior of ONO, improving the performance of the CRI and the visual attention of the children. Regarding the VFOA estimation, the algorithm was able to estimate the target into the FoV in different situations recurrently. Also, the robot was able to react according to the estimation. However, the algorithm only failed when occlusion by the child's hands is generated. On the other hand, the occlusion by the therapist and the robot was compensated using the multi-camera approach. The child's face recognition system showed to be imperative to analyze the child's behavior in the clinical setup implemented in this work, which required the caregiver's attention in the room.\n\n\nContext: Concluding remarks of a research paper detailing a robot-assisted tool for Autism Spectrum Disorder (ASD) diagnosis, including performance of the vision system and robot interaction.", "metadata": { "doc_id": "1_Ramırez-Duque__28", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Despite the limited number of children of this study, preliminary results of this case study showed the feasibility of identifying and quantify differences in the patterns of behavior of TD children and CwASD elicited by the CRI intervention. Through the proof of concept, it is evidenced here the system ability to improve the traditional tools used in ASD diagnosis. As future works, it is recommended a study to replicate the protocol proposed in this paper with ten CwASD and ten TD children. Another suggestion is to quantify other kinds of behaviors in addition to that assessed in this paper, such as verbal utterance patterns, physical and emotional engagement, object or event preferences and gather more evidence to improve the assistance to therapists in ASD diagnosis processes.\n\nAcknowledgements This work was supported by the Google Latin America Research Awards (LARA) program. The first author scholarship was supported in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.\n\nDisclosure statement No potential conflict of interest was reported by the authors.\n\nReferences\n\nBelpaeme, T., Baxter, P.E., de Greeff, J., Kennedy, J., Read, R., Looije, R., Neerincx, M., Baroni, I., Zelati, M.C.: Child-Robot interaction: perspectives and challenges. In: 5th International Conference, ICSR 2013, pp. 452-459. Springer International Publishing, Bristol (2013)\n\nDiehl, J.J., Schmitt, L.M., Villano, M., Crowell, C.R.: The clinical use of robots for individuals with autism spectrum disorders: A critical review. Res. Autism Spectr. Disord. 6(1), 249-262 (2012)\n\nScassellati, B., Admoni, H., Maja, M.: Robots for use in autism research. Annu. Rev. Biomed. Eng. 14(1), 275-294 (2012)\n\nPennisi, P., Tonacci, A., Tartarisco, G., Billeci, L., Ruta, L., Gangemi, S., Pioggia, G.: Autism and social robotics: A systematic review (2016)\n\n\nContext: Concluding remarks, acknowledgements, disclosure statement, and references section of a research paper on child-robot interaction for autism spectrum disorder diagnosis.", "metadata": { "doc_id": "1_Ramırez-Duque__29", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Pennisi, P., Tonacci, A., Tartarisco, G., Billeci, L., Ruta, L., Gangemi, S., Pioggia, G.: Autism and social robotics: A systematic review (2016)\n\nAmerican Psychiatric Association: DSM-5 diagnostic classification. In: Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association, 5 (2013)\n\nEggebrecht, A.T., Elison, J.T., Feczko, E., Todorov, A., Wolff, J.J., Kandala, S., Adams, C.M., Snyder, A.Z., Lewis, J.D., Estes, A.M., Zwaigenbaum, L., Botteron, K.N., McKinstry, R.C., Constantino, J.N., Evans, A., Hazlett, H.C., Dager, S., Paterson, S.J., Schultz, R.T., Styner, M.A., Gerig, G., Das, S., Kostopoulos, P., Schlaggar, B.L., Petersen, S.E., Piven, J., Pruett, J.R.: Joint attention and brain functional connectivity in infants and toddlers. Cerebral Cortex 27(3), 1709-1720 (2017)\n\nSteiner, A.M., Goldsmith, T.R., Snow, A.V., Chawarska, K.: Disorders in infants and toddlers. J. Autism Dev. Disord. 42(6), $1183-1196$ (2012)\n\nBelpaeme, T., Baxter, P.E., Read, R., Wood, R., Cuayáhuitl, H., Kiefer, B., Racioppa, S., Kruijff-Korbayová, I., Athanasopoulos, G., Enescu, V., Looije, R., Neerincx, M., Demiris, Y., RosEspinoza, R., Beck, A., Canamero, L., Hielle, A., Lewis, M., Baroni, I., Nalin, M., Cosi, P., Paci, G., Tesser, F., Sommavilla, G., Humbert, R.: Multimodal child-robot interaction: building social bonds. Journal of Human-Robot Interaction 1(2), 33-53 (2012)\n\nVanderborght, B., Simut, R., Saldien, J., Pop, C., Rusu, A.S., Pintea, S., Lefeber, D., David, D.O.: Using the social robot probo as a social story telling agent for children with ASD. Interact. Stud. 13(3), 348-372 (2012)\n\nWarren, Z.E., Zheng, Z., Swanson, A.R., Bekele, E., Zhang, L., Crittendon, J.A., Weitlauf, A.F., Sarkar, N.: Can robotic interaction improve joint attention skills? J. Autism Dev. Disord. 45(11), 3726-3734 (2015)\n\n\nContext: References cited in a systematic review of autism and social robotics.", "metadata": { "doc_id": "1_Ramırez-Duque__30", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Wood, L.J., Dautenhahn, K., Lehmann, H., Robins, B., Rainer, A., Syrdal, D.S.: Robot-mediated interviews: Do robots possess advantages over human interviewers when talking to children with special needs? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8239 LNAI, 54-63 (2013)\n\nFeil-Seifer, D., Mataric, M.J.: b3IA A control architecture for autonomous robot-assisted behavior intervention for children with Autism Spectrum Disorders. In: ROMAN 2008 The 17th IEEE International Symposium on Robot and Human Interactive Communication, pp. 328-333 (2008)\n\nLeo, M., Del Coco, M., Carcagni, P., Distante, C., Bernava, M., Pioggia, G., Palestra, G.: Automatic emotion recognition in Robot-Children interaction for ASD treatment. In: Proceedings of the IEEE International Conference on Computer Vision, 2015Febru(c), pp. 537-545 (2015)\n\nEsteban, P.G., Baxter, P.E., Belpaeme, T., Billing, E., Cai, H., Cao, H.-L., Coeckelbergh, M., Costescu, C., David, D., De Beir, A., Fang, Y., Ju, Z., Kennedy, J., Liu, H., Mazel, A., Pandey, A., Richardson, K., Senft, E., Thill, S., Van De Perre, G., Vanderborght, B., Vernon, D., Hui, Y., Ziemke, T.: How to build a supervised autonomous system for Robot-Enhanced therapy for children with autism spectrum disorder. Paladyn Journal of Behavioral Robotics 8(1), 18-38 (2017)\n\nPour, A.G., Taheri, A., Alemi, M., Ali, M.: Human-Robot facial expression reciprocal interaction platform: case studies on children with autism. Int. J. Soc. Robot. 10(2), 179-198 (2018)\n\nFeng, Y., Jia, Q., Wei, W.: A control architecture of RobotAssisted intervention for children with autism spectrum disorders. J. Robot. 2018, 12 (2018)\n\nBekele, E., Crittendon, J.A., Swanson, A., Sarkar, N., Warren, Z.E.: Pilot clinical application of an adaptive robotic system for young children with autism. Autism: The International Journal of Research and Practice 18(5), 598-608 (2014)\n\n\nContext: A list of related research publications on robot-assisted interventions for children with autism spectrum disorder.", "metadata": { "doc_id": "1_Ramırez-Duque__31", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Huijnen, C.A.G.J., Lexis, M.A.S., Jansens, R., de Witte, L.P.: Mapping robots to therapy and educational objectives for children with autism spectrum disorder. J. Autism Dev. Disord. 46(6), 2100-2114 (2016)\n\nAresti-Bartolome, N., Begonya, G.-Z.: Technologies as support tools for persons with autistic spectrum disorder: s systematic review. Int. J. Environ. Res. Public Health 11(8), 7767-7802 (2014)\n\nBoucenna, S., Narzisi, A., Tilmont, E., Muratori, F., Pioggia, G., Cohen, D., Mohamed, C.: Interactive technologies for autistic children: a review. Cogn. Comput. 6(4), 722-740 (2014)\n\nGrynsepan, O., Patrice, L., Weiss, T., Perez-Diaz, F., Gal, E.: Innovative technology-based interventions for autism spectrum disorders: a meta-analysis. Autism 18(4), 346-361 (2014)\n\nRehg, J.M., Rozga, A., Abowd, G.D., Goodwin, M.S.: Behavioral imaging and autism. IEEE Pervasive Comput. 13(2), 84-87, 4 (2014)\n\nCabibihan, J.J., Javed, H., Ang, M., Aljunied, S.M.: Why robots? a survey on the roles and benefits of social robots in the therapy of children with autism. Int. J. Soc. Robot. 5(4), 593-618 (2013)\n\nSartorato, F., Przybylowski, L., Sarko, D.K.: Improving therapeutic outcomes in autism spectrum disorders: enhancing social communication and sensory processing through the use of interactive robots. J. Psychiatr. Res. 90, 1-11 (2017)\n\nChong, E., Chanda, K., Ye, Z., Southerland, A., Ruiz, N., Jones, R.M., Rozga, A., Rehg, J.M.: Detecting gaze towards eyes in natural social interactions and its use in child assessment. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1(3), 43:143:20 (2017)\n\nNess, S.L., Manyakov, N.V., Bangerter, A., Lewin, D., Jagannatha, S., Boice, M., Skalkin, A., Dawson, G., Janvier, Y.M., Goodwin, M.S., Hendren, R., Leventhal, B., Shic, F., Cioccia, W., Gahan, P.: JAKE® Multimodal data capture system: Insights from an observational study of autism spectrum disorder. Frontiers in Neuroscience 11(SEP) (2017)\n\n\nContext: References cited in the article.", "metadata": { "doc_id": "1_Ramırez-Duque__32", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Rehg, J.M., Abowd, G.D., Rozga, A., Romero, M., Clements, M.A., Sclaroff, S., Essa, I., Ousley, O.Y., Li, Y., Kim, C., Rao, H., Kim, J.C., Lo Presti, L., Zhang, J., Lantsman, D., Bidwell, J., Ye, Z.: Decoding children's social behavior. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3414-3421 (2013)\n\nAdamo, F., Palestra, G., Crifaci, G., Pennisi, P., Pioggia, G., Ruta, L., Leo, M., Distante, C., Cazzato, D.: Non-intrusive and calibration free visual exploration analysis in children with autism spectrum disorder. In: Computational Vision and Medical Image Processing V - Proceedings of 5th Eccomas Thematic Conference on Computational Vision and Medical Image Processing, VipIMAGE 2015, pp. 201-208 (2016)\n\nMichaud, F., Salter, T., Duquette, A., Mercier, H., Lauria, M., Larouche, H., Larose, F.: Assistive technologies and Child-Robot interaction. American Association for Artificial Intelligence ii(3), 8-9 (2007)\n\nDuquette, A., Michaud, F., Mercier, H.: Exploring the use of a mobile robot as an imitation agent with children with lowfunctioning autism. Auton. Robot. 24(2), 147-157 (2008)\n\nSimut, R.E., Vanderfaelllie, J., Peca, A., Van de Perre, G., Bram, V.: Children with autism spectrum disorders make a fruit salad with probo, the social robot: an interaction study. J. Autism Dev. Disord. 46(1), 113-126 (2016)\n\nBekele, E., Lahiri, U., Swanson, A.R., Crittendon, J.A., Warren, Z.E., Nilanjan, S.: A step towards developing adaptive robotmediated intervention architecture (ARIA) for children with autism. IEEE Trans. Neural Syst. Rehabil. Eng. 21(2), 289-299 (2013)\n\nZheng, Z., Zhang, L., Bekele, E., Swanson, A., Crittendon, J.A., Warren, Z.E., Sarkar, N.: Impact of robot-mediated interaction system on joint attention skills for children with autism. In: IEEE International Conference on Rehabilitation Robotics (2013)\n\n\nContext: References to research on Child-Robot Interaction and assistive technologies for children with autism.", "metadata": { "doc_id": "1_Ramırez-Duque__33", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Anzalone, S.M., Tilmont, E., Boucenna, S., Xavier, J., Jouen, A.L., Bodeau, N., Maharatna, K., Chetouani, M., Cohen, D.: How children with autism spectrum disorder behave and explore the 4-dimensional (spatial $3 \\mathrm{D}+$ time) environment during a joint attention induction task with a robot. Res. Autism Spectr. Disord. 8(7), 814-826 (2014)\n\nChevalier, P., Martin, J.C., Isableu, B., Bazile, C., Iacob, D.O., Adriana, T.: Joint attention using human-robot interaction: impact of sensory preferences of children with autism. In: 25th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN 2016, pp. 849-854 (2016)\n\nLemaignan, S., Garcia, F., Jacq, A., Dillenbourg, P.: From realtime attention assessment to \"with-me-ness\" in human-robot interaction. In: ACM/IEEE International Conference on HumanRobot Interaction, 2016-April, pp. 157-164 (2016)\n\nDel Coco, M., Leo, M., Carcagni, P., Fama, F., Spadaro, L., Ruta, L., Pioggia, G., Distante, C.: Study of mechanisms of\n\n\nContext: References cited in the article.", "metadata": { "doc_id": "1_Ramırez-Duque__34", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: social interaction stimulation in autism spectrum disorder by assisted humanoid robot. IEEE Transactions on Cognitive and Developmental Systems 8920(c), 1-1 (2017) 38. Palestra, G., Varni, G., Chetouani, M., Esposito, F.: A multimodal and multilevel system for robotics treatment of autism in children. In: Proceedings of the International Workshop on Social Learning and Multimodal Interaction for Designing Artificial Agents - DAA '16, pp. 1-6. ACM Press, New York (2016) 39. Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., Berger, E., Wheeler, R., Ng, A.: ROS : an open-source robot operating system. In: ICRA workshop on open source software, number 3.2, pp. 5 (2009) 40. Vandevelde, C., Saldien, J., Ciocci, C., Vanderborght, B.: The use of social robot ono in robot assisted therapy. In: International Conference on Social Robotics, Proceedings, m (2013) 41. Dautenhahn, K.: A paradigm shift in artificial intelligence: why social intelligence matters in the design and development of robots with human-like intelligence. 50 Years of Artificial Intelligence, pp. 288-302 (2007) 42. Ekman, P., Friesen, W.: Facial Action Coding System. Consulting Psychologists Press (1978) 43. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755-1758 (2009) 44. Baltrušaitis, T., Robinson, P., Morency, L.-P.: OpenFace: an open source facial behavior analysis toolkit. IEEE Winter Conference on Applications of Computer Vision (2016) 45. King, D.E.: Max-Margin Object Detection. 1 (2015) 46. He, K., Zhang, X., Ren, S., Jian, S.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778. IEEE, 6 47. Baltrušaitis, T., Robinson, P., Morency, L.P.: Constrained local neural fields for robust facial landmark detection in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 354-361 (2013) 48. Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models. In:\n\n\nContext: A list of references cited in a research paper on social interaction stimulation in autism spectrum disorder using assisted humanoid robots.", "metadata": { "doc_id": "1_Ramırez-Duque__35", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 354-361 (2013) 48. Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models. In: Proceedings of the British Machine Vision Conference 2006, pp. 1-95 (2006) 49. Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vis. 91(2), 200-215 (2011) 50. Baltrušaitis, T., Robinson, P., Morency, L.P.: 3D constrained local model for rigid and non-rigid facial tracking. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2610-2617 (2012) 51. Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Neeraj, K.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930-2940 (2013) 52. Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive Facial Feature Localization, pp. 679-692. Springer, Berlin (2012) 53. Jorstad, A., Dementhon, D., Jeng Wang, I., Burlina, P.: Distributed consensus on camera pose. IEEE Trans. Image Process. 19(9), 2396-2407 (2010) 54. Ba, S.O., Odobez, J.-M.: Multi-Person visual focus of attention from head pose and meeting contextual cues. IEEE Trans. Pattern Anal. Mach. Intell. 33(August), 1-16 (2008) 55. Sheikhi, S., Jean-Marc, O.: Combining dynamic head posegaze mapping with the robot conversational state for attention recognition in human-robot interactions. Pattern Recogn. Lett. 66, 81-90 (2015)\n\n\nContext: A list of references cited in a research paper on computer vision and human-robot interaction.", "metadata": { "doc_id": "1_Ramırez-Duque__36", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.\n\nAndrés A. Ramírez-Duque received his bachelor's degree in Mechatronics Engineering from the Universidad Nacional de Colombia, Bogotá, Colombia, in 2009, and his Industrial Automation Master degree from the Universidad Nacional de Colombia, Bogotá, Colombia, in 2011. He is currently working toward a Ph.D. degree in the Assistive Technology Center, Federal University of Espírito Santo, Vitória, Brazil. He won a Google Latin America Research Award 2017. His current research interests include Child-Robot interaction, cloud parallel computing, high performance computing, smart environments and serious games applied to Children with development impairments.\n\nAnselmo Frizera-Neto received his bachelor's degree in Electrical Engineering (2006) from the Federal University of Espírito Santo (UFES) in Brazil and his doctorate in Electronics (2010) at the University of Alcalá, Spain. From 2006 to 2010 he was a researcher of the Bioengineering Group of the Consejo Superior de Investigaciones Científicas (Spain) where he carried out research related to his doctoral thesis. He is currently a permanent professor and adjunct coordinator of the Graduate Program in Electrical Engineering at UFES. He has authored or co-authored more than 250 papers in scientific journals, books and conferences in the fields of electrical and biomedical engineering. He has conducted or co-directed master's and doctoral theses in research institutions from Brazil, Argentina, Italy and Portugal. His research is aimed at rehabilitation robotics, the development of advanced strategies of human-robot interaction and the conception of sensors and measurement technologies with applications in different fields of electrical and biomedical engineering. Along with Andrés Ramírez-Duque, he won a Google Latin America Research Award 2017.\n\n\nContext: Author biographies and affiliations.", "metadata": { "doc_id": "1_Ramırez-Duque__37", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Teodiano Freire Bastos received his B.Sc. degree in Electrical Engineering from Universidade Federal do Espírito Santo (Vitória, Brazil) in 1987, his Specialist degree in Automation degree from Instituto de Automática Industrial (Madrid, Spain) in 1989, and his Ph.D. degree in Physical Science (Electricity and Electronics) from Universidad Complutense de Madrid (Spain) in 1994. He made two postdocs, one at the University of Alcalá (Spain, 2005) and another at RMIT University (Australia, 2012). He is currently a full professor at Universidade Federal do Espírito Santo (Vitória, Brazil), teaching and doing research at the Postgraduate Program of Electrical Enginneering, Postgraduate Program of Biotechnology and RENORBIO Ph.D. Program. His current research interests are signal processing, rehabilitation robotics and assistive technology for people with disabilities\n\nTerms and Conditions\n\n\nContext: Biographical information on the authors.", "metadata": { "doc_id": "1_Ramırez-Duque__38", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: Terms and Conditions\n\nSpringer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (\"Springer Nature\"). Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (\"Users\"), for smallscale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (\"Terms\"). For these purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial. These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will apply. We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as detailed in the Privacy Policy. While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may not:\n\nuse such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access control;\n\nuse such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is otherwise unlawful;\n\n\nContext: Legal terms governing the use of the research paper.", "metadata": { "doc_id": "1_Ramırez-Duque__39", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is otherwise unlawful;\n\nfalsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in writing;\n\nuse bots or other automated methods to access the content or redirect messages\n\noverride any security feature or exclusionary protocol; or\n\n\nContext: Terms of use restrictions on Springer Nature journal content.", "metadata": { "doc_id": "1_Ramırez-Duque__40", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: use bots or other automated methods to access the content or redirect messages\n\noverride any security feature or exclusionary protocol; or\n\nshare the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal content. In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue, royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any other, institutional repository. These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved. To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law, including merchantability or fitness for any particular purpose. Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed from third parties. If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not expressly permitted by these Terms, please contact Springer Nature at\n\n\nContext: Terms of use restrictions on Springer Nature journal content.", "metadata": { "doc_id": "1_Ramırez-Duque__41", "source": "1_Ramırez-Duque_" } }, { "page_content": "Text: NIH Public Access Author Manuscript\n\nPublished in final edited form as: J Autism Dev Disord. 2014 October ; 44(10): 2413-2428. doi:10.1007/s10803-014-2047-4.\n\nVocal patterns in infants with Autism Spectrum Disorder: Canonical babbling status and vocalization frequency\n\nElena Patten, Ph.D. ${ }^{1}$, Katie Belardi, M.S. ${ }^{2}$, Grace T. Baranek, Ph.D. ${ }^{2}$, Linda R. Watson, Ed.D. ${ }^{2}$, Jeffrey D. Labban, Ph.D. ${ }^{1}$, and D. Kimbrough Oller, Ph.D. ${ }^{3}$ ${ }^{1}$ Univ. of North Carolina, Greensboro ${ }^{2}$ Univ. of North Carolina, Chapel Hill ${ }^{3}$ Univ. of Memphis, and Konrad Lorenz Institute for Evolution and Cognition Research, Klosterneuburg, Austria\n\nAbstract\n\nCanonical babbling is a critical milestone for speech development and is usually well in place by 10 months. The possibility that infants with ASD show late onset of canonical babbling has so far eluded evaluation. Rate of vocalization or \"volubility\" has also been suggested as possibly aberrant in infants with ASD. We conducted a retrospective video study examining vocalizations of 37 infants at $9-12$ and $15-18$ months. Twenty-three of the 37 infants were later diagnosed with ASD and indeed produced low rates of canonical babbling and low volubility by comparison with the 14 typically developing infants. The study thus supports suggestions that very early vocal patterns may prove to be a useful component of early screening and diagnosis of ASD.\n\nKeywords\n\ncanonical babbling; volubility; vocal patterns; early detection\n\nASD and early vocal development\n\n\nContext: A retrospective video study examining vocalizations of infants, some later diagnosed with Autism Spectrum Disorder (ASD), to assess canonical babbling and vocalization frequency (volubility).", "metadata": { "doc_id": "Patten_Audio_0", "source": "Patten_Audio" } }, { "page_content": "Text: Keywords\n\ncanonical babbling; volubility; vocal patterns; early detection\n\nASD and early vocal development\n\nEarly intervention is critical for positive outcomes for children with Autism Spectrum Disorder (ASD). Early identification of atypical behaviors that manifest during infancy could significantly impact age of diagnosis and subsequent initiation of intervention. Currently, the minimum age at which the majority of children with ASD can be reliably diagnosed with relative stability is two years (e.g., Chawarska et al., 2009; Lord, 1995), but according to recent data from the Centers for Disease Control, many children are not\n\n[^0] [^0]: Correspondence concerning this article should be addressed to: Elena Patten. University of North Carolina at Greensboro, 300 Ferguson Building, P. O. Box 26170, Greensboro, NC 27402-6170. e_patten@uncg.edu. Elena Patten, UNC Greensboro, 300 Ferguson Building, Greensboro, NC 27412-6170 Katie Belardi, UNC Chapel Hill, Bondurant Hall, CB#7190, Chapel Hill, NC 27599-7190 Grace Baranek, UNC Chapel Hill, Bondurant Hall, CB #7122, Chapel Hill, NC 27599-7190 Linda Watson, UNC Chapel Hill, Bondurant Hall, CB#7190, Chapel Hill, NC 27599-7190 Jeffrey Labban, UNC Greensboro, 231 HHP Building, Greensboro, NC 27412 D. Kimbrough Oller, The University of Memphis, 807 Jefferson Avenue, Memphis, TN 38105\n\nElena Patten and Jeffrey Labban are at UNC Greensboro in North Carolina, USA; Katie Belardi, Grace Baranek and Linda Watson are at UNC Chapel Hill in North Carolina, USA; D. Kimbrough Oller is at the University of Memphis in Tennessee, USA.\n\n\nContext: A research paper investigating early vocal development in children with Autism Spectrum Disorder (ASD) and exploring potential markers for early detection.", "metadata": { "doc_id": "Patten_Audio_1", "source": "Patten_Audio" } }, { "page_content": "Text: diagnosed until preschool or kindergarten age (2012). Research targeting early detection has primarily focused on behaviors exhibited during toddlerhood (12-36 months) and preschool years (36-60 months) (e.g., Matson, Fodstad, \\& Dempsey, 2009; Volkmar \\& Chawarska, 2008) after diagnosis has been made. Use of retrospective video analyses and studies of infant siblings of children diagnosed with ASD has allowed examination of possible indicators of ASD in the first year of life (e.g., Baranek, 1999; Osterling, Dawson, \\& Munson, 2002; Sheinkopf, Iverson, Rinaldi, \\& Lester, 2012; Zwaigenbaum et al., 2005). Still, the most widely used autism screening tool for young children, the Modified Checklist for Autism Toddlers (MCHAT: Robins, Fein, Barton, \\& Green, 2001) is recommended for ages $16-30$ months.\n\nWe sought to identify potential communication markers of ASD that might be observed within the first year of life in a retrospective evaluation of data from infants recorded at home and later diagnosed with ASD. We focused on presumed precursors to language for two reasons: First, communication impairment is a core deficit in ASD, and second, evaluation of very early vocal behaviors in typically developing infants has already established markers that are critical to normal vocal communicative development. One robust pre-speech vocal milestone is the onset of canonical babbling. A canonical syllable (e.g., [ba]) is comprised of a consonant-like sound and a vowel-like sound, with a rapid transition between them (Oller, 1980, 2000). A second potentially important vocal measure that we considered is volubility, the rate of infant vocalization independent of vocal type (Nathani, Lynch, \\& Oller, 1999; Obenchain, Menn, \\& Yoshinaga-Itano, 1998).\n\nCanonical babbling as a key milestone\n\n\nContext: The authors are discussing early communication markers of Autism Spectrum Disorder (ASD), specifically focusing on the importance of canonical babbling and volubility as potential indicators observed in the first year of life.", "metadata": { "doc_id": "Patten_Audio_2", "source": "Patten_Audio" } }, { "page_content": "Text: Canonical babbling as a key milestone\n\nIn typical development, infants from birth produce vegetative vocalizations (e.g., coughs, burps, etc.) and cry, as well as vowel-like sounds that become more elaborate with time, incorporating supraglottal articulations until canonical syllables emerge, usually by early in the second half-year of life. Robust onset of canonical babbling has been well documented in typically developing infants by not later than 10 months (Koopmans-van Beinum \\& van der Stelt, 1986; Oller, 1980; Stark, 1980). The impression of robustness has been reinforced by the fact that no delay in onset of canonical babbling has been discerned in infants anticipated to be at-risk for communication deficits due to premature birth or low socioeconomic status (Eilers et al., 1993; Oller, Eilers, Basinger, Steffens, \\& Urbano, 1995). Even infants with Down syndrome usually show normal ages of onset, although a group level delay of a month or more is detectable (Lynch et al., 1995). Furthermore, infants tracheostomized at birth to provide an artificial airway that prevents or substantially inhibits vocalization for many months tend to produce age-appropriate canonical syllables within a short period after decannulation (Bleile, Stark, \\& McGowan, 1993; Locke \\& Pearson, 1990; Ross, 1983; Simon, Fowler, \\& Handler, 1983).\n\nOnly profound hearing impairment and Williams syndrome have been shown to produce consistent substantial delays in the onset of canonical babbling (Kent, Osberger, Netsell, \\& Hustedde, 1987; Koopmans-van Beinum, Clement, \\& van den Dikkenberg-Pot, 1998; Masataka, 2001; Oller \\& Eilers, 1988; Stoel-Gammon \\& Otomo, 1986). Further supporting the idea that restricted hearing prevents experiences critical to onset of canonical babbling,\n\nage of onset in severely or profoundly hearing impaired infants has been reported to be positively correlated with age of amplification (Eilers \\& Oller, 1994).\n\n\nContext: The document investigates early language development, specifically canonical babbling, in infants with and without Autism Spectrum Disorder (ASD), examining differences in vocalization patterns and potential correlations with environmental factors.", "metadata": { "doc_id": "Patten_Audio_3", "source": "Patten_Audio" } }, { "page_content": "Text: age of onset in severely or profoundly hearing impaired infants has been reported to be positively correlated with age of amplification (Eilers \\& Oller, 1994).\n\nIn infants without known disorders, onset of canonical babbling after ten months has been shown to be a significant predictor of language delay or other developmental disabilities (Oller, Eilers, Neal, \\& Schwartz, 1999; Stark, Ansel, \\& Bond, 1988; Stoel-Gammon, 1989). But late onset of canonical babbling is a rare occurrence in infants without easily diagnosed physical or mental limitations. The seeming resistance to derailment of this developmental milestone suggests that canonical babbling is of such importance in human development that it has been evolved to emerge within a relatively tightly constrained time period in spite of substantial variations in home environments and perinatal events. The importance of canonical babbling in predicting later language functioning is assumed to be due to the fact that words are overwhelmingly composed of canonical syllables, and thus lexical learning depends on control of canonical syllables.\n\n\nContext: The importance of canonical babbling as a predictor of later language functioning in typically developing infants.", "metadata": { "doc_id": "Patten_Audio_4", "source": "Patten_Audio" } }, { "page_content": "Text: To date, only two studies of which we are aware have targeted canonical babbling in ASD and neither specifically examined the onset of canonical babbling, But reasons for optimism that delays in onset of canonical babbling could constitute an early ASD marker can be found in research showing that various aspects of vocalization appear to be disrupted in young children with ASD (Paul, Augustyn, Klin, \\& Volkmar, 2005; Peppe, McCann, Gibbon, O’Hara, \\& Rutherford, 2007; Sheinkopf, Mundy, Oller, \\& Steffens, 2000; Warren, Gilkerson, Richards, \\& Oller, 2010; Wetherby et al., 2004). Research using automated analysis of all-day recordings based on the automated LENA (Language ENvironment Analysis) system of classification has shown clear indications that young children with ASD (16-48 months) display low rates of canonical syllable production compared with typically developing infants, even after matching of subgroups for expressive language (Oller et al., 2010). Even more to the point, one recent study has assessed the usage of canonical syllables (though not the onset of canonical babbling) in infants at high-risk for ASD because they were siblings of children with ASD; seven of 24 participants in the study received a provisional diagnosis of ASD at 24 months (Paul, Fuerst, Ramsay, Chawarska, \\& Klin, 2011). As a group, the at-risk infants (all 24) produced significantly lower mean canonical babbling ratios (canonical syllables divided by all \"speech-like\" vocalizations, i.e., those deemed \"transcribable\" by the researchers) compared to low-risk infants at nine-months of age, but there were no significant differences at 12 months. \"Non-speech\" vocalizations (those deemed \"not transcribable\" e.g., yells, squeals, growls) were not included in the evaluation of canonical babbling. Other vocal measures-especially number of consonantlike elements and number of speech-like and proportion of non-speech-like vocalizationsalso appeared to be potentially useful indicators of emergent ASD.\n\nVolubility in ASD\n\n\nContext: Discussion of vocalization differences in children with ASD, specifically focusing on canonical babbling and volubility.", "metadata": { "doc_id": "Patten_Audio_5", "source": "Patten_Audio" } }, { "page_content": "Text: Volubility in ASD\n\nVolubility, or rate of vocalization, measured in terms of frequency of syllable or utterance production, may be limited in ASD, a possibility that is supported by automated analysis of data showing low volubility in ASD from all-day recordings on children from 16 to 48 months of age based on the LENA system (Warren et al., 2010). Volubility in infants with severe or profound hearing loss and in infants with Down syndrome has not been found to\n\nbe depressed compared with typically developing infants; however, infants from lower socio-economic status (SES) have been shown consistently to produce fewer utterances per minute than their middle or high SES peers (Eilers et al., 1993; Oller et al., 1995). Research suggests that children in low SES experience less communication from caregivers (Hart \\& Risley, 1995; Snow, 1995). The lower volubility of these infants may be a product of decreased social-communication from adults, potentially resulting in lower levels of social motivation in the infants.\n\n\nContext: Discussion of vocalization patterns in autism spectrum disorder, specifically addressing volubility and its potential links to socioeconomic factors and caregiver communication.", "metadata": { "doc_id": "Patten_Audio_6", "source": "Patten_Audio" } }, { "page_content": "Text: Variability in moment-to-moment parental interactivity clearly does affect infant volubility by the middle of the first year of life, as indicated by research on parent-infant interaction in the \"still-face\" paradigm. The work suggests a strong tendency in the particular case of parent still-face for infants to increase vocalization rate. Specifically, volubility during a baseline period of one to three minutes of face-to-face vocal interaction is substantially lower than during a following still-face period of one to three minutes where the parent withholds any facial or vocal reaction while continuing to look directly at the infant. This pattern is seen in infants after 5 months, but not at 3 months, where volubility does not change at the shift from face-to-face interaction to still-face (Delgado, Messinger, \\& Yale, 2002; Goldstein, Schwade, \\& Bornstein, 2009; Yale, Messinger, Cobo-Lewis, Oller, \\& Eilers, 1999). The results from the still-face paradigm are interpreted to mean that infants seek to re-engage the withdrawn parent during the still-face period, having learned by the middle of the first year that their vocalizations can have impact (Tronick, 1982). This effect raises the question of whether infants with emergent ASD similarly increase their volubility to re-engage their caregivers after a period of withdrawn caregiver attention, or whether they decrease volubility, possibly due to diminished motivation to engage socially with others.\n\n\nContext: The study investigates vocal development in infants with and without ASD, examining babbling and volubility in relation to caregiver interaction patterns.", "metadata": { "doc_id": "Patten_Audio_7", "source": "Patten_Audio" } }, { "page_content": "Text: Frequency of vocalizations directed at others has been reported to be significantly lower in infants later diagnosed with ASD compared to typically developing infants at 12 months but not at 6 months (Ozonoff, Iosif, Baguio, Cook, Hill, et al., 2010). It is also notable that frequency of vocalization based on parent report is predictive of language abilities in toddlers with ASD (Weismer, Lord, \\& Esler, 2010). Paul et al. (2011) assessed frequency of vocalization in infants at high-risk and low-risk for developing ASD and found no difference between groups. However, the study did not actually test for volubility the way volubility is defined here and in much prior research. Frequency of vocalization was tallied in a special way in the Paul et al. study, by counting all speech-like (phonetically transcribable) and nonspeech-like (not phonetically transcribable) vocalizations that occurred within the first 50 speech-like vocalizations of each recorded sample. But not all participants produced 50 speech-like utterances, and in the ones who did, the length of recording required to reach the 50 speech-like utterance criterion was variable. Thus, rate of vocalizations per unit of time was not examined in this study; consequently, given the common usage of the term volubility, it is not possible to determine whether there was a difference in volubility between the groups. In addition, participants in this study were at high-risk for ASD—some were later diagnosed with ASD while some were not. This mixture may have attenuated group differences. It should also be noted that Weismer et al. included only child vocalizations directed at others while Paul et al. included vocalizations. Although ASD has roots in social impairments, vocalizations directed at others as well as independent vocal play might well be abnormal in ASD.\n\nA new study of early vocal development in ASD\n\n\nContext: A discussion of prior research on vocalization frequency in infants at risk for or diagnosed with ASD, highlighting limitations of existing studies.", "metadata": { "doc_id": "Patten_Audio_8", "source": "Patten_Audio" } }, { "page_content": "A new study of early vocal development in ASD\n\nOne reason the development of pre-speech vocal behaviors in ASD has not been well documented may be that ASD is not reliably diagnosed until long after canonical syllables are expected to emerge, thus making prospective analyses challenging. Retrospective interviews with parents whose children have been diagnosed with ASD regarding age at which canonical syllables emerged may be hindered by poor parent recall, given that parents are generally asked to remember the nature of child babbling that occurred one or more years prior to the time of the interview; also, parents' awareness of the diagnosis may bias their recall of the onset of canonical babbling. The effort by Paul et al. (2011) cited above represents a key advancement in methodology because they assessed infants known to be atrisk in a prospective fashion. Our approach seizes an additional opportunity afforded by the fortuitous existence of home video data from the first year of life that can be analyzed after diagnosis of ASD for comparison with similar video data from infants who did not receive the diagnosis.", "metadata": { "doc_id": "Patten_Audio_9", "source": "Patten_Audio" } }, { "page_content": "Text: As indicated in studies cited above, emergence of canonical syllables is a critical milestone in the development of spoken language, and delayed onset has been shown to be predictive of significant communication impairment. Canonical babbling and volubility have not been well characterized in infants with ASD. To arrive at a better understanding of these two variables as potential indicators of ASD risk in infants, we investigated vocalizations of infants later diagnosed with ASD and typically developing (TD) infants at two age ranges, $9-12$ months and $15-18$ months, using retrospective video analysis methods. Previous research has suggested that nearly all TD infants reach the canonical babbling stage by 9-12 months (Eilers \\& Oller, 1994), and on the assumption that a delay might be present in the children later diagnosed with ASD, we predicted such delay would be observed in this age range. We took the opportunity also to evaluate the available data at 15-18 months because any infant with a failure to show canonical babbling at that age would be greatly delayed in canonical babbling onset and would be considered at very high risk for a variety of disorders.\n\n\nContext: A study investigating vocalization patterns (canonical babbling and volubility) in infants later diagnosed with ASD compared to typically developing infants, using retrospective video analysis.", "metadata": { "doc_id": "Patten_Audio_10", "source": "Patten_Audio" } }, { "page_content": "Text: The coding scheme for this study is based on a widely applied method for laboratory-based evaluation of canonical babbling (Oller, 2000). In accord with this method, infants are assumed to be in the canonical stage if they show a canonical babbling ratio (canonical syllables divided by all syllables) of at least .15 , a value based on coding by trained listeners of a recording. A value of .15 or greater from such laboratory coding has been empirically determined in prior research as corresponding to parent judgments that infants are in the canonical stage (Lewedag, 1995). It has been reasoned that parent judgments constitute the most appropriate standard for establishing this criterion value (Oller, 2000). This reasoning is based on three points: 1) Parents respond to interview questions by providing very consistent and accurate information about canonical babbling in their infants (Papoušek, 1994; Oller, Eilers, \\& Basinger, 2001); 2) this parental capability is predictable, given that recognizing canonical babbling represents nothing more than being able to recognize syllables as being well-formed enough that they could form parts of words in real speech (and of course normal adults can easily recognize vocalizations of humans as speech or nonspeech); and 3) parents appear to intuitively understand that the onset of canonical babbling\n\n\nContext: Methods section, describing the coding scheme and criteria for canonical babbling.", "metadata": { "doc_id": "Patten_Audio_11", "source": "Patten_Audio" } }, { "page_content": "Text: is an emergent foundation for speech, as evidenced by the fact that they initiate intuitive lexical teaching as soon as they begin to recognize canonical babbling in their infants (Papoušek, 1994). Consistent parent recognition of the onset of canonical babbling runs in parallel with recognition of other developmental milestones (e.g., sitting unsupported, crawling, walking). In our study we could not use parents as informants about the age of onset of canonical babbling since that onset had occurred a very long time before our first contact with them. Consequently, the canonical babbling ratio, determined from recordings coded in our laboratory, provided the best available measure upon which to base inference about whether infants had reached the canonical stage.\n\nIn the present study the following hypotheses were tested:\n\nInfants later diagnosed with ASD will be less likely than TD infants to be in the canonical stage at each age ( $9-12$ and $15-18$ months), as determined by whether their canonical babbling ratios exceed the .15 criterion.\n\nInfants later diagnosed with ASD will demonstrate significantly lower canonical babbling ratios (independent of the canonical stage criterion) compared to TD infants.\n\nInfants later diagnosed with ASD will demonstrate significantly fewer total vocalizations (lower volubility) at both age ranges compared to TD infants.\n\nA combined analysis using both volubility and canonical babbling status will significantly predict group membership.\n\nMethod\n\nParticipants\n\n\nContext: Introduction to a study examining early vocal development in infants later diagnosed with autism spectrum disorder (ASD) compared to typically developing (TD) infants.", "metadata": { "doc_id": "Patten_Audio_12", "source": "Patten_Audio" } }, { "page_content": "Text: A combined analysis using both volubility and canonical babbling status will significantly predict group membership.\n\nMethod\n\nParticipants\n\nA total of 37 participants were included in the present study, 23 individuals later diagnosed with ASD and 14 individuals in the TD group (Table 1). There was one set of fraternal twins in the ASD group. Participants were drawn from a larger study conducted at the University of North Carolina-Chapel Hill based on availability of video recordings; participants must have had two five-minute edited video segments at 9-12 months and at least one edited video segment at $15-18$ months. As part of the larger study, participants were recruited from the Midwest and Southeast over a 15-year time period. Recruitment criteria included: (1) child age between two and seven years at the time of recruitment; (2) available home videotapes of the child between birth and two years of age that parents were willing to share; and (3) enough video footage for at least one 5-minute codable segment (see video editing section below) of the child at either 9-12 or 15-18 months of age.\n\nAll participants included in the ASD group received a clinical diagnosis of ASD from a licensed psychologist and/or physician at a point after the recordings were made. Thus, our design is a retrospective analysis similar to others that have used home movies of children later diagnosed with ASD (Baranek, 1999; Werner, Dawson, Osterling, \\& Dinno, 2000). A trained research staff member validated diagnoses for each participant using criteria from the Diagnostic and Statistical Manual IV (American Psychiatric Association, 2000) and from one or more ASD screening and diagnostic tools, including: the Childhood Autism\n\n\nContext: Results section, discussing statistical analyses predicting group membership (ASD vs. TD).", "metadata": { "doc_id": "Patten_Audio_13", "source": "Patten_Audio" } }, { "page_content": "Text: Rating Scale (CARS; Schopler, Reichler, \\& Renner, 1992), the Autism Diagnostic Observation Schedule (ADOS; Lord, et al., 1999), and/or the Autism Diagnostic InterviewRevised (ADI-R; Rutter, LeCouteur, \\& Lord, 2003). All participants had CARS scores and each participant in the ASD group had ADI/ADI-R scores and 13 of the 23 ASD participants had ADOS scores.\n\nTypically developing group membership was based in part on scores within normal limits (i.e., not more than one standard deviation below the mean) on the Mullen Scales of Early Learning (Mullen, 1995) and/or the Vineland Adaptive Behavior Scales (VABS; Sparrow, Balla, \\& Cicchetti, 1984). An additional exclusionary criterion for any participants in the TD group was any history of learning or developmental difficulties per parent report. Individuals with significant physical, visual or hearing impairments or known genetic conditions (e.g., Fragile X or Rett's Syndrome) associated with ASD were excluded. As indicated in Table 1, mean age (in months) was very similar across groups, gender was balanced, and the two groups were also similar with regard to SES based on maternal education. Our families were mostly middle SES with access to videotaping equipment.\n\nThe University of North Carolina-Chapel Hill Institutional Review Board approved the study, and all families signed informed consents. For more information regarding recruitment and inclusion criteria see Baranek (1999).\n\nVideo Editing Procedures\n\nFamilies provided home videos of their child from birth to two years as available. The videotapes included footage from a variety of contexts including family play situations, vacations, outings, special events, and familiar routines (e.g., mealtimes), with individual variation in situational content of each family's videotapes as would be expected in home videotapes. All videotapes were copied, transformed to digital formats, and originals were returned to participating families.\n\n\nContext: Participant recruitment and exclusion criteria.", "metadata": { "doc_id": "Patten_Audio_14", "source": "Patten_Audio" } }, { "page_content": "Text: Video editing guidelines first focused on the identification of video footage during which the child was consistently visible and for which the parents felt they could accurately identify the child's age. The two age ranges were originally selected for another study on early behavior in ASD (Baranek, 1999). At the same time, the two age ranges are well-suited to our current purposes. The 9-12 month age range is the earliest age range in which parents had sufficient videotape footage for it to be useful in our research and represents the time period when a number of communicative behaviors emerge. Further, this is a time frame during which the vast majority of TD children would be expected to already be in the canonical babbling stage. The $15-18$ month range provided follow-up on the same children with the expectation that monitored behaviors would be more consistent and would allow for confirmation or clarification of data from the earlier age. In TD children, canonical babbling is usually well consolidated by the 15-18 month age range (Vihman, 1996; Oller, 2000).\n\nIn editing tapes for the larger study, the aim was to compile two 5-minute video segments for each child in the 9-12 age range, and two 5 -minute segments in the $15-18$ month age range. On average, each 5 -minute segment consisted of 5 scenes. Research assistants who were blind to the research questions and not informed of the diagnostic status of the\n\n\nContext: Methodology: Video selection and editing procedures.", "metadata": { "doc_id": "Patten_Audio_15", "source": "Patten_Audio" } }, { "page_content": "Text: participants edited the videotapes and coded each scene for the following content variables: (a) number of people present; (b) amount of physical restriction on child's freedom to move, rated as low, medium, or high; (c) the amount of social intrusion another person was using to engage the child in interaction, rated as low, medium, or high; (d) and the types of events (e.g., meal time, bath time, active play, special events) (Baranek, 1999). The assistants were instructed to quasi-randomly select a cross-section of scenes from the available footage in the designated age ranges, purposely including scenes from each one-month age interval for which video footage was available within each age range, provided that the child was visible in each selected scene. All participants included in the current study had two 5-minute compilations (i.e., 10 minutes total) for the $9-12$ month age range, but at the $15-18$ month age range, there were three TD infants and one infant with ASD for whom only a single 5minute segment was assembled due to insufficient video footage. As a result, the mean duration of samples at the $15-18$ month age range was 9.5 minutes rather than 10 .\n\n\nContext: Methods section describing video analysis procedures.", "metadata": { "doc_id": "Patten_Audio_16", "source": "Patten_Audio" } }, { "page_content": "Text: Although vocalization from the infants was common in these scenes, the segments were not specifically selected to capture vocal behavior. Therefore, volubility estimated from the present study may be lower than in prior works where infants have been observed in settings designed to maximize vocal interaction. Similarly, the video segment selection procedure may yield differences in canonical babbling from prior studies. In most studies, 20-30 minutes of vocal interaction have been recorded, whereas here we had less than half that amount of data per sample. Our procedure can be predicted to produce greater variability in canonical babbling ratios than in studies with longer sampling periods (Molemans, 2011; Molemans et al., 2011). Additionally, the audio-video quality of these home movies was not as good as would be expected in laboratory studies, another factor that could reduce perceived canonical babbling and volubility.\n\n\nContext: Discussion of limitations in data collection methods and potential impact on findings.", "metadata": { "doc_id": "Patten_Audio_17", "source": "Patten_Audio" } }, { "page_content": "Text: To ensure that the contexts in which children were recorded were comparable, specific content parameters were identified and compared (Tables 2 and 3). No differences were found between the groups on any content parameter including: number of people present, level of physical restriction (i.e., amount of physical confinement such as a highchair versus free play; rated as low, medium or high), amount of social intrusion (rated as low, medium or high), and the total number of event types (e.g., meal time, active play). The number of times each event type (e.g., bath time, playtime) was represented in the ASD group versus the TD group for each age was compared using chi-square analyses. Results for the omnibus chi-square test failed to reach significance in the $9-12$ month age group ( $p>0.05$ ), but did reach significance in the $15-18$ months age group ( $p=0.046$ ). Typically developing children were more likely to be engaged in passive activities at the $15-18$ month age range ( $p=$ $0.046 ; \\mathrm{TD}=16.6 \\%, \\mathrm{ASD}=4.6 \\%$ ) according to follow-up analysis of the six event categories. See Tables 4 and 5 for the percentage in each category. For a comprehensive description of the coding procedures that yielded the data on situational context see Watson, Crais, Baranek, Dykstra, and Wilson (2012).\n\nCoding Procedure and Observer Agreement\n\nThe videotapes analyzed in this study were coded for infant production of all syllables in speech-like vocalizations by two certified speech-language pathologists who were not\n\ninformed of the diagnostic group of the infants. The intent was, of course, for the coders to be blind to diagnostic category, and with the exception of one infant to be discussed below, the coders reported they saw no reason to suspect any infant of having ASD.\n\n\nContext: Discussion of methodological controls for recording contexts and coding procedures.", "metadata": { "doc_id": "Patten_Audio_18", "source": "Patten_Audio" } }, { "page_content": "Text: We defined speech-like vocalizations (as in the primary literature on canonical babbling) to include both canonical and precanonical infant vocalizations (regardless of whether they would be deemed \"transcribable\"). Training of the two coders was provided by the last author, who originated the definition of \"canonical syllable\" used in this study, and who has conducted and collaborated on numerous studies on onset of canonical babbling, rate of canonical babbling, and volubility in infants (Cobo-Lewis, Oller, Lynch, \\& Levine, 1996; Lynch et al., 1995; Oller \\& Eilers, 1982, 1988; Oller, Eilers, \\& Basinger, 2001; Oller et al., 1995; Oller, Eilers, Neal, \\& Cobo-Lewis, 1998). The two observers were trained in identifying canonical syllables and in counting all syllables independent of their canonical status. The video samples used during training were separate (although drawn from similar materials based on the home recordings) and not included in the analyses for this investigation.\n\n\nContext: Describing the methodology for identifying and counting infant vocalizations in video recordings.", "metadata": { "doc_id": "Patten_Audio_19", "source": "Patten_Audio" } }, { "page_content": "Text: Syllables were defined as rhythmic units of speech-like vocalization, excluding raspberries, effort \"grunt\" sounds (i.e., a schwa-like sounds produced as an artifact of physical exertion), ingressive sounds, sneezes, hiccups, crying and laughing. Within an \"utterance\", which was defined as a vocal breath group (Lynch, Oller, Steffens, \\& Buder, 1995), it was possible to identify syllables as corresponding to sonority peaks (high points of pitch and/or amplitude) that are intuitively recognized by mature listeners. These rhythmic events occur in time frames typical of syllables in real speech (usually with durations of 200-400 ms). A canonical syllable is defined as including a vowel-like nucleus, at least one margin (or consonant-like sound) and a transition between margin and nucleus that is rapid and uninterrupted. In general, transitions that are too fast to be tracked auditorily (too fast to be heard \"as transitions\") are instead heard as gestalt syllables. Auditory tracking of these transitions focuses on formant (acoustic energy) transitions that can be measured on spectrograms as typically $<120 \\mathrm{~ms}$ (Oller, 2000). Formants are audible bands of energy corresponding to resonant frequencies of the vocal tract that change as the tract changes shape or size. Audible formant transitions occur, then, when the vocal tract moves during opening from a consonantal closure into a vowel or vice versa.\n\nExamples of canonical utterances (which must include at least one canonical syllable) are syllables that a listener might perceive as ba, taka, or gaga. Vocalizations produced while mouthing objects (e.g., toys or fingers) or eating were excluded from our analyses on the grounds that we could not be sure what role movement of the hands may have played in the apparent syllabification.\n\n\nContext: Defining terms and methodology for analyzing vocalizations in infant videos.", "metadata": { "doc_id": "Patten_Audio_20", "source": "Patten_Audio" } }, { "page_content": "Text: Videos were randomized and randomly distributed across the two coders with regard to diagnostic group. The 37 participants' videos were randomly split between the coders by participant and included both age ranges. The coders independently watched the videos, counting both syllables and canonical syllables in real time. This procedure is utilized regularly in the laboratories of the last author in accord with reasoning presented in recent\n\npapers, especially Ramsdell et al. (2012). This naturalistic listening approach mimics how a mother would hear her child, listening to each utterance only once.\n\nThe measure of canonical babbling ratio used here (number of canonical syllables divided by number of all syllables) is the measure utilized in the bulk of research on onset of canonical babbling to date. However, some studies have used a different ratio (number of canonical syllables divided by number of utterances). The former procedure is generally preferred nowadays because the resulting value can be interpreted as a proportion with values varying from 0 to 1 , whereas the latter procedure yields a ratio with no effective upper limit (Oller, 2000).\n\n\nContext: Methods section describing video coding procedures and justification for specific measurement choices.", "metadata": { "doc_id": "Patten_Audio_21", "source": "Patten_Audio" } }, { "page_content": "Text: In a coder agreement test, both observers independently coded twenty samples consisting of two five-minute segments of ten participants' video footage. A research assistant unaware of the study goals selected these test samples, and they represented both diagnostic groups and both ages. Reliability was gauged in accord with the degree to which coders agreed upon canonical syllables, total syllables, and whether the child was in the canonical babbling stage (i.e., had a canonical babbling ratio $>0.15$, the standard criterion). Inter-rater agreement ranged from good to excellent for canonical syllables (ICC $=.98, \\mathrm{CI} 95=.96-.99$ ) and for total syllables (ICC $=.87, \\mathrm{CI} 95=.61-.95$ ). Reliability for canonical babbling ratios was also good (ICC $=.89, \\mathrm{CI} 95=.69-.96$ ), with agreement on the canonical stage criterion at $95 \\%$ for the twenty samples. Additionally, the coders differed by an average of only $10 \\%$ of the total range of canonical babbling ratios obtained, and the correlation across the ratios for the twenty samples for the two coders was .89 . For volubility, the coders differed by an average of $13 \\%$ of the total range for volubility values, and the correlation across the twenty sample videos for the two coders was .91 .\n\nResults\n\nAnalyses were performed to confirm that the groups were matched on demographic variables. These analyses did not reveal significant differences between groups on any variable (see Table 1).\n\n\nContext: Coder reliability for video coding.", "metadata": { "doc_id": "Patten_Audio_22", "source": "Patten_Audio" } }, { "page_content": "Text: Results\n\nAnalyses were performed to confirm that the groups were matched on demographic variables. These analyses did not reveal significant differences between groups on any variable (see Table 1).\n\nInitial descriptive statistics for within- and between-group variables revealed two outliers in the ASD group. Both cases produced very high canonical babbling ratios in the 9-12 month range (.93 and .64) relative to the mean for both groups (ASD $=.12$ for the 23 cases, TD $=$. 17) (see Figure 1). Based on prior research, the canonical babbling ratios observed for these two ASD cases were substantially higher than would be expected in TD infants in the 9-12 month age range-infants grouped as having English or Spanish at home, as high or low SES, and as born at term or prematurely all showed mean canonical babbling ratios under .4 from 8 to 12 months of age (Oller, Eilers, Urbano, \\& Cobo-Lewis, 1997; Oller, Eilers, Steffens, Lynch, \\& Urbano, 1994). Analysis of z-scores revealed that infant 22 was 3.96 standard deviations above the mean for the present sample, and infant 23 was 2.73 standard deviations above the mean, further suggesting outlier status. On this basis we decided to eliminate these two cases in the primary analyses on canonical babbling; the remaining 35 cases ( 21 ASD, 14 TD) were analyzed to address our research questions regarding canonical babbling (see Figures 1 and 2 for canonical babbling ratios by participant at both ages, with\n\nthe two outliers indicated). However, there were no significant outliers with regard to volubility, and thus we included data from all 37 cases for that analysis (see Figures 3 and 4 for syllable volubility by participant at both ages).\n\nHypothesis 1: Infants later diagnosed with ASD will be less likely than typically developing infants to be in the canonical stage at each age (9-12 and 15-18 months)\n\n\nContext: A description of the data analysis methods and findings related to canonical babbling and syllable volubility in infants with ASD versus typically developing infants.", "metadata": { "doc_id": "Patten_Audio_23", "source": "Patten_Audio" } }, { "page_content": "Text: Hypothesis 1: Infants later diagnosed with ASD will be less likely than typically developing infants to be in the canonical stage at each age (9-12 and 15-18 months)\n\nLog odds ratios (log OR) were calculated to compare the classifications of both ASD and typically developing children with regard to their canonical babbling. The criterion for canonical babbling stage was set at $15 \\%$ or greater canonical syllables compared to all syllables; this is a common criterion in studies of canonical babbling, and is based on data reviewed in Oller (2000). TD infants were significantly more likely to have reached the canonical babbling stage based on the criterion than were infants later diagnosed with ASD at the $9-12$ months age range ( $\\mathrm{N}=35, \\log \\mathrm{OR}=2.84, \\mathrm{CI}{95}=1.02$ to $4.66, p=0.002$ ), and remained more likely at the $15-18$ month age range ( $\\mathrm{N}=35, \\log \\mathrm{OR}=1.78, \\mathrm{CI}{95}=-0.04$ to $3.61, p=0.054$ ). As an easily interpretable effect size measure, the simple odds ratios (as opposed to the log odds ratio, which is statistically preferable for significance testing with small N's) can be considered; the simple ORs indicated TD infants were 17 times more likely $(\\mathrm{OR}=17.1)$ to be categorized as in the canonical stage than ASD infants at $9-12$ months and 6 times more likely $(\\mathrm{OR}=5.96)$ at $15-18$ months.\n\nHypothesis 2: Infants later diagnosed with ASD will demonstrate significantly lower canonical babbling ratios (independent of the canonical stage criterion) compared to typically developing infants\n\n\nContext: Results of a study examining early language development in infants later diagnosed with ASD compared to typically developing infants.", "metadata": { "doc_id": "Patten_Audio_24", "source": "Patten_Audio" } }, { "page_content": "Text: Hypothesis 2: Infants later diagnosed with ASD will demonstrate significantly lower canonical babbling ratios (independent of the canonical stage criterion) compared to typically developing infants\n\nCanonical babbling ratios of infants later diagnosed with ASD and TD infants were contrasted using a Mixed ANOVA. The between-subjects variable was diagnostic category (ASD vs. TD) and the within-subjects variable was age range ( $9-12$ months and 15-18 months). The mean canonical babbling ratios at $9-12$ months were $.06(\\mathrm{SD}=.06)$ for the 21 infants later diagnosed with ASD and $.17(\\mathrm{SD}=.13)$ for the 14 TD infants; at 15-18 months the values were $.16(\\mathrm{SD}=.22)$ and $.28(\\mathrm{SD}=.16)$ respectively (Figure 5). Analyses revealed a significant main effect for diagnostic category $(F(1,1)=6.79, p=.01, \\mathrm{~B}{\\mathrm{p}}{ }^{2}=0.17)$, with infants later diagnosed with ASD producing significantly lower canonical babbling ratios, and a significant main effect for age $(F(1,1)=7.86, p<.01, \\mathrm{~B}{\\mathrm{p}}{ }^{2}=0.19)$, with higher canonical babbling ratios at the older age. The effect size between groups for 9-12 months was $\\mathrm{d}=1.09$ (a large effect) and for 15-18 months was .62 (a moderate effect; Cohen, 1992). The age by diagnosis interaction was not significant $(p>0.66)$.\n\nHypothesis 3: Infants later diagnosed with ASD will demonstrate significantly fewer total vocalizations (lower volubility) at both age ranges compared to typically developing infants\n\nFor this analysis, all 37 infants were included because there were no significant outliers. Volubility of infants later diagnosed with ASD and TD infants were contrasted using a Mixed ANOVA. The between-subjects variable was diagnostic category (ASD vs. TD), and the within-subjects variable was age range ( $9-12$ months and $15-18$ months). Infants later diagnosed with ASD produced a mean of $4.55(\\mathrm{SD}=.59)$ syllables per minute while TD\n\n\nContext: Results of a study examining vocalization patterns in infants later diagnosed with ASD compared to typically developing infants.", "metadata": { "doc_id": "Patten_Audio_25", "source": "Patten_Audio" } }, { "page_content": "Text: infants produced a mean of $5.86(\\mathrm{SD}=.67)$ syllables per minute at $9-12$ months. At 15-18 months, infants later diagnosed with ASD produced a mean of $3.24(\\mathrm{SD}=.49)$ syllables per minute while TD infants produced a mean of $4.63(\\mathrm{SD}=.51)$ syllables per minute (see Figure 6). Analyses revealed a significant main effect for diagnostic category $(F(1,1)=$ $4.85, p=.034, \\mathrm{~g}{\\mathrm{p}}{ }^{2}=0.12)$, and for age $(F(1,1)=4.96, p=.032, \\mathrm{~g}{\\mathrm{p}}{ }^{2}=0.12)$. Thus, infants later diagnosed with ASD displayed significantly lower volubility than TD infants. The effect size for group at 9-12 months was $\\mathrm{d}=2.07$ (large) and at 15-18 months was 2.77 (large).\n\nHypothesis 4: A combined analysis using both volubility and canonical babbling status will significantly predict group membership\n\nLogistic regression analysis was conducted to test whether canonical babbling status (whether each participant met the .15 criterion) and volubility at age ranges of 9-12 months and 15-18 months could reliably predict later diagnosis status (group membership). This test was conducted with all 37 cases included, partly in order to match the number of cases for the two predictor variables and partly because the goal of the analysis was to determine the potential practical utility of identification of these children without any information other than volubility and canonical babbling ratio. This test may thus be the one of primary clinical interest, since it evaluates the circumstance that screening implies, where there would be no basis for knowing whether an infant might be an outlier on any variable. Without this evaluation there would be no direct indication in our results of the degree of group discriminability.\n\n\nContext: Results of a study examining vocal development in infants later diagnosed with ASD compared to typically developing infants.", "metadata": { "doc_id": "Patten_Audio_26", "source": "Patten_Audio" } }, { "page_content": "Text: Statistical significance was reached in a test of the full model against a constant-only model, which indicated that, as a set, canonical babbling status and volubility reliably predicted later diagnosis $\\left(\\chi^{2}=9.82, p=0.044, d f=4\\right)$. A small-to-moderate relationship between prediction and grouping was observed (Nagelkerke's $R^{2}=0.317$ ), with an overall prediction success of $75 \\%$ ( $64 \\%$ for TD and $82 \\%$ for ASD). However, further examination of the predictors using the Wald criterion revealed that when all four predictor variables were included in the model, none significantly contributed to prediction of group membership at an individual level ( $p>0.05$ ). The status of infants with regard to canonical babbling stage at the 9-12 months age range provided the largest observed predictive contribution, Wald $=$ $3.06, p=0.08, E X P(B)=0.198$. The contribution to group discriminability by volubility at $9-12$ and $15-18$ months age ranges approached nil, $E X P(B)=0.992$ and 0.985 respectively.\n\nExamination of the correlations among the predictor variables showed that all but volubility at 9-12 months were significantly correlated with all other predictors (Table 6), with volubility at 9-12 months significantly correlated with only canonical babbling at 9-12 months. This inter-relation among the predictor variables suggests that, to some degree, they account for some of the same variance in diagnosis. However, the observed $E X P(B)$ values (odds ratios of the outcomes given the value of an individual predictor) more strongly suggest that canonical babbling at 9-12 months accounted for the bulk of the variability in diagnosis.\n\nIt seems clear that significance of the individual predictors in the logistic regression may have been hampered by the high level of relation among them. Individually, the volubility\n\n\nContext: Results of a logistic regression analysis examining predictors of autism diagnosis based on babbling and volubility measures.", "metadata": { "doc_id": "Patten_Audio_27", "source": "Patten_Audio" } }, { "page_content": "Text: It seems clear that significance of the individual predictors in the logistic regression may have been hampered by the high level of relation among them. Individually, the volubility\n\nvariables did not appear to have much influence given small Betas and high $p$-values. When predictors were entered into the model in a hierarchical fashion, no matter how predictor entry was ordered ( $9-12$ month variables at step 1 and $15-18$ month variables at step 2 , or CB variables at step 1 and volubility at step 2 ), only the $9-12$ month CB variable was a significant independent predictor. $\\mathrm{R}^{2}$ changes and diagnostic ability for all of the regression and regression step iterations suggested little was added to the $\\mathrm{R}^{2}$ by adding variables in the second step (even with two more predictor variables, only $\\sim 2-3 \\%$ was added to $\\mathrm{R}^{2}$ ), nor did these additions substantially alter the ability of the model to predict later diagnosis. The most efficient model appeared to be a logistic regression with $9-12$ month CB as the only predictor.\n\nDiscussion\n\nThe importance of early intervention for children with ASD has resulted in attempts to quantify behaviors in infancy that may lead to early detection. Substantial effort has addressed gestural and social development and their potential roles in detection within the first year of life (e.g., Watson, Crais, Baranek, Dykstra, \\& Wilson, 2013). The present results offer parallel findings in the domain of vocal development by demonstrating significant group differences in canonical babbling status, canonical babbling ratio, and total syllables produced (volubility) during the first year of life.\n\n\nContext: Results of a study examining vocal development in infants with and without ASD, specifically addressing the findings of a logistic regression analysis.", "metadata": { "doc_id": "Patten_Audio_28", "source": "Patten_Audio" } }, { "page_content": "Text: In our study, infants later diagnosed with ASD were significantly less likely to be classified as being in the canonical babbling stage, and demonstrated significantly reduced canonical babbling ratios compared to TD peers. Although significant group differences were apparent in both age ranges ( $9-12$ and $15-18$ months), the effect sizes for canonical babbling were larger at 9-12 months. Paul et al. (2011) demonstrated similar results in infants at high-risk for developing ASD who produced significantly lower canonical babbling ratios compared to low risk infants at 9 months, though at 12 months the differences were not statistically significant. Combined with the finding from Oller et al. (2010) that children with ASD up to 48 months of age show low canonical syllable production, the data here suggest that low production of canonical syllables may be a helpful marker for ASD from infancy into early childhood.\n\n\nContext: Findings regarding canonical babbling in infants later diagnosed with ASD compared to typically developing peers.", "metadata": { "doc_id": "Patten_Audio_29", "source": "Patten_Audio" } }, { "page_content": "Text: Since canonical babbling is well established in the vast majority of TD infants by 10 months (Eilers and Oller, 1994), it might seem odd that several of the TD infants ( 5 at $9-12$ months and 3 at $15-18$ months) in the present study provided samples that did not meet the .15 canonical babbling ratio criterion for assignment to the canonical stage of vocal development. However, it is important to consider the fact that even infants who are clearly in the canonical stage based on parent report often fail to reach the criterion in a single laboratory sample of 20-30 minutes (Lewedag, 1995). In addition, unlike the samples in prior research on canonical babbling, the samples here were not designed to elicit vocalizations, and consequently they may have been less rich in quantity and variety of vocalization than the samples that were used to develop the criterion. Further, our samples at $9-12$ months were only 10 minutes in duration, and at $15-18$ months an average of slightly less than 10 minutes; it has been shown that variability in obtained canonical babbling ratios increases as the length of samples decreases (Molemans, 2011; Molemans, Van den Berg,\n\nVan Severen, \\& Gillis, 2011). Finally, our samples were based on home recordings with considerable noise and variable camera management that may have impeded our ability to recognize vocalizations in the samples. Consequently, we are not surprised that some of the TD infants failed to reach the criterion used to determine canonical status based on laboratory samples.\n\n\nContext: Discussion of the study's findings regarding canonical babbling ratios in typically developing infants, addressing why some did not meet the established criterion.", "metadata": { "doc_id": "Patten_Audio_30", "source": "Patten_Audio" } }, { "page_content": "Text: Given the strong links between the onset of canonical babbling and language development (Oller, Eilers, Urbano, \\& Cobo-Lewis, 1997; Stoel-Gammon, 1989), delayed onset of canonical babbling in infants with ASD may reflect latent communication impairment. It also may be that delayed canonical babbling directly contributes to communication symptoms in ASD. Canonical babbling requires motor ability as well as motivation to produce syllables, and practice in babbling may lay critical foundations for speech.\n\nProspective research on motor development in infants later diagnosed with ASD is sparse and often limited to high-risk groups, but available research does indicate that early motor impairment may be present (e.g., Matson, Mahan, Fodstad, Hess, \\& Neal, 2010; Manjiviona \\& Prior 1995; Page \\& Boucher, 1998; Teitelbaum, Teitelbaum, Nye, Fryman, \\& Maurer, 1998). Thus delayed canonical babbling may reflect an immature or disordered motor system with specific implications for speech.\n\n\nContext: Discussion of potential links between delayed canonical babbling and motor development in infants later diagnosed with ASD.", "metadata": { "doc_id": "Patten_Audio_31", "source": "Patten_Audio" } }, { "page_content": "Text: If language develops as a consequence of social reinforcement of speech-like sounds that eventually evolve into true words (consider behavioral models of language development as in Hulit \\& Howard, 2002; Goldstein, King, \\& West, 2003; Goldstein \\& Schwade, 2008; Goldstein \\& West, 1999), then social reinforcement may encourage the production of canonical babbling. Children with ASD may be less motivated by social reinforcement, yielding less frequent vocal exploration and production of canonical syllables than in TD infants. To add to the problem, a delay in canonical babbling may result in reduction in caregiver social-communication directed toward the infant. On average, by six to seven months and very rarely later than ten months, canonical babbling emerges in TD infants (Eilers \\& Oller, 1994). In response to recognition of canonical babbling, caregivers alter their communication pattern, sometimes attempting to direct the infant toward using canonical syllables meaningfully—for example, the parent who hears [baba] may reply, \"Yes, that's a bubble\" (Papoušek, 1994; Stoel-Gammon, 2011). Therefore, infants who are delayed in canonical babbling may also be delayed in their exposure to important linguistic input, and thus may be given less opportunity to learn words. A final point is that infants with ASD may simply have lower motivation to vocalize socially in the first place. This lower motivation could provide a further basis for slow vocabulary learning.\n\n\nContext: Discussion of potential reasons for differences in vocal development (canonical babbling) between children with Autism Spectrum Disorder (ASD) and typically developing (TD) infants.", "metadata": { "doc_id": "Patten_Audio_32", "source": "Patten_Audio" } }, { "page_content": "Text: Our results on volubility included two statistically reliable findings. First, children in both groups had lower volubility at the second age than at the first. We attribute no particular theoretical importance to this finding but we take note of the fact that the lower level of volubility at 15-18 months compared to 9-12 months did correspond to greater physical movement of the children at the older age. In both groups combined, level of physical restriction during the selected recording samples was significantly less at the older age ( $\\mathrm{p}<$. 001). As reported earlier, level of physical restriction was not significantly different between diagnostic groups.\n\nThe second volubility finding is that infants later diagnosed with ASD produced significantly fewer vocalizations deemed to be relevant for the emergence of speech (both canonical and non-canonical sounds) at both age ranges ( $9-12$ and 15-18 months) compared to TD peers. Other research has demonstrated that infants with ASD direct fewer vocalizations to others (Ozonoff et al., 2010); our study extends this finding to a more general measure of volubility in terms of total vocalizations (syllables) rather than only ones directed to others. Our finding is also congruent with results from automated analysis of allday recordings indicating low volubility in children with ASD at 16-48 months of age (Warren et al. 2010). The results may seem to run counter to Paul et al. (2011) whose sample of high-risk infants were reported to not produce significantly fewer vocalizations than low-risk infants. However, as described in the introduction above, the Paul et al. study did not report data in a way that can be directly compared with the volubility data reported here.\n\n\nContext: Discussion of volubility findings in a study comparing vocalization patterns of infants later diagnosed with ASD and typically developing peers.", "metadata": { "doc_id": "Patten_Audio_33", "source": "Patten_Audio" } }, { "page_content": "Text: Some disability groups (e.g., hearing impaired infants and infants with cleft palate) have been reported to exhibit volubility similar to that of TD infants (Clement, 2004; Chapman, Hardin-Jones, Schulte, \\& Halter, 2001; Van den Dikkenberg-Pot, Koopmans-van Beinum, \\& Clement, 1998; Nathani, Oller, \\& Neal, 2007; Davis, Morrison, von Hapsburg, \\& WarnerCzyz, 2005); however, infants from low SES households have been reported to have significantly decreased volubility in comparison to those from higher SES households (Oller, Eilers, Basinger, Steffens, \\& Urbano, 1995). Children from low SES backgrounds are often presumed to be at-risk for language deficits. Although it would be impossible to identify and quantify all of the mechanisms through which poverty may affect language development, research has demonstrated that the amount of communication caregivers direct toward their children is decreased in low SES situations (Hart \\& Risley, 1995). This impoverished linguistic environment may result in decreased dyadic social and communicative interactions and thus in a decrease in overall volubility of infants.\n\n\nContext: Factors influencing infant volubility, including disability, socioeconomic status, and caregiver communication.", "metadata": { "doc_id": "Patten_Audio_34", "source": "Patten_Audio" } }, { "page_content": "Text: It is important to note that the relatively well-matched SES between our two groups suggests that the differences in volubility were not attributable to differences in SES. In the case of low SES households, an impoverished linguistic environment due to lack of parent responsiveness might be expected to lead to decreased volubility of the infant and later language difficulty. For infants later diagnosed with ASD, reduced volubility may be affected by multiple factors, not related to inherent parental responsiveness, but related instead to the social impairments of ASD. One issue is that these children may experience less linguistic stimulation due to having disrupted sensory processing systems corresponding to sensory hyporesponsiveness; children with ASD are less likely to respond, or require substantially more stimulation to respond to environmental events (Baranek, 1999; Baranek et al., 2013); Miller, Reisman, McIntosh, \\& Simon, 2001; Rogers \\& Ozonoff, 2005). This characteristic of ASD is also reflected in the tendency for infants as young as eight months who will later be diagnosed with ASD to be less likely than TD infants to respond to their name being called (Werner, Dawson, Osterling, \\& Dinno, 2000). This lack of responsiveness may indicate that infants with ASD are less affected by vocal communication from caregivers than TD infants. If so, the lack of responsiveness may reflect an effectively impoverished linguistic environment because of attenuated reception of caregiver input by infants with ASD and subsequent communication impairments. Indeed,\n\nsensory hyporesponsiveness has been shown to be associated with poorer language functioning in children with ASD (Watson et al., 2011).\n\n\nContext: Discussion of potential reasons for reduced volubility in infants later diagnosed with ASD, specifically relating to sensory processing and responsiveness.", "metadata": { "doc_id": "Patten_Audio_35", "source": "Patten_Audio" } }, { "page_content": "Text: sensory hyporesponsiveness has been shown to be associated with poorer language functioning in children with ASD (Watson et al., 2011).\n\nAn additional way that the environment for children with ASD may be impoverished could involve a social feedback loop (Warlaumont et al., 2010) that is under investigation using automated analysis of vocalizations of parents and infants from all-day home recordings. Since infants with ASD produce fewer canonical syllables than TD infants, and since parents respond strongly with language stimulation to canonical syllables, an infant with ASD may actually hear less language from parents, because parents provide input that is tied to the infant's output. The infant's low volubility may then be aggravated by lower input levels resulting from the infant's own anomalous pattern of vocalization.\n\nFinally, the logistic regression analysis with four independent variables (age 1 and age 2 canonical babbling classification and age 1 and age 2 volubility) demonstrated that classification of diagnostic category (ASD vs. TD) could be predicted with $75 \\%$ accuracy, even when the two outliers were included. The model more accurately classified infants later diagnosed with ASD (Sensitivity $=82.6 \\%$ ) than TD infants (Specificity $=64.3 \\%$ ). The strongest predictor of group membership was canonical babbling classification at 9-12 months as it alone correctly classified $90 \\%$ of infants later diagnosed with ASD and $63 \\%$ of TD infants. Thus, in the search for markers of ASD risk in infancy, canonical babbling status at $9-12$ months appears to be the single best candidate among the variables considered in the current study. The utility of the measure as a group marker is age dependent, since a larger proportion of infants in the ASD group at 15-18 months had reached the canonical stage than at $9-12$ months.\n\n\nContext: Discussion of potential environmental factors impacting language development in children with ASD, specifically relating to parental vocalizations and infant output.", "metadata": { "doc_id": "Patten_Audio_36", "source": "Patten_Audio" } }, { "page_content": "Text: To help better understand the high canonical babbling ratios of the two outliers, the coders, both certified speech-language pathologists, viewed the videos from those infants again after their outlier status was identified. We speculated that the outlier status of these two infants may be related to the phenomenon of motor stereotypy that is common in ASD, that is, that the two infants were engaged, at least in the 9-12 month samples, in a motor stereotypy focused precisely on canonical babbling. In re-examining the videos of the two outliers, the coders looked for qualitative evidence that might speak to the credibility of this speculation. In the second viewing of the recordings, the coders noticed that the first outlier infant produced the majority of the canonical syllables during a single scene while walking outside. He repeatedly produced a [da] syllable during this brief episode, but did not direct his vocalizations to the caregiver. The sense that a prelinguistic vocal stereotypy may have been operating was enhanced by the fact that the same syllable was repeated throughout. The stereotypy of canonical babbling in this infant was reported by the coders as constituting the only evidence either had noticed as specifically suggesting the possibility of ASD while they were coding, and thus, this was the single case where the intended blinding of the coders to diagnostic group seems to have been foiled. The coders did not observe any other stereotypic behaviors vocal or otherwise in these samples. The second infant engaged in high canonical babble production while roughhousing with his father, but to our clinical eyes, that behavior did not seem particularly unusual. Further research on the possibility that babbling can be a focus of motor stereotypy in ASD seems in order. It may be worthy of\n\nnote that the two outliers' CARS scores ( 25 and 31 ) fell within the range of the scores for the ASD group (23-50).\n\n\nContext: Discussion of potential motor stereotypy influencing babbling patterns in autistic infants, specifically regarding outlier cases in the study.", "metadata": { "doc_id": "Patten_Audio_37", "source": "Patten_Audio" } }, { "page_content": "Text: note that the two outliers' CARS scores ( 25 and 31 ) fell within the range of the scores for the ASD group (23-50).\n\nIn addition to the findings suggesting possible clinically useful markers for ASD, the present results provide a new scientific view on the robustness of canonical babbling. There has been no prior empirical indication that canonical babbling onset is delayed in ASD, nor that volubility is low in infants later diagnosed with ASD. Our results thus suggest that the development of vocalization in infancy is affected by whatever the fundamental disorders of ASD may be. Assuming ASD to be a social disorder, it is not obvious that babbling would necessarily be disturbed in the disorder because the extent to which babbling is a social (as opposed to an endogenously generated) phenomenon is itself an empirical question. Our results can then be thought to provide a new empirical perspective on the possible social nature of babbling. The results also suggest that the vocal differentiation of the two groups is robust, given the relative clarity of the results indicating low canonical babbling and volubility in the infants in the ASD group, even though we had samples of low recording quality and very limited duration. The results seem especially significant in the context of a broad body of research cited above on the robustness of canonical babbling as a foundation for language and on the robust resistance of canonical babbling to delay as seen in prior studies cited in our paper-no delay has been found in cases of prematurity, low SES, or multilingual exposure.\n\nFuture Directions and Limitations\n\n\nContext: Discussion of results and limitations of the study.", "metadata": { "doc_id": "Patten_Audio_38", "source": "Patten_Audio" } }, { "page_content": "Text: Future Directions and Limitations\n\nThis study provides a proof of concept regarding the notion of atypical emergence of the canonical babbling stage in the developing infant who will later be diagnosed with ASD and the possibility that tracking canonical babbling in infancy may add to our repertoire of markers for ASD prior to one year of age. Future research to address some of the limitations of the current study and advance our understanding of the development of canonical babbling among infants with ASD is warranted by the findings of the current study. One limitation in the current study was the lack of a comparison group of infants with later diagnoses of non-ASD disabilities, which prevents us from definitively attributing the differences found in this study to ASD rather than general impairments in cognition or communication. Our working hypothesis to test in future studies will be that these differences in canonical babbling onset and in volubility are specific to ASD.\n\nAnother limitation was that our study used only short video segments from each time point, which surely impacted our ability to precisely assess important aspects of vocalization, because it has been shown that variability in obtained canonical babbling ratios increases as the length of samples decreases (Molemans, 2011; Molemans, Van den Berg, Van Severen, \\& Gillis, 2011). The low canonical babbling ratios obtained for a few of the TD infants presumably would not have occurred with larger sample sizes. In future studies we hope to obtain longer samples, and if possible to more precisely identify canonical babbling onset through longitudinal laboratory assessments paired with caregiver report of onset. But of course to make this possible, prospective studies may be necessary, with several years of follow-up, presumably taking advantage of the opportunity presented by sibling studies. Such studies would also afford the opportunity to obtain much better recordings than are\n\n\nContext: Discussion of study limitations and future research directions regarding canonical babbling and ASD diagnosis.", "metadata": { "doc_id": "Patten_Audio_39", "source": "Patten_Audio" } }, { "page_content": "Text: available in retrospective studies such as the present one. Indeed, sibling studies can now capitalize on all-day recording, yielding the opportunity to assess vocal development in ASD with much greater ecological validity and representativeness.\n\nOnset of canonical babbling usually occurs between 5 and 9 months in TD infants. It appears from the present data that onset may occur within a much wider range in ASD. Quantification of onset in ASD may yield prognostic value regarding core communication symptoms. For example, if canonical stage onset is delayed beyond a certain threshold, the infant may be at especially high-risk for remaining nonverbal. Discovery of such a delay could allow specific interventions to be tailored based on prognosis earlier in development.\n\nFuture research should also focus on caregivers and their roles in canonical stage development and its identification. Prior work suggests that with TD infants, parents are extremely accurate in their reports of the onset of canonical babbling (Oller et al., 2001). If caregivers of infants with ASD are similarly capable of identifying onset of canonical babbling, it may be possible to use canonical babbling onset as part of a parent-report screening tool for early identification. In addition, alterations in communication directed to infants by caregivers as canonical babbling emerges may help to elicit and maintain socialcommunicative interaction, and subsequently impact language development.\n\n\nContext: Discussion of future research directions regarding canonical babbling in ASD, including caregiver roles and potential screening tools.", "metadata": { "doc_id": "Patten_Audio_40", "source": "Patten_Audio" } }, { "page_content": "Text: Our findings on volubility represent another potential avenue for understanding early socialcommunication development processes in ASD. Perhaps the most intriguing aspect of this possibility is suggested by the proposal that there may be a feedback loop involving low canonical syllable production in ASD followed by low parental rate of vocalization to infants, aggravating the low volubility and low rate of canonical syllables in ASD (Warlaumont et al., 2010). We anticipate rapid growth of studies tracking this possibility, especially since there is a rapidly growing possibility of conducting some aspects of such analysis based on automated classification of vocalizations in all-day recordings as indicated by the growth of LENA system studies.\n\nClinical Implications\n\nOur findings suggest that canonical babbling should be considered an important milestone in infancy that may be delayed in infants who are later diagnosed with ASD. If infants demonstrate delays in canonical babbling, a developmental assessment that includes evaluation of early warning signs for ASD should be administered. Although volubility appears less promising as a marker for ASD, it may be useful in combination with other items in the context of early identification screening tools. For infants demonstrating either low canonical babbling ratios or low volubility, interventions to draw infants' attention to social-communicative stimuli in that context of dyadic interactions may help stimulate growth of vocal communication.\n\nAcknowledgments\n\nThis research was made possible through a grant from the National Institute for Child Health and Human Development (R01-HD42168) and a grant from Cure Autism Now Foundation (Sensory-Motor and SocialCommunicative Symptoms of Autism in Infancy). We thank the families whose participation made this study possible and the staff who collected and processed data for this project.\n\nReferences\n\n\nContext: Discussion of findings related to volubility, potential feedback loops in ASD, and clinical implications for early identification and intervention.", "metadata": { "doc_id": "Patten_Audio_41", "source": "Patten_Audio" } }, { "page_content": "Text: References\n\nAcevedo MC. The role of acculturation in explaining ethnic differences in the prenatal health-risk behaviors, mental health, and parenting beliefs of Mexican American and European American atrisk women. Child Abuse \\& Neglect. 2000; 24:111-127. [PubMed: 10660014] American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 4. Washington, DC: Author; 2000. text rev Baranek GT. Autism during infancy: A retrospective video analysis of sensory-motor and social behaviors at 9-12 months of age. Journal of Autism and Developmental Disorders. 1999; 29:213224. [PubMed: 10425584]\n\n\nContext: A list of references cited within a research paper investigating early language development and behaviors in infants with and without autism spectrum disorder, using video analysis.", "metadata": { "doc_id": "Patten_Audio_42", "source": "Patten_Audio" } }, { "page_content": "Text: Baranek GT, David FJ, Poe MD, Stone WL, Watson LR. Sensory Experiences Questionnaire: Discriminating sensory features in young children with autism, developmental delays, and typical development. Journal of Child Psychology and Psychiatry. 2006; 47(6):591-601. [PubMed: 16712636] Baranek GT, Watson LR, Boyd BA, Poe MD, David FJ, McGuire L. Hyporesponsiveness to social and nonsocial sensory stimuli in children with autism, children with developmental delays, and typically developing children. Development and Psychopathology. 2013; 25(2013):307-320. [PubMed: 23627946] Bleile KM, Stark RE, McGowan JS. Speech development in a child after decannulation: Further evidence that babbling facilitates later speech development. Clinical Linguistics and Phonetics. 1993; 7:319-337. Center for Disease Control and Prevention. Prevalence of autism spectrum disorders -Autism and developmental disabilities monitoring network, 14 Sites, United States, 2008. Morbidity and Mortality Weekly Report Surveillance Summaries. 2012; 61:1-19. Retrieved from http:// www.cdc.gov/mmwr/preview/mmwrhtml/ss6103a1.htm. Clarke E, Reichard U, Zuberbühler K. The anti-predator behaviour of wild white-handed gibbons (Hylobates lar). Behavioral Ecology and Sociobiology. 2012; 66:85-96.10.1007/ s00265-011-1256-5 Chapman K, Hardin-Jones M, Schulte J, Halter K. Vocal development of 9 month-old babies with cleft palate. Journal of Speech, Language and Hearing Research. 2001; (44):1268-1283. Chawarska K, Klin A, Paul R, Macari Volkmar F. A prospective study of toddlers with ASD: Shortterm diagnostic and cognitive outcomes. Journal of Autism and Developmental Disorders. 2009; 50(10):1235-1245. Clement, CJ. PhD Dissertation. Netherlands Graduate School of Linguistics; Amsterdam: 2004. Development of vocalizations in deaf and normally hearing infants. Cobo-Lewis AB, Oller DK, Lynch MP, Levine SL. Relations of motor and vocal milestones in typically developing infants and infants with Down syndrome. American Journal on Mental Retardation. 1996; 100:456-467.\n\n\nContext: A review of existing literature on autism, developmental delays, and typical development, including studies on sensory experiences, speech development, and vocalizations.", "metadata": { "doc_id": "Patten_Audio_43", "source": "Patten_Audio" } }, { "page_content": "Text: AB, Oller DK, Lynch MP, Levine SL. Relations of motor and vocal milestones in typically developing infants and infants with Down syndrome. American Journal on Mental Retardation. 1996; 100:456-467. [PubMed: 8852298] Cohen, J. Statistical Power Analysis for the Behavioral Sciences. Hillsdale,, NJ: Erlbaum Associates; 1988.\n\n\nContext: References cited in the study.", "metadata": { "doc_id": "Patten_Audio_44", "source": "Patten_Audio" } }, { "page_content": "Text: Davis BL, Morrison HM, von Hapsburg D, Warner AD. Early vocal patterns in infants with varied hearing levels. Volta Review. 2005; 105(1):7-27. Delgado CEF, Messinger DS, Yale ME. Infant responses to direction of parental gaze: A comparison of two still-face conditions. Infant Behavior and Development. 2002; 25(3):311-318. Eilers RE, Oller DK, Levine S, Basinger D, Lynch MP, Urbano R. The role of prematurity and socioeconomic status in the onset of canonical babbling in infants. Infant Behavior and Development. 1993; 16:297-315. Eilers RE, Oller DK. Infant vocalizations and the early diagnosis of severe hearing impairment. The Journal of Pediatrics. 1994; 124(2):199-203. [PubMed: 8301422] Goldstein MH, Schwade JA, Bornstein MH. The value of vocalizing: Five-month-old infants associate their own noncry vocalizations with responses from adults. Child Development. 2009; 80:636644. [PubMed: 19489893]\n\nGoldstein MH, King AP, West MJ. Social interaction shapes babbling: Testing parallels between birdsong and speech. Proceedings of the National Academy of Sciences. 2003; 100(13):80308035.\n\n\nContext: A list of references cited in a study examining early vocal patterns in infants with and without autism spectrum disorder.", "metadata": { "doc_id": "Patten_Audio_45", "source": "Patten_Audio" } }, { "page_content": "Text: Goldstein MH, King AP, West MJ. Social interaction shapes babbling: Testing parallels between birdsong and speech. Proceedings of the National Academy of Sciences. 2003; 100(13):80308035.\n\nGoldstein MH, Schwade JA. Social feedback to infants' babbling facilitates rapid phonological learning. Psychological Science. 2008; 19:515-522. [PubMed: 18466414] Goldstein MH, West MJ. Consistent responses of human mothers to prelinguistic infants: The effect of prelinguistic repertoire size. Journal of Comparative Psychology. 1999; 113(1):52-58. [PubMed: 10098268] Hart, B.; Risley, TR. Meaningful differences in the everyday experience of young American children. Baltimore: Paul H. Brookes; 1995. Hulit, LM.; Howard, MR. Born to talk: An introduction to speech and language development. Boston: Allyn and Bacon; 2002. Kent R, Osberger MJ, Netsell R, Hustedde CG. Phonetic development identical twins differing in auditory function. Journal of Speech and Hearing Disorders. 1987; 52:64-75. [PubMed: 3807347] Koopmans-van Beinum, FJ.; Clement, CJ.; van den Dikkenberg-Pot, I. Influence of lack of auditory speech perception on sound productions of deaf infants. Berne, Switzerland: International Society for the Study of Behavioral Development; 1998. Koopmans-van Beinum, FJ.; van der Stelt, JM. Early stages in the development of speech movements. In: Lindblom, B.; Zetterstrom, R., editors. Precursors of early speech. New York: Stockton Press; 1986. p. 37-50.\n\n\nContext: A review of literature supporting the role of social interaction in early language development, including babbling.", "metadata": { "doc_id": "Patten_Audio_46", "source": "Patten_Audio" } }, { "page_content": "Text: Lewedag, VL. Doctoral Dissertation. University of Miami; Coral Gables, FL: 1995. Patterns of onset of canonical babbling among typically developing infants. Locke JL, Pearson D. Linguistic significance of babbling: Evidence from a tracheostomized infant. Journal of Child Language. 1990; 17:1-16. [PubMed: 2312634] Lord C. Follow-up of two-year-olds referred for possible autism. Journal of Child Psychology and Psychiatry, and Allied Disciplines. 1995; 36(8):1365-1382. Lord, C.; Rutter, M.; DiLavore, P.; Risi, S. Autism Diagnostic Observation Schedule (ADOS). Los Angeles, CA: Western Psychological Services; 1999. Lynch MP, Oller DK, Steffens ML, Levine SL, Basinger D, Umbel V. The onset of speech-like vocalizations in infants with Down syndrome. American Journal of Mental Retardation. 1995; 100(1):68-86. [PubMed: 7546639] Lynch MP, Oller DK, Steffens ML, Buder EH. Phrasing in prelinguistic vocalizations. Developmental Psychobiology. 1995; 28:3-23. [PubMed: 7895922] Manjiviona J, Prior M. Comparison of Asperger syndrome and high-functioning autistic children on a test of motor impairment. Journal of Autism and Developmental Disorders. 1995; 25:23-29. [PubMed: 7608032] Masataka N. Why early linguistic milestones are delayed in children with Williams syndrome: Late onset of hand banging as a possible rate-limiting constraint on the emergence of canonical babbling. Developmental Science. 2001; 4:158-164. Matson JL, Fodstad JC, Dempsey T. What symptoms predict the diagnosis of autism or PDD-NOS in infants and toddlers with developmental delays using the Baby and Infant Screen for Autism Traits. Developmental Neurorehabilitation. 2009; 12(6):381-388. [PubMed: 20205546] Matson JL, Mahan S, Hess JA, Fodstad JC, Neal D. Convergent validity of the Autism Spectrum Disorder-Diagnostic for Children (ASD-DC) and Childhood Autism Rating Scales (CARS). Research in Autism Spectrum Disorders. 2010; 4(4):633-638. Miller, LJ.; Reisman, JE.; McIntosh, DN.; Simon, J. An ecological model of sensory modulation: Performance in children\n\n\nContext: A literature review section of a dissertation on early language development in autistic and typically developing infants.", "metadata": { "doc_id": "Patten_Audio_47", "source": "Patten_Audio" } }, { "page_content": "Text: Rating Scales (CARS). Research in Autism Spectrum Disorders. 2010; 4(4):633-638. Miller, LJ.; Reisman, JE.; McIntosh, DN.; Simon, J. An ecological model of sensory modulation: Performance in children with fragile X syndrome, autistic disorder, attention-deficit/hyperactivity disorder, and sensory modulation dysfunction. In: Smith-Roley, S.; Blanche, EI.; Schaaf, RC., editors. Understanding the Nature of Sensory Integration with Diverse Populations. San Antonio, TX: Therapy Skill Builders; 2001. p. 57-88. Molemans, I. PhD. University of Antwerp; Antwerp, Belgium: 2011. Sounds like babbling: A Longitudinal investigation of aspects of the prelexical speech repertoire in young children acquiriing Dutch: Normally hearing children and hearing impaired children with a cochlear implant.\n\n\nContext: A review of related research on autism spectrum disorders, sensory modulation, and language acquisition.", "metadata": { "doc_id": "Patten_Audio_48", "source": "Patten_Audio" } }, { "page_content": "Text: Molemans I, Van den Berg R, Van Severen L, Gillis S. How to measure the onset of babbling reliably. Journal of Child Language. 2011; 39:1-30. [PubMed: 21418730] Mullen, EM. Mullen Scales of Early Learning: AGS Edition. Circle Pines, MN: American Guidance Service; 1995. Nathani S, Oller DK, Neal AR. On the robustness of vocal development: An examination of infants with moderate-to-severe hearing loss and additional risk factors. Journal of Speech, Language, and Hearing Research. 2007; 50(6):1425-1444. Obenchain P, Menn L, Yoshinaga-Itano C. Can speech development at 36 months in children with hearing loss be predicted from information available in the second year of life? Volta Review. 1998; 100:149-180. Oller, DK. The emergence of the sounds of speech in infancy. In: Yeni-Komshian, G.; Kavanagh, J.; Ferguson, C., editors. Child phonology, Vol 1: Production. New York: Academic Press; 1980. p. 93-112. Oller, DK. The Emergence of the Speech Capacity. Mahwah, NJ: Lawrence Erlbaum Associates; 2000.\n\n\nContext: A list of references cited in a study examining early language development in infants with and without autism spectrum disorder, using video analysis of babbling and activity types.", "metadata": { "doc_id": "Patten_Audio_49", "source": "Patten_Audio" } }, { "page_content": "Text: Oller DK, Eilers RE. Similarities of babbling in Spanish- and English-learning babies. Journal of Child Language. 1982; 9:565-578. [PubMed: 7174757] Oller DK, Eilers RE. The role of audition in infant babbling. Child Development. 1988; 59:441-449. [PubMed: 3359864] Oller DK, Eilers RE, Steffens ML, Lynch MP, Urbano R. Speech-like vocalizations in infancy: an evaluation of potential risk factors. Journal of Child Language. 1994; 21:33-58. [PubMed: 8006094] Oller DK, Eilers RE, Urbano R, Cobo-Lewis AB. Development of precursors to speech in infants exposed to two languages. Journal of Child Language. 1997; 27:407-425. [PubMed: 9308425] Oller DK, Eilers RE, Basinger D. Intuitive identification of infant vocal sounds by parents. Developmental Science. 2001; 4:49-60. Oller DK, Eilers RE, Basinger D, Steffens ML, Urbano R. Extreme poverty and the development of precursors to the speech capacity. First Lang. 1995; 15:167-188. Oller DK, Eilers RE, Neal AR, Cobo-Lewis AB. Late onset canonical babbling: a possible early marker of abnormal development. American Journal on Mental Retardation. 1998; 103:249-265. [PubMed: 9833656] Oller DK, Eilers RE, Neal AR, Schwartz HK. Precursors to speech in infancy: the prediction of speech and language disorders. Journal of Communication Disorders. 1999; 32:223-246. [PubMed: 10466095] Oller DK, Niyogi P, Gray S, Richards JA, Gilkerson J, Xu D, Warren SF. Automated Vocal Analysis of Naturalistic Recordings from Children with Autism, Language Delay and Typical Development. Proceedings of the National Academy of Sciences. 2010; 107:13354-13359. Osterling JA, Dawson G, Munson JA. Early recognition of 1-year-old infants with autism spectrum disorder versus mental retardation. Development and Psychopathology. 2002; 14(2):239-251. [PubMed: 12030690] Ozonoff S, Iosif A, Baguio F, Cook IC, Hill MM, Hutman T, Rogers SJ, Rozga A, Sangha S, Sigman M, Steinfeld MB, Young GS. A prospective study of the emergence of early behavioral signs of autism. Journal of the American Academy of Child \\&\n\n\nContext: A review of prior research on infant vocalization development and speech disorders.", "metadata": { "doc_id": "Patten_Audio_50", "source": "Patten_Audio" } }, { "page_content": "Text: MM, Hutman T, Rogers SJ, Rozga A, Sangha S, Sigman M, Steinfeld MB, Young GS. A prospective study of the emergence of early behavioral signs of autism. Journal of the American Academy of Child \\& Adolescent Psychiatry. 2010; 49(3):256266. [PubMed: 20410715]\n\n\nContext: A study examining the emergence of early behavioral signs of autism.", "metadata": { "doc_id": "Patten_Audio_51", "source": "Patten_Audio" } }, { "page_content": "Text: Page J, Boucher J. Motor impairments in children with autistic disorder. Child Language Teaching and Therapy. 1998; 14(3):233. Papoušek, M. Vom ersten Schrei zum ersten Wort: Anfänge der Sprachentwickelung in der vorsprachlichen Kommunikation. Bern: Verlag Hans Huber; 1994. Paul R, Augustyn A, Klin A, Volkmar FR. Perception and production of prosody by speakers with ASD spectrum disorders. Journal of Autsim and Developmental Disorders. 2005; 35:205-220. Paul R, Fuerst Y, Ramsay G, Chawarska K, Klin A. Out of the mouths of babes: Vocal production in infant siblings of children with ASD. Journal of Child Psychology and Psychiatry. 2011; 52(5): 588-598. [PubMed: 21039489]\n\nPeppe S, McCann J, Gibbon F, O’Hara A, Rutherford M. Receptive and expressive prosodic ability in children with high-functioning ASD. Journal of Speech, Language, and Hearing Research. 2007; 50:1015-1028. Ramsdell HL, Oller DK, Buder EH, Ethington CA, Chorna L. Identification of prelinguistic phonological categories. Journal of Speech Language and Hearing Research. 2012; 55:1626-1629. Robins DI, Fein D, Barton MI, Green JA. The modified checklist for autism in toddlers: An initial study investigating the early detection of autism and pervasive developmental disorders. Journal of Autism and Developmental Disorders. 2001; 31:131-144. [PubMed: 11450812] Rogers SJ, Ozonoff S. Annotation: What do we know about sensory dyfunction in autism? A critical review of the empirical evidence. Journal of Child Psychology and Psychiatry. 2005; 46(12):12551268. [PubMed: 16313426]\n\n\nContext: A review of literature on autism spectrum disorder, focusing on early language development, sensory function, and related research.", "metadata": { "doc_id": "Patten_Audio_52", "source": "Patten_Audio" } }, { "page_content": "Text: Ross GS. Language functioning and speech development of six children receiving tracheostomy in infancy. Journal of Communication Disorders. 1983; 15:95-111. [PubMed: 7096617] Rutter, M.; Le Couteur, A.; Lord, C. Autism Diagnostic Interview-Revised. Los Angeles, CA: Western Psychological Services; 2003. Schopler, E.; Reichler, RJ.; Rochen Renner, B. The Childhood Autism Rating Scale. Lost Angeles, CA: Western Psychological Services; 1992. Sheinkopf SJ, Mundy P, Oller DK, Steffens M. Vocal atypicalities of preverbal autistic children. Journal of Autism and Developmental Disorders. 2000; 30:345-353. [PubMed: 11039860] Simon BM, Fowler SM, Handler SD. Communication development in young children with long-term tracheostomies: Preliminary report. International Journal of Otorhinolaryngology. 1983; 6:37-50. Snow, CE. Issues in the study of input: fine-tuning universality, individual and devlopmental differences and necessary causes. In: MacWhinney, B.; Fletcher, P., editors. NETwerken: Bijdragen van het vijfde NET symposium: Antwerp Papers in Linguistics. Vol. 74. Antwerp: University of Antwerp; 1995. p. 5-17. Sparrow, SS.; Balla, DA.; Cicchetti, DV. Vineland Adaptive Behavior Scales. Circle Pines, MN: American Guidance Service; 1984. Stark, RE. Stages of speech development in the first year of life. In: Yeni-Komshian, G.; Kavanagh, J.; Ferguson, C., editors. Child Phonology. Vol. 1. New York: Academic Press; 1980. p. 73-90. Stark, RE.; Ansel, BM.; Bond, J. Are prelinguistic abilities predictive of learning disability? A followup study. In: Masland, RL.; Masland, M., editors. Preschool Prevention of Reading Failure. Parkton, MD: York Press; 1988. Stoel-Gammon C. Prespeech and early speech development of two late talkers. First Language. 1989; 9:207-224. Stoel-Gammon C, Otomo K. Babbling development of hearing impaired and normally hearing subjects. Journal of Speech and Hearing Disorders. 1986; 51:33-41. [PubMed: 3945058] Stoel-Gammon C. Relationships between lexical and phonological development in young children.\n\n\nContext: A list of references cited in a study examining language development in autistic children.", "metadata": { "doc_id": "Patten_Audio_53", "source": "Patten_Audio" } }, { "page_content": "Text: and normally hearing subjects. Journal of Speech and Hearing Disorders. 1986; 51:33-41. [PubMed: 3945058] Stoel-Gammon C. Relationships between lexical and phonological development in young children. Journal of Child Language. 2011; 38(1):1-34. [PubMed: 20950495] Teitelbaum P, Teitelbaum O, Nye J, Fryman J, Maurer RG. Movement analysis in infancy may be useful for early diagnosis of autism. Proceedings of the National Academy of Sciences of the United States of America. 1998; 95(23):13982-13987. [PubMed: 9811912] Tronick, EZ. Social interchange in infancy. Baltimore: University Park Press; 1982. Van den Dikkenberg-Pot I, Koopmans-van Beinum F, Clement C. Influence of lack of auditory speech perception of sound productions of deaf infants. Proceedings of the Institute of Phonetic Sciences, University of Amsterdam. 1998; 22:47-60. Vihman, MM. Phonological Development: The Origins of Language in the Child. Cambridge, MA: Blackwell Publishers; 1996. Volkmar FR, Chawarska K. Autism in infants: An update. World Psychiatry: Official Journal of the World Psychiatric Association. 2008; 7(1):19-21. Watson LR, Patten E, Baranek GT, Boyd BA, Freuler A, Lorenzi J. Differential associations between sensory response patterns and language, social, and communication measures in children with autism or other developmental disabilities. Journal of Speech, Language, and Hearing Research. 2011; 54(6):1562-1576.\n\n\nContext: A research paper investigating early language development in autistic and typically developing children, using observational data and acoustic analysis of babbling and vocalizations.", "metadata": { "doc_id": "Patten_Audio_54", "source": "Patten_Audio" } }, { "page_content": "Text: Watson LR, Crais ER, Baranek GT, Dykstra JR, Wilson KP. Communicative Gesture Use in Infants with and without Autism: A Retrospective Home Video Study. American Journal of SpeechLanguage Pathology. 2013; 22:25-39. [PubMed: 22846878] Warlaumont, AS.; Oller, DK.; Dale, R.; Richards, JA.; Gilkerson, J.; Xu, D. Vocal Interaction Dynamics of Children With and Without Autism. Paper presented at the Proceedings of the 32nd Annual Conference of the Cognitive Science Society; Austin, TX. 2010. Warren SF, Gilkerson J, Richards JA, Oller DK. What Automated Vocal Analysis Reveals About the Language Learning Environment of Young Children with Autism. Journal of Autism and Developmental Disorders. 2010; 40:555-569. [PubMed: 19936907] Weismer SE, Lord C, Esler A. Early language patterns of toddlers on the autism spectrum compared to toddlers with developmental delay. Journal of Autism and Developmental Disorders. 2010; 40(10):1259-1273. [PubMed: 20195735] Werner E, Dawson G, Osterling J, Dinno N. Brief report: Recognition of autism spectrum disorder before one year of age: A retrospective study based on home videotapes. Journal of Autism and Developmental Disorders. 2000; 30:157-162. [PubMed: 10832780] Wetherby AM, Woods J, Allen L, Cleary J, Dickinson H, Lord C. Early indicators of ASD spectrum disorders in the second year of life. Journal of ASD and Developmental Disorders. 2004; 34:473493.\n\nYale ME, Messinger DS, Cobo-Lewis AB, Oller DK, Eilers RE. An event-based analysis of the coordination of early infant vocalizations and facial actions. Developmental Psychology. 1999; 35(2):505-513. [PubMed: 10082021] Zwaigenbaum L, Bryson S, Rogers T, Roberts W, Brian J, Szatmari P. Behavioral manifestations of autism in the first year of life. International Journal of Developmental Neuroscience. 2005; 23(2-3):143-152. [PubMed: 15749241]\n\nimg-0.jpeg\n\nFigure 1. Canonical babbling ratios by participant at $9-12$ months\n\nimg-1.jpeg\n\nFigure 2. Canonical babbling ratios by participant at $15-18$ months\n\nimg-2.jpeg\n\n\nContext: A research paper investigating communicative gesture use in infants with and without autism, utilizing retrospective home video analysis.", "metadata": { "doc_id": "Patten_Audio_55", "source": "Patten_Audio" } }, { "page_content": "Text: img-0.jpeg\n\nFigure 1. Canonical babbling ratios by participant at $9-12$ months\n\nimg-1.jpeg\n\nFigure 2. Canonical babbling ratios by participant at $15-18$ months\n\nimg-2.jpeg\n\nFigure 3. Syllable volubility by participant at $9-12$ months.\n\nimg-3.jpeg\n\nFigure 4. Syllable volubility by participant at $15-18$ months.\n\nimg-4.jpeg\n\nFigure 5. Canonical babbling ratios by age and diagnosis\n\nimg-5.jpeg\n\nFigure 6. Volubility by age and diagnosis\n\nTable 1 Participant Demographics\n\nASD; n=23 TD; n=14 Age at 9-12months; mean (SD) 10.89 (1.39) $10.63(.53)$ Age at 15-18 months; mean (SD) 16.33 (.83) $16.28(.70)$ Sex 19 males, 4 females 11 males, 3 females Race 23 White, 1 Black 13 White, 1 Asian Maternal education ${ }^{1}$ 5.48 $5.8^{2}$ Childhood Autism Rating Scale; mean (SD) $34.17(1.52)$ $16.15(.39)^{3}$\n\n1 Maternal education: 1=6th grade or lower; $2=7$ th to 9 th grade; 3=partial high school; 4=high school graduate/GED; 5=associate of arts/associate of science or technical training or partial college training; 6=bachelor of arts/science; 7=master of arts/science or doctorate or other professional degree completed ${ }^{2}$ missing information for two participants 3 missing information for four participants\n\nTable 2 Content Variables for Videos, 9-12 months\n\nASD; mean (SD) TD; mean (SD) Number of people present $3.22(1.53)$ $3.28(1.24)$ Amount of physical restriction $1.58(.35)$ $1.51(.32)$ Amount of social intrusion $2.02(.38)$ $2.04(.32)$ Total number of different event types $5.32(1.05)$ $5.07(1.02)$\n\n${ }^{a}$ Rated by coders on a 1 to 3 scale\n\nTable 3 Content Variables for Videos, 15-18 months\n\nASD; mean (SD) TD; mean (SD) Number of people present $2.84(1.20)$ $2.82(1.24)$ Amount of physical restriction $1.37(.29)$ $1.28(.33)$ Amount of social intrusion ${ }^{a}$ $2.06(.40)$ $1.95(.34)$ Total number of different event types $5.34(1.17)$ $5.23(1.11)$\n\n${ }^{a}$ Rated by coders on a 1 to 3 scale\n\nTable 4 Percentage of each activity type, 9-12 month videos\n\n\nContext: This chunk presents figures and tables detailing participant demographics, video content variables, and babbling/volubility data for children with and without Autism Spectrum Disorder (ASD).", "metadata": { "doc_id": "Patten_Audio_56", "source": "Patten_Audio" } }, { "page_content": "Text: ${ }^{a}$ Rated by coders on a 1 to 3 scale\n\nTable 4 Percentage of each activity type, 9-12 month videos\n\nASD; n=23 TD; n=14 Mealtime $10 \\%$ $11 \\%$ Active $53.9 \\%$ $60.6 \\%$ Bathtime $4.5 \\%$ $5.5 \\%$ Other $2.5 \\%$ $4.1 \\%$ special activity $20.3 \\%$ $16.5 \\%$ passive activity $8.7 \\%$ $3.4 \\%$\n\nTable 5 Percentage of each activity type, 15-18 month videos\n\nASD; n=23 TD; n=14 Mealtime $7.5 \\%$ $2.6 \\%$ Active $64 \\%$ $72.8 \\%$ Bathtime $2.5 \\%$ $1.8 \\%$ Other $8.3 \\%$ $11.4 \\%$ Special activity $12.9 \\%$ $4.4 \\%$ Passive activity $4.6 \\%$ $16.6 \\%$\n\nTable 6 Intercorrelations Between Canonical Babbling Ratios and Volubility\n\n$\\mathbf{1}$ $\\mathbf{2}$ $\\mathbf{3}$ $\\mathbf{4}$ 1. Canonical Babbling 9-12 mos - $352^{*}$ $.528^{ }$ $.354^{*}$ 2. Volubility 9-12 mos - 0.21 0.14 3. Canonical Babbling 15-18 mos - $.510^{ }$ 4. Volubility 15-18 mos -\n\n$\\mathrm{p}<.05$ ** $\\mathrm{p}<.01$\n\n\nContext: A research study examining early language development in children with autism spectrum disorder, using video analysis of home environments and measures of babbling and vocalization.", "metadata": { "doc_id": "Patten_Audio_57", "source": "Patten_Audio" } }, { "page_content": "Text: Research and development of autism diagnosis information system based on deep convolution neural network and facial expression data\n\nWang Zhao and Long Lu School of Information Management, Wuhan University, Wuhan, China\n\nAbstract\n\nPurpose - Facial expression provides abundant information for social interaction, and the analysis and utilization of facial expression data are playing a huge driving role in all areas of society. Facial expression data can reflect people's mental state. In health care, the analysis and processing of facial expression data can promote the improvement of people's health. This paper introduces several important public facial expression databases and describes the process of facial expression recognition. The standard facial expression database FER2013 and CK+ were used as the main training samples. At the same time, the facial expression image data of 16 Chinese children were collected as supplementary samples. With the help of VGG19 and Resnet18 algorithm models of deep convolution neural network, this paper studies and develops an information system for the diagnosis of autism by facial expression data.\n\n\nContext: Introduction to the research and development of an autism diagnosis information system utilizing deep convolutional neural networks and facial expression data.", "metadata": { "doc_id": "zhao2020_0", "source": "zhao2020" } }, { "page_content": "Text: Design/methodology/approach - The facial expression data of the training samples are based on the standard expression database FER2013 and CK+. FER2013 and CK+ databases are a common facial expression data set, which is suitable for the research of facial expression recognition. On the basis of FER2013 and CK+ facial expression database, this paper uses the machine learning model support vector machine (SVM) and deep convolution neural network model CNN, VGG19 and Resnet18 to complete the facial expression recognition. Findings - In this study, ten normal children and ten autistic patients were recruited to test the accuracy of the information system and the diagnostic effect of autism. After testing, the accuracy rate of facial expression recognition is 81.4 percent. This information system can easily identify autistic children. The feasibility of recognizing autism through facial expression is verified. Research limitations/implications - The CK+ facial expression database contains some adult facial expression images. In order to improve the accuracy of facial expression recognition for children, more facial expression data of children will be collected as training samples. Therefore, the recognition rate of the information system will be further improved. Originality/value - This research uses facial expression data and the latest artificial intelligence technology, which is advanced in technology. The diagnostic accuracy of autism is higher than that of traditional systems, so this study is innovative. Research topics come from the actual needs of doctors, and the contents and methods of research have been discussed with doctors many times. The system can diagnose autism as early as possible, promote the early treatment and rehabilitation of patients, and then reduce the economic and mental burden of patients. Therefore, this information system has good social benefits and application value. Keywords Facial expression data, FER2013, CK+, Deep convolution neural network, VGG19, Resnet18, Autism, Diagnostic\n\n\nContext: Methods and results of an AI-based system for autism diagnosis using facial expression analysis.", "metadata": { "doc_id": "zhao2020_1", "source": "zhao2020" } }, { "page_content": "Text: Therefore, this information system has good social benefits and application value. Keywords Facial expression data, FER2013, CK+, Deep convolution neural network, VGG19, Resnet18, Autism, Diagnostic information system Paper type Research paper\n\n\nContext: Concluding remarks and keywords for a research paper on an autism diagnostic information system.", "metadata": { "doc_id": "zhao2020_2", "source": "zhao2020" } }, { "page_content": "Text: 1. Introduction\n\nFacial expression recognition is an important social cognitive skill. Emotions are expressed by facial expressions. Therefore, recognition and understanding of facial expressions is the\n\n[^0]\n\nimg-0.jpeg\n\nLibrary Hi Tech (c) Emerald Publishing Limited 0737-8831 DOI 10.1108/LHT-08-2019-0176\n\n[^0]: This research has been possible thanks to the support of projects: National Natural Science Foundation of China (No. 61772375) and Independent Research Project of School of Information Management Wuhan University (No: 413100032).\n\nbasis of communication and interpersonal relationships with others. Abnormal expression is a prominent manifestation of autism, and it is also one of the criteria for the diagnosis of autism. Doctors can diagnose autism by responding to abnormal facial expressions in children.\n\nAutism, also known as autism or autism disorders, is a representative disease of generalized developmental disorders. In recent years, the incidence of autism in children has become higher and higher, experiencing a transition from rare diseases to epidemics. At present, research on autism is still in its infancy at home and abroad, and research methods and tools are still developing.\n\nThe main symptoms of autism include impaired social and interpersonal communication, language retardation, repetitive behavior and sensory dysfunction. It is difficult for autistic patients to correctly recognize faces and explain facial emotions. They have different emotional expressions from ordinary people, and they cannot correctly perceive and understand some basic expressions such as anger (Yan, 2008).\n\n\nContext: The introduction to a research paper exploring facial expression recognition for autism diagnosis, outlining the importance of facial expressions in communication, the rising prevalence of autism, and key symptoms related to facial expression deficits.", "metadata": { "doc_id": "zhao2020_3", "source": "zhao2020" } }, { "page_content": "Text: At present, the diagnostic methods for autism spectrum disorders include: traditional standard DSM-IV-TR (Segal, 2010) and ICD-10 (Organization W H, 1992), various autism diagnostic assessment scales such as \"Childhood Autism Rating Scale (CARS)\", \"the autism child behavior scale (ABC)\" and autism behavior rating scale and questionnaire interviews (Wang and Lu, 2015). Most of these methods rely on doctors' direct observation of the patient's expression, speech and behavior based on their experience. Diagnostic results are easily disturbed by external factors such as hospital level, physician's subjective level, patient's education level, age and so on. There are relatively large subjective factors, resulting in a certain degree of missed diagnosis and misdiagnosis. It takes about $1-2 \\mathrm{~h}$ for each autistic patient to diagnose, so doctors have a lot of work to do. The best period of treatment for autistic patients is before the age of six. Early diagnosis is of great significance for the rehabilitation of autistic patients.\n\nThe purpose of our research and design is to train the model and make a facial expression recognition system based on the normal expression, so as to verify the abnormal expression. This system can test the facial expression of autistic children and judge the difference between autistic children and normal children.\n\n\nContext: Current autism diagnosis methods and the need for improved, objective approaches.", "metadata": { "doc_id": "zhao2020_4", "source": "zhao2020" } }, { "page_content": "Text: In this study, FER2013 and CK+ were used as the main facial expression training samples. At the same time, we collected the facial expression image data of 16 Chinese children as a supplementary sample of facial expression. With the help of VGG19 and Resnet18 algorithm models of deep convolution neural network, according to the hospital autism diagnosis scale and diagnosis process, this paper studies and designs an information system for the diagnosis of autism by facial expression data. After the actual test of recruiting testers, the recognition rate of the system is 81.4 percent. It can effectively distinguish whether the expression of children is normal or not. It provides a practical information system for the diagnosis of autism. This paper will continue to collect more children's facial expression data from different countries and regions as training samples to further improve the recognition rate of facial expressions.\n\nThe autism diagnosis information system designed in this study has the following important significance: (1) Autism can be diagnosed as early as possible by using this system. The best time to treat autism is before the age of six. The earlier the diagnosis of autism is made, the less the treatment cost and the higher the probability of recovery. Early diagnosis is of great value in alleviating the burden on families and society of autistic patients. The system can be published in the form of app or web pages and disseminated through the Internet. The system can be installed and used on different devices, such as computers, mobile phones, tablets, etc. It has good applicability. Through this\n\n\nContext: The study details the design and testing of an autism diagnosis information system utilizing facial expression data and deep learning algorithms.", "metadata": { "doc_id": "zhao2020_5", "source": "zhao2020" } }, { "page_content": "Text: system, autism can be diagnosed conveniently, and time can be saved for the early treatment of autism patients, especially those in underdeveloped areas. (2) It can make the diagnosis of autism more objective. The whole diagnosis process is completed by the system. Because artificial intelligence technology is used to recognize facial expressions without human intervention, the diagnosis results are objective and accurate. (3) Reduce the intensity of doctors' work. Before the system was used, it took an hour for doctors to diagnose an autistic patient. By using this system, doctors can save a lot of time and pay attention to the treatment of autism. (4) The facial expression database used in the training of this system contains different races in the world. Therefore, this system can not only diagnose children in different countries and regions but also diagnose suspected autism patients all over the world. (5) This research designs the system according to the actual business. The early design of the system adopts the suggestions of several doctors, so it is designed and manufactured according to the actual needs of doctors. Although there are some papers on autism diagnosis by facial expressions at home and abroad, there are still few autism diagnosis systems developed which can be used in practice. (6) This paper uses the latest in-depth learning technology to improve the accuracy of facial expression recognition. Previous traditional techniques and methods have low recognition rate of facial expressions. In recent years, with the development of artificial intelligence technology and the improvement of computing speed, the convolutional neural network has greatly improved the accuracy of facial expression recognition, which is the innovation of this research in technology.\n\n2. Facial expression database and its recognition technology\n\n2.1 Facial expression database\n\n\nContext: The advantages of the developed autism diagnosis system.", "metadata": { "doc_id": "zhao2020_6", "source": "zhao2020" } }, { "page_content": "Text: 2. Facial expression database and its recognition technology\n\n2.1 Facial expression database\n\nFacial expression is an important way for people to express their emotions. In the social process, facial expression is an important way to judge the attitude and inner feelings of the other party (Lanlan, 2018). Mehrabian (2008) found that in a conversation, the change of facial expression played the most important role. Of these, 55 percent are facial expressions, 38 percent are voice and only 7 percent are words (Mei and Hu, 2015). Compared with voice, expression can convey more abundant information. Recognition and understanding of facial expressions is very important for communicating with others (Shen et al., 2013). In 1972, Ekman demonstrated through empirical research that human beings have six basic facial expressions: happiness, sadness, anger, fear, disgust and surprise (Ekman, 1992). In subsequent studies, neutral expression has also been added to the basic expression, and it is generally believed that there are seven basic expressions in facial expression.\n\nWith the continuous development of computer software and hardware technology, people have a deeper understanding of facial expression recognition technology. In order to better study facial expression recognition technology, many international research institutions have established standard facial expression databases, the main facial expression databases are as follows: (1) JAFFE\n\nThe database stores facial expression data of Japanese women. It contains 213 facial images of ten Japanese women. There are seven types of facial expressions, namely neutral, happy,\n\nFigure 1. Facial expression recognition process sad, surprise, anger, disgust and fear. The resolution of each image is $256 \\times 256$ pixels. Everyone has seven kinds of pictures of facial expressions. (2) $\\mathrm{CK}+$\n\n\nContext: This chunk details existing facial expression databases used in research and technology development for facial expression recognition.", "metadata": { "doc_id": "zhao2020_7", "source": "zhao2020" } }, { "page_content": "Text: The expression database was collected under laboratory conditions. It includes African Americans, Asians and South Americans. The resolution of each image is $640 * 480$ pixels. It contains 593 expression sequences of 123 people, 69 percent of whom are female and 31 percent are male. Each sequence begins and ends with neutral expression, which includes the process from calm to strong expression. CK+ is a facial expression data set with many applications. The reliability of various facial expression evaluation experiments using this database is very high. It includes seven types of facial expressions: anger, contempt, disgust, fear, happy, sadness and surprise. (3) FER2013\n\nThere are 35,887 facial images in the library, and there are seven facial expression types: angry, disgust, fear, happy, sad, surprise and neutral. The resolution of each image is $48^{*} 48$ pixels. All the images are gray images. There are three sample sets: 28,709 images in the training set; 3,589 images in the validation set and 3,589 images in the test set. (4) MMI\n\nThe expression database can be divided into two parts: one is a dynamic data set composed of more than 2,900 video sequences. The other part is a static data set consisting of a large number of high resolution images. There are seven types of expression in the library. (5) AFEW\n\nAll the facial images in the database are edited from the movies and contain seven basic facial expressions. (6) SFEW\n\nThe expression library is a static frame image extracted from the AFEW data set, which contains seven basic expressions.\n\n2.2 Facial expression recognition process\n\n\nContext: Description of facial expression databases used in the study.", "metadata": { "doc_id": "zhao2020_8", "source": "zhao2020" } }, { "page_content": "Text: The expression library is a static frame image extracted from the AFEW data set, which contains seven basic expressions.\n\n2.2 Facial expression recognition process\n\nThe process of facial expression recognition includes two stages as shown in Figure 1: One is the training stage and the other is the recognition stage. The training and recognition stages can be divided into three parts: the pretreatment of facial expression images, the extraction of facial expression features and the classification of facial expressions. The training stage is to train the model in order to achieve the purpose that the model can be used. The recognition stage is to recognize and classify the expression of the test image (Du, 2018).\n\nThe two stages of expression recognition process include the following processes: First, face detection is carried out on the image in the expression database, including the location,\n\nalignment and clipping of the face area. This is the basis of the follow-up process. Only when the expression area is accurately obtained, the following series of work will be more accurate. After the face area is detected, the image needs to be preprocessed in order to eliminate the noise caused by the influence of acquisition equipment and environment and avoid the interference of feature extraction. Then it is the feature extraction step, which aims to extract the features that can represent the essence of expression from the preprocessed facial images. In this process, in order to avoid the high dimension of feature extraction and affect the efficiency of the algorithm, we need to reduce the dimension of extracted features in order to extract the most representative expression features. Finally, the extracted facial features are classified to determine which type of facial expression is.\n\n2.3 Facial expression recognition technology\n\n\nContext: Within the section detailing the facial expression recognition process and technology used for autism diagnosis.", "metadata": { "doc_id": "zhao2020_9", "source": "zhao2020" } }, { "page_content": "Text: 2.3 Facial expression recognition technology\n\nFacial expression recognition technology mainly includes traditional machine learning technology and deep learning technology. The two technologies have similarities and different characteristics. (1) Traditional machine learning technology\n\nFacial expression recognition algorithm based on traditional machine learning includes three steps: image preprocessing, facial expression feature extraction and feature classification.\n\nFirst, for the convenience of feature extraction, it is necessary to preprocess the image, which can effectively avoid the interference of various noises and leave the key information needed by the face. The pretreatment process includes image gray processing, face alignment, face size tailoring, data enhancement, brightness, pose normalization, etc. (Li and Deng, 2018).\n\nSecond, the traditional feature extraction methods include directional gradient histogram feature, Gabor filter feature, local directional pattern feature and enhanced local binary algorithm. Because these methods are artificial design, time-consuming and laborious, and have certain limitations and often have better effect in feature extraction in small sample image set, most of the current studies are based on deep learning feature extraction method.\n\nThere are many basic machine learning methods for expression classification, such as support vector machine (SVM), hidden Markov model (HMM) and k-nearest classification algorithm. (2) Deep learning technology\n\n\nContext: Following a discussion of the prevalence of autism and related challenges, this section details the facial expression recognition technology used for diagnosis.", "metadata": { "doc_id": "zhao2020_10", "source": "zhao2020" } }, { "page_content": "Text: Facial expression recognition algorithm based on deep learning also needs image preprocessing. The difference is that it often combines feature extraction and feature classification into an end-to-end model, which greatly simplifies the process of facial expression recognition. In addition to end-to-end learning, deep learning algorithm can be used to extract facial expression features, and then other independent classifiers can be used. For example, SVM or random forest algorithm is used to process the extracted features and classify them.\n\nIn this paper, we construct a facial expression recognition model based on deep learning technology, extract facial expression feature data of children and classify them into groups, so as to diagnose autism.\n\n2.4 Driving role of facial expression data\n\nResearch on facial expression recognition has been applied in a series of life scenarios. In children's education, advanced human-computer interaction, medical diagnosis and other aspects have played an important role (Cai, 2018).\n\nIn distance education or classroom teaching, teachers can better improve students' learning quality by observing students' emotional changes in the classroom and adjusting\n\nFacial expressions for autism diagnosis\n\nteaching plans in time. Advanced human-computer interaction can make human-computer interaction more harmonious. For example, intelligent robots can automatically respond to the facial expressions of their interlocutors. In medical diagnosis, facial expressions also play an important role in the prevention and diagnosis of diseases. For example, this article is to diagnose autism by analyzing children's facial expressions.\n\n3. Autism and facial expression diagnosis\n\n3.1 Autism and its development\n\nAutism is a neurodevelopmental disorder, which is collectively referred to as autism spectrum disorder (Duan et al., 2015).\n\n\nContext: Within a section discussing the application of facial expression recognition technology, specifically detailing the use of deep learning algorithms for autism diagnosis and highlighting the broader role of facial expression data in various fields.", "metadata": { "doc_id": "zhao2020_11", "source": "zhao2020" } }, { "page_content": "Text: 3. Autism and facial expression diagnosis\n\n3.1 Autism and its development\n\nAutism is a neurodevelopmental disorder, which is collectively referred to as autism spectrum disorder (Duan et al., 2015).\n\nSince Kanner, an American child psychiatrist, first reported autism in 1943, the incidence of autism has risen rapidly worldwide. In the 1980s, about $3-5$ out of every 10,000 people suffered from the disease, while in 2000, 6.7 out of every 1,000 children suffered from the disease (Vismara and Rogers, 2008). According to the National Center for Health Statistics, the probability of autism among children aged 3-14 in the United States reached 2.76 percent in 2016 (Zablotsky et al., 2017).\n\nThere is no statistical survey on autistic children in China. However, according to the data of the report on the development of China's autism education and rehabilitation industry II, the number of people with autism in China is estimated to exceed 10 million, of which 2 million are autistic children. At the same time, it is growing at the rate of nearly 200,000 annually (Beijing Wucai Deer Autism Research Institute, 2017).\n\n\nContext: Following an introduction to facial expression diagnosis, this section provides background information on autism, including its prevalence and rising incidence globally and specifically in China.", "metadata": { "doc_id": "zhao2020_12", "source": "zhao2020" } }, { "page_content": "Text: Autism brings serious financial burden to both society and family. Families with autistic children, on the one hand, spend a lot of time caring for their children, while working hours are reduced so that work income is reduced. On the other hand, the cost of family rehabilitation treatment for autistic children is huge, which increases the family's financial burden (Wu and Chen, 2018). According to the survey on the occupational and economic burden of preschool autistic children's families, 33 percent of parents of autistic children reported that their caregiving problems seriously affected their careers, and their annual income was significantly lower than that of ordinary families, with an average loss of income of 30,957 yuan per year. Meanwhile, the average annual cost of autistic children's families for children's education and training is significantly higher than that of ordinary families (Yang and Wang, 2014). The society and the government also need to invest a lot of money in the rehabilitation education of autistic children. At the same time, autism also brings high subjective load and depression to the families of patients, which has a negative impact on their quality of life (Singh et al., 2017; Wang et al., 2018). It can be seen that the incidence of autism in children is relatively serious, and the harm to society and family is enormous.\n\n3.2 Diagnosis of autism through facial expressions\n\n\nContext: The financial and societal impact of autism.", "metadata": { "doc_id": "zhao2020_13", "source": "zhao2020" } }, { "page_content": "Text: 3.2 Diagnosis of autism through facial expressions\n\n3.2.1 Facial expression recognition disorder. Autistic children have facial expression recognition obstacles, which are mainly manifested in their inability to recognize facial expressions (Liu et al., 2015). It is easy to distinguish autistic children from normal children by observing their facial expressions. Therefore, we combine facial expression recognition technology to extract facial expression response feature vectors and use artificial intelligence technology to distinguish normal group and autistic group based on these facial features. 3.2.2 The principle of diagnosing autism through facial expressions. A large number of studies have pointed out that autistic patients have deficiencies in facial expression recognition and understanding. This is the core source of impaired social function in autistic patients (Yang et al., 2017). Autistic children are more difficult to identify other people's emotional behavior, and it is difficult to make appropriate judgment and response\n\n(Shen et al., 2013). Overseas research on facial expression recognition ability of autistic patients has been carried out not only in children but also in adults. Most studies believe that the ability of facial expression recognition of autistic patients is low. Baron-Cohen et al. (1997) used standard facial expression maps to study the recognition of different emotional types in autistic adults. It was found that autistic adults had better recognition of some basic facial expressions, such as happiness, but relatively complex facial expressions such as surprise recognition were difficult to recognize.\n\nAt present, the main diagnostic criteria of autism are: IDC-10, DSM-IV, the autism child behavior scale (ABC), the children autism rating scale (CARS) and the Clancy behavior scale (CABS) (Wang, 2007).\n\n\nContext: Methods for diagnosing autism, specifically utilizing facial expression recognition technology.", "metadata": { "doc_id": "zhao2020_14", "source": "zhao2020" } }, { "page_content": "Text: After consulting a large number of literatures and investigating the actual situation of the hospital, now the hospital mainly uses CABS (filled by parents), ABC (filled by parents) and CARS (filled by doctors) to diagnose autism. After a detailed review of the test items of the three scales, these scales all contain the test items to judge autism through children's facial expressions. There were 14 items in the CABS scale, of which the seventh item was inexplicable laughter and the tenth item was not looking at each other's face. Avoiding eye contact was related to expression. There were 57 items in the ABC scale, of which the seventh item was non-communicative smile, the seventeenth item did not respond to other people's facial expressions, and the twenty-fourth item was active avoidance of eye contact with others. Fifteen items of the CARS scale, the third of which is emotional response, pleasure and unhappiness and interest, are expressed by changes in facial expression and posture. These scales basically include the items of autism detection by children's facial expressions, which show that the diagnosis of autism can be more accurate by facial expressions. With the progress of artificial intelligence technology, facial expression recognition technology can objectively and effectively reflect the mental health of children and can be used in early diagnosis of autism (Yanbin et al., 2018).\n\n\nContext: Current autism diagnostic methods and the potential for AI-assisted diagnosis.", "metadata": { "doc_id": "zhao2020_15", "source": "zhao2020" } }, { "page_content": "Text: We also communicated with doctors of Hubei Maternal and Child Health Hospital, Wuhan Children's Hospital and Guangzhou Women and Children's Medical Center many times, and actually checked the process of using the above autism diagnostic scale to diagnose children. The doctor observes the tester's reaction to determine whether the tester is autistic after requesting the tester to make the corresponding expression. Doctors point out that facial expression is an important part of autism diagnosis. In terms of system design, they put forward requirements and suggestions for the process of diagnosing autism through facial expression.\n\n4. Research and development of autism diagnosis information system\n\n4.1 Facial expression database selection\n\nThe expression databases in this study mainly come from two public expression databases CK+ and FER2013. In addition, 16 Chinese children's expression data were collected as supplementary samples. The two public expression databases are standard and international and have been widely used, including facial expression data of adults and children. Each sample in the database contains seven expressions: angry, disgust, fear, happy, sad, surprise and neutral. Because children's facial expressions are different from adults, in order to improve the recognition rate of children's facial expressions, we collected facial expression data of 16 children aged 5 to 8 in China. Seven expressions were collected from each child. We combine Chinese children's facial expression data and public expression database as our system's facial expression database. 4.1.1 FER2013 facial expression database. The reason for choosing FER2013 expression database is that it has more samples and is more mature than other expression databases. It has advantages in model training. At the same time, it has been used in many studies (see Plate 1).\n\nFacial expressions for autism diagnosis\n\n\nContext: The research details the development of an autism diagnosis information system, specifically focusing on the selection and composition of the facial expression database used.", "metadata": { "doc_id": "zhao2020_16", "source": "zhao2020" } }, { "page_content": "Text: Facial expressions for autism diagnosis\n\n4.1.2 CK+ Facial expression database. CK+ facial expression database was selected because it was collected in the laboratory, so its accuracy is relatively high (Lucey et al., 2010) (see Plate 2). 4.1.3 Facial expression data of Chinese children. At present, the mature facial expression databases at home and abroad are mainly based on adult male or female facial expression images. Therefore, it is urgent to establish a facial expression database for children.\n\nFacial images of children are quite different from those of adults. Children have rounder faces, larger eyes and less prominent bones. Because of these differences, children's facial features are less obvious and more difficult to recognize than adults. Because of the particularity of children, it is very difficult to collect children's facial images. In order to improve the recognition rate of children's facial expressions, we cooperated with Amy Education School in Zhengzhou. Sixteen healthy children as volunteers were recruited to collect facial expression data. Each of them collected seven kinds of expressions, totaling 112 pictures. These children are between 5 and 8 years old, including 8 boys and 8 girls. The acquisition environment is quiet and there is no external interference. High-definition cameras are used to collect facial expression images, which are processed professionally. Before collecting facial expression data, parents have been informed of the purpose of collecting facial expression data. After questioning with parents, all the children who participated in the collection of facial expression data had no history of autism.\n\n\nContext: Methods section describing the datasets used for facial expression analysis in autism diagnosis.", "metadata": { "doc_id": "zhao2020_17", "source": "zhao2020" } }, { "page_content": "Text: We loaded the expression data into the training sample library. The purpose of collecting Chinese children's facial expression data is to increase the number of Chinese children's facial expression samples in training samples and improve the recognition rate of the system for children's facial expression. The collection process and the collected children's facial expression data are shown in Plate 3.\n\n4.2 Network topology\n\nAccording to the network environment and equipment of the information service platform, the network topology can be divided into four levels. The network topology diagram is shown in Figure 2.\n\nThe first layer is the application layer, which consists of users, computers and various smart devices. Smart devices include smart tablet computer, smartphones and other electronic devices. Users access and use the information service platform through computers and various smart devices.\n\nPlate 1. FER2013 facial expression database\n\nPlate 2. CK+ Facial expression database\n\nimg-1.jpeg\n\nAngry Disgust Fear Happy Neutral Sad Surprise\n\nimg-2.jpeg\n\nThe second layer is the communication layer, mainly based on the internet network environment, providing access channels for users and systems.\n\nThe third layer is the application server layer, which is composed of firewall and application server and has an ontology display system for autism. The application server manages various business functions, handles various business requests submitted by users and can access the database server for various data exchange.\n\nThe fourth layer is the database server layer, which stores all kinds of data and knowledge resources of the information service platform.\n\n4.3 System architecture\n\nThe smart diagnosis system of autism adopts client/server architecture. The client includes different versions of programs suitable for computers and smartphones. The system architecture diagram is shown in Figure 3.\n\n\nContext: Describing the data collection process and network topology of the autism diagnosis system.", "metadata": { "doc_id": "zhao2020_18", "source": "zhao2020" } }, { "page_content": "Text: The client includes three main modules: user interaction, image acquisition and face detection. User interaction module is responsible for human-computer interaction. According to the requirements of the autism diagnostic scale, users who diagnose are required to make appropriate expressions and feedback by prompting pictures and voice guidance. Through the camera, the image acquisition module can dynamically capture facial expression images. At the right time, the system will collect facial expression images and transmit them to the face detection module. Face detection module recognizes the valid face features and compresses the image and transfers it to the server through the internet or mobile Internet.\n\nThe server includes six main modules: image processing, feature extraction, group classification, automatic diagnosis, training model and data management. The image processing module can receive the expression image transmitted by the client and then process the expression image and transfer it to other modules on the server side. The feature extraction module receives the facial expression images provided by the image processing module and extracts the facial expression features. The group classification module is\n\nimg-3.jpeg\n\nFacial expressions for autism diagnosis\n\nimg-4.jpeg\n\nFigure 2. Network topology diagram\n\nLHT\n\nFigure 3. System architecture\n\nimg-5.jpeg\n\nresponsible for group classification and correctly classifies the expression images into the most matching expressions among the seven kinds of expressions. The automatic diagnosis module gives the diagnosis of autism by comparing the facial expressions that the tester is required to imitate and the facial expressions that the tester actually makes. Model training module is the core module of the system, which is responsible for recognizing and processing the newly collected facial expression images. The data management module mainly manages facial expression data, including storing and reading facial expression images transmitted by the client.\n\n\nContext: System architecture and modules for autism diagnosis using facial expressions.", "metadata": { "doc_id": "zhao2020_19", "source": "zhao2020" } }, { "page_content": "Text: The system server stores facial expression feature files, which are formed by feature extraction of facial expression database. The expression feature file is HDF5 file format. The expression recognition system running on the server can read the expression feature file at any time. If new facial expression samples are collected, the model can be retrained and the facial expression feature files can be updated.\n\nThe client collects the tester's facial expression data by high-definition camera and transmits the facial expression data to the server by JSON file according to TCP communication protocol. The facial expression recognition system running on the server processes the collected facial expression data and then feeds the recognition results back to the tester through the network and stores the recognition results and facial expression data in the server database. Facial data and diagnostic system are stored on a server, and the recognition results and facial data are stored in the SQL Server database. The diagnostic system reads data from the database through SQL structured query language. The response time of the whole database operation and communication process should not exceed 5 s .\n\n4.4 System architecture\n\n4.4.1 VGG19 model. Researchers from the Oxford University and the Google Brain have jointly developed the convolutional neural network VGG. VGNet consists of 11, 13, 16 and 19 layers of neural networks [20].VGNet constructs 16-19 layers of neural networks by stacking small convolution cores of $3 \\times 3$ and maximum pooling layers of $2 \\times 2$ repeatedly. VGGNet\n\n\nContext: System design and implementation details.", "metadata": { "doc_id": "zhao2020_20", "source": "zhao2020" } }, { "page_content": "Text: has strong scalability and greatly reduces the error rate when extending. When migrating to other image data, it has good generalization ability and simple structure. 4.4.2 ResNet18 model. ResNet was proposed by Kaiming He and others of Microsoft Research institute. They have successfully trained 152 layers of neural networks by using ResNet unit. The structure of ResNet can accelerate the training of the neural network, and the accuracy of the model has been greatly improved. 4.4.3 Graphic of deep learning framework. The deep learning framework used in this paper for facial expression recognition is shown in Plate 4.\n\nThe whole process includes image input, image preprocessing, model building, model training, model testing and output of expression recognition results. There are two kinds of deep learning algorithms used in this paper: VGG19 and ResNet18. ResNet18 solves the problem of network performance degradation caused by the high depth of VGG19. By training the two models and synthesizing the two convolutional neural network models, the facial expression features of autistic children can be extracted accurately. 4.4.4 Image preprocessing. The purpose of image preprocessing is to achieve uniform normalization of the final input image. The process is shown in Figure 4.\n\n\nContext: The paper details the deep learning methods used for facial expression recognition in autism diagnosis, including model selection and image preprocessing.", "metadata": { "doc_id": "zhao2020_21", "source": "zhao2020" } }, { "page_content": "Text: Converting an image to a grayscale image can reduce the computational complexity of the latter pixel level and also reflect the overall and local distribution and characteristics of the image. Then, image transformation is used to enhance data by zooming, rotating, cutting and translating, and the image is located in the center of the window. The contrast and brightness of the image can be improved by histogram equalization to reduce the influence of illumination on expression feature learning. In order to make the image uniform, it is planned to transform the image size into the same size by normalizing the image size. Finally, the mask is used to remove the occlusion of non-face areas. 4.4.5 Model training. Before model training, we need to enhance the image data. We choose SGD random gradient descent algorithm as the optimization method. The batch size is still 128 by default, and the learning rate is set to 0.01 initially. In addition, the initialization of\n\nimg-6.jpeg\n\nFigure 4. Image preprocessing\n\nimg-7.jpeg\n\nPlate 5. Facial expression recognition results\n\n\nContext: Image preprocessing and model training for facial expression recognition in autism diagnosis.", "metadata": { "doc_id": "zhao2020_22", "source": "zhao2020" } }, { "page_content": "Text: img-6.jpeg\n\nFigure 4. Image preprocessing\n\nimg-7.jpeg\n\nPlate 5. Facial expression recognition results\n\nnetwork parameters is also very important. We have adopted a random initialization method to train the two network algorithms. The core code of Python is as follows: # Model training def train(epoch): if epoch > learning_rate_decay_start and learning_rate_decay_start >= 0; frac $=$ (epoch - learning_rate_decay_start) // learning_rate_decay_every decay_factor = learning_rate_decay_rate ** frac current_lr = opt.lr * decay_factor utils.set_lr(optimizer, current_lr) # set the decayed rate else: current_lr = opt.lr for batch_idx, (inputs, targets) in enumerate(trainloader): if use_cuda: inputs, targets = inputs.cuda(), targets.cuda() optimizer.zero_grad() utils.clip_gradient(optimizer, 0.1) optimizer,step() correct += predicted.eq(targets.data).cpu().sum() 4.4.6 Recognition results. Through the trained model, we use some children's facial expressions pictures and videos to test, and get the probability of various expressions and the final prediction results of the model. As shown in Plate 5, the histogram shows the probability of each type of facial expression, and the histogram of maximum probability is the final recognized facial expression. After testing, the recognition rate of children's facial expression reaches 81.4 percent, which can effectively distinguish whether children's facial expression is normal or not.\n\n5. System validation\n\n5.1 Testing environment\n\nIn this study, two kinds of mobile phones, personal computers and servers are selected as test environments. The hardware and software environments are shown in Table I.\n\n5.2 Diagnostic procedure and interface of diagnostic system\n\nThe diagnostic process is shown in Figure 5. First, the system randomly displays one of the seven kinds of facial expressions for the tester to imitate. The system will prompt the tester\n\n\nContext: Describing the facial expression recognition results and system validation process.", "metadata": { "doc_id": "zhao2020_23", "source": "zhao2020" } }, { "page_content": "Text: The diagnostic process is shown in Figure 5. First, the system randomly displays one of the seven kinds of facial expressions for the tester to imitate. The system will prompt the tester\n\nTesting equipment Hardware environment Software environment OPPO R17 mobile phone CPUSDM670 RAM:8GB Android IPhone 8 mobile phone CPU:A11 RAM:2GB iOS Personal computer CPU:Intel i7 RAM:16GB Windows 10 Server CPU:Intel W2133 RAM:16GB Windows Server 2019\n\nTable L Testing environment\n\nimg-8.jpeg\n\nFigure 5. Automatic diagnostic procedure\n\nLHT\n\nPlate 6.\n\n\nContext: Describing the automated diagnostic procedure for autism using facial expressions.", "metadata": { "doc_id": "zhao2020_24", "source": "zhao2020" } }, { "page_content": "Text: Table L Testing environment\n\nimg-8.jpeg\n\nFigure 5. Automatic diagnostic procedure\n\nLHT\n\nPlate 6.\n\n(a) The main interface of the autism smart diagnosis information system, including system introduction, knowledge introduction of autism and other functions. (b) The facial expression that the system prompts the tester to simulate after starting the diagnostic process. (c) The expression analysis after diagnosis. (d) The result given by the system after three diagnoses to imitate the facial expression by pictures and sounds. For example, the system displays happy cartoon smiling faces, plays happy children's songs and induces children to make happy expressions. The system displays the same expression example three times and collects the tester's expression data at the same time. Then the system compares the expression examples and the actual collected expression data and gives the diagnosis results. 5.2.1 Diagnostic procedure. The diagnostic process is shown in Figure 5. First, the system randomly displays one of the seven kinds of facial expressions for the tester to imitate. The system will prompt the tester to imitate the facial expression by pictures and sounds. For example, the system displays happy cartoon smiling faces, plays happy children's songs and induces children to make happy expressions. The system displays the same expression example three times and collects the tester's expression data at the same time. Then the system compares the expression examples and the actual collected expression data and gives the diagnosis results. 5.2.2 Interface of diagnostic system. The system diagnostic interface is designed according to the diagnostic process (see Plate 6).\n\n5.3 System testing\n\n5.3.1 Test sample. We recruited ten normal children and ten autistic children and divided them into normal children group and autistic children group for comparative verification. The accuracy of the system is verified by the actual test of the autism diagnosis information system.\n\n\nContext: System testing and interface description for an autism diagnosis information system.", "metadata": { "doc_id": "zhao2020_25", "source": "zhao2020" } }, { "page_content": "Text: The normal group of children was provided by Amy Education School in Zhengzhou, which cooperated with us. Ten healthy children as volunteers were recruited as the normal\n\nimg-9.jpeg\n\ngroup for testing. These children were between 5 and 8 years old, including 5 boys and 5 girls. Parents were informed of the purpose and content of the experiment before the experiment. Children who participated in the experiment had no history of autism after being asked by their parents.\n\nThe autistic children were provided by Guangzhou Children's Care Center, which cooperated with us. Ten volunteers of autistic children were recruited as the autistic children group for testing. These children were aged between 3 and 6 years old, including 5 boys and 5 girls. Parents were informed of the purpose and content of the experiment before the experiment. The selected children with autism were diagnosed by a professional physician. 5.3.2 Test environment and process. All the tests were conducted in quiet classrooms without noise and external factors. Through our autism diagnosis information system, each child was prompted by pictures and sounds to imitate seven kinds of facial expressions and prompted to make corresponding facial responses according to the facial expressions on the pictures. At this time, the camera will capture their facial expressions, and after system analysis, they will be saved in the form of pictures in the computer of the test system (see Plate 7). 5.3.3 Test result. We used the system to test the normal combination and autistic children respectively. Finally, we compared the recognition rate of the two groups.\n\nFrom Table II, the average recognition rate of each expression is angry 80 percent, disgust 70 percent, fear 80 percent, happy 100 percent, sad 80 percent, surprise 70 percent and neutral 90 percent.\n\nTest child 1 only had a disgusting expression recognition error, and other facial expression recognition was correct, then the average recognition rate of the seven facial\n\nimg-10.jpeg\n\n\nContext: Methodology: Participant recruitment and testing procedures for normal and autistic children.", "metadata": { "doc_id": "zhao2020_26", "source": "zhao2020" } }, { "page_content": "Text: Test child 1 only had a disgusting expression recognition error, and other facial expression recognition was correct, then the average recognition rate of the seven facial\n\nimg-10.jpeg\n\nPlate 7. (a) Test environment for normal children group. (b) Test environment for autistic children group\n\nNumber of test children\n\nCorrect identification $(\\mathrm{Y}=$ Yes, No $=\\mathrm{N})$ Facial expression 1 2 3 4 5 6 7 8 9 10 Average recognition rate \\% Angry Y Y Y N Y Y Y Y Y N 80 Disgust N Y Y Y N Y N Y Y Y 70 Fear Y Y Y N Y Y Y Y N Y 80 Happy Y Y Y Y Y Y Y Y Y Y 100 Sad Y Y N Y Y Y Y Y Y N 80 Surprise Y N Y N Y Y N Y Y Y 70 Neutral Y Y Y Y Y Y Y Y Y N 90\n\nTable II. Test results in normal children group\n\nexpressions of test child 1 was 85.7 percent. According to this method, the average recognition rates of the seven expressions from test child 1 to test child 10 were 85.7 percent, 85.7 percent, 85.7 percent, 57.1 percent, 85.7 percent, 100 percent, 71.4 percent, 100 percent, 85.7 percent and 57.1 percent, respectively. The average recognition rate is 81.4 percent.\n\nJudging by the 60 percent threshold, there are two test children's facial expression recognition rates at 57.1 percent. This shows that in real environment, the algorithm of the system is affected by the environment and light, and the accuracy will be affected to a certain extent. However, according to the accuracy of 81.4 percent, it can basically meet the preliminary diagnostic requirements of whether the expression is abnormal or not. In the future, more real samples will be added to further improve the accuracy of the system algorithm.\n\n\nContext: Results of facial expression recognition testing on autistic children, including specific examples and overall accuracy assessment.", "metadata": { "doc_id": "zhao2020_27", "source": "zhao2020" } }, { "page_content": "Text: The experimental results show that the errors mainly concentrate on the expressions of disgust and surprise. The main reasons are as follows: (1) Disgust and surprise have only minor local changes in the faces of the two kinds of expressions, and there is no significant distinguishing feature. (2) Some of the participants had little change in the two facial expressions, did not have the obvious features of the corresponding categories, approached neutral expressions and were easy to confuse. From Table III, the average recognition rate of each expression is angry 50 percent, disgust 10 percent, fear 30 percent, happy 60 percent, sad 60 percent, surprise 20 percent and neutral 10 percent.\n\nThe average recognition rates of the seven expressions from test child 1 to test child 10 were 28.5 percent, 28.5 percent, 28.5 percent, 42.9 percent, 42.9 percent, 57.1 percent, 28.5 percent, 42.9 percent, 28.5 percent and 14.3 percent, and the average recognition rate is 34.3 percent.\n\nTable III. Test results in autistic children group Number of test children Correct identification $(\\mathrm{Y}=\\mathrm{Yes}, \\mathrm{No}=\\mathrm{N})$ Facial expression 1 2 3 4 5 6 7 8 9 10 Angry Y N N N Y Y N Y Y N 50 Disgust N N N Y N N N N N N 10 Fear N N Y N Y Y N N N N 30 Happy N Y Y Y N Y N Y N Y 60 Sad N Y N Y Y Y N Y Y N 60 Surprise Y N N N N N Y N N N 20 Neutral N N N N N N Y N N N 10\n\nFacial expression Normal children group Autistic children group Angry $80 \\%$ $50 \\%$ Disgust $70 \\%$ $10 \\%$ Fear $80 \\%$ $30 \\%$ Happy $100 \\%$ $60 \\%$ Table IV. Sad $80 \\%$ $60 \\%$ Comparisons of two groups of children's facial expression recognition rate Surprise $70 \\%$ $20 \\%$ Neutral $90 \\%$ $10 \\%$ Average recognition rate \\% $81.4 \\%$ $34.3 \\%$\n\nThe experimental results show that the recognition rate of happiness and sadness is higher in the seven expressions. Testing children showed difficulty in identifying complex facial expressions such as neutrality and aversion (see Table IV).\n\n\nContext: Results and comparison of facial expression recognition rates between autistic and normal children.", "metadata": { "doc_id": "zhao2020_28", "source": "zhao2020" } }, { "page_content": "Text: The experimental results showed that the recognition rate of facial expressions in autistic children was significantly lower than that in normal children. All the autistic children who participated in the test had a facial recognition rate of less than 60 percent. Therefore, if the accuracy rate of facial expression diagnosis by the system was less than 60 percent, the tester would have a tendency to suffer from autism. The lower the recognition rate, the higher the tendency of autism.\n\n6. Conclusion\n\nIn the era of rapid development of information technology, the processing of a large number of health data has brought new opportunities and challenges to medical research. The incidence of autism is increasing, which has attracted more and more attention from all aspects of society. The use of information technology, especially artificial intelligence technology, to build an autism diagnosis system has become an urgent need for doctors and patients. In this paper, an autism diagnosis system based on deep convolution neural network and expression data is constructed. After testing, it can meet the design requirements of autism diagnosis. The public can download and use the system through the network to diagnose autism conveniently. In addition, we will expand the function of the system, increase the recognition of children's physical movement and realize the diagnosis of autism from multiple perspectives.\n\nBecause the average age of children using and collecting facial expression data is between 3 and 8 years old, the system can recognize children aged 3-6 years old. Therefore, through this system, autism can be diagnosed as soon as possible. The earlier the diagnosis and treatment of autism is, the better the rehabilitation effect. Therefore, it is of great significance for the treatment of autism.\n\n\nContext: Results of the autism diagnosis system's performance.", "metadata": { "doc_id": "zhao2020_29", "source": "zhao2020" } }, { "page_content": "Text: Because the training samples of the system adopt the international open facial expression database, which contains the facial expression data of children and adults in different countries and regions, the system can diagnose autism for children and adults in different countries and regions.\n\nOf course, the system also needs to be improved through practical use. Next, we will arrange for the system to be tested in a large number of cooperative hospitals. Next, there are two main tasks to be done. The first is to collect more data of normal and autistic children's facial expressions, improve the recognition effect of the system on children's facial expressions and establish a special database of children's facial expressions. The second is to improve the system function, according to the results of facial expression diagnosis of autistic children for a detailed classification, to distinguish between severe, moderate and mild autism patients, in order to facilitate the treatment of doctors.\n\nThis study hopes to be helpful to the diagnosis of autism in remote and underdeveloped areas, so as to promote the early diagnosis and treatment of autistic children and reduce the medical costs and burdens of autistic families and society. Therefore, this study has more important social significance and application value.\n\nReferences\n\nBaron-Cohen, S., Wheelwright, S., Jolliffe, T. (1997), \"Is there a \"language of the eyes\"? Evidence from normal adults, and adults with autism or asperger syndrome\", Visual Cognition, Vol. 4 No. 3, pp. 311-331. Beijing Wucai Deer Autism Research Institute (2017), Report on the Development of Autism Education and Rehabilitation Industry in China 2, Huaxia Publishing House, Beijing.\n\n\nContext: System capabilities and future development plans for autism diagnosis using facial expressions.", "metadata": { "doc_id": "zhao2020_30", "source": "zhao2020" } }, { "page_content": "Text: Cai, Y. (2018), \"Facial tracking and facial expression recognition based on in-depth learning\" Southeast University. Du, J. (2018), \"Research on face expression recognition based on Kernel relief\" Zhengzhou University. Duan, Y., Wu, X. and Jinfeng (2015), \"Research progress on etiology and treatment of autism\", Chinese Science: Life Science, Vol. 9, pp. 820-844. Ekman, P. (1992), \"An argument for basic emotions\", Cognition and Emotion, Vol. 6 Nos 3-4, pp. 169-200. Lanlan (2018), \"Research on facial expression recognition method based on multi-feature fusion\", Jilin University. Li, S. and Deng, W. (2018), \"Deep facial expression recognition: a survey\" arXiv preprint arXiv: 1804.08348 .\n\n\nContext: References cited in the article on facial expression analysis for autism diagnosis.", "metadata": { "doc_id": "zhao2020_31", "source": "zhao2020" } }, { "page_content": "Text: Liu, Y., Huo, W. and Hu, X. (2015), \"Summary of research on facial expression recognition of autistic children\", Modern Special Education, Vol. 8, pp. 35-39. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J. and Ambadar, Z. (2010), \"The extended cohn-kanade dataset (ckb): a complete dataset for action unit and emotion-specified expression\", 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, 2010, IEEE. Mehrabian, A. (2008), \"Communication without words\", Communication Theory, Vol. 6, pp. 193-200. Mei, J. and Hu, B. (2015), \"Research and implementation of real-time face expression recognition method\", Information and Technology, Vol. 44 No. 4, pp. 145-148. Organization W H (1992), The ICD-10 Classification of Mental and Behavioural Disorders: Clinical Descriptions and Diagnostic Guidelines, World Health Organization, Geneva. Segal, D.L. (2010), \"Diagnostic and statistical manual of mental disorders (DSM-IV-TR)\", The Corsini Encyclopedia of Psychology, Vol. 1 No. 16, pp. 1-3. Shen, X., He, Z. and Ding, X. (2013), \"Computer facial expression recognition training to improve the facial expression recognition ability of autistic children\", Sci-tech Horizon, Vol. 25, pp. 12-13. Singh, P., Ghosh, S. and Nandi, S. (2017), \"Subjective burden and depression in mothers of children with autism spectrum disorder in India: moderating effect of social support\", Journal of Autism and Developmental Disorders, Vol. 47 No. 10, pp. 3097-3111. Vismara, L.A. and Rogers, S.J. (2008), \"The early start denver model\", Journal of Early Intervention, Vol. 31 No. 1, pp. 91-108. Wang, H. (2007), \"Psychological and behavioral characteristics, diagnosis and evaluation of autistic children\", Chinese Journal of Rehabilitation Medicine, Vol. 22 No. 9, pp. 853-856. Wang, G. and Lu, M. (2015), \"Research on educational games for children with autism spectrum disorders\", Modern Special Education, Vol. 14, pp. 38-40. Wang, Y., Xiao, L. and Chen, R. (2018), \"Social impairment of children with autism spectrum\n\n\nContext: References related to facial expressions, autism, and diagnostic tools.", "metadata": { "doc_id": "zhao2020_32", "source": "zhao2020" } }, { "page_content": "Text: games for children with autism spectrum disorders\", Modern Special Education, Vol. 14, pp. 38-40. Wang, Y., Xiao, L. and Chen, R. (2018), \"Social impairment of children with autism spectrum disorder affects parental quality of life in different ways\", Psychiatry Research, Vol. 2018 No. 266, pp. 168-174. Wu, X. and Chen, S. (2018), \"Research progress on quality of life and its influencing factors of primary caregivers for autistic children\", General Nursing, Vol. 16 No. 18, pp. 2206-2208. Yan, S. (2008), \"Experimental study on facial expression processing of autistic children\", East China Normal University. Yanbin, H., Fuxing, W., Heping, X., Jing, A., Yuxin, W. and Huashan, L. (2018), \"Facial processing characteristics of autism spectrum disorders: meta-analysis of eye movement research\", Progress in Psychological Science, Vol. 1, pp. 26-41. Yang, Y. and Wang, M. (2014), \"Employment and financial burdens of families with preschool-aged children with autism\", Chinese Journal of Clinical Psychology, Vol. 22 No. 2, pp. 295-297, 361.\n\n\nContext: Listing of references cited in the article.", "metadata": { "doc_id": "zhao2020_33", "source": "zhao2020" } }, { "page_content": "Text: Yang, J., Xing, H., Shao, Z. and Yuan, J. (2017), \"Facial expression sensitivity deficits in patients with autism spectrum disorder: impact of task nature and implications for intervention\", Chinese Science: Life Science, Vol. 47 No. 4, pp. 443-452. Zablotsky, B., Black, L.I. and Blumberg, S.J. (2017), \"Estimated prevalence of children with diagnosed developmental disabilities in the United States, 2014-2016\", NCHS Data Brief, Vol. 291, pp. 1-8.\n\nCorresponding author\n\nWang Zhao can be contacted at: creativesoft@sohu.com\n\nFacial expressions for autism diagnosis\n\nFor instructions on how to order reprints of this article, please visit our website: www.emeraldgrouppublishing.com/licensing/reprints.htm Or contact us for further details: permissions@emeraldinsight.com\n\n\nContext: Concluding the list of references and providing contact information for reprint requests and the corresponding author.", "metadata": { "doc_id": "zhao2020_34", "source": "zhao2020" } }, { "page_content": "Text: img-0.jpeg\n\nJ Med Internet Res. 2019 Apr; 21(4): e13822. PMCID: PMC6505375 Published online 2019 Apr 24. doi: 10.2196/13822: 10.2196/13822 PMID: 31017583\n\nDetecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children: Development and Validation Study\n\nMonitoring Editor: Gunther Eysenbach Reviewed by Eric Linstead and Sharief Teraman Qandeel Tariq, MA, ${ }^{81,2}$ Scott Lanyon Fleming, BS, ${ }^{82}$ Jessey Nicole Schwartz, BA, ${ }^{1,2}$ Kaillyn Dunlap, BSc, MRES, ${ }^{1,2}$ Conor Corbin, BS, ${ }^{2}$ Peter Washington, MS, ${ }^{2}$ Haik Kalantarian, PhD, ${ }^{1,2}$ Naila Z Khan, PhD, MBBS, FCPS, ${ }^{3}$ Gary L Darmstadt, PhD, ${ }^{1,4}$ and Dennis Paul Wall, PhD ${ }^{21,2}$ ${ }^{1}$ Division of Systems Medicine, Department of Pediatrics, Stanford University, Palo Alto, CA, United States ${ }^{2}$ Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States ${ }^{3}$ Dhaka Shishu Children's Hospital, Dhaka, Bangladesh, ${ }^{4}$ Division of Neonatal and Developmental Medicine, Department of Pediatrics, Stanford University, Palo Alto, CA, United States Dennis Paul Wall, Division of Systems Medicine, Department of Pediatrics, Stanford University, 1265 Welch Road, Palo Alto, CA, 94305, United States, Phone: 1 6173946031, Email: dowall@stanford.edu. ${ }^{2}$ Corresponding author. ${ }^{8}$ Contributed equally. Corresponding Author: Dennis Paul Wall dowall@stanford.edu Received 2019 Feb 28; Revisions requested 2019 Mar 20; Revised 2019 Apr 3; Accepted 2019 Apr 5. Copyright ©Qandeel Tariq, Scott Lanyon Fleming, Jessey Nicole Schwartz, Kaillyn Dunlap, Conor Corbin, Peter Washington, Haik Kalantarian, Naila Z Khan, Gary L Darmstadt, Dennis Paul Wall. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 24.04.2019.\n\n\nContext: A development and validation study using machine learning models to detect developmental delay and autism in Bangladeshi children based on home videos.", "metadata": { "doc_id": "Tariq_2019_0", "source": "Tariq_2019" } }, { "page_content": "Text: This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.\n\nAbstract\n\nBackground\n\nAutism spectrum disorder (ASD) is currently diagnosed using qualitative methods that measure between 20-100 behaviors, can span multiple appointments with trained clinicians, and take several hours to complete. In our previous work, we demonstrated the efficacy of machine learning classifiers to accelerate the process by collecting home videos of US-based children, identifying a reduced subset of behavioral features that are scored by untrained raters using a machine learning classifier to determine children's \"risk scores\" for autism. We achieved an accuracy of $92 \\%$ ( $95 \\%$ CI $88 \\%-97 \\%$ ) on US videos using a classifier built on five features.\n\nObjective\n\nUsing videos of Bangladeshi children collected from Dhaka Shishu Children's Hospital, we aim to scale our pipeline to another culture and other developmental delays, including speech and language conditions.\n\nMethods\n\n\nContext: This chunk describes the licensing and copyright information for an open-access article published in the Journal of Medical Internet Research, following the abstract and background of a study on using machine learning to diagnose autism and other developmental delays.", "metadata": { "doc_id": "Tariq_2019_1", "source": "Tariq_2019" } }, { "page_content": "Text: Methods\n\nAlthough our previously published and validated pipeline and set of classifiers perform reasonably well on Bangladeshi videos ( $75 \\%$ accuracy, $95 \\%$ CI $71 \\%-78 \\%$ ), this work improves on that accuracy through the development and application of a powerful new technique for adaptive aggregation of crowdsourced labels. We enhance both the utility and performance of our model by building two classification layers: The first layer distinguishes between typical and atypical behavior, and the second layer distinguishes between ASD and non-ASD. In each of the layers, we use a unique rater weighting scheme to aggregate classification scores from different raters based on their expertise. We also determine Shapley values for the most important features in the classifier to understand how the classifiers' process aligns with clinical intuition.\n\nResults\n\nUsing these techniques, we achieved an accuracy (area under the curve [AUC]) of $76 \\%$ (SD 3\\%) and sensitivity of $76 \\%$ (SD 4\\%) for identifying atypical children from among developmentally delayed children, and an accuracy (AUC) of $85 \\%$ (SD 5\\%) and sensitivity of $76 \\%$ (SD 6\\%) for identifying children with ASD from those predicted to have other developmental delays.\n\nConclusions\n\nThese results show promise for using a mobile video-based and machine learning-directed approach for early and remote detection of autism in Bangladeshi children. This strategy could provide important resources for developmental health in developing countries with few clinical resources for diagnosis, helping children get access to care at an early age. Future research aimed at extending the application of this approach to identify a range of other conditions and determine the population-level burden of developmental disabilities and impairments will be of high value.\n\nKeywords: autism, autism spectrum disorder, machine learning, developmental delays, clinical resources, Bangladesh, Biomedical Data Science\n\nIntroduction\n\n\nContext: This section describes the methods and results of a study using machine learning to improve the accuracy of autism detection in Bangladeshi children using video analysis, and discusses the potential for remote diagnosis in resource-limited settings.", "metadata": { "doc_id": "Tariq_2019_2", "source": "Tariq_2019" } }, { "page_content": "Text: Keywords: autism, autism spectrum disorder, machine learning, developmental delays, clinical resources, Bangladesh, Biomedical Data Science\n\nIntroduction\n\nAutism spectrum disorder (ASD) is a heterogeneous developmental disorder that includes deficits in social communication, repetitive behaviors, and restrictive interests, all of which lead to significant social and occupational impairments throughout the lifespan. Autism is one of the fastest growing developmental disorders in the United States [1], affecting 1 in 59 children [2]. Although the global autism prevalence is largely unknown, the prevalence is estimated to be between $0.15 \\%$ and $0.8 \\%$ among children in developing countries such as Bangladesh, with a higher prevalence in urban centers (eg, 3\\% in Dhaka) [3]. These numbers only represent a fraction of the actual cases, as most cases in semiurban and rural areas go unnoticed due to a dearth of resources. The disparity between urban and rural prevalence may reflect poorly understood risk factors or clinical resources in high-income areas along with higher awareness among urban parents about developmental delays [4]. More accessible and wide-scale screening is needed to accurately estimate ASD prevalence in remote parts of Bangladesh and other countries.\n\n\nContext: A study investigating machine learning models for autism spectrum disorder (ASD) detection in Bangladesh, addressing the challenges of limited resources and prevalence estimation.", "metadata": { "doc_id": "Tariq_2019_3", "source": "Tariq_2019" } }, { "page_content": "Text: The current models for diagnosing autism in Bangladesh, as in the United States, are often administered by trained clinical professionals using standard assessments [5]. Empirically validated diagnostic tools like the Autism Diagnostic Observation Schedule (ADOS) [6] and Autism Diagnostic Interview (ADI-R) [7] are not always used in different countries, particularly in developing countries, as these tools are expensive, require trained clinicians to administer, and may be limited by available translations and cultural adaptations [4]. For countries with limited ASD resources like Bangladesh, obtaining a diagnosis, which is essential for receiving an intervention and improving outcomes, is difficult. There is a pressing need to further develop open-source tools that do not require extensive training and professional certification and have high cross-cultural validity for autism screening globally [4]. Previous work has shown the feasibility and efficacy of assessing developmental delay using rapid assessment tools delivered\n\nby professionals with limited clinical expertise in the home [5]. There is potential to extend the reach of assessment tools and decrease health care disparity, especially in developing and rural countries, by using machine learning and mobile technologies.\n\n\nContext: The challenges of autism diagnosis in Bangladesh due to limited resources and the need for accessible, culturally valid screening tools.", "metadata": { "doc_id": "Tariq_2019_4", "source": "Tariq_2019" } }, { "page_content": "Text: In our previous works, we have developed tools for rapid mobile detection of ASD in short home videos of US children by using supervised machine learning approaches to identify minimal sets of behaviors that align with clinical diagnoses of ASD [8-15]. Features extracted in our minimally viable classifiers are accurately labeled by nonexpert raters (ie, noncertified clinical practitioners) in a short period of time (eg, $<6$ minutes). These labeled features can then be fed into our machine learning classifiers to determine the child's autism risk. Tariq et al [14] used a dataset consisting of 162 videos ( 116 ASD, 46 neurotypical development [TD]) of US children to validate these classifiers. The top-performing classifier exhibited an accuracy of $92 \\%$ ( $95 \\%$ CI $88 \\%-97 \\%$ ).\n\nAdditionally, an independent validation set consisting of 66 videos ( 33 ASD, 33 TD) was labeled by a separate set of video raters in order to validate the results. The top-performing classifier maintained similar results, achieving an overall accuracy of $89 \\%$ ( $95 \\%$ CI $81 \\%-95 \\%$ ).\n\nThe current study aimed to show generalizability of video-based machine learning procedures for ASD detection that have established validity among US-based children [14] in Bangladesh. Specifically, our study aimed to determine the performance and accuracy of this same video machine learning procedures on videos of Bangladeshi children under the age of 4 years. This sample was drawn from a population diagnosed with ASD and another population with other speech and language conditions (SLCs), but not ASD. Additionally, we compared the features that are most important for accurate classification of children from Bangladesh and created several machine learning models that can be generalized to different cultures.\n\nMethods\n\nData Collection\n\n\nContext: The authors are describing prior work developing machine learning tools for ASD detection in US children and outlining the current study's goal to assess the generalizability of these tools to a Bangladeshi population.", "metadata": { "doc_id": "Tariq_2019_5", "source": "Tariq_2019" } }, { "page_content": "Text: Methods\n\nData Collection\n\nThe study received ethical clearance under Dr Naila Khan from the Bangladesh Institute of Child Health, Dhaka Shishu Children's Hospital (DSH) and the Stanford University Institutional Review Board. We aimed to recruit 150 children for this study: 50 with ASD, 50 with an SLC, and 50 with neurotypical development (TD). All participants were recruited after they provided consent (in Bengali language) for participation at the DSH, and their children were screened for the presence of ASD or SLC. Participants were enrolled if they were parents above 18 years of age, had a child between the ages of 18 months and 4 years, could attend an appointment at the DSH to complete the study procedures, and were willing to submit a brief video of their child to the study team. Enrolled families provided demographic information (see Table 1 in the Results section).\n\nBrief videos (2-5 minutes) were recorded during evaluation of the children who presented to the Child Development Center of the Bangladesh Institute of Child Health with neurodevelopmental concerns. We administered the Modified Checklist for Autism in Toddlers (Bangla version [16]) to all children to identify the presence of ASD, and all children underwent additional clinical evaluations by a developmental psychologist and a child health physician in order diagnose ASD, SLC, or TD, as described previously [5]. We also administered the ADOS for 28 of the 50 children identified with ASD; ADOS could not be completed in the remaining 22 children diagnosed with ASD because their families were unable to commit to the time required to complete the assessment, a common problem for families in lowresource areas [4].\n\n\nContext: Methods section describing data collection procedures for a study using videos to classify child development conditions in Bangladesh.", "metadata": { "doc_id": "Tariq_2019_6", "source": "Tariq_2019" } }, { "page_content": "Text: Acquired videos and supporting demographic measures were securely sent from DSH to Stanford University. Videos were assessed for quality by trained clinical researchers at Stanford University. Criteria included video, sound, and image quality in addition to video length and content (ie, ensuring that the video was long enough to answer necessary questions, that the child was present in the video, etc). Furthermore, videos were assessed to meet the following criteria: (1) it captured the child's face and hands, (2) it involved social interaction or attempts of social interaction, and (3) it involved an interaction between the child and a toy/object.\n\nNine non-Bengali speaking US-based raters with no clinical training used a secure, HIPAA (Health Insurance Portability and Accountability Act)-compliant online website to watch the videos and answer a set of 31 multiple-choice questions corresponding to the behavioral features of autism [14]. Each rater completed a 1-hour training session with a senior analyst before scoring the videos. Senior analysts conducted rater quality checks by comparing a subset of 10 video scores to \"gold standard\" scores. These \"gold standard\" scores were agreed upon by two clinical research coordinators who each had several years of experience with children with autism.\n\nSource Classifiers Trained on Clinical Data for Reduce-to-Practice Testing\n\n\nContext: Data acquisition and rater training procedures for assessing videos of children.", "metadata": { "doc_id": "Tariq_2019_7", "source": "Tariq_2019" } }, { "page_content": "Text: Source Classifiers Trained on Clinical Data for Reduce-to-Practice Testing\n\nWe assembled eight published machine learning classifiers to test their viability for use in the rapid mobile detection of autism through the use of short home videos of US children [14]. For all eight models, the source of training and validation data was item-level medical records of US children, which contained either the ADOS or ADI-R outcome data on all participants. The ADOS has several modules containing approximately 30 features that correspond to the developmental level of the individual under assessment. These features are assessed based on how a child interacts with a clinical practitioner administering the exam. The ADI-R is a parent-directed interview that includes $>90$ elements asked of the parent, with multiple choices for answers. Each model was trained on item-level outcomes from the administration of either the ADOS or ADI-R and optimized for accuracy, sparsity of features, and interpretability in previous publications [8-15]. All these classifiers have been validated with US home videos (total: $\\mathrm{n}=162$, ASD: $\\mathrm{n}=116$, non-ASD: $\\mathrm{n}=46$ ) [14]. The top three performing classifiers in this dataset were chosen for validation of the videos collected from DSH in Bangladesh to test the accuracies of these models across cultures.\n\nStacked Classifiers With Rater-Adaptive Weighting\n\n\nContext: The authors describe their methodology for testing machine learning classifiers trained on US clinical data for autism detection using videos of Bangladeshi children.", "metadata": { "doc_id": "Tariq_2019_8", "source": "Tariq_2019" } }, { "page_content": "Text: Stacked Classifiers With Rater-Adaptive Weighting\n\nIn an effort to improve the results on the Bangladeshi dataset after attempting to validate previously built classifiers on these data, we constructed new classifiers while controlling for potential noise resulting from inaccurate ratings and constructed separate layers for each step of the classification for a streamlined approach. Our dataset contained three classes-TD, ASD, and SLC—assigned by screening via clinical evaluation at the DSH [5]. By implementing a layered approach to classification-first distinguishing general developmental delays (including ASD and SLC) from TD and then distinguishing ASD from SLCs —we were able to broaden the detection capabilities to more generally classify the presence of other developmental delays in addition to ASD specifically.\n\nRater Weighting Given the raters' lack of formal clinical training, we hypothesized that some raters might be more adept at identifying certain risk factors in some videos than others. Regardless of whether these interrater differences in identification accuracy for certain subsets of behaviors arise naturally or by chance, we hypothesized that this heterogeneous rater performance could be leveraged to yield increased model performance. For example, if one rater is especially capable of labeling a child's level of eye contact and another rater does a poor job of rating eye contact but excels at rating language ability, then a model trained on each individual rater's labels alone might perform poorly; however, an ensemble that considers the outputs of both rater's models could perform substantially better. Achieving this improved performance is the focus of our proposed novel rater-adaptive weighting scheme.\n\n\nContext: The authors describe their efforts to improve autism detection in a Bangladeshi dataset by developing layered classifiers and a novel rater-adaptive weighting scheme to account for varying rater accuracy.", "metadata": { "doc_id": "Tariq_2019_9", "source": "Tariq_2019" } }, { "page_content": "Text: For each of the three raters in the dataset, we trained a Random Forest classifier to predict a child's class label (TD, SLC, or ASD) based on the rater's annotations of that child's behavior in a given video. The Random Forest classifier adapts to each rater's expertise and labeling patterns; a basic analysis revealed that each rater had a different feature set that they rated well. In addition to (and, in part, because of) interrater differences in the labeling ability, each rater's model had varying levels of accuracy. We wanted the ensemble to weigh the predictions from the most accurate rater models more heavily. Therefore, we first trained and calculated the accuracy of each rater's model relative to a majority vote baseline and then used that difference to up- or downweigh that rater's vote relative to the other raters' votes.\n\nSpecifically, we let $z_{j}$ represent the difference in accuracy of rater $j$ 's model relative to the majority vote baseline. Then, after calculating $z_{j}$ for each rater $j=1,2 \\ldots \\mathrm{~K}$, we pass these values into the softmax function to generate rater-specific weights:\n\nThis ensures that all the raters' weights collectively sum up to 1 , so that the ensemble prediction will be a linear combination of each rater's predictions. Using these weights, the final ensemble prediction for child $i$ is calculated by multiplying each rater model's predicted probability for the target class (eg, atypical development or ASD) by the corresponding rater-specific weight and adding the weighted raters' predicted probabilities together. More specific details can be found in Multimedia Appendix 1.\n\nStacking Classifiers to Distinguish Between Typical/Atypical Development and Autism Spectrum Disorder/Speech and Language Conditions\n\n\nContext: The authors describe their method for creating an ensemble model that combines predictions from three raters to improve diagnostic accuracy.", "metadata": { "doc_id": "Tariq_2019_10", "source": "Tariq_2019" } }, { "page_content": "Text: Stacking Classifiers to Distinguish Between Typical/Atypical Development and Autism Spectrum Disorder/Speech and Language Conditions\n\nIn order to reflect the differences in both the conceptualization and use cases of predicting (1) TD vs atypical development and (2) ASD from other developmental delays, we decided to create a stacked approach to classification. In the first layer, we built classifiers to distinguish between TD and atypical development (ASD/other SLCs). The cases classified as atypical from the first layer were then used as input for the second layer to distinguish between ASD and other SLCs.\n\nWe wanted to optimize the model for sensitivity in the first layer to ensure no atypical case was misclassified. In the second layer, we wanted to optimize for both sensitivity and specificity, so that children with ASD would be effectively distinguished from children with other development delays. After training these classifiers for each rater, we tested them on the held-out test set and aggregated rater scores using the rater weights calculated in the previous step. For each of these layers, we used a three-fold crossvalidation approach to select the training and test sets randomly in order to ensure that the accuracy reported is stable across different splits.\n\nFeature Importance\n\n\nContext: The authors describe their approach to classifying developmental conditions in children, detailing a stacked classification method to distinguish typical development from atypical conditions and then further differentiate autism spectrum disorder from other speech and language conditions.", "metadata": { "doc_id": "Tariq_2019_11", "source": "Tariq_2019" } }, { "page_content": "Text: Feature Importance\n\nTo determine the impact of each video's annotations on the classifier's predicted label for that video, we used a recently developed method for efficiently calculating approximate Shapley values [17]. Shapley values are traditionally used in coalitional game theory to determine how to optimally distribute gains earned from cooperative effort. The same idea can be extended to machine learning in order to rank features for nonlinear models such as Random Forests. In the machine learning adaptation of Shapley values, feature values \"cooperate\" to impact a machine learning model's output, which in this case is the predicted probability of a child's video being classified as TD, ASD, or SLC. For each video, Shapley values capture both the magnitude of importance for each feature value as well as the direction in which the feature value \"pushes\" the final predicted class probability. More precisely, if we let $\\Phi_{k}\\left(F_{j}{ }^{}, x^{(i)}\\right)$ be the impact (Shapley value) of the $k$ th feature for video $i$ with feature vector $\\mathrm{x}^{(i)}$ on the output of model $\\mathrm{F}_{\\mathrm{j}}{ }^{}$, then the Shapley value formulation guarantees that\n\n\nContext: The authors are describing their methodology for understanding feature importance in their machine learning model used to classify children's videos.", "metadata": { "doc_id": "Tariq_2019_12", "source": "Tariq_2019" } }, { "page_content": "Text: In other words, any video's final predicted class probability is the average predicted class probability of the dataset plus all the Shapley values associated with each element of that video's input vector. This property, called local accuracy, indicates that the feature importance can be easily measured and compared. Additionally, because each video, feature, and model triple is associated with a single scalar-valued feature importance, we can understand how each annotation for each child's video affected his/her predicted probability of TD/ASD/SLC at an individual level and estimate a feature's overall importance to the model by summing up the absolute values of that feature's Shapley values over all videos. The features with the highest sum of absolute Shapley values are considered the most important to the model. Finally, given the way in which we ensembled individual raters' models, we can extract Shapley values for the multirater ensemble by employing the same weights. Specifically, we can employ the following equation:\n\nTo test whether our classifier's decisions align with clinical intuition, we calculated Shapley values for the 159 videos for the second layer of the classifier when distinguishing ASD from non-ASD.\n\nComparing Bangladeshi and US Results\n\n\nContext: Explaining the model's decision-making process using Shapley values and comparing results between Bangladeshi and US datasets.", "metadata": { "doc_id": "Tariq_2019_13", "source": "Tariq_2019" } }, { "page_content": "Text: Comparing Bangladeshi and US Results\n\nIn order to determine the generalizability of one dataset's characteristics to the other, we trained logistic regression classifiers with elastic net regularization for the Bangladeshi data and US data to predict ASD from the non-autism class. We trained the model on the Bangladeshi data and tested the model on the US data and vice versa. For both classifiers, we randomly split the dataset into training and testing, reserving $20 \\%$ for the latter while using cross-validation on the training set to tune hyperparameters associated with elastic net regularization. Note that while traditional logistic regression seeks to find a set of model coefficients, $\\beta$, that minimizes the logarithmic loss (we will denote this loss as where represents the model's predictions when the model is parameterized by $\\beta$ ), logistic regression with elastic net regularization seeks to minimize the logarithmic loss plus a regularization term:\n\n\nContext: Evaluating model generalizability between Bangladeshi and US datasets.", "metadata": { "doc_id": "Tariq_2019_14", "source": "Tariq_2019" } }, { "page_content": "Text: Here, the first sum corresponds to an L2-loss, the second sum corresponds to an L1-loss, $\\rho$ is a hyperparameter governing the balance between the two losses, and $\\alpha$ is the second hyperparameter determining the overall strength of regularization. Incorporating this regularization into the logistic regression loss yields several benefits, including more parsimonious and interpretable models and better predictive performance, especially when two or more of the predictor variables are correlated [18]. We used cross-validation for model hyperparameter tuning by performing a grid search with different values of $\\alpha$ (varying penalty weights) and $\\rho$ (the mixing parameter determining how much weight to apply to L1 versus L2 penalties) [19,20-21]. Based on the resulting area under the curve (AUC) and accuracy from each combination, we selected the top-performing pair of hyperparameters. Using this pair, we trained the model using logistic regression and balanced class weights to adjust weights that were inversely proportional to class frequencies in the input data, which helps account for class imbalance. After determining the top-ranked features based on the trained model and the resulting coefficients, we validated the model on the reserved test set. The behavioral features that were selected most often during the hyperparameter tuning phase across different cross-folds were compared between US and Bangladeshi models to determine which features have a greater significance and whether they align between the two models.\n\nSoftware\n\nAnalyses were performed in Python 3.6.7; we used pandas 0.23 .4 to prepare the data for analysis [20]. The classification models described were trained and evaluated using the scikit-learn 0.20 .0 package [21]. Hyperparameters for each rater model were tuned using the hyperopt 0.1 .1 package [22]. Shapley value estimates were calculated using the shap 0.24 .0 package [23]. Plots were generated using matplotlib 3.0.1 [24].\n\nResults\n\nData Collection\n\n\nContext: The authors describe their methods for training and validating logistic regression models to classify developmental delays in children, including hyperparameter tuning and feature selection.", "metadata": { "doc_id": "Tariq_2019_15", "source": "Tariq_2019" } }, { "page_content": "Text: Results\n\nData Collection\n\nWe collected 159 videos in total: 55 videos were of children with ASD, 50 were of children with SLC, and 54 were of children with TD. The parent-submitted home videos were an average of 3 minutes 11 seconds long (SD 1 minute 57 seconds). Of the 159 videos submitted, all were manually inspected and found to be of good, scorable quality in terms of length, resolution, and content. Demographic data were missing for 9 subjects, who were excluded from analysis; all other data were complete. Video rating staff were able to rate all videos. Table 1 outlines the diagnosis and demographic breakdown for 150 of the 159 videos included in the dataset.\n\nResults of Source Classifiers Trained on Clinical Data for Reduce-to-Practice Testing\n\nWe first sought to distinguish AD from non-ASD cases. Our top performing classifiers from our previous analysis of the videos from 162 US children [14] were validated on the Bangladeshi dataset. We tested across different train-test splits and achieved a maximum AUC of 0.75 (SD 0.06; Figure 1). In order to improve classifier performance, we next shifted to the development of stacked classifiers.\n\nResults From Stacked Classifiers With Rater-Adaptive Weightings\n\nSince we used a three-fold cross-validation approach, we trained and tested the models for each of the raters across three different splits. The training set consisted of 114 randomly selected videos, and the average demographic information for the three splits for the training set was as follows: average age, 2 years 7 months (SD 7 months); proportion of males, $64 \\%$; proportion of children with TD, $34 \\%$; proportion of children with SLC, $31 \\%$; and proportion of children with ASD, $35 \\%$. The demographic information for the test set for layer 1 (distinguishing TD from ASD/SLC) and layer 2 (distinguishing ASD from SLC) can be found in Table 2.\n\n\nContext: A study using machine learning to analyze home videos to identify autism spectrum disorder and speech/language conditions in Bangladeshi children, validating and adapting models initially trained on US data.", "metadata": { "doc_id": "Tariq_2019_16", "source": "Tariq_2019" } }, { "page_content": "Text: Layer 1 of the stacked classifier, which sought to distinguish between children with TD from children with atypical development, achieved $76 \\%$ (SD 4\\%) sensitivity and $58 \\%$ (SD 3\\%) specificity with an AUC of $76 \\%$ (SD 3\\%) and an accuracy of $70 \\%$ (SD 2\\%; Figure 2 A). For layer 2, which distinguished ASD from other SLCs, the classifier performed with $76 \\%$ (SD 6\\%) sensitivity, $77 \\%$ (SD 24\\%) specificity with an AUC of $85 \\%$ (SD 5\\%) and accuracy of $76 \\%$ (SD 11\\%; Figure 2 B; Table 3).\n\nFeature Importance\n\nThe most important features in our rater-adaptive ensemble for predicting ASD, as measured by the Shapley value, align with clinical intuition. Figure 3 shows the distribution of Shapley values across all participants for two of the features that were among the most important (as measured by mean absolute Shapley value) to our ensemble model's predictions. For example, for the feature corresponding to the child's level of eye contact, the value \"rarely or never does this\" contributes strongly to a classification of ASD and \"exhibits clear, flexible gaze that is meshed with other communication\" contributes the most to a non-ASD classification. Another feature that aligns with clinical intuition measures the child's repetitive interests and stereotyped behaviors-the feature value \"behaviors observed the entire time,\" contributes strongly to the positive class (ASD), whereas \"not observed\" contributes strongly to the negative class (non-ASD; Figure 3).\n\nComparison of Bangladeshi and US Results\n\nFor the classifier trained on the Bangladeshi data, the performance on the held-out test set ( $20 \\%$ of Bangladeshi data) was $84.4 \\%$ and its performance when validated on US data was $72.5 \\%$ (Figure 4).\n\n\nContext: Performance evaluation of machine learning classifiers for autism spectrum disorder and speech/language condition identification in Bangladeshi children, comparing results with US data and analyzing feature importance.", "metadata": { "doc_id": "Tariq_2019_17", "source": "Tariq_2019" } }, { "page_content": "Text: We trained a similar classifier on our dataset of 162 US videos and validated it on the Bangladeshi data ( Figure 5). The classifier performed with a $94.2 \\%$ accuracy when tested on the held-out test set from US videos. The classifier's accuracy dropped significantly when validated on the Bangladeshi data, reaching around $54 \\%$.\n\nWhile performing hyperparameter tuning on these classifiers, we conducted further analysis to determine which of the behavioral features were selected most often for each cross-fold of US videos and Bangladeshi videos in order to draw a comparison. It is apparent from Figure 6 A and 6B that the features being selected are quite similar between the two datasets, with some minor differences. The features understands language, sensory seeking, calls attention to objects, and stereotyped interests and actions are highly ranked by models trained on either of the datasets. Responsiveness, developmental delay, social participation, and stereotyped speech are selected more often for US data and less so for Bangladeshi data. The opposite is true for eye contact.\n\nDiscussion\n\nPrincipal Results\n\nWe were able to demonstrate the potential to use video-based machine learning methods to detect developmental delay and autism in a collection of videos of Bangladeshi children at risk for autism. Despite language, cultural, and geographic barriers, this outcome shows promise for remote autism detection in a developing country. More testing and refinement will be needed, but, in general, there is potential for the method to be made virtual to run entirely on mobile devices and therefore potential to increase the capacity to detect and provide more immediate diagnostics to children in need of therapeutic interventions.\n\n\nContext: Evaluation of a machine learning classifier trained on US data and tested on Bangladeshi data, followed by feature selection analysis comparing the two datasets.", "metadata": { "doc_id": "Tariq_2019_18", "source": "Tariq_2019" } }, { "page_content": "Text: An important result of our work is that we were able to gather 159 videos from Bangladeshi parents collected via mobile phone through our collaboration with DSH. This suggests feasibility of expanding this study to a larger sample size across Bangladesh and other low-resource settings and the ability to rely on the use of mobile phones in developing countries like Bangladesh, where $95 \\%$ of the population are mobile phone subscribers [25]. Additionally, we found that clinically untrained, US-based, non-Bengali speaking raters were able to score videos of Bangladeshi children with limited training, suggesting that speaking the native language may not be necessary for scoring videos. This finding also demonstrates the validity and potential of this mobile tool to be deployed across cultures and languages.\n\n\nContext: Discussion of the study's feasibility, cultural applicability, and potential for wider deployment using mobile technology in Bangladesh.", "metadata": { "doc_id": "Tariq_2019_19", "source": "Tariq_2019" } }, { "page_content": "Text: A useful and novel contribution of our work was our method for ensembling predictions from models trained on and adapted to each individual rater. This method demonstrates several advantageous properties. First, because each classification model was trained to map an individual rater's annotation patterns to a predicted class label, these rater-adaptive models can capitalize on features reflecting a rater's strengths while ignoring features on which the rater shows weaker performance. Furthermore, the fact that raters' models are trained independently from one another means that, in a distributed setting where there is a large corpora of videos such that each rater annotates only a small subset of them, our method can make predictions on each video by applying and ensembling the models from each rater without any need for additional imputation. By weighting each rater's model according to its accuracy on a rater-specific heldout validation set, the overall ensemble can lean more heavily on those raters whose models consistently demonstrate the best classification performance. Finally, because the final ensemble's prediction is a linear combination of all of the rater's models and we are able to calculate Shapley values for every feature in each of these models, it follows that we can use the same weights from the ensemble of rater-specific predictions to generate ensemble-level Shapley values as well. Thus, if a child's video is distributed to several different raters and those raters' annotations are fed into the ensemble model, one can interpret how each of the child's behavioral annotations contributed to both the final ensemble classification label and each rater's predicted label individually.\n\n\nContext: The authors describe a novel ensemble method leveraging individual rater-adaptive models to improve autism detection from video, detailing its advantages in accuracy, distributed learning, and interpretability through Shapley values.", "metadata": { "doc_id": "Tariq_2019_20", "source": "Tariq_2019" } }, { "page_content": "Text: We found that while models trained on videos of US children and models trained on Bangladeshi children both relied on many of the same clinically relevant features (eg, sensory seeking, stereotyped interests, and actions), some features were more prominent in one model compared to the other. For example, models trained on US data tended to rely more heavily on social participation and stereotyped speech, while models trained on Bangladeshi data relied more on eye contact. These patterns make sense, as raters could rely on a mutual understanding of the language (English) to evaluate behaviors like stereotyped speech and social interaction in US videos and may not have needed to rely as heavily on physical cues like eye contact, whereas when US raters viewed Bangladeshi videos, nonlanguage-based cues became more important. Even without the ability to confidently evaluate all aspects of the child's behavior, the rater ensemble demonstrated that the set of behavioral features needed to make an accurate diagnosis of developmental delays, including ASD, may be narrower than previously thought. Nevertheless, the difficulty in assessing certain sociolinguistic patterns in the cross-cultural context may have been the cause of comparatively lower performance in the Bangladeshi dataset. We hypothesize that, when trained on annotations provided by raters who share a common linguistic and sociocultural background with the Bangladeshi children, our ensemble's performance will improve and become comparable to the models trained and evaluated on the US dataset.\n\nLimitations\n\n\nContext: Research paper discussing machine learning models for diagnosing developmental delays, including autism, using video analysis across US and Bangladeshi populations.", "metadata": { "doc_id": "Tariq_2019_21", "source": "Tariq_2019" } }, { "page_content": "Text: Limitations\n\nAlthough accuracy achieved using our source classifiers originally trained on US datasets was lower when applied to Bangladeshi videos, it still indicated a signal in the Bangladeshi dataset. The relatively low accuracy is most likely a result of three factors. First, these original classifiers were trained on clinical scoresheets, not on features obtained from live video data. Second, these scoresheets were obtained from formal clinical assessments of US children, and therefore they do not capture a culturally diverse set of behavioral nuances. Third, these classifiers were trained to distinguish between typically developing children and children with autism. However, this dataset consists of delays other than autism (eg, SLCs), which may be why these classifiers were unable to classify these cases with higher accuracy.\n\n\nContext: Discussion of the study's findings and potential reasons for lower accuracy when applying US-trained classifiers to Bangladeshi data.", "metadata": { "doc_id": "Tariq_2019_22", "source": "Tariq_2019" } }, { "page_content": "Text: Although the potential uses for a method of crowdsourced annotation and classification of developmental disorders like the one we established in this work are myriad, we wish to highlight a few uses. First, in areas where resources are scarce, and with a disorder like ASD, where early intervention is the key to successful treatment, our framework could be essential in performing cost-effective and reliable triage. Parents could send short home videos of their children to the cloud, at which point the video would be routed to several raters who perform feature tagging of the child's behavior. Based on the raters' previous annotation patterns and their associated models, the child would receive a predicted risk probability of developmental delay or ASD and a clinical team nearby could then be alerted, as appropriate. Since 2008, Dr Khan and her team have assisted the government to establish multidisciplinary Child Development Centers in tertiary hospitals across Bangladesh [26]. Fifteen such Child Development Centers are currently operational, whose chief mandate is to diagnose and provide appropriate management for a range of neurodevelopmental disorders including autism. However, in a country with population of 160 million, of whom an estimated $45 \\%$ are within the pediatric age groups, access to reliable services can be limited. Formalization of the approaches documented here could enable broader reach and coverage through remote care while allowing resource-strapped clinical teams to deploy their efforts where they are needed the most.\n\n\nContext: Discussion of potential applications of the developed framework for identifying developmental disorders, specifically focusing on resource-limited settings like Bangladesh and complementing existing Child Development Centers.", "metadata": { "doc_id": "Tariq_2019_23", "source": "Tariq_2019" } }, { "page_content": "Text: An exciting second consequence of a deployment like this would be the steady development of a large corpus of annotated videos. No such dataset exists to date; however, the potential impact of such a dataset could be substantial. Modern algorithms from machine vision and speech recognition like convolutional and recurrent neural networks could use these annotations to learn features from the raw video and audio that are important for detecting developmental disorders, including ASD. Once trained, these models would dramatically accelerate the speed for detection of disorders and ability to accelerate the delivery of useful interventions.\n\nAnother important effect of such a pipeline would be that, with location-tagged videos, we could develop more accurate epidemiological statistics on the prevalence and onset of developmental disorders like ASD worldwide. Better information like this may increase awareness, positively impact policy change, and advance progress for addressing unmet needs of the children with developmental delays. This can have important applications in the developing world by helping countries identify the proportion of the population affected by such delays or impairments and therefore inform policy and gather actionable insights for health sector responses.\n\nAcknowledgments\n\nWe are grateful for the generous support and participation of all 159 families who provided video and other phenotypic records. This work was supported by the Bill and Melinda Gates Foundation. It was also supported, in part, by funds to DW from the NIH (1R01EB025025-01 and 1R21HD091500-01), The Hartwell Foundation, the Coulter Foundation, Lucile Packard Foundation, and program grants from Stanford's Human Centered Artificial Intelligence Program, Precision Health and Integrated Diagnostics Center (PHIND), Beckman Center, Bio-X Center, Predictives and Diagnostics Accelerator (SPADA) Spectrum, and Wu Tsai Neurosciences Institute Neuroscience: Translate Program. We also acknowledge the generous support from Peter Sullivan.\n\n\nContext: Discussion of potential future impacts and benefits of the research, including dataset creation and improved epidemiological statistics.", "metadata": { "doc_id": "Tariq_2019_24", "source": "Tariq_2019" } }, { "page_content": "Text: Abbreviations\n\nADI-R Autism Diagnostic Interview-Revised ADOS Autism Diagnostic Observation Schedule ASD autism spectrum disorder AUC area under the curve DSH Dhaka Shishu Children's Hospital HIPAA Health Insurance Portability and Accountability Act MCHAT Modified Checklist for Autism in Toddlers SLC speech and language condition TD neurotypical development\n\nMultimedia Appendix 1\n\nFormulae used in creating stacked rater-weighted classifiers.\n\nFootnotes\n\nContributed by Authors' Contributions: QT: data curation, formal analysis, investigation, methodology, software, validation, visualization, writing (original draft, review, and editing). SLF: formal analysis, investigation, methodology, software, validation, visualization, writing (review and editing). JNS: data curation, investigation, methodology, project administration, resources, and writing (review and editing). KD: data curation, investigation, methodology, project administration, resources, writing (review and editing). CC: formal analysis, investigation, methodology, software, writing (review and editing). PW: data curation, formal analysis, resources, writing (review and editing). HK: data curation, formal analysis, resources, writing (review and editing). NZK: conceptualization, funding acquisition, investigation, data curation, methodology, writing (review and editing). GLD: conceptualization, funding acquisition, data curation, writing (review and editing). DPW: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing (original draft, review, and editing)\n\nConflicts of Interest: DPW is the founder of Cognoa.com. This company is developing digital health solutions for pediatric care. All other authors declare no competing interests.\n\nReferences\n\n\nContext: Results and analysis of a study using machine learning to diagnose autism spectrum disorder and speech and language conditions in Bangladeshi children, comparing performance with US data and exploring feature importance.", "metadata": { "doc_id": "Tariq_2019_25", "source": "Tariq_2019" } }, { "page_content": "Text: Conflicts of Interest: DPW is the founder of Cognoa.com. This company is developing digital health solutions for pediatric care. All other authors declare no competing interests.\n\nReferences\n\nCromer J. U.S. Army. 2018. Apr 05, [2019-04-17]. Autism: fastest-growing developmental disability https://www.army.mil/article/203386/autism_fastest_growing_developmental_disability_webcite.\n\nBaio J, Wiggins J, Christensen DL, Maenner MJ, Daniels J, Warren Z, Kurzius-Spencer M, Zahorodny W, Robinson Rosenberg C, White T, Durkin MS, Imm P, Nikolaou L, Yeargin-Allsopp M, Lee LC, Harrington R, Lopez M, Fitzgerald RT, Hewitt A, Pettygrove S, Constantino JN, Vehorn A, Shenouda J, Hall-Lande J, Van Naarden Braun K, Dowling NF. Prevalence of Autism Spectrum Disorder Among Children Aged 8 Years - Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014. MMWR Surveill Summ. 2018 Dec 27;67(6):1-23. doi: 10.15585/mmwr.ss6706a1. http://europepmc.org/abstract/MED/29701730. [PMCID: PMC5919599] [PubMed: 29701730] [CrossRef: $10.15585 / \\mathrm{mmwr} . \\mathrm{ss} 6706 \\mathrm{a} 1]$\n\nHossain M, Ahmed HU, Jalal Uddin MM, Chowdhury WA, Iqbal MS, Kabir RI, Chowdhury IA, Aftab A, Datta PG, Rabbani G, Hossain SW, Sarker M. Autism Spectrum disorders (ASD) in South Asia: a systematic review. BMC Psychiatry. 2017 Dec 01;17(1):281. doi: 10.1186/s12888-017-1440-x. https://bmcpsychiatry.biomedcentral.com/articles/10.1186/s12888-017-1440-x. [PMCID: PMC5563911] [PubMed: 28826398] [CrossRef: 10.1186/s12888-017-1440-x]\n\nDurkin M, Elsabbagh M, Barbaro J, Gladstone M, Happe F, Hoekstra RA, Lee LC, Rattazzi A, StapelWax J, Stone WL, Tager-Flusberg H, Thurm A, Tomlinson M, Shih A. Autism screening and diagnosis in low resource settings: Challenges and opportunities to enhance research and services worldwide. Autism Res. 2015 Oct;8(5):473-6. doi: 10.1002/aur.1575. http://europepmc.org/abstract/MED/26437907. [PMCID: PMC4901137] [PubMed: 26437907] [CrossRef: 10.1002/aur.1575]\n\n\nContext: Acknowledgements and references section of a research article on using machine learning to diagnose autism across different populations.", "metadata": { "doc_id": "Tariq_2019_26", "source": "Tariq_2019" } }, { "page_content": "Text: Khan N, Muslima H, Shilpi AB, Begum D, Parveen M, Akter N, Ferdous S, Nahar K, McConachie H, Darmstadt GL. Validation of rapid neurodevelopmental assessment for 2- to 5-year-old children in Bangladesh. Pediatrics. 2013 Feb;131(2):e486-94. doi: 10.1542/peds.2011-2421. [PubMed: 23359579] [CrossRef: 10.1542/peds.2011-2421]\n\nLord C, Rutter M, Goode S, Heemsbergen J, Jordan H, Mawhood L, Schopler E. Autism diagnostic observation schedule: a standardized observation of communicative and social behavior. J Autism Dev Disord. 1989 Jun;19(2):185-212. [PubMed: 2745388]\n\nLord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. $J$ Autism Dev Disord. 1994 Oct;24(5):659-85. [PubMed: 7814313]\n\nDuda M, Daniels J, Wall DP. Clinical Evaluation of a Novel and Mobile Autism Risk Assessment. J Autism Dev Disord. 2016 Dec;46(6):1953-1961. doi: 10.1007/s10803-016-2718-4. http://europepmc.org/abstract/MED/26873142. [PMCID: PMC4860199] [PubMed: 26873142] [CrossRef: 10.1007/s10803-016-2718-4]\n\nDuda M, Haber N, Daniels J, Wall DP. Crowdsourced validation of a machine-learning classification system for autism and ADHD. Transl Psychiatry. 2017 Dec 16;7(5):e1133. doi: 10.1038/tp.2017.86. http://europepmc.org/abstract/MED/28509905. [PMCID: PMC5534954] [PubMed: 28509905] [CrossRef: 10.1038/tp.2017.86]\n\nDuda M, Kosmicki JA, Wall DP. Testing the accuracy of an observation-based classifier for rapid detection of autism risk. Transl Psychiatry. 2014 Aug 12;4:e424. doi: 10.1038/tp.2014.65. http://europepmc.org/abstract/MED/25116834. [PMCID: PMC4150240] [PubMed: 25116834] [CrossRef: 10.1038/tp.2014.65]\n\nDuda M, Ma R, Haber N, Wall DP. Use of machine learning for behavioral distinction of autism and ADHD. Transl Psychiatry. 2016 Feb 09;6:e732. doi: 10.1038/tp.2015.221. http://europepmc.org/abstract/MED/26859815. [PMCID: PMC4872425] [PubMed: 26859815] [CrossRef: 10.1038/tp.2015.221]\n\n\nContext: A list of references cited in a study validating a machine learning approach for autism risk assessment across different datasets (Bangladeshi and US).", "metadata": { "doc_id": "Tariq_2019_27", "source": "Tariq_2019" } }, { "page_content": "Text: Kosmicki J, Sochat V, Duda M, Wall DP. Searching for a minimal set of behaviors for autism detection through feature selection-based machine learning. Transl Psychiatry. 2015 Feb 24;5:e514. doi: 10.1038/tp.2015.7. http://europepmc.org/abstract/MED/25710120. [PMCID: PMC4445756] [PubMed: 25710120] [CrossRef: 10.1038/tp.2015.7]\n\nLevy S, Duda M, Haber N, Wall DP. Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism. Mol Autism. 2017;8:65. doi: 10.1186/s13229-017-0180-6. https://molecularautism.biomedcentral.com/articles/10.1186/s13229-017-0180-6. [PMCID: PMC5735531] [PubMed: 29270283] [CrossRef: 10.1186/s13229-017-0180-6]\n\nTariq Q, Daniels J, Schwartz JN, Washington P, Kalantarian H, Wall DP. Mobile detection of autism through machine learning on home video: A development and prospective validation study. PLoS Med. 2018 Dec;15(11):e1002705. doi: 10.1371/journal.pmed. 1002705. http://dx.plos.org/10.1371/journal.pmed. 1002705. [PMCID: PMC6258501] [PubMed: 30481180] [CrossRef: 10.1371/journal.pmed. 1002705]\n\nWall D, Kosmicki J, Deluca TF, Harstad E, Fusaro VA. Use of machine learning to shorten observation-based screening and diagnosis of autism. Transl Psychiatry. 2012 Apr 10;2:e100. doi: 10.1038/tp.2012.10. http://europepmc.org/abstract/MED/22832900. [PMCID: PMC3337074] [PubMed: 22832900] [CrossRef: 10.1038/tp.2012.10]\n\nRobins DL, Dumont-mathieu TM. Early Screening for Autism Spectrum Disorders. Journal of Developmental \\& Behavioral Pediatrics. 2006;27(Supplement 2):S111-S119. doi: 10.1097/00004703-200604002-00009. [PubMed: 16685177] [CrossRef: 10.1097/00004703-200604002-00009]\n\nLundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems. 2017:4565-4774.\n\nZou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005;67(2):301-320.\n\n\nContext: A list of related publications cited in a study using machine learning to detect autism from home videos, focusing on feature selection and model interpretation.", "metadata": { "doc_id": "Tariq_2019_28", "source": "Tariq_2019" } }, { "page_content": "Text: Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005;67(2):301-320.\n\nHastie T, Tibshirani R, Friedman J. The elements of statistical learning: Data Mining, Inference, and Prediction. New York: Springer; 2001. pp. 101-179.\n\nMcKinney W. Pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing. 2011:14.\n\nPedregosa F. Journal of machine learning research. 2011. [2019-04-17]. Scikit-learn: Machine learning in Python https://dl.acm.org/citation.cfm?id=2078195\\&preflayout=tabs.\n\nBergstra J, Yamins D, Cox DD. Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. Proceedings of the 12th Python in science conference; 2013; Austin, Texas. 2013.\n\nLundberg S, Erion GG, Lee SI. Cornell University - arXiv.org. 2018. [2019-04-17]. Consistent individualized feature attribution for tree ensembles https://arxiv.org/abs/1802.03888 webcite.\n\nHunter J. Matplotlib: A 2D Graphics Environment. Comput Sci Eng. 2007 May;9(3):90-95. doi: 10.1109/MCSE.2007.55. [CrossRef: 10.1109/MCSE.2007.55]\n\nBTRC 2018. [2019-04-17]. Mobile Phone Subscribers in Bangladesh December, 2018 http://www.btrc.gov.bd/content/mobile-phone-subscribers-bangladesh-december-2018 webcite.\n\nKhan N, Sultana R, Ahmed F, Shilpi AB, Sultana N, Darmstadt GL. Scaling up child development centres in Bangladesh. Child Care Health Dev. 2018 Dec;44(1):19-30. doi: 10.1111/cch.12530. [PubMed: 29235172] [CrossRef: 10.1111/cch.12530]\n\nFigures and Tables\n\nDetecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children:...\n\nTable 1 Participant demographics collected from Dhaka Shishu Hospital, Bangladesh.\n\n\nContext: References cited in the study.", "metadata": { "doc_id": "Tariq_2019_29", "source": "Tariq_2019" } }, { "page_content": "Text: Table 1 Participant demographics collected from Dhaka Shishu Hospital, Bangladesh.\n\nDemographic Full cohort $(\\mathrm{N}=150)$ $\\mathrm{ASD}^{\\mathrm{a}}$ cohort $(\\mathrm{N}=2$ Age (years), mean (SD) $2.55(0.62)$ $2.51(0.70)$ Gender (male), n (\\%) 90 (60) 36 (72) Preterm (ie, <37 weeks), n (\\%) $11(0.7)$ $5(10)$ Family income in taka ${ }^{\\mathrm{d}}$, n (\\%) 1,000-10,000 $16(10.7)$ $0(0)$ $>10,000-30,000$ $33(22)$ $2(4)$ $>30,000$ $101(67.3)$ $48(96)$ Residence, n (\\%) Urban 139 (92.7) 50 (100) Semiurban $8(5.3)$ $0(0)$ Rural $3(2)$ $0(0)$ Religion, n (\\%) Muslim 141 (94) 44 (88) Hindu $6(4)$ $4(8)$ Christian $1(0.01)$ $1(2)$ Buddhist $2(0.01)$ $1(2)$ Stunted growth, n (\\%) Missing stunting information $60(40)$ $4(8)$ No stunting $49(32.7)$ $30(60)$ Stunting $41(27.3)$ $16(32)$ Clinical evaluations, mean (SD) MCHAT $^{\\mathrm{e}}$ total score $13.5(3.04)$ ADOS $^{\\text {f,g }}$ score Social affect $\\mathrm{N} / \\mathrm{A}^{\\mathrm{h}}$ $11.57(5.30)$ Restricted and repetitive behavior N/A $3.46(3.29)$ Composite N/A $5.14(2.08)$ SLC diagnosis Receptive language delay N/A N/A Expressive language delay N/A N/A Both receptive and expressive language delay N/A N/A\n\nOpen in a separate window ${ }^{a}$ ASD: autism spectrum disorder. ${ }^{\\mathrm{b}}$ TD: neurotypical development. ${ }^{\\mathrm{c}}$ SLC: speech and language condition. ${ }^{\\mathrm{d}} 1$ US $\\$=84$ taka. ${ }^{\\text {e }}$ MCHAT: Modified Checklist for Autism in Toddlers ${ }^{\\mathrm{f}}$ ADOS: Autism Diagnostic Observation Schedule.\n\n${ }^{\\mathrm{g}} \\mathrm{ADOS}$ was only performed on a subset of 28 children with ASD. ${ }^{\\mathrm{h}} \\mathrm{N} / \\mathrm{A}$ : not available.\n\nFigure 1\n\nResults from the top performing classifiers trained on US clinical score sheet data and tested on Bangladeshi data with an objective to distinguish between ASD and non-ASD. ROC: receiver operating characteristic; AUC: area under the curve; ASD: autism spectrum disorder.\n\nTable 2\n\n\nContext: Demographic characteristics of a cohort of children from Bangladesh used in a study evaluating machine learning models for autism detection.", "metadata": { "doc_id": "Tariq_2019_30", "source": "Tariq_2019" } }, { "page_content": "Text: Table 2\n\nAverage demographic information of the test set calculated by testing the model on 45 videos for both layers.\n\nDemographic Layer 1 (distinguishing $\\mathrm{TD}^{\\mathrm{a}}$ from $\\mathrm{ASD}^{\\mathrm{b}} / \\mathrm{SLC}^{\\mathrm{c}}$ ) Age (years), average (SD) 2 years 7 months (5 months) Proportion of males, mean \\% 62 Proportion of TD children, mean \\% 33 Proportion of children with ASD, mean \\% 33 Proportion of children with SLC, mean \\% 33 $\\cdot$ $\\star$\n\n${ }^{\\mathrm{a}} \\mathrm{TD}$ : neurotypical development. ${ }^{\\mathrm{b}}$ ASD: autism spectrum disorder. ${ }^{\\mathrm{c}}$ SLC: speech and language condition.\n\nFigure 2 (A) ROC curve for layer 1 (distinguishing between children with TD and children with ASD or SLC). (B) ROC curve for layer 2 (distinguishing between ASD and SLC). ASD: autism spectrum disorder; AUC: area under the curve; SLC: speech and language condition; TD: neurotypical development; ROC: receiver operating characteristic.\n\nTable 3\n\nResults from classifiers to distinguish among autism spectrum disorder, speech and language conditions, and neurotypical development. The results distinguish layer 1 (distinguishing neurotypical development from atypical conditions [autism spectrum disorder/speech and language conditions]) and layer 2 (distinguishing autism spectrum disorder from other delays [speech and language conditions]) from those classified as atypical in layer 1.\n\nClassifier Layer Sensitivity, \\% (SD) Specificity, \\% (SD) Unweighted average recall, \\% (SD) Layer $1^{\\text {a }}$ $76(\\mathrm{SD} 4)$ $58(\\mathrm{SD} 3)$ $67(\\mathrm{SD} 1)$ Layer $2^{\\text {b }}$ $76(\\mathrm{SD} 6)$ $77(\\mathrm{SD} 24)$ $77(\\mathrm{SD} 9)$\n\na Distinguishing neurotypical development from autism spectrum disorder/speech and language conditions. ${ }^{\\mathrm{b}}$ Distinguishing autism spectrum disorder from other developmental delays (speech and language conditions).\n\nFigure 3\n\n\nContext: Results of machine learning models applied to video data of Bangladeshi children, showing demographic information of the test set, performance metrics for distinguishing between neurotypical and atypical development, and results for distinguishing autism from other delays.", "metadata": { "doc_id": "Tariq_2019_31", "source": "Tariq_2019" } }, { "page_content": "Text: Figure 3\n\nShapley value distributions for two of the most important features in the rater-adaptive ensemble model. These features measure the child's stereotyped behaviors/repetitive interests and eye contact. They demonstrate that clinical intuition and the inner workings of our classifier align closely. ASD: autism spectrum disorder.\n\nFigure 4\n\nLogistic regression (Elastic Net penalty) classifier, trained on Bangladeshi data and tested on US data as well as a held-out test set of the Bangladeshi data. AUC: area under the curve.\n\nFigure 5\n\nLogistic regression (Elastic Net penalty) classifier, trained on US data and tested on Bangladeshi data as well as a held-out test set of the US data.\n\nFigure 6\n\nFeature selection analysis. Numbers within the cells indicate the frequency of selection. (A) Feature frequency comparison during cross-fold validation with alpha value 0.1 between Bangladeshi data and US data. (B) Feature frequency comparison during cross-fold validation with alpha value 0.01 between Bangladeshi data and US data.\n\nArticles from Journal of Medical Internet Research are provided here courtesy of Gunther Eysenbach\n\n\nContext: This section presents results from machine learning models used to classify children's development (ASD, SLC, TD) based on video analysis, including Shapley value analysis of key features and performance comparisons across datasets.", "metadata": { "doc_id": "Tariq_2019_32", "source": "Tariq_2019" } }, { "page_content": "Text: Digital Behavioral Phenotyping Detects Atypical Pattern of Facial Expression in Toddlers with Autism\n\nKimberly L. H. Carpenter $\\odot$, Jordan Hahemi, Kathleen Campbell, Steven J. Lippmann, Jeffrey P. Baker, Helen L. Egger, Steven Espinosa, Saritha Vermeer, Guillermo Sapiro, and Geraldine Dawson\n\nAbstract\n\n\nContext: A review of research on facial expressions in autism, culminating in a study using digital behavioral phenotyping to detect atypical facial expression patterns in toddlers.", "metadata": { "doc_id": "carpenter2020 (1)_0", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Abstract\n\nCommonly used screening tools for autism spectrum disorder (ASD) generally rely on subjective caregiver questionnaires. While behavioral observation is more objective, it is also expensive, time-consuming, and requires significant expertise to perform. As such, there remains a critical need to develop feasible, scalable, and reliable tools that can characterize ASD risk behaviors. This study assessed the utility of a tablet-based behavioral assessment for eliciting and detecting one type of risk behavior, namely, patterns of facial expression, in 104 toddlers (ASD $N=22$ ) and evaluated whether such patterns differentiated toddlers with and without ASD. The assessment consisted of the child sitting on his/her caregiver's lap and watching brief movies shown on a smart tablet while the embedded camera recorded the child's facial expressions. Computer vision analysis (CVA) automatically detected and tracked facial landmarks, which were used to estimate head position and facial expressions (Positive, Neutral, All Other). Using CVA, specific points throughout the movies were identified that reliably differentiate between children with and without ASD based on their patterns of facial movement and expressions (area under the curves for individual movies ranging from 0.62 to 0.73 ). During these instances, children with ASD more frequently displayed Neutral expressions compared to children without ASD, who had more All Other expressions. The frequency of All Other expressions was driven by non-ASD children more often displaying raised eyebrows and an open mouth, characteristic of engagement/interest. Preliminary results suggest computational coding of facial movements and expressions via a tablet-based assessment can detect differences in affective expression, one of the early, core features of ASD. Autism Res 2020, 00: 1-12. (C) 2020 International Society for Autism Research and Wiley Periodicals LLC\n\n\nContext: A study exploring the feasibility of using tablet-based computer vision analysis to detect ASD risk behaviors, specifically patterns of facial expression, in toddlers.", "metadata": { "doc_id": "carpenter2020 (1)_1", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Lay Summary: This study tested the use of a tablet in the behavioral assessment of young children with autism. Children watched a series of developmentally appropriate movies and their facial expressions were recorded using the camera embedded in the tablet. Results suggest that computational assessments of facial expressions may be useful in early detection of symptoms of autism.\n\nKeywords: autism; risk behaviors; facial expressions; computer vision; early detection\n\nIntroduction\n\nAutism spectrum disorder (ASD) can be reliably diagnosed as early as 24 months old and the risk signs can be detected as early as 6-12 months old [Dawson \\& Bernier, 2013; Luyster et al., 2009]. Despite this, the average age of diagnosis in the United States remains around 4 years of age [Christensen et al., 2016]. While there is mixed evidence for the stability of autism traits over early childhood [Bieleninik et al., 2017; Waizbard-Bartov et al., 2020], the delay in diagnosis can still impact timely intervention during a critical window of development. In response to this, in 2007 the American Academy of Pediatrics published guidelines supporting the need for all children to be screened for ASD between 18- and 24 -months of age as part of their well-child visits [Myers, Johnson, \\& Council on Children With Disabilities, 2007]. Current screening typically relies on caregiver report, such as the Modified Checklist of ASD in ToddlersRevised with Follow-up (M-CHAT-R/F) (Robins et al., 2014). Evidence suggests that a two-tiered screening approach, including direct observational assessment of the child, improves the positive predictive value of M-CHAT screening by $48 \\%$ [Khowaja, Robins, \\&\n\n\nContext: Early autism detection and intervention strategies.", "metadata": { "doc_id": "carpenter2020 (1)_2", "source": "carpenter2020 (1)" } }, { "page_content": "Text: [^0] [^0]: From the Duke Center for Autism and Brain Development, Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, North Carolina, USA (K.L.H.C., J.H., K.C., H.L.E., S.E., S.V., G.D.); Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina, USA (J.H., S.E.); Department of Pediatrics, University of Utah, Salt Lake City, Utah, USA (K.C.); Department of Population Health Sciences, Duke University School of Medicine, Durham, North Carolina, USA (S.J.L.); Department of Pediatrics, Duke University School of Medicine, Durham, North Carolina, USA (J.P.B.); NYU Langone Child Study Center, New York University, New York, New York, USA (H.L.E.); Departments of Biomedical Engineering Computer Science, and Mathematics, Duke University, Durham, North Carolina, USA (G.S.); Duke Institute for Brain Sciences, Duke University, Durham, North Carolina, USA (G.D.) Received April 7, 2020; accepted for publication August 24, 2020 Address for correspondence and reprints: Kimberly L. H. Carpenter, Duke Center for Autism and Brain Development, Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, 2608 Erwin Rd #300, Durham, NC 27705. E-mail: kimberly.carpenter@duke.edu Published online 00 Month 2020 in Wiley Online Library (wileyonlinelibrary.com) DOI: $10.1002 /$ aur. 2391 (c) 2020 International Society for Autism Research and Wiley Periodicals LLC\n\n\nContext: A research article detailing the affiliations of the authors and acknowledgements related to a study on autism research.", "metadata": { "doc_id": "carpenter2020 (1)_3", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Adamson, 2017] and may reduce ethnic/racial disparities in general screening [Guthrie et al., 2019]. Current tools for observational assessment of ASD signs in infants and toddlers, such as the Autism Observation Scale for Infants (AOSI) and Autism Diagnostic Observation Schedule (ADOS), take substantial time and training to administer, resulting in a shortage of qualified diagnosticians to perform these observational assessments. As such, there remains a critical need to develop feasible, scalable, and reliable tools that can characterize ASD risk behaviors and identify those children who are most in need of follow-up by an ASD specialist. In an effort to address this critical need, we have embarked on a program of research using computer vision analysis (CVA) to develop tools for digitally phenotyping early emerging risk behaviors for ASD [Dawson \\& Sapiro, 2019]. If successful, such digital screening tools have the opportunity to help existing practitioners reach more children and assist in triaging boundary cases for review by specialists.\n\n\nContext: The need for feasible, scalable, and reliable tools to identify ASD risk behaviors in infants and toddlers.", "metadata": { "doc_id": "carpenter2020 (1)_4", "source": "carpenter2020 (1)" } }, { "page_content": "Text: One of the early emerging signs of ASD is a tendency to more often display a neutral facial expression. This pattern is evident in the quality of facial expressions and in sharing emotional expressions with others [Adrien et al., 1993; Baranek, 1999; S. Clifford \\& Dissanayake, 2009; S. Clifford, Young, \\& Williamson, 2007; S. M. Clifford \\& Dissanayake, 2008; Maestro et al., 2002; Osterling, Dawson, \\& Munson, 2002; Werner, Dawson, Osterling, \\& Dinno, 2000]. A restricted range of emotional expression and its integration with eye gaze (e.g., during social referencing) have been found to differentiate children with ASD from typically developing children, as well as those who have other developmental delays, as early as 12 months of age [Adrien et al., 1991; S. Clifford et al., 2007; Filliter et al., 2015; Gangi, Ibanez, \\& Messinger, 2014; Nichols, Ibanez, Foss-Feig, \\& Stone, 2014]. While core features of ASD vary by age, cognitive ability, and language, one of the most stable symptoms from early childhood through adolescences is increased frequency of neutral expression [Bal, Kim, Fok, \\& Lord, 2019]. As such, differences in facial affect may show utility in assessing early risk for ASD.\n\nA recent meta-analysis of facial expression production in autism found that individuals with ASD display facial expressions less often than non-ASD participants and that, when they did display facial expressions, the expressions occurred for shorter durations and were of different quality than non-ASD individuals (Trevisan, Hoskyn, \\& Birmingham, 2018). Decreases in the frequency of both emotional facial expressions and the sharing of those expressions with others has been demonstrated across naturalistic interactions [Bieberich \\& Morgan, 2004; Czapinski \\& Bryson, 2003; Dawson, Hill, Spencer, Galpert, \\& Watson, 1990; Mcgee, Feldman, \\& Chernin, 1991; Snow, Hertzig, \\& Shapiro, 1987; Tantam,\n\n\nContext: Facial expressions as an early indicator of Autism Spectrum Disorder (ASD).", "metadata": { "doc_id": "carpenter2020 (1)_5", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Holmes, \\& Cordess, 1993], in lab-based assessments such as during the ADOS [Owada et al., 2018] or the AOSI [Filliter et al., 2015] and in response to emotion-eliciting videos [Trevisan, Bowering, \\& Birmingham, 2016]. Furthermore, higher frequency of neutral expressions correlates with social impairment in children with ASD [Owada et al., 2018] and differentiates them from children with other delays [Bieberich \\& Morgan, 2004; Yirmiya, Kasari, Sigman, \\& Mundy, 1989]. As such, frequency and duration of facial affect is a promising early risk marker for young children with autism.\n\n\nContext: Facial affect as an early risk marker for autism.", "metadata": { "doc_id": "carpenter2020 (1)_6", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Previous research on atypical facial expressions in children with ASD has relied on hand coding of facial expressions, which is time intensive and often requires significant training [Bieberich \\& Morgan, 2004; S. Clifford et al., 2007; Dawson et al., 1990; Gangi et al., 2014; Mcgee et al., 1991; Nichols et al., 2014; Snow et al., 1987]. This approach is not scalable for use in general ASD risk screening or as a behavioral biomarker or outcome assessment for use in large clinical trials. As such, the field has moved toward automating the coding of facial expressions. In one of the earliest studies of this approach, Guha and colleagues demonstrated that children with ASD have atypical facial expressions when mimicking others. However, their technology required the children to wear markers on their face for data capture [Guha, Yang, Grossman, \\& Narayanan, 2018; Guha et al., 2015], which is both invasive and not scalable. More recently, several groups have applied non-invasive CVA technology to measuring affect in older children and adults with ASD within the laboratory setting [Capriola-Hall et al., 2019; Owada et al., 2018; Samad et al., 2018]. This represents an important move toward scalability as CVA approaches do not rely on the presence of physical markers on the face to extract emotion information. Rather, CVA relies on videos of the individual in which features around specific regions on a face (e.g., the mouth and eyes) are extracted. Notably, these features mirror those used by the manually rated facial affect coding system (FACS) [Ekman, 1997]. Both our earlier work [Hashemi et al., 2018] and that of others [Capriola-Hall et al., 2019] have shown good concordance between human coding and CVA rating of facial emotions. Furthermore, previous research in adults has demonstrated that CVA can detect neutral facial expressions more reliably than human coders [Lewinski, 2015].\n\n\nContext: The challenges and advancements in automated facial expression analysis for autism spectrum disorder research, moving from manual coding to computer vision-based approaches.", "metadata": { "doc_id": "carpenter2020 (1)_7", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Building on previous work applying CVA in laboratory settings, we have developed a portable tablet-based technology that uses the embedded camera and automatic CVA to code ASD risk behaviors in $<10 \\mathrm{~min}$ across a range of non-laboratory settings (e.g., pediatric clinics, at home, etc.). We developed a series of movies designed to capture children's attention, elicit emotion in response to novel and interesting events, and assess the toddler's ability to sustain attention and share it with others. By embedding\n\nthese movies in a fully automated system on a costeffective tablet whereby the elicited behaviors, in this case the frequency of different patterns of facial affect, are automatically encoded with CVA, we aim to create a tool that is objective, efficient, and accessible. The current analysis focuses on preliminary results supporting the utility of this tablet-based assessment for the detection of facial movement and affect in young children and the use of facial affect to differentiate children with and without ASD. Though facial affect is the focus of the current analysis, the ultimate goal is to combine information across autism risk features collected through the current digital screening tool [e.g., delayed response to name as described in Campbell et al., 2019], to develop a risk score based on multiple behaviors [Dawson \\& Sapiro, 2019]. This information could then be combined with additional measures of risk to enhance screening for ASD.\n\nMethods\n\nParticipants\n\n\nContext: The authors describe the development of a portable tablet-based technology utilizing computer vision analysis (CVA) to assess autism spectrum disorder (ASD) risk behaviors in young children, focusing on preliminary results related to facial affect.", "metadata": { "doc_id": "carpenter2020 (1)_8", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Methods\n\nParticipants\n\nParticipants were 104 children 16-31 months of age (Table 1). Children were recruited at their pediatric primary care visit by a research assistant embedded within the clinic or via referral from their physician, as well as through community advertisement ( $N=4$ in the nonASD group and $N=15$ in the ASD group). For children recruited within the pediatric clinics, recruitment occurred at the 18 - or 24 -month well-child visit at the same time as they received standard screening for ASD with the M-CHAT-R/F. A total of $76 \\%$ of the participants recruited in the clinic by a research assistant chose to participate. Of the participants who chose not to participate, $11 \\%$ indicated that they were not interested in the study, whereas the remainder declined due to not having enough time, having another child to take care of, wanting to discuss with their partner, or their child was already too distressed after the physician visit. All children who enrolled in the study found the procedure engaging enough that they were able to provide adequate data for analysis. Because the administration is very brief and non-demanding, data loss was not a significant problem.\n\nExclusionary criteria included known vision or hearing deficits, lack of exposure to English at home, or insufficient English language skills for caregiver's informed consent. Twenty-two children were diagnosed with ASD. The non-ASD comparison group $(N=82)$ was comprised of 74 typically developing children and 8 children with a non-ASD delay, which was defined by a diagnosis of language delay or developmental delay of clinical significance sufficient to qualify for speech or developmental therapy as recorded in the electronic medical record. All caregivers/legal guardians gave written informed consent, and the study protocol was approved by the Duke University Health System IRB.\n\n\nContext: Study participant recruitment and characteristics.", "metadata": { "doc_id": "carpenter2020 (1)_9", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Children recruited from the pediatric primary care settings received screening with a digital version of the M-CHAT-R/F as part of a quality improvement study ongoing in the clinic [Campbell et al., 2017]. Participants recruited from the community received ASD screening with the digital M-CHAT-R/F prior to the tablet assessment. As part of their participation in the study, children who either failed the M-CHAT-R/F or for whom there was caregiver or physician concern about possible ASD underwent diagnostic testing using the ADOS-Toddler (ADOST) [Luyster et al., 2009] conducted by a licensed psychologist or research-reliable examiner supervised by a licensed\n\nTable 1. Sample Demographics\n\nTypically developing $(N=74 ; 71 \\%)$ Non-ASD delay $(N=8 ; 8 \\%)$ ASD $(N=22 ; 21 \\%)$ Age Months [mean (SD)] $21.7(3.8)$ $23.9(3.7)$ $26.2(4.1)$ Sex Female $31(42)$ $3(38)$ $5(23)$ Male $43(58)$ $5(62)$ $17(77)$ Ethnicity/race African American $10(14)$ $1(13)$ $3(14)$ Caucasian $46(62)$ $2(25)$ $10(45)$ Hispanic $1(1)$ $0(0)$ $1(4)$ Other/unknown $17(23)$ $5(62)$ $8(37)$ Insurance ${ }^{a}$ Medicald $11(15)$ $1(14)$ $6(67)$ Non-Medicaid $60(85)$ $6(86)$ $3(33)$ MCHAT result ${ }^{b}$ Positive $1(1)$ $0(0)$ $18(82)$ Negative $73(99)$ $8(100)$ $4(18)$\n\n${ }^{a}$ Insurance status was unknown for $17(16 \\%)$ of participants in this study. ${ }^{\\text {b }}$ Children for whom the MCHAT was negative but received and ASD diagnosis were referred for assessment due to concerns by either the parent or the child's physician.\n\npsychologist. The mean ADOS-T score was 18.00 ( $\\mathrm{SD}=4.67$ ). A subset of the ASD children $(N=13)$ also received the Mullen Scales of Early Learning (Mullen, 1995). The mean IQ based on the Early Learning Composite Score for this subgroup was 63.58 ( $\\mathrm{SD}=25.95$ ). None of the children in the non-ASD comparison group was administered the ADOS or Mullen.\n\n\nContext: Study participant recruitment, screening procedures, and diagnostic assessments.", "metadata": { "doc_id": "carpenter2020 (1)_10", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Children's demographic information was extracted from the child's medical record or self-reported by the caregiver at their study visit. Children in the ASD group were, on average, 4 months younger than the comparison group $(t[102]=4.64, P<0.0001)$. Furthermore, as would be expected, there was a higher proportion of males in the ASD group than in the comparison group, though the difference was not statistically significant ( $\\chi^{2}$ $[1,104]=2.60, P=0.11)$. There were no differences in the proportion of racial/ethnic minority children between the two groups $\\left(\\chi^{2}[1,104]=1.20, P=0.27\\right)$. When looking only at the children for which Medicaid status was known, there was no difference in the proportion of children on Medicaid in the ASD and the non-ASD group $\\left(\\chi^{2}[1,87]=1.82, P=0.18\\right)$.\n\nStimuli and Procedure\n\nA series of developmentally appropriate brief movies designed to elicit affect and engage the child's attention were shown on a tablet while the child sat on a caregiver's lap. The tablet was placed on a stand approximately 3 ft away from the child to prevent the child from touching the screen as depicted in previous publications [Campbell et al., 2019; Dawson et al., 2018; Hashemi et al., 2015; Hashemi et al., 2018]. Movies consisted of cascading bubbles ( $2 \\times 30 \\mathrm{sec}$ ), a mechanical bunny ( 66 sec ), animal puppets interacting with each other ( 68 sec ), and a split screen showing a woman singing nursery rhymes on one side and dynamic, noise-making toys on the other side ( 60 sec ; Fig. 1). These movies included stimuli that have been used in previous studies of ASD symptomatology [Murias et al., 2018], as well as developed specifically for the current tablet-based technology to elicit autism symptoms, based on Dawson et al. [2004], Jones, Dawson, Kelly, Estes, and Webb [2017], Jones et al. [2016], and Luyster et al. [2009]. At three points during the movies, the examiner located behind the child called the child's name.\n\n\nContext: Study participant demographics and stimulus/procedure details.", "metadata": { "doc_id": "carpenter2020 (1)_11", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Prior to the administration of the app, caregivers were clearly instructed not to direct their child's attention or in any way try to influence the child's behavior during the assessment. Furthermore, if the caregiver began to try to direct their child's attention, the examiner in the room immediately asked the caregiver to refrain from doing so. If the caregiver persisted, this was noted on our validity form and the administration would have been considered invalid. Researchers stopped the task for one comparison participant due to crying. Researchers restarted the task for three participants with ASD due to difficulty remaining in view of the tablet's camera for more than half of the first video stimulus. If other family members were present during the well-child visit, they were asked to stand behind the caregiver and child so as to not distract the child during the assessment. Additionally, for children assessed during a well-child visit, research assistants were instructed to collect data prior to any planned shots or blood draws.\n\nComputer Vision Analysis\n\nThe frontal camera in the tablet recorded video throughout the experiment at $1280 \\times 720$ resolution and 30 frames per second. The CVA algorithm [Hashemi et al., 2018] first automatically detected and tracked 49 facial landmarks on the child's face [De la Torre et al., 2015]. Head positions relative to the camera were estimated by computing the optimal rotation parameters between the detected landmarks and a 3D canonical face model [Fischler \\& Bolles, 1981]. A \"not visible\" tag was assigned to frames where the face was not detected or the face exhibited drastic yaw ( $>45^{\\circ}$ from center). We acknowledge that the current method used only indicates whether the child is oriented toward the stimulus and does not track eye movements. For each \"visible\" frame, the probability of expressing three standard categories of facial expressions, Positive, Neutral (i.e., no active facial\n\nimg-0.jpeg\n\n\nContext: Data collection procedures and computer vision analysis methods used in an autism assessment study.", "metadata": { "doc_id": "carpenter2020 (1)_12", "source": "carpenter2020 (1)" } }, { "page_content": "Text: img-0.jpeg\n\nFigure 1. Example of movie stimuli. Developmentally appropriate movies consisted of cascading bubbles, a mechanical bunny, animal puppets interacting with each other, and a split screen showing a woman singing nursery rhymes on one side and dynamic, noise-making toys on the other side.\n\n\nContext: Movie stimuli used in autism research.", "metadata": { "doc_id": "carpenter2020 (1)_13", "source": "carpenter2020 (1)" } }, { "page_content": "Text: action unit), or Other (all other expressions), was assigned [Hashemi et al., 2018]. The model for automatic facial expression is an extension of the pose-invariant and cross-modal dictionary learning approach originally described in Hashemi et al. [2015]. During training, the dictionary representation is setup to map facial information between 2D and 3D modalities and is then able to infer discriminative facial information for facial expressions recognition even when only 2D facial images are available at deployment. For training, data from Binghamton University 3D Facial Expression database [Yin, Wei, Sun, Wang, \\& Rosato, 2006] were used, along with synthesized faces images with varying poses [see Hashemi et al., 2015 for synthesis details]. Extracted image features and distances between a subset of facial landmarks were used as facial features to learn the robust dictionary. Lastly, using the inferred discriminative 3D and frontal 2D facial features, a multiclass support vector machine [Chang \\& Lin, 2011] was trained to classify the different facial expressions. In recent years, there has been progress on automatic facial expression analysis of both children and toddlers [Dys \\& Malti, 2016; Gadea, Aliño, Espert, \\& Salvador, 2015; Haines et al., 2019; LoBue \\& Thrasher, 2014; Messinger, Mahoor, Chow, \\& Cohn, 2009]. In addition to this, we have previously validated our CVA algorithm against expert human rater coding of facial affect in a subsample of 99 video recordings across 33 participants (ASD $=15$, non-ASD $=18$ ). This represents $20 \\%$ of the non-ASD sample and a matched group from the ASD sample. The selection of participants for this previously published validation study was based on age distribution to ensure representation across the range of ages for both non-ASD and ASD groups. This previous work showed strong concordance between CVAand human-rated coding of facial emotion in this data set, with high precision, recall, and $F 1$ scores of 0.89 , 0.90 , and 0.89 , respectively [Hashemi et\n\n\nContext: A description of a computational method (CVA) for analyzing facial expressions, including its technical details, training data, and previous validation against human raters.", "metadata": { "doc_id": "carpenter2020 (1)_14", "source": "carpenter2020 (1)" } }, { "page_content": "Text: work showed strong concordance between CVAand human-rated coding of facial emotion in this data set, with high precision, recall, and $F 1$ scores of 0.89 , 0.90 , and 0.89 , respectively [Hashemi et al., 2018].\n\n\nContext: Automated facial emotion coding using computer vision algorithms.", "metadata": { "doc_id": "carpenter2020 (1)_15", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Statistical Approach\n\n\nContext: Methods used to analyze facial expression data in autism research.", "metadata": { "doc_id": "carpenter2020 (1)_16", "source": "carpenter2020 (1)" } }, { "page_content": "Text: For each video frame, the CVA algorithm produces a probability value for each expression (Positive, Neutral, Other). We calculated the mean of the probability values for each of the three expression types within nonoverlapping 90 -frame ( 3 sec ) intervals, excluding frames when the face was not visible. A $3-\\mathrm{sec}$ window was selected as it provided us with a continuous distribution of the emotion probabilities, while still being within the $0.5-4-\\mathrm{sec}$ window of a macroexpression (Ekman, 2003). Additionally, for each of the name call events we removed the time window starting at cue for the name call prompt through the point where $75 \\%$ of the audible name calls actually occurred, plus 150 frames ( 5 sec ). This window was selected based on our previous study [Campbell et al., 2019] showing that orienting tended to occur within a few seconds after a name call. We calculated the proportion of frames the child was not attending to the movie stimulus, based on the \"visible\" and \"not visible\" tags described above, within each 90 -frame interval, excluding name call periods. Thus, for each child, we generated four variables (mean probabilities of Positive, Neutral, or Other; and proportion of frames not attending) for each 90 -frame interval within each of the five movies. To evaluate differences between ASD and non-ASD children at regular intervals throughout the assessment, we fit a series of bivariate logistic regressions to obtain the odds ratios for the associations between the mean expression probability or attending proportion during a given interval, parameterized as increments of $10 \\%$ points, and ASD diagnosis. We then fit a series of multivariable logistic regression models, separately for each movie and variable, which included parameters for each of the 3-sec intervals within the movie to predict ASD diagnosis. Given the large number of intervals relative to the small sample size, we used a Least Absolute Shrinkage and Selection Operator (LASSO) penalized regression approach\n\n\nContext: Analysis of facial expressions and attention during movie viewing in children with and without ASD, using computer vision algorithms and statistical modeling.", "metadata": { "doc_id": "carpenter2020 (1)_17", "source": "carpenter2020 (1)" } }, { "page_content": "Text: movie to predict ASD diagnosis. Given the large number of intervals relative to the small sample size, we used a Least Absolute Shrinkage and Selection Operator (LASSO) penalized regression approach [Tibshirani, 1996] to select a parsimonious set of parameters representing the intervals within each movie and expression type that were most predictive of ASD diagnosis. For each of the five movies, we then combined the LASSO-selected interval parameters into a full logistic model. When more than one expression parameter was selected for a given interval, we selected the one with the stronger odds ratio estimate. Analyses were conducted with and without age as a covariate. Since the small study size precluded having separate training and validation sets, we used leave-one-out cross-validation to assess model performance. Receiver-operating characteristics (ROC) curves were plotted and the $c$-statistic for the area under the ROC curve was calculated for each movie.\n\n\nContext: Methods for analyzing video data to predict Autism Spectrum Disorder (ASD) diagnosis using machine learning.", "metadata": { "doc_id": "carpenter2020 (1)_18", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Results\n\nFigure 2 depicts the odds ratio analysis using the \"Rhymes and Toys\" movie as one illustrative example. As shown by the variability in the odds ratio estimates, some parts of the movies elicited strongly differential responses in certain patterns of expression (blue window, Fig. 2), while in other sections, there were not substantial differences between the two groups (green window, Fig. 2). Overlaid on the plot are the odds ratios and confidence bands for the interval parameters selected by the expression-specific LASSO models. These selected parameters were then used in the movie-level logistic models for which we calculated classification metrics.\n\nimg-1.jpeg\n\nFigure 2. Time series of odds ratios for the association between the mean expression probability or proportion attending and ASD diagnosis. Using the \"Rhymes\" movie as one illustrative example, lines depict the odds of meeting criteria for ASD (OR > 1) or being in the non-ASD comparison group ( $O R<1$ ) for each of the outcomes of interest for each 3 sec time bin across the movie. Points with error bars are intervals that were selected by the LASSO regression models and included in the final logistic model. The blue window depicts a segment of the movie where there were differential emotional responses between the ASD and non-ASD children. The green window depicts a segment of the movie in which there was no difference in emotional responses between the groups.\n\nimg-2.jpeg\n\nFigure 3. Receiver-operating characteristics (ROC) curves. ROC curves were calculated for predictive ability of expression-specific LASSO selected interval parameters for facial expressions and attention to stimulus for each movie independently.\n\n\nContext: Analysis of facial expressions and attention in children with ASD using movie stimuli and machine learning techniques.", "metadata": { "doc_id": "carpenter2020 (1)_19", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Figure 3 compares the ROC curves for the five final movie-level logistic models after leave-one-out crossvalidation. ROC curves analyses were performed for each video individually. The model for the \"Rhymes\" movie yielded the strongest predictive ability, with an area under the curve (AUC) of 0.73 ( $95 \\%$ confidence interval [CI] $0.59-0.86$ ), followed by the \"Puppets\" (AUC = 0.67; $95 \\%$ CI $0.53-0.80$ ) and the \"Bunny\" (AUC = 0.66; 95\\%\n\nCI 0.51-0.82) videos. Finally, the two \"Bubbles\" movies that bookend the stimulus sets were the least predictive, with AUCs of $0.62(95 \\%$ CI $0.49-0.74)$ and $0.64(95 \\% \\mathrm{CI}$ $0.51-0.76$ ), respectively. Because there was a significant difference in age between the ASD and non-ASD comparison groups, we ran a second set of ROC analyses where age was included as a covariate, shown in Table 2. Results remained significant after including the age covariate.\n\n\nContext: Evaluation of predictive ability of logistic regression models using ROC curve analysis for video stimuli.", "metadata": { "doc_id": "carpenter2020 (1)_20", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Given the preponderance for the Other emotional category in our non-ASD comparison group, we explored: (a) what specific facial movements are driving this category of Other expressions and (b) how it differs from the Neutral expression category. We focused on analyzing movements of facial landmarks and head pose angles and how they differ between the facial expression categories. Our CVA algorithm aligns the facial landmarks of the child to a canonical face model through an affine transformation, which normalizes the landmark locations across all video frames to a common space. This normalization process is commonly used across CVA tasks related to facial analysis as it allows one to analyze/compare landmark locations across different frames or participants. With this alignment step, we were able to quantify the distances between the eye corners and the corners of the eyebrows, the vertical distance between the inner lip points, and the vertical distance between the outer lip points (Fig. 4, right). To interpret features that differentiated the Neutral from the Other facial expression category, we assessed differences between these facial landmark distances of a given child when they were predominately expressing Neutral versus Other facial expressions. We also included yaw and pitch head pose angles since they may play a role in the alignment process.\n\nTable 2. Comparison of ASD Versus Non-ASD: Area Under the Curve (AUC) Analyses\n\nAUC without covariates AUC with age in the model Bubbles 1 0.62 0.75 Bunny 0.66 0.81 Puppets 0.67 0.78 Rhymes 0.73 0.83 Bubbles 2 0.64 0.79\n\n\nContext: Analysis of facial landmark distances and head pose angles to differentiate Neutral and \"Other\" facial expressions in children with and without ASD.", "metadata": { "doc_id": "carpenter2020 (1)_21", "source": "carpenter2020 (1)" } }, { "page_content": "Text: AUC without covariates AUC with age in the model Bubbles 1 0.62 0.75 Bunny 0.66 0.81 Puppets 0.67 0.78 Rhymes 0.73 0.83 Bubbles 2 0.64 0.79\n\nWe focused in on three stimuli in which participants exhibited high probabilities of Other expressions, namely, the first Bubbles, Puppets, and Rhymes and Toys videos. Out of the 104 participants, all exhibited frames where both Neutral and Other facial expressions were dominant (probability of expression over 60\\%). A Wilson signed-ranks test, reported as (median difference, $P$ value), indicated that within individual participants for each diagnostic group, the median differences between the distances of Other versus Neutral facial expressions were significantly higher for inner right eyebrows (nonASD: diff $=2.5, P<8.1 \\mathrm{e}-10$; ASD: diff $=4.0, P<1.1 \\mathrm{e}-4$ ), inner left eyebrows (non-ASD: diff $=2.4, P<1.5 \\mathrm{e}-9$; ASD: diff $=3.6, P<1.4 \\mathrm{e}-4$ ), outer right eyebrows (non-ASD: diff $=1.2, P<1.0 \\mathrm{e}-5$; ASD: diff $=2.3, P<1.6 \\mathrm{e}-4$ ), outer left eyebrows (non-ASD: diff $=0.9, P<3.4 \\mathrm{e}-6$; ASD: diff $=1.4$, $P<1.4 \\mathrm{e}-3$ ), and mouth height (non-ASD: diff $=1.5$, $P<1.0 \\mathrm{e}-3$ ), as well as for pitch head pose angles (nonASD: diff $=2.8, P<8.7 \\mathrm{e}-10$; ASD: diff $=4.4, P<1.2 \\mathrm{e}-3$ ); but not for eye heights, lips parting, nor yaw head pose angles.\n\nDiscussion\n\n\nContext: Results of a study comparing diagnostic groups (ASD vs. non-ASD) using facial expression analysis of video stimuli, including AUC values and eyebrow movement differences.", "metadata": { "doc_id": "carpenter2020 (1)_22", "source": "carpenter2020 (1)" } }, { "page_content": "Text: The present study evaluated an application administered on a tablet that was comprised of carefully designed movies that elicited affective expressions combined with CVA of recorded behavioral responses to identify patterns of facial movement and emotional expression that differentiate toddlers with ASD from those without ASD. We demonstrated that the movies elicited a range of affective facial expressions in both groups. Furthermore, using CVA we found children with ASD were more likely to display a neutral expression than children without ASD when watching this series of videos, and the patterns of facial expressions elicited during specific parts of the movies differed between the two groups. We believe this finding has strong face validity that rests on both research and clinical observations of a restricted range of facial expression in children with autism. Furthermore, this replicates a previous finding of our group reporting increased frequency of neutral expression in young children who screened positive on the M-CHAT [Egger et al., 2018]. Together, these preliminary results support the use of engaging brief movies shown on a cost-effective tablet, combined with automated CVA behavioral coding, as an objective and feasible tool for measuring an early emerging symptom of ASD, namely, increased frequency of neutral facial expressions. While the predictive power of emotional expression in some of the videos varied, all but one represent a medium effect, equivalent to Cohen's $d=0.5$ or greater [Rice \\& Harris, 2005]. Overall, the best predictor from our battery of videos is the \"Rhymes\" video, which had an AUC with a large effect size (equivalent to $d>0.8$ ). While this may suggest that presenting the \"Rhymes\" video alone is sufficient for differentiating between the ASD and non-ASD groups, we caution readers from coming to this conclusion for two reasons: First, it is possible that, had we had a larger sample, the other videos would have had a larger effect. Second, we anticipate that there will be\n\n\nContext: A study evaluating a tablet application using computer vision analysis (CVA) to identify facial expression patterns differentiating toddlers with and without ASD, reporting findings on neutral expression frequency and predictive power of specific videos.", "metadata": { "doc_id": "carpenter2020 (1)_23", "source": "carpenter2020 (1)" } }, { "page_content": "Text: from coming to this conclusion for two reasons: First, it is possible that, had we had a larger sample, the other videos would have had a larger effect. Second, we anticipate that there will be variability in the ASD group with regard to which features a single child will express and different videos may be better suited to elicit different features in any given individual. As such, we believe that it is important to understand how each independent feature, in this case facial affect, performs across the different videos so that we can being to build better predictive models from combinations of features [e.g., facial affect and postural sway as described in Dawson et al., 2018]. To further understand the difference of facial expression in the non-ASD group as compared to our ASD sample, we explored the facial landmarks differentiating between the Other facial expression category that dominated the non-ASD control group versus the Neutral facial expression, which was more common in the ASD group. Through this analysis, we identified the features of raised eyebrows and open mouth to play a role in discriminating between the Other vs. Neutral categories. This facial pattern is consistent with an engaged/interested look displayed when a child is actively watching, as described in young children by Sullivan and Lewis [2003]. It is interesting to note a raised pitch angle was also statistically significant. Since the median difference of this angle between the two facial expression is small $\\left(3.2^{\\circ}\\right)$, this may be a natural movement of raising one's eyebrows. Our results need to be considered in light of several limitations. First, the CVA models of facial expressions used in the current study were trained on adult faces [Hashemi et al., 2018]. Despite this, our previous findings with young children demonstrate good concordance between human and CVA coding on the designation of facial expressions [Hashemi et al., 2018]. Furthermore, the Other facial expression category includes all nonpositive or\n\n\nContext: Analysis of facial expression differences between ASD and non-ASD groups using computer vision algorithms and facial landmark analysis.", "metadata": { "doc_id": "carpenter2020 (1)_24", "source": "carpenter2020 (1)" } }, { "page_content": "Text: good concordance between human and CVA coding on the designation of facial expressions [Hashemi et al., 2018]. Furthermore, the Other facial expression category includes all nonpositive or negative expressions. As such, even though we were able to determine the predominant feature driving those expressions was the raised eyebrows, which is in line with our observations from watching the movies,\n\n\nContext: Automated analysis of facial expressions in autism research using computer vision.", "metadata": { "doc_id": "carpenter2020 (1)_25", "source": "carpenter2020 (1)" } }, { "page_content": "Text: img-3.jpeg\n\nFigure 4. Analysis of Other Facial Expression. The 4 panels on the left depict heat maps of aligned landmarks across ASD and non-ASD participants when they were exhibiting Neutral and other facial expressions (the color bar indicates proportion of frames where landmarks were displayed in a given image location). The single panel on the right is an example of the landmark distances explored. it is possible that there are a combination of facial expressions in the non-ASD group driving this designation. Future studies will need to train on the engaged/interested facial expression specifically and test the robustness of this finding. Additionally, though we have previously demonstrated good reliability between our CVA algorithms and human coding of emotions [Hashemi et al., 2015], future validation of our CVA analysis of emotional facial expressions in larger datasets is currently underway. Second, using the LASSO statistical approach means our model may not select all features that have differentiating information. However, we selected this approach because it minimizes over fitting the model. Third, our sample size was relatively small and we do not have separate training and testing samples. To account for this, we applied cross-validation on the ROC curves even though this decreased the performance metrics of the model. This suggests that our ROC results are potentially conservative. Finally, our comparison group contains both children with typical development and children with non-ASD developmental delays, a factor that can be viewed as both a weakness and a strength. Previous research has demonstrated that increased frequency of neutral expressions does differentiate children with ASD from those with other developmental delays [Bieberich \\& Morgan, 2004; Yirmiya et al., 1989]. However, due to the small sample size of children with non-\n\n\nContext: Discussion of limitations and future directions for facial expression analysis in autism research.", "metadata": { "doc_id": "carpenter2020 (1)_26", "source": "carpenter2020 (1)" } }, { "page_content": "Text: ASD developmental delays, we were unable to directly test this in our data. Furthermore, because only a subset of the sample received an assessment of cognitive ability, it is possible that there were additional children in the non-ASD comparison group that also had a developmental delay that was undetected. Ongoing research in a prospective, longitudinal study with larger samples is underway to further parse the ability of our CVA tools to differentiate between children with ASD, children with a non-ASD developmental delay and/or attention deficit hyperactivity disorder, and typically developing children.\n\nWhile a difference in facial expression is one core feature of ASD, the heterogeneity in ASD means we do not expect all children with ASD to display this sign of ASD. As such, our next step is to combine the current results with other measures of autism risk assessed through the current digital screening tool, including response to name [Campbell et al., 2019], postural sway [Dawson et al., 2018], and differential vocalizations [Tenenbaum et al., 2020], among other features, to develop a risk score based on multiple behaviors [Dawson \\& Sapiro, 2019]. Since no one child is expected to display every risk behavior, a goal is to determine thresholds based on the total number of behaviors, regardless of which combination of behaviors, to asses for risk. This is similar to what is done in commonly used screening and diagnostic tools, such as the M-CHAT [Robins et al., 2014], Autism Diagnostic\n\n\nContext: Discussion of limitations and future directions for a digital screening tool for autism spectrum disorder.", "metadata": { "doc_id": "carpenter2020 (1)_27", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Interview [Lord, Rutter, \\& Le Couteur, 1994], and ADOS [Gotham, Risi, Pickles, \\& Lord, 2007; Lord et al., 2000]. In summary, we evaluated an integrated, objective tool for the elicitation and measurement of facial movements and expressions in toddlers with and without ASD. The current study adds to a body of research supporting digital behavioral phenotyping as a viable method for assessing autism risk behaviors. Our goal is to further develop and validate this tool so that it can eventually be used within the context of current standard of care to enhance autism screening in pediatric populations.\n\nAcknowledgments\n\nFunding for this work was provided by NIH R01-MH 120093 (Sapiro, Dawson PIs), NIH RO1-MH121329 (Dawson, Sapiro PIs), NICHD P50HD093074 (Dawson, Kollins, PIs), Simons Foundation (Sapiro, Dawson, PIs), Duke Department of Psychiatry and Behavioral Sciences PRIDe award (Dawson, PI), Duke Education and Human Development Initiative, Duke-Coulter Translational Partnership Grant Program, National Science Foundation, a Stylli Translational Neuroscience Award, and the Department of Defense. Some of the stimuli used for the movies were created by Geraldine Dawson, Michael Murias, and Sara Webb at the University of Washington. This work would not have been possible without the help of Elizabeth Glenn, Elizabeth Adler, and Samuel Marsan. We also gratefully acknowledge the participation of the children and families in this study. Finally, we could not have completed this study without the assistance and collaboration of Duke pediatric primary care providers.\n\nConflict of interests\n\n\nContext: Concluding remarks and acknowledgements of a study developing a digital tool for assessing autism risk behaviors.", "metadata": { "doc_id": "carpenter2020 (1)_28", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Conflict of interests\n\nGuillermo Sapiro has received basic research gifts from Amazon, Google, Cisco, and Microsoft and is a consultant for Apple and Volvo. Geraldine Dawson is on the Scientific Advisory Boards of Janssen Research and Development, Akili, Inc., LabCorp, Inc., Tris Pharma, and Roche Pharmaceutical Company, a consultant for Apple, Inc, Gerson Lehrman Group, Guidepoint, Inc., Teva Pharmaceuticals, and Axial Ventures, has received grant funding from Janssen Research and Development, and is CEO of DASIO, LLC (with Guillermo Sapiro). Dawson receives royalties from Guilford Press, Springer, and Oxford University Press. Dawson, Sapiro, Carpenter, Hashemi, Campbell, Espinosa, Baker, and Egger helped develop aspects of the technology that is being used in the study. The technology has been licensed and Dawson, Sapiro, Carpenter, Hashemi, Espinosa, Baker, Egger, and Duke University have benefited financially.\n\nReferences\n\n\nContext: Disclosure of author conflicts of interest and acknowledgements of contributions and financial benefits related to the study's technology.", "metadata": { "doc_id": "carpenter2020 (1)_29", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Adrien, J. L., Faure, M., Perrot, A., Hameury, L., Garreau, B., Barthelemy, C., \\& Sauvage, D. (1991). Autism and family home movies: Preliminary findings. Journal of Autism and Developmental Disorders, 21(1), 43-49. Adrien, J. L., Lenoir, P., Martineau, J., Perrot, A., Hameury, L., Larmande, C., \\& Sauvage, D. (1993). Blind ratings of early symptoms of autism based upon family home movies. Journal of the American Academy of Child and Adolescent Psychiatry, 32(3), 617-626. https://doi.org/10.1097/00004583-199305000-00019 Bal, V. H., Kim, S. H., Fok, M., \\& Lord, C. (2019). Autism spectrum disorder symptoms from ages 2 to 19 years: Implications for diagnosing adolescents and young adults. Autism Research, 12(1), 89-99. https://doi.org/10.1002/aur. 2004 Baranek, G. T. (1999). Autism during infancy: a retrospective video analysis of sensory-motor and social behaviors at 9-12 months of age. Journal of Autism and Developmental Disorders, 29(3), 213-224. Bieberich, A. A., \\& Morgan, S. B. (2004). Self-regulation and affective expression during play in children with autism or Down syndrome: A short-term longitudinal study. Journal of Autism and Developmental Disorders, 34(4), 439-448. https://doi.org/10.1023/b/jadd.0000037420.16169.28 Bieleninik, Ł., Posserud, M.-B., Geretsegger, M., Thompson, G., Elefant, C., \\& Gold, C. (2017). Tracing the temporal stability of autism spectrum diagnosis and severity as measured by the Autism Diagnostic Observation Schedule: A systematic review and meta-analysis. PLoS One, 12(9), e0183160-e0183160. https://doi.org/10.1371/journal.pone. 0183160 Campbell, K., Carpenter, K. L., Hashemi, J., Espinosa, S., Marsan, S., Borg, J. S., ... Dawson, G. (2019). Computer vision analysis captures atypical attention in toddlers with autism. Autism, 23(3), 619-628. https://doi.org/10.1177/ 1362361318766247 Campbell, K., Carpenter, K. L. H., Espinosa, S., Hashemi, J., Qiu, Q., Tepper, M., ... Dawson, G. (2017). Use of a Digital Modified Checklist for Autism in Toddlers-Revised with follow-up to improve\n\n\nContext: Studies utilizing home videos to identify early signs of autism.", "metadata": { "doc_id": "carpenter2020 (1)_30", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Campbell, K., Carpenter, K. L. H., Espinosa, S., Hashemi, J., Qiu, Q., Tepper, M., ... Dawson, G. (2017). Use of a Digital Modified Checklist for Autism in Toddlers-Revised with follow-up to improve quality of screening for autism. The Journal of Pediatrics, 183, 133-139.e1. https://doi.org/10. 1016/j.jpeds.2017.01.021 Capriola-Hall, N. N., Wieckowski, A. T., Swain, D., Tech, V., Aly, S., Youssef, A., ... White, S. W. (2019). Group differences in facial emotion expression in autism: Evidence for the utility of machine classification. Behavior Therapy, 50(4), 828-838. https://doi.org/10.1016/j.beth.2018.12.004 Chang, C.-C., \\& Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 1-27. Christensen, D. L., Baio, J., Braun, K. V., Bilder, D., Charles, J., Constantino, J. N., ... Yeargin-AlIsopp, M. (2016). Prevalence and characteristics of autism spectrum disorder among children aged 8 years-Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2012. MMWR Surveillance Summaries, 65(SS-3), 1-23. Clifford, S., \\& Dissanayake, C. (2009). Dyadic and triadic behaviours in infancy as precursors to later social responsiveness in young children with autistic disorder. Journal of Autism and\n\n\nContext: A review of research on identifying and assessing autism spectrum disorder.", "metadata": { "doc_id": "carpenter2020 (1)_31", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Developmental Disorders, 39(10), 1369-1380. https://doi. org/10.1007/s10803-009-0748-x Clifford, S., Young, R., \\& Williamson, P. (2007). Assessing the early characteristics of autistic disorder using video analysis. Journal of Autism and Developmental Disorders, 37(2), 301-313. https://doi.org/10.1007/s10803-006-0160-8 Clifford, S. M., \\& Dissanayake, C. (2008). The early development of joint attention in infants with autistic disorder using home video observations and parental interview. Journal of Autism and Developmental Disorders, 38(5), 791-805. https://doi. org/10.1007/s10803-007-0444-7 Czapinski, P., \\& Bryson, S. (2003). Reduced facial muscle movements in autism: Evidence for dysfunction in the neuromuscular pathway? Brain and Cognition, 51(2), 177-179. Dawson, G., \\& Bernier, R. (2013). A quarter century of progress on the early detection and treatment of autism spectrum disorder. Development and Psychopathology, 25(4 Pt 2), 1455-1472. https://doi.org/10.1017/S0954579413000710 Dawson, G., Campbell, K., Hashemi, J., Lippmann, S. J., Smith, V., Carpenter, K., ... Sapiro, G. (2018). Atypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder. Scientific Reports, 8 (1), 17008. https://doi.org/10.1038/s41598-018-35215-8\n\nDawson, G., Hill, D., Spencer, A., Galpert, L., \\& Watson, L. (1990). Affective exchanges between young autistic-children and their mothers. Journal of Abnormal Child Psychology, 18 (3), 335-345. https://doi.org/10.1007/Bf00916569\n\n\nContext: Studies examining facial expressions and behaviors in autism.", "metadata": { "doc_id": "carpenter2020 (1)_32", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Dawson, G., \\& Sapiro, G. (2019). Potential for digital behavioral measurement tools to transform the detection and diagnosis of autism spectrum disorder. JAMA Pediatrics, 173(4), 305-306. https://doi.org/10.1001/jamapediatrics.2018.5269 Dawson, G., Toth, K., Abbott, R., Osterling, J., Munson, J., Estes, A., \\& Liaw, J. (2004). Early social attention impairments in autism: social orienting, joint attention, and attention to distress. Developmental Psychology, 40(2), 271-283. https:// doi.org/10.1037/0012-1649.40.2.271 De la Torre, F., Chu, W. S., Xiong, X., Vicente, F., Ding, X., \\& Cohn, J. (2015). IntraFace. IEEE International Conference on Automatic Face Gesture Recognition Workshops. (pp. 1-8). Ljubljana, Slovenia: IEEE. https://doi.org/10.1109/fg.2015. 7163082 Dys, S. P., \\& Malti, T. (2016). It's a two-way street: Automatic and controlled processes in children's emotional responses to moral transgressions. Journal of Experimental Child Psychology, 152, 31-40. https://doi.org/10.1016/j.jecp.2016. 06.011\n\nEgger, H. L., Dawson, G., Hashemi, J., Carpenter, K. L. H., Espinosa, S., Campbell, K., ... Sapiro, G. (2018). Automatic emotion and attention analysis of young children at home: A ResearchKit autism feasibility study. NPJ Digital Medicine, 1 (1), 20. https://doi.org/10.1038/s41746-018-0024-6\n\n\nContext: Studies exploring digital tools and automated analysis for autism detection and diagnosis, alongside research on early social attention impairments.", "metadata": { "doc_id": "carpenter2020 (1)_33", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Ekman, P. (2003). Emotions revealed (2nd ed.). New York, NY: Times Books. Ekman, R. (1997). What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). New York, NY: Oxford University Press. Filliter, J. H., Longard, J., Lawrence, M. A., Zwaigenbaum, L., Brian, J., Garon, N., ... Bryson, S. E. (2015). Positive affect in infant siblings of children diagnosed with autism spectrum disorder. Journal of Abnormal Child Psychology, 43(3), 567-575. https://doi.org/10.1007/s10802-014-9921-6 Fischler, M. A., \\& Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the Association for Computing Machinery, 24(6), 381-395. https://doi.org/10.1145/358669.358692 Gadea, M., Aliño, M., Espert, R., \\& Salvador, A. (2015). Deceit and facial expression in children: The enabling role of the \"poker face\" child and the dependent personality of the detector. Frontiers in Psychology, 6, 1089-1089. https://doi. org/10.3389/fpsyg.2015.01089 Gangi, D. N., Ibanez, L. V., \\& Messinger, D. S. (2014). Joint attention initiation with and without positive affect: Risk group differences and associations with ASD symptoms. Journal of Autism and Developmental Disorders, 44(6), 1414-1424. https://doi.org/10.1007/s10803-013-2002-9 Gotham, K., Risi, S., Pickles, A., \\& Lord, C. (2007). The Autism Diagnostic Observation Schedule: Revised algorithms for improved diagnostic validity. Journal of Autism and Developmental Disorders, 37(4), 613-627. https://doi.org/10.1007/ s10803-006-0280-1 Guha, T., Yang, Z., Grossman, R. B., \\& Narayanan, S. S. (2018). A computational study of expressive facial dynamics in children with autism. IEEE Transactions on Affective Computing, 9(1), 14-20. https://doi.org/10.1109/taffc.2016. 2578316 Guha, T., Yang, Z., Ramakrishna, A., Grossman, R. B., Darren, H., Lee, S., \\& Narayanan, S. S. (2015). On quantifying facial expression-related atypicality of children with\n\n\nContext: Facial expression research and measurement tools used in autism studies.", "metadata": { "doc_id": "carpenter2020 (1)_34", "source": "carpenter2020 (1)" } }, { "page_content": "Text: 2578316 Guha, T., Yang, Z., Ramakrishna, A., Grossman, R. B., Darren, H., Lee, S., \\& Narayanan, S. S. (2015). On quantifying facial expression-related atypicality of children with autism spectrum disorder. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2015, 803-807. https://doi.org/10.1109/icassp.2015.7178080 Guthrie, W., Wallis, K., Bennett, A., Brooks, E., Dudley, J., Gerdes, M., ... Miller, J. S. (2019). Accuracy of autism screening in a large pediatric network. Pediatrics, 144(4), e20183963. https://doi.org/10.1542/peds.2018-3963 Haines, N., Bell, Z., Crowell, S., Hahn, H., Kamara, D., McDonough-Caplan, H., ... Beauchaine, T. P. (2019). Using automated computer vision and machine learning to code facial expressions of affect and arousal: Implications for emotion dysregulation research. Development and Psychopathology, 31(3), 871-886. https://doi.org/10.1017/S09545794 19000312\n\n\nContext: Studies utilizing computer vision and machine learning to analyze facial expressions in children with autism spectrum disorder.", "metadata": { "doc_id": "carpenter2020 (1)_35", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Hashemi, J., Campbell, K., Carpenter, K., Harris, A., Qiu, Q., Tepper, M., ... Calderbank, R. (2015). A scalable app for measuring autism risk behaviors in young children: A technical validity and feasibility study. Paper presented at the Proceedings of the 5th EAI International Conference on Wireless Mobile Communication and Healthcare, Dublin, Ireland. Hashemi, J., Dawson, G., Carpenter, K. L. H., Campbell, K., Qiu, Q., Espinosa, S., ... Sapiro, G. (2018). Computer vision analysis for quantification of autism risk behaviors. IEEE Transactions on Affective Computing, 1-1. https://doi.org/ 10.1109/taffc.2018.2868196\n\nJones, E. J., Dawson, G., Kelly, J., Estes, A., \\& Webb, S. J. (2017). Parent-delivered early intervention in infants at risk for ASD: Effects on electrophysiological and habituation measures of social attention. Autism Research, 10(5), 961-972.\n\n\nContext: Technological tools and computer vision for autism risk behavior assessment.", "metadata": { "doc_id": "carpenter2020 (1)_36", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Jones, E. J., Venema, K., Earl, R., Lowy, R., Barnes, K., Estes, A., ... Webb, S. (2016). Reduced engagement with social stimuli in 6-month-old infants with later autism spectrum disorder: A longitudinal prospective study of infants at high familial risk. Journal of Neurodevelopmental Disorders, 8(1), 7. Khowaja, M., Robins, D. L., \\& Adamson, L. B. (2017). Utilizing two-tiered screening for early detection of autism spectrum disorder. Autism, 22, 881-890. https://doi.org/10.1177/ 1362361317712649 Lewinski, P. (2015). Automated facial coding software outperforms people in recognizing neutral faces as neutral from standardized datasets. Frontiers in Psychology, 6, 1386. https://doi.org/10.3389/fpsyg.2015.01386 LoBue, V., \\& Thrasher, C. (2014). The Child Affective Facial Expression (CAFE) set: Validity and reliability from untrained adults. Frontiers in Psychology, 5, 1532. https://doi.org/10. 3389/fpsyg.2014.01532 Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Jr., Leventhal, B. L., DiLavore, P. C., ... Rutter, M. (2000). The autism diagnostic observation schedule-generic: A standard measure of social and communication deficits associated with the spectrum of autism. Journal of Autism and Developmental Disorders, 30 (3), 205-223.\n\n\nContext: Early detection and assessment tools for autism spectrum disorder.", "metadata": { "doc_id": "carpenter2020 (1)_37", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Lord, C., Rutter, M., \\& Le Couteur, A. (1994). Autism Diagnostic Interview-Revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. Journal of Autism and Developmental Disorders, 24(5), 659-685. Luyster, R., Gotham, K., Guthrie, W., Coffing, M., Petrak, R., Pierce, K., ... Lord, C. (2009). The Autism Diagnostic Observation Schedule-toddler module: A new module of a standardized diagnostic measure for autism spectrum disorders. Journal of Autism and Developmental Disorders, 39(9), 1305-1320. https://doi.org/10.1007/s10803-009-0746-z Maestro, S., Muratori, F., Cavallaro, M. C., Pei, F., Stern, D., Golse, B., \\& Palacio-Espasa, F. (2002). Attentional skills during the first 6 months of age in autism spectrum disorder. Journal of the American Academy of Child and Adolescent Psychiatry, 41(10), 1239-1245. https://doi.org/10.1097/00004583-200210000-00014 Mcgee, G. G., Feldman, R. S., \\& Chernin, L. (1991). A comparison of emotional facial display by children with autism and typical preschoolers. Journal of Early Intervention, 15(3), 237-245. https://doi.org/10.1177/105381519101500303 Messinger, D. S., Mahoor, M. H., Chow, S. M., \\& Cohn, J. F. (2009). Automated measurement of facial expression in infant-mother interaction: A pilot study. Infancy, 14(3), 285-305. https://doi.org/10.1080/15250000902839963 Mullen, E. M. (1995). Mullen scales of early learning. Circle Pines, MN: American Guidance Service Inc. Murias, M., Major, S., Compton, S., Buttinger, J., Sun, J. M., Kurtzberg, J., \\& Dawson, G. (2018). Electrophysiological biomarkers predict clinical improvement in an open-label trial assessing efficacy of autologous umbilical cord blood for treatment of autism. Stem Cells Translational Medicine, 7(11), 783-791. https://doi.org/10.1002/sctm.18-0090 Myers, S. M., Johnson, C. P., \\& Council on Children With Disabilities. (2007). Management of children with autism spectrum disorders. Pediatrics, 120(5), 1162-1182. https://\n\n\nContext: Standardized diagnostic measures and early development in autism.", "metadata": { "doc_id": "carpenter2020 (1)_38", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Myers, S. M., Johnson, C. P., \\& Council on Children With Disabilities. (2007). Management of children with autism spectrum disorders. Pediatrics, 120(5), 1162-1182. https:// doi.org/10.1542/peds.2007-2362 Nichols, C. M., Ibanez, L. V., Foss-Feig, J. H., \\& Stone, W. L. (2014). Social smiling and its components in high-risk infant siblings without later ASD symptomatology. Journal of Autism and Developmental Disorders, 44(4), 894-902. https://doi.org/10.1007/s10803-013-1944-2 Osterling, J. A., Dawson, G., \\& Munson, J. A. (2002). Early recognition of 1-year-old infants with autism spectrum disorder versus mental retardation. Development and Psychopathology, 14(2), 239-251. Owada, K., Kojima, M., Yassin, W., Kuroda, M., Kawakubo, Y., Kuwabara, H., ... Yamasue, H. (2018). Computer-analyzed facial expression as a surrogate marker for autism spectrum social core symptoms. PLoS One, 13(1), e0190442. https:// doi.org/10.1371/journal.pone. 0190442 Rice, M. E., \\& Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC area, Cohen's $d$, and $r$. Law and Human Behavior, 29(5), 615-620. https://doi.org/10.1007/ s10979-005-6832-7 Robins, D. L., Casagrande, K., Barton, M., Chen, C. M., DumontMathieu, T., \\& Fein, D. (2014). Validation of the modified checklist for autism in toddlers, revised with follow-up (M-CHAT-R/F). Pediatrics, 133(1), 37-45. https://doi.org/10. 1542/peds.2013-1813 Samad, M. D., Diawara, N., Bobzien, J. L., Harrington, J. W., Witherow, M. A., \\& Iftekharuddin, K. M. (2018). A feasibility study of autism behavioral markers in spontaneous facial, visual, and hand movement response data. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(2), 353-361. https://doi.org/10.1109/tnsre.2017. 2768482 Snow, M. E., Hertzig, M. E., \\& Shapiro, T. (1987). Expression of emotion in young autistic children. Journal of the American Academy of Child and Adolescent Psychiatry, 26 (6), 836-838. https://doi.org/10.1097/00004583-19872606000006 Sullivan, M. W., \\& Lewis, M. (2003). Emotional\n\n\nContext: A list of references related to autism spectrum disorder research.", "metadata": { "doc_id": "carpenter2020 (1)_39", "source": "carpenter2020 (1)" } }, { "page_content": "Text: autistic children. Journal of the American Academy of Child and Adolescent Psychiatry, 26 (6), 836-838. https://doi.org/10.1097/00004583-19872606000006 Sullivan, M. W., \\& Lewis, M. (2003). Emotional expressions of young infants and children-A practitioner's primer. Infants and Young Children, 16(2), 120-142. https://doi.org/10. 1097/00001163-200304000-00005 Tantam, D., Holmes, D., \\& Cordess, C. (1993). Nonverbal expression in autism of Asperger type. Journal of Autism and Developmental Disorders, 23(1), 111-133. Tenenbaum, E. J., Carpenter, K. L. H., Sabatos-DeVito, M., Hashemi, J., Vermeer, S., Sapiro, G., \\& Dawson, G. (2020). A six-minute measure of vocalizations in toddlers with autism spectrum disorder. Autism Research, 13(8), 1373-1382. https://doi.org/10.1002/aur. 2293 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. https://doi.org/10.1111/j. 2517-6161.1996.tb02080.x Trevisan, D. A., Bowering, M., \\& Birmingham, E. (2016). Alexithymia, but not autism spectrum disorder, may be related to the production of emotional facial expressions. Molecular Autism, 7, 46. https://doi.org/10.1186/s13229-016-0108-6\n\n\nContext: A list of references cited in a document about autism and facial expressions.", "metadata": { "doc_id": "carpenter2020 (1)_40", "source": "carpenter2020 (1)" } }, { "page_content": "Text: Trevisan, D. A., Hoskyn, M., \\& Birmingham, E. (2018). Facial expression production in autism: A meta-analysis. Autism Research, 11(12), 1586-1601. https://doi.org/10.1002/aur. 2037 Waizbard-Bartov, E., Ferrer, E., Young, G. S., Heath, B., Rogers, S., Wu Nordahl, C., ... Amaral, D. G. (2020). Trajectories of autism symptom severity change during early childhood. Journal of Autism and Developmental Disorders. https://doi. org/10.1007/s10803-020-04526-z Werner, E., Dawson, G., Osterling, J., \\& Dinno, N. (2000). Brief report: Recognition of autism spectrum disorder before one year of age: A retrospective study based on home videotapes.\n\nJournal of Autism and Developmental Disorders, 30(2), $157-162$. Yin, L., Wei, X., Sun, Y., Wang, J., \\& Rosato, M. J. (2006). A 3D facial expression database for facial behavior research. Paper presented at the 7th International Conference on automatic face and gesture recognition (FGR06), University of Southampton, Southampton, UK. Yirmiya, N., Kasari, C., Sigman, M., \\& Mundy, P. (1989). Facial expressions of affect in autistic, mentally retarded and normal children. Journal of Child Psychology and Psychiatry, 30(5), $725-735$.\n\n\nContext: Studies examining facial expressions and behavior in autism, including databases and retrospective analyses.", "metadata": { "doc_id": "carpenter2020 (1)_41", "source": "carpenter2020 (1)" } }, { "page_content": "Text: PLOS ONE\n\nRESEARCH ARTICLE\n\nEarly screening of autism spectrum disorder using cry features\n\nAida Khozaei ${ }^{1}$, Hadi Moradi ${ }^{1,2 *}$, Reshad Hosseini ${ }^{1}$, Hamidreza Pouretemad ${ }^{3}$, Bahareh Eskandari ${ }^{3}$\n\n1 School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran, 2 Intelligent Systems Research Institute, SKKU, Suwon, South Korea, 3 Department of Psychology, Shahid Beheshti University, Tehran, Iran\n\nmoradih@ut.ac.ir\n\nAbstract\n\n\nContext: A research paper investigating the potential of using cry features for early screening of autism spectrum disorder.", "metadata": { "doc_id": "Asd_Cry_patterns_0", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: moradih@ut.ac.ir\n\nAbstract\n\nThe increase in the number of children with autism and the importance of early autism intervention has prompted researchers to perform automatic and early autism screening. Consequently, in the present paper, a cry-based screening approach for children with Autism Spectrum Disorder (ASD) is introduced which would provide both early and automatic screening. During the study, we realized that ASD specific features are not necessarily observable in all children with ASD and in all instances collected from each child. Therefore, we proposed a new classification approach to be able to determine such features and their corresponding instances. To test the proposed approach a set of data relating to children between 18 to 53 months which had been recorded using high-quality voice recording devices and typical smartphones at various locations such as homes and daycares was studied. Then, after preprocessing, the approach was used to train a classifier, using data for 10 boys with ASD and 10 Typically Developed (TD) boys. The trained classifier was tested on the data of 14 boys and 7 girls with ASD and 14 TD boys and 7 TD girls. The sensitivity, specificity, and precision of the proposed approach for boys were $85.71 \\%, 100 \\%$, and $92.85 \\%$, respectively. These measures were $71.42 \\%, 100 \\%$, and $85.71 \\%$ for girls, respectively. It was shown that the proposed approach outperforms the common classification methods. Furthermore, it demonstrated better results than the studies which used voice features for screening ASD. To pilot the practicality of the proposed approach for early autism screening, the trained classifier was tested on 57 participants between 10 to 18 months. These 57 participants consisted of 28 boys and 29 girls and the results were very encouraging for the use of the approach in early ASD screening.\n\nIntroduction\n\n\nContext: A study introducing and testing a cry-based approach for early and automatic Autism Spectrum Disorder (ASD) screening, including performance evaluation and a pilot study on infants.", "metadata": { "doc_id": "Asd_Cry_patterns_1", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Introduction\n\nChildren with Autism Spectrum Disorder (ASD) are defined by their abnormal or impaired development in social interaction and communication, as well as restricted and repetitive behaviors, interests, or activities [1]. The rapid growth of ASD in the past 20 years has inspired many research efforts toward the diagnosis and rehabilitation of ASD [2-5]. In the field of\n\n0622770.v1 Harvard Dataverse (Contains only a rar file of sounds): 10.7910/DVN/LSTBOW\n\nFunding: HM received a small fund for collecting data and for diagnosing the subjects. Grant number 123 Cognitive Sciences and Technology Council of Iran cogc.ir The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.\n\n\nContext: A research paper investigating ASD diagnosis and rehabilitation, specifically mentioning the increasing prevalence of ASD and introducing the study's focus.", "metadata": { "doc_id": "Asd_Cry_patterns_2", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Competing interests: The authors have declared that no competing interests exist. diagnosis, there are several well-established manual methods to diagnose children over 18 months [6]. However, the practical average age of diagnosis is over 3 years due to the lack of knowledge about ASD and the lack of expertise for diagnosing autism [7, 8]. It is of the utmost importance to have early diagnosis/screening in order to provide early intervention which is more effective at the first few years of life than later on [7, 9-11]. It is shown that early intervention improves the developmental performance in children with ASD [12]. It has also been reported that early interventions would be cost saving for families and the treatment service systems [13, 14]. Consequently, there are two main questions: 1) can autism be screened earlier than 18 months to reduce the typical diagnosis or intervention age and 2 ) is it possible to employ intelligent methods for the screening of autism to eliminate the widespread need for experts? It should be mentioned that our goal was to answer these questions with respect to screening all children who may not have clear symptoms. The screened children should go through a diagnosis procedure to acquire confirmation and/or be cautiously worked with.\n\n\nContext: The importance of early autism diagnosis and the need for improved screening methods to address delays in diagnosis and reliance on expert assessments.", "metadata": { "doc_id": "Asd_Cry_patterns_3", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Fortunately, there are studies in the literature showing that the age of diagnosis can be lower than 18 months. For example, Thabtah and Peebles [15] reviewed several questionnairebased approaches that may be able to screen ASD above 6 months of age. However, those approaches, like Autism Diagnostic Interview-Revised (ADI-R) [16] and Autism Diagnostic Observation Schedule (ADOS) [17] which have been clinically proven to be effective and adequate, are time-consuming instruments [15] and need trained practitioners to use them. To reduce the dependency on the human expertise needed in using such questionnaires [8], several studies proposed machine learning methods to classify children with ASD [18, 19] using questionnaires. Their goal was to automate the process and/or find an optimum subset of questions or features. For instance, Abbas et al. [20] proposed a multi-modular assessment system combined of three modules, a parent questionnaire, a clinician questionnaire, and a video assessment module. Although the authors used machine learning to automate and improve classification process, the need for human involvement still exists in order to answer questions or assess videos.\n\n\nContext: The challenges of early autism spectrum disorder (ASD) diagnosis and the limitations of existing assessment tools.", "metadata": { "doc_id": "Asd_Cry_patterns_4", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: On the other hand, Emerson et. al showed that fMRI [21] can be used to predict the diagnosis of autism at the age of 2 in high-risk 6-month-old infants. Denisova and Zhao [22] used movement data from rs-fMRI from 1-2 month-old infants to predict future atypical developmental trajectories as biological features. Furthermore, Bosl, Tager-Flusberg, and Nelson [23] suggested that useful biomarkers can be extracted from EEG signals for early detection of autism. Blood-based markers [24, 25] and prenatal immune markers [26] were also proposed to diagnose ASD that can be used right after birth. Although these approaches suggest new directions towards early ASD diagnosis/screening, they are costly, require expertness and dedicated equipment, which would limit their usage. Furthermore, these methods are still in the early stages of research and require further approval. Finally, approaches which involve methods such as fMRI or EEG, are difficult to use on children, especially on children with autism who may have trouble following instructions appropriately [27], have atypical behaviors [28], or have excessive head movements [29, 30].\n\nThere are studies that used vocalization-based analysis to screen children with autism. For instance, Brisson et al. [31] showed differences in voice features between children with ASD and Typically Developing (TD) children. Several studies, like [32], used speech-related features for the screening of children older than 2. To reach the goal of early ASD screening, vocalizations of infants under 2 years of age have been investigated [33-35]. Santos et al. [33] used vocalizations, such as babbling, to screen ASD children at the age of 18 months. They collected data from 23 and 20 ASD and TD children, respectively. They reported high accuracy of around $97 \\%$ which can be due to the fact that they used k-fold cross-validation without considering subject-wise hold out in order to have unseen subjects in the test fold [36]. Oller et al.\n\n\nContext: Early detection methods for Autism Spectrum Disorder (ASD), contrasting advanced techniques like fMRI and EEG with vocalization-based analysis.", "metadata": { "doc_id": "Asd_Cry_patterns_5", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: [34] proposed another vocalization-based classification method in which they included age and excluded crying. They applied the method on 106 TD children and 77 children with ASD between 16 to 48 months and reached $86 \\%$ accuracy. Pokorny et al. [35] extracted eGeMAPS parameter set [37], which includes 88 acoustic parameters, in 10 month old children. This set consists of statistics calculated for 25 frequency-related, energy-related, and spectral low-level descriptors. They reached a $75 \\%$ accuracy on a population of 10 TD children and 10 children with ASD.\n\nEsposito, Hiroi, and Scattoni [38] showed that crying is a promising biomarker for the screening of ASD children. Sheinkopf et al. [39] and Orlandi et al. [40] have shown that there are differences in the cry of children with ASD compared to TD children. To the best of our knowledge, our own group's preliminary study [41] was the only research that has used cry sounds for the screening of children with ASD. We used a dataset of 5 children with ASD and 4 TD children older than two years. The accuracy of the proposed method is $96.17 \\%$ using kfold cross validation without considering subject-wise hold out, which is a shortcoming of this study. In other words, it has been overfitted to the available data and may fail to correctly classify new samples. So, a thorough examination using an unseen test set on cry features is necessary to evaluate the results. It should be noted that the data from our previous study [41] could not be used in the study presented in this paper due to the differences in data collection procedures.\n\n\nContext: A review of existing research on using cry sounds for autism screening and diagnosis.", "metadata": { "doc_id": "Asd_Cry_patterns_6", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: In all the above studies, it was assumed that the specific sound features, distinguishing children with ASD from TD children, are common among all the ASD cases. However, this may not be the case for all the features. For instance, tiptoe walking, which is one of the repetitive behaviors of children with ASD, appears in approximately $25 \\%$ of these children [42]. Consequently, in the current study, we propose a new cry-based approach for screening children with ASD. Our screening approach makes use of the assumption that all discriminative characteristics of autism may not appear in all ASD children. This assumption is in contrast with the assumption put forward in the ordinary instance-based machine learning methods, which assumes that all instances of a class include all discriminative features needed for classification. In our proposed method, at first, discriminable instances of cries, which exist in subsets of children with ASD, are found. Then it uses these instances to select features to distinguish between these ASD instances from TD instances. It should be mentioned that the final selected features, in this study, are common among our set of children with ASD between 18 to 53 months of age. These selected features support the experiential knowledge of our experts stating that the variations in the cries of children with ASD are more than TD children. This approach is different from the other approaches that either used a dataset of children with a specific age [33, 35] or used age information for classification [34]. The proposed approach has been implemented and tested on 62 participants. The results show the effectiveness of the approach with respect to accuracy, sensitivity, and specificity.\n\nMethod\n\nSince this study was performed on human subjects, first, it was approved by the ethics committee at Shahid Beheshti University of Medical Sciences and Health Services. All the parents of the participants were informed about the study and signed an agreement form before being included in the study.\n\n\nContext: A study proposing a new cry-based approach for screening children with Autism Spectrum Disorder (ASD), contrasting it with existing methods and detailing its implementation and testing.", "metadata": { "doc_id": "Asd_Cry_patterns_7", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Participants\n\nThere were 62 participants aged between 18 and 53 months, who were divided into two groups, i.e. 31 ASD and 31 TD with 24 boys and 7 girls in each group. Since we expected to\n\nhave different vocalization characteristics for boys and girls, the training set was assembled of only boys, including 10 TD , and 10 ASD. In other words, we wanted to eliminate the gender effects on the feature extraction and model training. Unfortunately, due to the lower number of girls with ASD in the real world, not enough data for girls with ASD could be collected. Nonetheless, the model was also tested on the girls to see how it would generalize even on them.\n\n\nContext: Study methodology and participant demographics.", "metadata": { "doc_id": "Asd_Cry_patterns_8", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: The inclusion criteria of the ASD participants were: a) being very recently diagnosed with ASD based on DSM-5 with no other neurodevelopmental, mental, and intellectual disorder, b) having no other known medical or genetic conditions, or environmental factors, and c) not having received any treatment or medication, or having received treatment in less than a month. There were only two girls who did not fall into these criteria since they had been diagnosed more than a year before. The participants' average language development at the time of participation, which was assessed based on [43-46], was equal to children between 6 to 12 months old. The autism diagnosis procedure started with the Gilliam Autism Rating Scale-Second Edition (GARS-2) questionnaire [47] which was answered by the parents. Then the parents were interviewed, based on DSM-5, while the participants were evaluated and observed by two child clinical psychologists with Ph.D. degrees. In addition, the diagnosis of ASD was separately confirmed by at least one child psychiatrist in a different setting. It should be noted that ADOS, which is a very common diagnostic tool is not administered widely in Iran since there is no official translation of ADOS in Farsi. TD children were selected from those in an age range similar to the ASD participants from volunteer families from their homes and health centers. They had no evidence or official diagnosis of any neurological or psychological disorder at the time of recording their voices. The children with ASD were older than 20 months with the mean, standard deviation, and range of $35.6,8.8$, and 33 months respectively. The TD children were younger than 51 months with the mean, standard deviation, and range of about $30.8,10.3$, and 33 months respectively. It should be mentioned that the diagnosis of the children under 3 years was mainly based on experts' evaluation, not the GARS score. Furthermore, all TD participants under 3 years of age had a follow up study when they passed the age of 3 , to make sure the\n\n\nContext: Methods section describing participant selection and diagnostic procedures for an autism study.", "metadata": { "doc_id": "Asd_Cry_patterns_9", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: under 3 years was mainly based on experts' evaluation, not the GARS score. Furthermore, all TD participants under 3 years of age had a follow up study when they passed the age of 3 , to make sure the initial TD assignment was correct or still valid. To do so, we used a set of expert-selected questions based on [48] to assess them through interviews with parents.\n\n\nContext: Methodology: Participant selection and assessment of autism spectrum disorder.", "metadata": { "doc_id": "Asd_Cry_patterns_10", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Tables 1 and 2 show the details of the participants on the training and test sets, respectively. In each table, the number of voice instances from each participant and the total duration of all its instances in seconds are shown in columns 3 and 4 , respectively. The recording device category, i.e. a high-quality recorder (HQR) and typical cell phones (CP), is given in the device category column. The next two columns include GARS-2 scores and the language developmental milestone of the participants with ASD at the time of the recording. In six cases, there were no GARS score available at the time of study, demonstrated by ND (No Data). The column labeled as 'Place' shows the location of the recording which can be in homes (H), autism centers (C1, C2, and C3), and health centers (C4, C5, and C6). There was a total number of 359 samples for all children. $53.44 \\%$ of the samples were from ASD participants and $46.56 \\%$ were from TD participants.\n\nTwo groups of 10 TD and 10 ASD children were selected for training the classifiers such that two groups were as balanced as possible with respect to age and the recording device. Thus, each child in the TD group had a corresponding child in the ASD group around the same age. As a result of this data balancing, we obtained training participants with an age between 20 and 51 months. The mean ages in the training set were 32.7 and 35.2 months for ASD and TD participants, respectively. The standard deviations are 9 and 9.9 months with the range of 25 and 30 months for ASD and TD participants, respectively.\n\nTable 1. The training set data of participants.\n\n\nContext: A study analyzing voice samples from autistic and typically developing children to train classifiers.", "metadata": { "doc_id": "Asd_Cry_patterns_11", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Table 1. The training set data of participants.\n\nID Age (month) # of instances Total duration(sec) Device GARS score Language milestone (month) Place Reason for crying ASD1 20 9 7.8 CP 104 $0-6$ C1 ASD2 24 3 1.5 HQR 83 $0-6$ C2 ASD3 26 5 2.1 HQR 120 $0-6$ C1 ASD4 28 13 9.1 HQR 121 $0-6$ C2 ASD5 29 14 26 HQR 89 6-12 C2 ASD6 31 4 2.4 HQR 87 $0-6$ C2 ASD7 36 11 11 HQR 87 6-12 C2 ASD8 43 2 0.7 CP ND ND C2 ASD9 45 3 2.6 CP 72 6-12 C2 ASD10 45 4 3.4 CP ND ND H TD1 21 11 14 HQR NA NA H TD2 24 12 12 HQR NA NA C4 TD3 26 2 2.3 HQR NA NA C5 TD4 28 6 13 CP NA NA C5 TD5 36 3 2.6 CP NA NA H TD6 38 3 1.5 HQR NA NA C6 TD7 41 3 2.4 HQR NA NA H TD8 43 3 2.2 CP NA NA H TD9 44 2 1.2 CP NA NA H TD10 51 2 1.7 CP NA NA H\n\nhttps://doi.org/10.1371/journal.pone.0241690.t001\n\nAlthough this approach was trained and tested on children older than 18 months, we tested the proposed approach on 57 participants between 10 to 18 months to investigate how it works on children under 18 months. These 57 participants consisted of 28 boys and 29 girls with the mean age of 15.2 for both and standard deviations of 2.8 and 2.9 respectively. All these participants were evaluated at a later date at the age of 3 or older, by the same follow-up procedure, using our expertselected questionnaire. At the time of initial voice collection, 55 of these participants had no evident or diagnosed disorder. Two of them were referred to our experts due to the positive results of screening using our method. The diagnosis or concerns about the two mentioned participants, as well as the participants with any evidence of having abnormality in the developmental milestones during the follow-up procedure are summarized in Table 3. The summary of disorders given in the last column of Table 3 is based on the parental interviews and our experts' evaluation. Unfortunately, Child5, Child6, and Child7's parents did not cooperate in obtaining expert evaluation.\n\nData collection and preprocessing\n\n\nContext: A description of the training dataset used for a voice analysis approach, including participant demographics and characteristics.", "metadata": { "doc_id": "Asd_Cry_patterns_12", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Data collection and preprocessing\n\nAs mentioned earlier, the data was recorded using high-quality devices and typical smartphones. The high-quality devices were a UX560 Sony voice recorder and a Sony UX512F voice recorder. To use typical smartphones, a voice-recording and archiving application was developed and used on various types of smartphones. All voices, through the application or the high-quality recorders, were recorded in wav format, 16 bits, and with the sampling rate of 44.1 kHz . The reason for using various devices was to avoid biasing of the approach to a specific device. Similarly, the place of recording was not restricted to one place in order to make the results applicable to all places.\n\nThe parents and trained voice collectors were asked to record the voices in a quiet environment. Furthermore, they were asked to keep the recorders or smartphones about 25 cm from the participants' mouth. Despite the proposed two recommendations, there were recorded\n\nTable 2. The test set information.\n\n\nContext: Data collection methods and audio recording parameters.", "metadata": { "doc_id": "Asd_Cry_patterns_13", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: ID Age (month) # of instances Total duration(S) Device GARS score Language milestone (months) Place Reason for crying ASD11 28 12 7.2 HQR 102 $0-6$ C2 Unwilling/ Uncomfortable ASD12 30 18 17.1 HQR ND ND C3 Separation from mother ASD13 30 3 2.9 CP ND ND H Unwilling/Sleepy ASD14 31 5 2.3 HQR 73 $0-6$ C2/H Separation from mother/Hungriness ASD15 33 3 2.5 HQR 91 $0-6$ C2 Unwilling ASD16 33 2 2.5 HQR 104 $0-6$ C1 Annoyed/Uncomfortable ASD17 34 1 0.6 HQR 91 $0-6$ C2 Unwilling/Complaining ASD18 35 2 1.7 HQR 81 ND C1 Annoyed/Uncomfortable ASD19 37 1 0.6 HQR 94 $12-18$ C2 Unwilling/Complaining ASD20 40 19 14 HQR 91 $0-6$ C1 Annoyed ASD21 45 1 0.3 HQR 81 $6-12$ C2 Unwilling/Complaining ASD22 48 2 1.6 HQR 100 $6-12$ C2 Annoyed/Complaining ASD23 52 6 3.1 HQR 113 $12-18$ C2 Unwilling/Complaining ASD24 53 7 5.2 HQR 78 $6-12$ C1 Annoyed/Uncomfortable ASD25 25 12 14 HQR 85 $0-6$ C2 Unwilling/Complaining ASD26 26 5 2 CP 102 $0-6$ C1 Scared ASD27 31 3 1.7 HQR 94 $0-6$ C2 Unwilling/Complaining ASD28 32 2 1.3 HQR 100 $0-6$ C2 Unwilling/Complaining ASD29 41 8 3 HQR 102 $0-6$ C2 Unwilling/Complaining ASD30 45 2 1.2 CP ND ND H Thirsty ASD31 49 7 12 CP ND ND H Unwilling/Complaining TD11 18 4 2 HQR NA NA C4 Scared TD12 18 7 5.1 HQR NA NA C4 Scared/Unwilling TD13 19 7 4.2 HQR NA NA C5 Unwilling TD14 20 9 8 HQR NA NA C5 Unwilling/Complaining TD15 21 4 1.2 HQR NA NA H Complaining TD16 24 3 2.7 HQR NA NA C5 Scared /Unwilling TD17 24 2 1.5 HQR NA NA C5 Scared/Unwilling TD18 24 6 5.1 HQR NA NA C4 Unwilling/Complaining TD19 24 4 2.4 HQR NA NA C5 Unwilling/Complaining TD20 24 5 4.2 HQR NA NA C5 Unwilling/Complaining TD21 29 11 10 HQR NA NA H Unwilling/Complaining TD22 30 4 2 HQR NA NA C5 Scared/Unwilling TD23 30 4 2 CP NA NA H Unwilling TD24 43 12 11 HQR NA NA H Complaining TD25 24 5 6 HQR NA NA C4 Unwilling/Complaining TD26 25 2 4.4 HQR NA NA C5 Scared TD27 29 5 5 HQR NA NA C5 Scared TD28 33 2 2.1 CP NA NA H Complaining TD29 45 16 11 HQR NA NA H Unwilling/Complaining TD30 50 6 7 HQR NA NA H Complaining TD31 51 2 0.7 CP NA NA H Unwilling\n\n\nContext: A dataset of infant cry characteristics and associated demographic information.", "metadata": { "doc_id": "Asd_Cry_patterns_14", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: https://doi.org/10.1371/journal.pone.0241690.t002 voices where the recommendations were not followed and did not have the required quality. Consequently, those recordings were eliminated from the study. Also, all the cry sounds which were due to pain, had been removed from the study since they were similar between the TD and ASD groups.\n\nTable 3. The participants with an abnormality in the follow-up.\n\nID Gender Age (in months) Disorder at recording time at following-up time Child1 M 11 11 Developmental delay ${ }^{a}$, signs of genetic diseases Child2 M 17 17 UNDD $^{\\text {b }}$ Child3 M 12 40 ASD $^{\\text {b }}$ Child4 M 12 36 Sensory processing disorder ${ }^{\\text {c }}$, several ADHD symptoms ${ }^{\\text {b }}$ Child5 M 18 40 Language delay Child6 M 15 46 Developmental delay symptoms Child7 M 12 43 Developmental delay symptoms\n\nUNDD, Unspecified Neurodevelopmental Disorder. ${ }^{a}$ Clinical observation by our expert based on [48]. ${ }^{\\mathrm{b}}$ Clinical observation by our expert based on [1] ${ }^{\\text {c }}$ Clinical observation by our expert based on [49]. https://doi.org/10.1371/journal.pone.0241690.t003\n\n\nContext: A study analyzing cry sounds of children with and without autism spectrum disorder, detailing participant characteristics and follow-up observations.", "metadata": { "doc_id": "Asd_Cry_patterns_15", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: After data collection, there was a preprocessing phase in which only pure crying parts of the recordings, with no other types of vocalization, were selected. To explain more, the parts of cry sounds which were accompanied by screaming, saying words/other vocalizations, or that occurred with closed/non-empty mouth were eliminated. All segmentations and eliminations were done manually using Sound Forge Pro 11.0. From the selected cries, the beginning and the end, which contained voice rises and fades, were removed in order to just keep the steady parts of the cries; this prevents having too much variation in the voice which can lead to unsuitable statistics. Also, the uvular/guttural parts of the cries were removed. The reason for this was that we believe these parts distort the feature values of the steady parts of a voice. Each remaining continuous segment of the cries was considered and used as a sample (instance) in this study. Finally, since the basic voice features were extracted from 20 milliseconds frames [50], to generate statistical features of the basic features, the minimum length of the cry segments were set to 15 frames, i.e. 300 milliseconds. Thus, any cry samples below 300 milliseconds were eliminated from the study. In this study, the final prepared samples were between 320 milliseconds to 3 seconds.\n\nFeature extraction\n\n\nContext: Data preprocessing and preparation for feature extraction from infant cries.", "metadata": { "doc_id": "Asd_Cry_patterns_16", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Feature extraction\n\nPrevious studies working on voice features for discriminating ASD children used different sets of features. These methods share several common features like F0, i.e. the fundamental frequency of a voice, and Mel-Frequency Cepstral Coefficients (MFCC), i.e. coefficients which represent the short-term power spectrum of a sound [51]. F0 has been one of the most common features used [31, 32, 39]. However, since age is an important factor affecting F0 [52], this feature is useful when participants have a similar age. On the other hand, MFCC coefficients and several related statistical values have been reported to be useful features in several studies [35, 41, 53]. Considering the useful features reported in previous studies and the specifications of the current study, several features were selected to be used in this work that are explained in the following.\n\nIn this study, each instance was divided into 20 milliseconds frames, to extract basic voice features. We used several features proposed by Motlagh, Moradi, and Pouretemad [41] and by Belalcázar-Bolaños et al. [54]. The features used by Motlagh, Moradi, and Pouretemad [41] include certain statistics like mean and covariance of the frame-wise basic features, such as MFCC coefficients, over a voice segment. They also used the mean and variance of frame-wise temporal derivative $[55,56]$ of the basic features. The frame-wise temporal derivative means\n\nthe difference between two consecutive frames, which in a sense is the rate of change of a feature value in one frame step. We modified the spectral flatness features by including the range of $125-250 \\mathrm{~Hz}$ beside the $250-500 \\mathrm{~Hz}$ range. This range was added to cover a wider frequency range than the normal children frequency range, which showed to be necessary in the process of feature extraction and selection. Each range is divided into 4 octaves and the spectral flatness is computed for those octaves.\n\n\nContext: A research paper describing a study using voice analysis to identify autism spectrum disorder (ASD), detailing the methods for extracting acoustic features from voice recordings.", "metadata": { "doc_id": "Asd_Cry_patterns_17", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: We removed all uninformative and noisy features of the set which are explained in the following. The mean of frame-wise temporal derivative of the basic features is removed because it is not a meaningful feature and is equal to taking the difference between the value of the last and the first frames. There are means of the features related to the energy, such as the audio power, total loudness, SONE, and the first coefficient of MFCC, that were removed to make the classifier independent of the loudness/power in children's voices. Zero crossing rate (ZCR) was omitted too, due to its dependency on the noise in the environment.\n\nThe second set of features used in this study was from Belalcázar-Bolaños et al. [54] because it has phonation features, like jitter and shimmer. Jitter and shimmer, which have been reported to be discriminative for ASD, are linked to perceptions of breathiness, hoarseness, and roughness [57]. Other features used from Belalcázar-Bolaños et al. [54] include glottal features related to vocal quality and the closing velocity of the vocal folds [33]. The mean of logarithmic energy feature was omitted for the same reason as other energy-related features. A summary of the features, added to or removed from the sets by [41] and [54], is presented in Table 4.\n\nThe proposed subset instance classifier\n\n\nContext: Feature selection and refinement process for autism detection using vocal analysis.", "metadata": { "doc_id": "Asd_Cry_patterns_18", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: The proposed subset instance classifier\n\nTo explain the proposed classifier, it was assumed that there is a target group of participants that we want to distinguish from the rest of the participants, called the rest. Furthermore, each participant in the target group may have several instances that may be used to distinguish the target group from the rest. Fig 1A shows a situation in which all instances of all participants of the target group are differentiable using common classifiers that we call Whole Set Instance (WSI) classifiers. In this figure, the circles represent our target group and the triangles represent the rest. The color coding is used to differentiate between the instances of each participant in each group. In contrast to the situation in Fig 1A, in Fig 1B the target group cannot easily be distinguished from the rest. In such a situation, there are instances of two participants in the target group, i.e. the red and brown circles that are not easily separable from the instances in the rest (Case 1). Furthermore, there is a participant with no instances, i.e. the orange circles, easily\n\nTable 4. The features and statistics which were added or removed to the two feature sets.\n\nFeature removing/adding Reason Second set logarithmic energy Mean statistic is removed Classification dependency on loudness/power of cries First set Audio power Total loudness SONE First MFCC coefficient ZCR The basic feature is removed The feature's dependency on environmental noise All basic features applicable mean of frame-wise temporal derivative of the basic features is removed No meaning for the feature MFCC Coefficients of 14-24 are added Having higher-order coefficients for vocal cords information as well as vocal tract Spectral flatness A range of $125-250 \\mathrm{~Hz}$ is added Covering the low-frequency range of human voice\n\nhttps://doi.org/10.1371/journal.pone.0241690.t004\n\n\nContext: A description of a proposed subset instance classifier and a table detailing feature modifications for improved classification of cries.", "metadata": { "doc_id": "Asd_Cry_patterns_19", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: https://doi.org/10.1371/journal.pone.0241690.t004\n\nseparable from the rest (Case 2). An example of Case 1 is tiptoe walking in children with ASD, which is common in about $25 \\%$ of these children [42] who do it most of the time. An example of Case 2 is children with ASD who do not tiptoe walk. In other words, there are children with ASD who cannot be distinguished from TD children using the tiptoe walking behavior factor.\n\nApplying any WSI classifier may fail for the data type shown in Fig 1B. Consequently, we proposed SubSet Instance (SSI) classifier that first finds differentiable instances and then trains a classifier on these instances. As an example, the proposed SSI classifier first tries to find the circles on the left of the line in Fig 1B, using a clustering method. Then, it uses these circles, as exclusive instances having a specific feature common in a subset of the target group, to train a classifier separating a subset of the target group.\n\nThe steps of common WSI classifiers are shown in Fig 2A. The steps of our proposed SSI classifier are shown in Fig 2B. In the SSI classification approach, after the feature extraction and clustering steps, for each cluster, a classifier is trained to separate its exclusive instances from the instances of the rest of the participants. In the testing phase, any participant with only one instance classified in the target group (positive instance), is classified as a target group's participant. The pseudo-code for the proposed approach is given in Algorithms 1 and 2.\n\nAlgorithm 1. Training SSI classifiers\n\n\nContext: A proposed SubSet Instance (SSI) classifier is introduced to address challenges in identifying individuals with Autism Spectrum Disorder (ASD) based on potentially non-differentiating features, illustrated with the example of toe-walking.", "metadata": { "doc_id": "Asd_Cry_patterns_20", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Algorithm 1. Training SSI classifiers\n\nT: set of all target group instances R: set of all the rest instances F: set of all classifiers \\rho: threshold for the number of samples in a cluster s: the number of minimum samples needed in a cluster to be able to train a classifier for it C n: number of clusters F}=\\emptyset 1: While }\\exists j|Cj|>\\rho\\mathrm{; while there is a cluster bigger than a threshold or n = 1 2: n=n+1; increase the number of clusters 3: Cluster the T + R into n clusters Cj, j = 1,\\ldots,n 4: EC = {C C T); the set of clusters of only exclusive instances, i.e. exclusive clusters\n\nimg-0.jpeg\n\nFig 1. Two different hypothetical types of two-dimensional data of the target group and the rest. The instances shown by the warm-colored circles and the cool-colored triangles are for the target group and the rest, respectively. All instances belonging to a participant have the same color. In (a), all the target group participants' instances are distinguishable using a classifier. In (b), only some instances of the target group participants are separable from the other instances by a classifier.\n\nimg-1.jpeg\n\nFig 2. An overall view of WSI and SSI methods. (a) In WSI method, after feature extraction, a classifier is trained on all instances and majority pooling (MP) is usually used in the testing phase. In this study Best-chance threshold Pooling (BP), which is a threshold-based pooling with the threshold giving the best accuracy on the test set, is also used to give the best chance to WSI classifier. (b) In the proposed SSI classifier, after feature extraction, clustering is applied to find and select exclusive instances containing instances of the target group participants only. Then classifiers are trained using exclusive instances, and a participant is classified in the target group in the testing phase if any classifier detects a positive instance for it. https://doi.org/10.1371/journal.pone.0241690.g002\n\n\nContext: A paper describing a novel method (SSI) for classifying individuals within a target group, contrasting it with a traditional method (WSI) and illustrating the approach with figures and an algorithm.", "metadata": { "doc_id": "Asd_Cry_patterns_21", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: 5: If EC \\(\\neq \\emptyset\\); check if there is any exclusive cluster 6: For all \\(C_{j}\\) in EC with \\((C j) \\geq s\\) 7: Train a classifier using positive labels \\(c \\in C_{j}\\) and negative labels \\(r \\in R\\) 8: Add the classifier to \\(F\\) 9: \\(\\quad T=T-\\sum_{c_{j} \\in R C} C_{j} ;\\) remove the instances of the exclusive clusters from target group instances 10: \\(\\quad n=1\\); set 1 to re-start clustering in two groups on the remain- ing instances\n\nAlgorithm 2. Testing SSI classifiers\n\nF: set of trained classifiers A: set of subject instances 1: For all instances a of \\(A\\) 2: \\(\\quad P=\\{a \\in A \\mid \\exists f\\), classifies a as positive instance\\} 3: If \\(P \\neq \\emptyset\\) 4: The participant is from the target group 5: Else 6: The participant is from the rest\n\nIn the proposed training algorithm of the SSI approach, the goal is to find clusters containing the ASD instances only. Then a classifier is trained using the instances of these clusters and\n\nadded to a list of all trained classifiers (lines 7 and 8 of Algorithm 1). As shown in the loop of the algorithm, starting at line 1 , the data is clustered starting with two clusters. Then the number of clusters is increased until a cluster, containing only the target group instances, emerges. The exclusive instances in such a cluster are removed from the set of all target group's instances, and the loop is restarted. Before restarting the loop, if the number of instances in this cluster is more than a threshold, a new classifier using these instances is trained and this classifier is added to the set of all trained classifiers. The loop stops when the number of samples in each cluster is less than a threshold.\n\n\nContext: A description of an algorithm for identifying and classifying individuals with Autism Spectrum Disorder (ASD) using machine learning techniques.", "metadata": { "doc_id": "Asd_Cry_patterns_22", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: For testing the participants, using the trained classifiers, all the instances of each participant are classified one by one using all the trained classifiers (line 2 of Algorithm 2). A subject would be classified in the target group if at least one of its instances is classified in the target group at least by one of the classifiers (lines 3 and 4 of Algorithm 2). Otherwise, if there is no instance classified among the target group, the participant is classified as the rest (lines 5 and 6).\n\nDetails of the implementations\n\nThe classifiers were implemented in Python using scikit-learn library. WSI classifiers. We have tested several common WSI classifiers, but we report only the result of SVM with RBF kernel and with no feature selection, which gives the best average accuracy. It should be noted that several feature selection approaches, like L1-SVM and backward elimination, were tested but they only reduced the accuracy. We used group 5-fold crossvalidation for tuning hyper-parameters. Group K-fold means that all instances of each participant are placed in only one of the folds. This prevents having the same participant's instances in the train and validation folds simultaneously. In each fold, there were two ASD and two TD participants. It should be mentioned that before applying the algorithms, we balanced the number of instances of the two groups using upsampling.\n\nTwo approaches were exploited to combine the decisions on different samples of a participant in the WSI approach. The first approach was majority pooling which classifies a participant as ASD, if the number of instances classified as ASD are more than 50 percent of all instances. The second approach was threshold-based pooling which is similar to the first approach except that a threshold other than 50 is used.\n\n\nContext: Evaluation of a machine learning approach for autism detection using vocal samples.", "metadata": { "doc_id": "Asd_Cry_patterns_23", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: SSI classifiers. Before applying the algorithm, we balanced the number of instances of the two groups by upsampling. The threshold for the minimum number of samples, needed in a cluster, to be able to train a classifier is set to 10 . It should be mentioned that agglomerative clustering and decision tree are the methods used for clustering and classification parts of Algorithm 1, respectively.\n\nTraining the SSI classifiers. After running Algorithm 1 on our data, two exclusive clusters with enough instances, i.e. at least 10 instances in our study, were found. Then two classifiers were trained corresponding to each cluster. One of these exclusive clusters had 11 instances from 4 ASD participants (Table 1). These 11 instances consisted of 6 out of 9 instances of ASD1, 2 out of 4 instances of ASD10, 1 out of 2 instances of ASD8, and 2 out of 4 instances of ASD6. As explained in the algorithm, for each cluster, a decision tree classifier was trained using the ASD instances in the cluster versus all TD instances. Interestingly, only one feature was enough to discriminate instances in the cluster from all TD instances. Among those features that can discriminate the cluster's instances, we selected the Variance of Framewise Temporal Derivative (VFTD) of the 7th MFCC coefficient as the feature which can discriminate more ASD participants from the set of all participants with a simple threshold. The classifier obtained by setting a threshold based on this feature was the first classifier. This feature supports our expert's report regarding the higher variations in the cry sounds of ASD\n\nchildren than TD children. From 10 ASD children, 8 of them can be discriminated using this feature. For each participant, the number of instances found by this classifier is shown in the $2^{\\text {nd }}$ column of Table 5.\n\n\nContext: The document details an algorithm and its application to discriminate between Autism Spectrum Disorder (ASD) and Typically Developing (TD) children based on cry sound analysis, specifically using Mel-Frequency Cepstral Coefficients (MFCCs).", "metadata": { "doc_id": "Asd_Cry_patterns_24", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: After excluding the ASD samples from the first classifier, the second classifier was trained based on the second exclusive cluster. This cluster included all instances of participant ASD4. The only feature used for classifying this cluster was VFTD of the $6^{\\text {th }}$ SONE coefficient. SONE is a unit of loudness which is a subjective perception of sound pressure [58]. Having higher VFTD of the $6^{\\text {th }}$ SONE coefficient confirms the experiential knowledge of our experts mentioned before. Among all the ASD participants, eight had instances with VFTD of the $6^{\\text {th }}$ SONE higher than a threshold (Shown in the 3rd column of Table 5). The results of classification based on these two features are depicted in Fig 3. As mentioned in the proposed method section, the participants with at least one instance classified into this cluster would be considered as a participant with ASD.\n\nResults\n\nIn this part, the performance of our proposed SSI classifier against a common WSI classifier is evaluated on our test set of ASD and TD participants. Each participant has multiple instances which are cleaned using the criteria explained in the data collection and preprocessing section. The participants who had at least one accepted instance were used in the training and testing phases, which are shown in Tables 1 and 2.\n\nThe output of the SSI approach was two classifiers, each of them works by setting a threshold based on a feature. The number of instances of ASD participants in the training set, correctly detected by the first and the second classifiers, are shown in the second and third columns of Table 5, respectively. On the other hand, the best-resulting classifier for the WSI approach was Radial Basis Function-Support Vector Machine (RBF-SVM) [59].\n\n\nContext: Evaluation of a novel Sound Spectral Index (SSI) classifier for Autism Spectrum Disorder (ASD) detection, comparing its performance against a common Wavelet Spectral Index (WSI) classifier using acoustic features.", "metadata": { "doc_id": "Asd_Cry_patterns_25", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: The classification results on the test set for different classifiers are shown in Table 6. The portion of each participant's instances, correctly classified by each classifier, is written as a percentage under the name of the classifier. The decision made by the WSI and SSI classifiers for each participant is shown by ASD or TD. To classify each subject using the WSI classifier, the Majority Pooling (MP) and the Best-chance threshold Pooling (BP) approaches were used. BP is a threshold-based pooling with the threshold giving the best accuracy on the test set for male participants. For the boys, MP has specificity, sensitivity, and precision equal to $100 \\%, 35.71 \\%$, and $67.85 \\%$, respectively. On the other hand, BP leads to specificity, sensitivity, and precision equal to $85.71 \\%, 71.42 \\%$, and $78.57 \\%$, respectively. The threshold\n\nTable 5. The number of instances of each participant in the training set that are classified as ASD using each trained SSI classifier.\n\nID First SSI classifier Second SSI classifier ASD1 8 3 ASD2 1 2 ASD3 3 1 ASD4 10 9 ASD5 0 0 ASD6 1 3 ASD7 1 0 ASD8 1 2 ASD9 0 1 ASD10 2 4\n\nhttps://doi.org/10.1371/journal.pone.0241690.t005\n\nimg-2.jpeg\n\n\nContext: Classification results of autism spectrum disorder detection using different classifiers and pooling approaches.", "metadata": { "doc_id": "Asd_Cry_patterns_26", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: https://doi.org/10.1371/journal.pone.0241690.t005\n\nimg-2.jpeg\n\nFig 3. Two classifiers trained on the two exclusive clusters found during the SSI classifier training phase. (a) The Variance of Frame-wise Temporal Derivative (VFTD) of the 7th MFCC coefficient separates 27 instances of 8 ASD subjects from all TD instances of the training set. (b) VFTD of the $6^{\\text {th }}$ SONE coefficient separates 17 instances of 7 ASD participants from all TD instances of the training set. https://doi.org/10.1371/journal.pone.0241690.g003 for BP was set to $20 \\%$ that means if $20 \\%$ of instances of a participant were classified as ASD instance, the participant was classified as having ASD. The results of the percentage of instances correctly classified by the two classifiers in the SSI approach are shown as $\\mathrm{C}{1}$ (the first SSI classifier) and $\\mathrm{C}{2}$ (the second SSI classifier) in Table 6. The aggregated result of the decisions by $\\mathrm{C}{1}$ and $\\mathrm{C}{2}$ makes the final decision of the SSI classifier which is shown in the decision column, under the SSI classification section. The achieved specificity, sensitivity, and precision using the proposed method for the boys are $100 \\%, 85.71 \\%$, and $92.85 \\%$, respectively.\n\nTo further show the applicability of the proposed approach to girls, we applied the boys' trained classifiers on the test set of the girls. The results are shown in the last row of Table 6 which show that the MP approach has specificity, sensitivity, and precision equal to $100 \\%$, $71.42 \\%$, and $85.71 \\%$, respectively. Furthermore, the BP approach gives specificity, sensitivity, and precision all equal to $85.71 \\%$, respectively. The results of the proposed SSI classifier is $100 \\%$ specificity, $71.42 \\%$ sensitivity, and $85.71 \\%$ precision.\n\n\nContext: Evaluation of a proposed method for autism spectrum disorder (ASD) classification using Mel-Frequency Cepstral Coefficients (MFCCs) and Sonic features, demonstrating performance metrics for both boys and girls.", "metadata": { "doc_id": "Asd_Cry_patterns_27", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: A two-dimensional scatter plot of the two features, used in $\\mathrm{C}{1}$ and $\\mathrm{C}{2}$ classifiers, are shown in Fig 4. As can be seen in this figure, the instances of a participant with ASD are scattered in the area containing instances of both TD and ASD participants. Nevertheless, there are instances for this participant uniquely distinguishable using the selected two features.\n\nWe compared the results of our proposed method with that of the only method available in the literature which was trained using only cry features [41] based on our data. The results (Table 7) show the superiority of our method, compared to the previously proposed method.\n\nInvestigating the trained classifier on participants under 18 months\n\nThe SSI classifier which was trained using the training set in Table 1 was also tested on the data of children younger than 18 months. From 57 participants under 18 months, two boys (Child1 and Child2 in Table 3) were classified as ASD by the mentioned trained classifier. These participants were referred to our experts for diagnosis. These two were suspected of having neurodevelopmental problems. All other boys were classified as TD. However, among them, Child3 was diagnosed with ASD at the age of 2. Also, Child4 showed\n\nTable 6. The results of classifiers on the instances of each participant in the test set.\n\n\nContext: Evaluation of a classifier trained on cry features to detect Autism Spectrum Disorder (ASD) in infants, including comparison to existing methods and testing on a subset of participants under 18 months.", "metadata": { "doc_id": "Asd_Cry_patterns_28", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Table 6. The results of classifiers on the instances of each participant in the test set.\n\nTD children Children with ASD ID Portion of instances classified as TD as a percentage and the decision ID Portion of instances classified as ASD as a percentage and the decision WSI classification SSI classification WSI classification SSI classification SVM Dec. $\\mathrm{C}_{1}$ $\\mathrm{C}_{2}$ Dec. SVM Dec. $\\mathrm{C}_{1}$ $\\mathrm{C}_{2}$ $\\frac{\\pi}{\\sqrt{2}}$ TD11 100 TD TD 100 100 TD ASD11 50 ASD ASD 17 50 ASD TD12 100 TD TD 100 100 TD ASD12 33 TD ASD 11 28 ASD TD13 100 TD TD 100 100 TD ASD13 33 TD ASD 33 0 ASD TD14 100 TD TD 100 100 TD ASD14 20 TD ASD 20 20 ASD TD15 100 TD TD 100 100 TD ASD15 0 TD TD 0 40 ASD TD16 100 TD TD 100 100 TD ASD16 50 ASD ASD 100 0 ASD TD17 100 TD TD 100 100 TD ASD17 0 TD TD 0 100 ASD TD18 83 TD TD 100 100 TD ASD18 50 ASD ASD 50 50 ASD TD19 100 TD TD 100 100 TD ASD19 0 TD TD 0 0 TD TD20 80 TD ASD 100 100 TD ASD20 42 TD ASD 42 16 ASD TD21 100 TD TD 100 100 TD ASD21 100 ASD ASD 0 0 TD TD22 100 TD TD 100 100 TD ASD22 0 TD TD 0 50 ASD TD23 75 TD ASD 100 100 TD ASD23 33 TD ASD 33 17 ASD TD24 92 TD TD 100 100 TD ASD24 86 ASD ASD 86 86 ASD Acc. \\% 100 85.71 100 35.71 71.42 85.71 $\\frac{\\pi}{\\sqrt{2}}$ TD25 100 TD TD 100 100 TD ASD25 42 TD ASD 17 0 ASD TD26 100 TD TD 100 100 TD ASD26 60 ASD ASD 60 20 ASD TD27 100 TD TD 100 100 TD ASD27 50 ASD ASD 0 0 TD TD28 100 TD TD 100 100 TD ASD28 100 ASD ASD 0 50 ASD TD29 100 TD TD 100 100 TD ASD29 62 ASD ASD 50 50 ASD TD30 67 TD ASD 100 100 TD ASD30 100 ASD ASD 50 50 ASD TD31 100 TD TD 100 100 TD ASD31 0 TD TD 0 0 TD Acc. \\% 100 85.71 100 71.42 85.71 71.42\n\n\nContext: Results of classification experiments distinguishing between typically developing children and children with Autism Spectrum Disorder based on voice samples.", "metadata": { "doc_id": "Asd_Cry_patterns_29", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Each classifier result on a participant's instances is reported as a percentage. Dec., Decision; MP, Majority Pooling; BC, Best-chance threshold Pooling; C1, Classifier1; C2, Classifier2; Acc., Accuracy. https://doi.org/10.1371/journal.pone.0241690.t006 symptoms of having ADHD and sensory processing disorder at the age of 3 . Three other children had symptoms which suggested that they are not TD children. Two of the girls who were 18 months old were classified as ASD, using the trained classifier. The other girls were classified as TD. The results of testing the trained SSI classifier on this data set are summarized in Table 8.\n\nThe original and cleaned voices and their extracted features (the data set) in this research and the implementation codes of the proposed method are deposited in the following repositories:\n\nCodeOcean 10.24433/CO.0622770.v1\n\nHarvard Dataverse (Contains only a rar file of sounds): 10.7910/DVN/LSTBQW\n\nimg-3.jpeg\n\nFig 4. Instances of several ASD and TD participants scattered in the space of two features given by the proposed SSI method. The instances of a chosen ASD participant are illustrated in green to show that a participant may have instances in the area common with TD instances besides those two areas separated by the selected thresholds as ASD. The mentioned ASD participant (with green instances) is tagged as ASD, due to having at least one instance with the greater value than at least one of the thresholds on the two features. https://doi.org/10.1371/journal.pone.0241690.g004\n\nDiscussion and conclusion\n\n\nContext: Results of a study using a trained classifier to identify Autism Spectrum Disorder (ASD) from voice data, including a table summarizing classifier performance and a visualization of data points.", "metadata": { "doc_id": "Asd_Cry_patterns_30", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Discussion and conclusion\n\nIn this paper, we presented a novel cry-based screening method to distinguish between children with autism and typically developing children. In the proposed method, groups of children with autism who have specific features in their cry sounds can be determined. This method is based on a new classification approach called SubSet Instance (SSI) classifier. An appealing property of the proposed SSI classifier, in the case of voice-based autism screening, is its high specificity such that a normal child can be detected with no error. We applied the proposed method on a group of participants consisting of 24 boys with ASD between 20 and\n\nTable 7. Comparison of the results on the test set using the two methods; SSI approach and a baseline approach.\n\nSensitivity Specificity Precision $\\begin{aligned} & \\text { S } \\ & \\text { S } \\end{aligned}$ SSI $85.71 \\%$ $100 \\%$ $92.85 \\%$ Baseline $50.58 \\%$ $81 \\%$ $65 \\%$ SSI $71.42 \\%$ $100 \\%$ $85.71 \\%$ Baseline $21 \\%$ $86.48 \\%$ $53 \\%$\n\nhttps://doi.org/10.1371/journal.pone.0241690.t007\n\nTable 8. Classification of the participants under 18 months using our trained SSI classifier.\n\nBoys Girls ASD TD Others $^{\\mathrm{a}}$ ASD TD Others $^{\\mathrm{a}}$ Classified as ASD 0 0 2 0 1 0 Classified as TD 1 22 4 0 27 0\n\n\nContext: Results and discussion of a novel cry-based autism screening method using a SubSet Instance (SSI) classifier.", "metadata": { "doc_id": "Asd_Cry_patterns_31", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Boys Girls ASD TD Others $^{\\mathrm{a}}$ ASD TD Others $^{\\mathrm{a}}$ Classified as ASD 0 0 2 0 1 0 Classified as TD 1 22 4 0 27 0\n\n${ }^{a}$ Other developmental or mental disorders https://doi.org/10.1371/journal.pone.0241690.t008 53 months of age and 24 TD boys between 18 and 51 months of age. The two features, found in this study, were used to train a classifier on 10 boys with ASD and 10 TD boys. Then, the classifier was used to distinguish 14 boys with ASD from 14 TD boys, reaching $92.8 \\%$ accuracy. Due to the fact that girls are less likely to have autism and consequently, it is harder to collect enough data from girls than boys, the number of girls with ASD was not sufficient to train a separate classifier for this gender. It should be noted that we tested the trained system on 7 girls with ASD and 7 TD girls. It was seen that the trained classifier can screen girls with 7\\% lower accuracy than boys of the test set. In other words, it seems that gender differences should be considered in the training of the system. In testing the data from participants under 18 months, one TD girl was classified as ASD which was not the case for any TD children of the male counterparts. This result also confirms the aforementioned point about the gender effect. However, in future work, we would try to collect more data on girls to be able to train a system to accurately screen girls. Furthermore, we would also try to train a single classifier for boys and girls to determine whether it can be used for both of them.\n\n\nContext: A study evaluating a classifier trained to distinguish between boys with Autism Spectrum Disorder (ASD) and typically developing (TD) boys, and its performance when applied to girls.", "metadata": { "doc_id": "Asd_Cry_patterns_32", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: It should be mentioned that our training and test data were completely separate, to make the trained model more general. The features found in this study are applicable in the age range of our participants from 18 to 53 months. This is in contrast to other approaches that either used a dataset of children with a specific age [33,35] or used age information for classification [34]. Due to the age invariant features found in this study, it can be claimed that there are markers in the voices of children with ASD that are sustained at least in a range of ages.\n\nThe two discriminative features, found in this study, were a coefficient of MFCC and a SONE coefficient. MFCC and SONE are related to the power spectrum of a speech signal. SONE measures loudness in specific Bark bands [56]. On the other hand, MFCC, which is the inverse DFT of log-spectrum in the Mel scale, is related to the timbre of the voice [60]. Therefore, MFCC and SONE can be interpreted to be related to the timbre and loudness of a tone. Furthermore, based on the feedback from our experts, there is unpredictability in the crying sound of children with autism which is not the case for TD children. Consequently, we used the variance of temporal difference as a feature suitable for screening children with autism. This is due to the fact that if a signal is constant or changes linearly over time, the variance of temporal difference is zero. Therefore, the variance of temporal difference can be seen as the amount of ambiguity or unpredictability of a sound. On the other hand, the heightened variability in the two features, found in this study, for children with ASD is significant due to the reports from other studies [22,61] which shows increased biological signals variability in children with ASD and infants at high risk for autism in comparison with TD children. These features are statistical features of the cry instances that hold constant, at least, across an age range studied in this research.\n\n\nContext: A study investigating acoustic features of infant cries to identify markers for Autism Spectrum Disorder (ASD).", "metadata": { "doc_id": "Asd_Cry_patterns_33", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: To the best of our knowledge, [34] and [35] were the only studies on screening children with autism using voice features on children younger than 2 years of age. Our proposed method has higher precision than these two, i.e. $6 \\%$ more than [34] and $17 \\%$ more than [35], using only cry features. The use of cry features as suitable biomarkers for autism screening matches the claims in [38].\n\nIn the present study only children with ASD and TD children were tested. Other developmental disorders or health issues were not tested to see how children with such disorders would be classified using the proposed method which can decrease the specificity of $100 \\%$. However, this approach is proposed to be used as a screening tool and the final diagnosis should be done under experts' supervision. So, this approach can be applied as a general screener of autism spectrum disorder.\n\n\nContext: A study comparing a new autism screening method using cry features to existing methods, discussing its precision and potential as a general screener.", "metadata": { "doc_id": "Asd_Cry_patterns_34", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: The trained classifier was also tested on 57 participants between 10 to 18 months of age. The classifier screened two boys from the rest, i.e. Child1 and Child2 (Table 3). Child1 showed evidences of genetic disease and was diagnosed with developmental delay and Child2 received UNDD classification by our experts. This suggests that a) the system can be used for children under 2 years of age, and b) it may be able to distinguish other neurodevelopmental disorders. On the other hand, there were 5 boys, i.e. Child3 to Child7 (Table 3), who had no evidence of mental or developmental disorders at the time of their recording. At the same time, our approach did not distinguish them as children with ASD either. However, when they were older than 3 years, they showed symptoms of neurodevelopmental disorders. Out of these children, we could manage to collect new recordings from Child3 and Child4 that were classified as children with ASD using our approach. Unfortunately, Child5, Child6, and Child7 did not cooperate and could not be evaluated by an expert to validate the results of our expert-selected questionnaire. Furthermore, the parents refused to cooperate send us their children's recent cry sounds.\n\n\nContext: Evaluation of a cry-based classifier's performance on a cohort of young children, including those with and without neurodevelopmental disorders, and follow-up assessments of children initially classified as typically developing.", "metadata": { "doc_id": "Asd_Cry_patterns_35", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: The result of studying these 57 children under the age of 18 months may suggest that: a) there could be symptoms in the crying sounds of children with neurodevelopmental disorders under 18 months (Child1 and Child2), b) the approach may not be able to screen a participant with neurodevelopmental disorders under the age of 18 months due to the possibility that: 1) the participant was among those children with neurodevelopmental disorders who do not have our proposed specific features in their crying sounds, 2) the participant's recorded cry samples did not include our specific features, and/or 3) neurodevelopmental disorders and their features had not been developed in the child at the time of initial recording. The reason behind not classifying Child3 and Child4, as children with ASD under the age of 18, could be b. 2 or b.3. To clearly determine any reason behind this phenomena, a further investigation is needed.\n\nWe believe that this approach can be used to perform early autism screening under 18 months of age. Thus, in the future, we need to collect data and test the approach on more data of children under 18 months to validate these results with more confidence.\n\nWe have to further check the proposed approach and the extracted features on other neurodevelopmental disorders, such as ADHD, to evaluate the capability of the approach to distinguish the children with these disorders from TD children.\n\nFurthermore, without comparing the cry sounds of children with ASD to those without ASD but another disorder, we do not really know if these findings are specific to autism or to general atypical brain developments. Thus, we should collect cry sounds of children with other neurodevelopmental disorders and compare voices of children with ASD to voices of children with other neurodevelopmental disorders to see if these features would be able to separate them or not.\n\n\nContext: Discussion of limitations and future research directions for an autism screening approach based on infant cry analysis.", "metadata": { "doc_id": "Asd_Cry_patterns_36", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: It has been demonstrated that crying consists of intricate motor activities [62]. On the other hand, it has been shown that children with ASD have problems in the motor domain and in coordination of their motor capabilities with other modalities [63]. Consequently, it is possible that the extracted features in the crying sounds of children with ASD come from this deficiency/problem in the motor domain which requires further investigations.\n\nFinally, automating the preprocessing part is a technical issue that should be addressed if it is deemed necessary that the cry-based screening be fully automated. This is important since\n\nsuch a screening system can be deployed in systems such as Amazon Alexa [64] to automatically screen problematic cry sounds.\n\nAcknowledgments\n\nWe would like to thank the Center for Treatment of Autism Disorder (CTAD) and its members for supporting this study. We would also like to thank all the families who helped this research by taking the time to collect the cry sounds of their children. The authors would also like to express their gratitude to Prof. H. Sameti from Sharif University of Technology for his valuable and constructive feedbacks on the data collection and voice processing.\n\nAuthor Contributions\n\nConceptualization: Aida Khozaei, Hadi Moradi, Reshad Hosseini, Hamidreza Pouretemad, Bahareh Eskandari.\n\nData curation: Aida Khozaei. Formal analysis: Aida Khozaei. Funding acquisition: Hadi Moradi. Investigation: Hadi Moradi. Methodology: Aida Khozaei. Project administration: Hadi Moradi. Software: Aida Khozaei. Supervision: Hadi Moradi. Validation: Aida Khozaei. Visualization: Aida Khozaei. Writing - original draft: Aida Khozaei, Hadi Moradi, Reshad Hosseini. Writing - review \\& editing: Aida Khozaei, Hadi Moradi, Reshad Hosseini, Hamidreza Pouretemad, Bahareh Eskandari.\n\nReferences\n\nAmerican Psychiatric Association. Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Pub; 2013.\n\n\nContext: The chunk discusses potential motor-related origins of differences in crying sounds of children with ASD and suggests potential for automated screening using voice assistants, concluding with acknowledgments and author contributions, followed by a comprehensive reference list.", "metadata": { "doc_id": "Asd_Cry_patterns_37", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: References\n\nAmerican Psychiatric Association. Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Pub; 2013.\n\nChen JL, Sung C, Pi S. Vocational rehabilitation service patterns and outcomes for individuals with autism of different ages. J Autism Dev Disord. 2015; 45(9):3015-29. https://doi.org/10.1007/s10803-015-2465-y PMID: 25982310\n\nFakhoury M. Autistic spectrum disorders: A review of clinical features, theories and diagnosis. Int J Dev Neurosci. 2015; 43:70-7. https://doi.org/10.1016/j.ijdevneu.2015.04.003 PMID: 25862937\n\nConstantino JN, Charman T. Diagnosis of autism spectrum disorder: reconciling the syndrome, its diverse origins, and variation in expression. Lancet Neurol. 2016; 15(3):279-91. https://doi.org/10. 1016/S1474-4422(15)00151-9 PMID: 26497771\n\nCalderoni S, Billeci L, Narzisi A, Brambilla P, Retico A, Muratori F. Rehabilitative interventions and brain plasticity in autism spectrum disorders: focus on MRI-based studies. Front Neurosci. 2016; 10:139. https://doi.org/10.3389/fnins.2016.00139 PMID: 27065795\n\nBrentani H, Paula CSd, Bordini D, Rolim D, Sato F, Portolese J, et al. Autism spectrum disorders: an overview on diagnosis and treatment. Braz J Psychiatry. 2013; 35:S62-S72. https://doi.org/10.1590/1516-4446-2013-S104 PMID: 24142129\n\nMandell DS, Novak MM, Zubritsky CD. Factors associated with age of diagnosis among children with autism spectrum disorders. Pediatrics. 2005; 116(6):1480-6. https://doi.org/10.1542/peds.2005-0185 PMID: 16322174\n\nThabtah F, Peebles D. A new machine learning model based on induction of rules for autism detection. Health Inform J. 2019. https://doi.org/10.1177/1460458218824711 PMID: 30693818\n\nVolkmar F, Cook EH, Pomeroy J, Realmuto G, Tanguay P. Practice parameters for the assessment and treatment of children, adolescents, and adults with autism and other pervasive developmental disorders. J Am Acad Child Adolesc Psychiatry. 1999; 38(12, Supplement):32S-54S. https://doi.org/10. 1016/S0890-8567(99)80003-3\n\n\nContext: A list of references cited throughout a document concerning autism spectrum disorders.", "metadata": { "doc_id": "Asd_Cry_patterns_38", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Campbell M, Schopler E, Cueva JE, Hallin A. Treatment of autistic disorder. J Am Acad Child Adolesc Psychiatry. 1996; 35(2):134-43. https://doi.org/10.1097/00004583-199602000-00005 PMID: 8720622\n\nZachor DA, Itzchak EB. Treatment approach, autism severity and intervention outcomes in young children. Res Autism Spectr Disord. 2010; 4(3):425-32.\n\nBoyd BA, Hume K, McBee MT, Alessandri M, Gutierrez A, Johnson L, et al. Comparative Efficacy of LEAP, TEACCH and Non-Model-Specific Special Education Programs for Preschoolers with Autism Spectrum Disorders. J Autism Dev Disord. 2014; 44(2):366-80. https://doi.org/10.1007/s10803-013-1877-9 PMID: 23812661\n\nJacobson JW, Mulick JA, Green G. Cost-benefit estimates for early intensive behavioral intervention for young children with autism-general model and single state case. Behav Interv. 1998; 13(4):201-26.\n\nJacobson JW, Mulick JA. System and Cost Research Issues in Treatments for People with Autistic Disorders. J Autism Dev Disord. 2000; 30(6):585-93. https://doi.org/10.1023/a:1005691411255 PMID: 11261469\n\nThabtah F, Peebles D. Early Autism Screening: A Comprehensive Review. Int J Environ Res Public Health. 2019; 16(18):3502. https://doi.org/10.3390/ijerph16183502 PMID: 31546906\n\nRutter M, Le Couteur A, Lord C. ADI-R: Autism Diagnostic Interview-Revised. Los Angeles, CA: Western Psychological Services; 2003.\n\nLord C., Risi S., Lambrecht L., Cook E. H. Jr., Leventhal B. L., Lavore Di, et al. The autism diagnostic observation schedule-generic: a standard measure of social and communication deficits associated with the spectrum of autism. J Autism Dev Disord. 2000; 30(3), 205-223. PMID: 11055457\n\nLevy S, Duda M, Haber N, Wall DP. Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism. Mol Autism. 2017; 8(1):65. https://doi.org/10.1186/ s13229-017-0180-6 PMID: 29270283\n\n\nContext: Treatment and diagnostic approaches for autism.", "metadata": { "doc_id": "Asd_Cry_patterns_39", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Küpper C, Stroth S, Wolff N, Hauck F, Kliewer N, Schad-Hansjosten T, et al. Identifying predictive features of autism spectrum disorders in a clinical sample of adolescents and adults using machine learning. Sci Rep. 2020; 10(1):4805. https://doi.org/10.1038/s41598-020-61607-w PMID: 32188882\n\nAbbas H, Garberson F, Liu-Mayo S, Glover E, Wall DP. Multi-modular AI Approach to Streamline Autism Diagnosis in Young Children. Scientific Reports. 2020; 10(1):5014. https://doi.org/10.1038/ s41598-020-61213-w PMID: 32193406\n\nEmerson RW, Adams C, Nishino T, Hazlett HC, Wolff JJ, Zwaigenbaum L, et al. Functional neuroimaging of high-risk 6-month-old infants predicts a diagnosis of autism at 24 months of age. Sci Transl Med. 2017; 9(393):eaag2882. https://doi.org/10.1126/scitranslmed.aag2882 PMID: 28592562\n\nDenisova K, Zhao G. Inflexible neurobiological signatures precede atypical development in infants at high risk for autism. Sci Rep. 2017; 7(1):11285. https://doi.org/10.1038/s41598-017-09028-0 PMID: 28900155\n\nBosl WJ, Tager-Flusberg H, Nelson CA. EEG analytics for early detection of autism spectrum disorder: a data-driven approach. Sci Rep. 2018; 8(1):6828. https://doi.org/10.1038/s41598-018-24318-x PMID: 29717196\n\nMomeni N, Bergquist J, Brudin L, Behnia F, Sivberg B, Joghataei M, et al. A novel blood-based biomarker for detection of autism spectrum disorders. Transl Psychiatry. 2012; 2(3):e91. https://doi.org/10. 1038/tp.2012.19 PMID: 22832856\n\nGlatt SJ, Tsuang MT, Winn M, Chandler SD, Collins M, Lopez L, et al. Blood-based gene expression signatures of infants and toddlers with autism. J Am Acad Child Adolesc Psychiatry. 2012; 51(9):934944.e2. https://doi.org/10.1016/j.jaac.2012.07.007 PMID: 22917206\n\nCroen LA, Braunschweig D, Haapanen L, Yoshida CK, Fireman B, Grether JK, et al. Maternal mid-pregnancy autoantibodies to fetal brain protein: the early markers for autism study. Biol Psychiatry. 2008; 64 (7):583-8. https://doi.org/10.1016/j.biopsych.2008.05.006 PMID: 18571628\n\n\nContext: Recent research utilizing machine learning and various biomarkers for autism detection.", "metadata": { "doc_id": "Asd_Cry_patterns_40", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Greene DJ, Black KJ, Schlaggar BL. Considerations for MRI study design and implementation in pediatric and clinical populations. Dev Cogn Neurosci. 2016; 18:101-112. https://doi.org/10.1016/j.dcn.2015. 12.005 PMID: 26754461\n\nWebb SJ, Bernier R, Henderson HA, Johnson MH, Jones EJ, Lerner MD, et al. Guidelines and best practices for electrophysiological data collection, analysis and reporting in autism. J Autism Dev Disord. 2015; 45(2):425-243. https://doi.org/10.1007/s10803-013-1916-6 PMID: 23975145\n\nEngelhardt LE, Roe MA, Juranek J, DeMaster D, Harden KP, Tucker-Drob EM, et al. Children's head motion during fMRI tasks is heritable and stable over time. Dev Cogn Neurosci. 2017; 25:58-68. https:// doi.org/10.1016/j.dcn.2017.01.011 PMID: 28223034\n\nDenisova K. Age attenuates noise and increases symmetry of head movements during sleep restingstate fMRI in healthy neonates, infants, and toddlers. Infant Behav Dev. 2019; 57:101317. https://doi. org/10.1016/j.infbeh.2019.03.008 PMID: 31102945\n\nBrisson J, Martel K, Serres J, Sirois S, Adrien JL. Acoustic analysis of oral productions of infants later diagnosed with autism and their mother. Infant Ment Health J. 2014; 35(3):285-95. https://doi.org/10. 1002/imhj. 21442 PMID: 25798482\n\nNakai Y, Takiguchi T, Matsui G, Yamaoka N, Takada S. Detecting abnormal word utterances in children with autism spectrum disorders: machine-learning-based voice analysis versus speech therapists. Percept Mot Skills. 2017; 124(5):961-73. https://doi.org/10.1177/0031512517716855 PMID: 28649923\n\nSantos JF, Brosh N, Falk TH, Zwaigenbaum L, Bryson SE, Roberts W, et al. Very early detection of autism spectrum disorders based on acoustic analysis of pre-verbal vocalizations of 18-month old toddlers. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing; 2013; Vancouver, BC, Canada: IEEE. 2013. Doi: 10.1109/ICASSP. 2013.6639134\n\n\nContext: Methods and considerations for neuroimaging studies, particularly MRI and fMRI, in pediatric and clinical populations, alongside related research on head motion and vocalization analysis in autism spectrum disorder.", "metadata": { "doc_id": "Asd_Cry_patterns_41", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Oller D, Niyogi P, Gray S, Richards J, Gilkerson J, Xu D, et al. Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. Proc Natl Acad Sci. 2010; 107(30):13354-9. https://doi.org/10.1073/pnas.1003882107 PMID: 20643944\n\nPokorny FB, Schuller BW, Marschik PB, Brueckner R, Nyström P, Cummins N, et al. Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-Based Approach. In: Proceedings of the INTERSPEECH 2017; 2017; Stockholm, Sweden: ISCA. 2017. Doi: 10.21437/ Interspeech.2017-1007\n\nLittle MA, Varoquaux G, Saeb S, Lonini L, Jayaraman A, Mohr DC, et al. Using and understanding cross-validation strategies. Perspectives on Saeb et al. Gigascience. 2017; 6(5):1-6. https://doi.org/10. 1093/gigascience/gix020 PMID: 28327989\n\nEyben F, Scherer KR, Schuller BW, Sundberg J, André E, Busso C, et al. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans Affect Comput. 2015; 7(2):190-202.\n\nEsposito G, Hiroi N, Scattoni ML. Cry, Baby, Cry: Expression of Distress As a Biomarker and Modulator in Autism Spectrum Disorder. Int J Neuropsychopharmacol. 2017; 20(6):498-503. https://doi.org/10. 1093/ijnp/pyx014 PMID: 28204487\n\nSheinkopf SJ, Iverson JM, Rinaldi ML, Lester BM. Atypical Cry Acoustics in 6-Month-Old Infants at Risk for Autism Spectrum Disorder. Autism Res. 2012; 5(5):331-9. https://doi.org/10.1002/aur. 1244 PMID: 22890558\n\nOrlandi S, Manfredi C, Bocchi L, Scattoni ML, editors. Automatic newborn cry analysis: a non-invasive tool to help autism early diagnosis. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2012; San Diego, CA, USA: IEEE. 2012. Doi: 10.1109/ EMBC.2012.6346583\n\nMotlagh SHRE, Moradi H. Pouretemad H, editors. Using general sound descriptors for early autism detection: 2013 9th Asian Control Conference (ASCC) Control; 2013; Istanbul, Turkey: IEEE. 2013. Doi: 10.1109/ASCC.2013.6606386\n\n\nContext: Research on automated vocal analysis and cry analysis for early autism detection.", "metadata": { "doc_id": "Asd_Cry_patterns_42", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Barrow WJ, Jaworski M, Accardo PJ. Persistent toe walking in autism. J Child Neurol. 2011; 26(5):61921. https://doi.org/10.1177/0883073810385344 PMID: 21285033\n\nPaul R, Norbury CF. Language disorders from infancy through adolescence: Elsevier; 2012.\n\nJalilevand N, Ebrahimipour M. Pronoun acquisition in Farsi-speaking children from 12 to 36 months. J Child Lang Acquis Dev. 2013; 1(1):1-9.\n\nGoldstein S, Ozonoff S. Assessment of autism spectrum disorder: Guilford Publications; 2018.\n\nLund NJ, Duchan JF. Assessing children's language in naturalistic contexts: Prentice Hall; 1993.\n\nGilliam JE. Gilliam autism rating scale: GARS 2: Pro-ed; 2006.\n\nBerk L. Development through the lifespan: Pearson Education India; 2010.\n\nThree ZT. Diagnostic classification of mental health and developmental disorders of infancy and early childhood: Revised edition (DC: 0-3R). Washington, DC: Zero To Three Press; 2005.\n\nPaliwal KK, Lyons JG, Wójcicki KK, editors. Preference for 20-40 ms window duration in speech analysis. In: Proceedings of the 4th International Conference on Signal Processing and Communication Systems; 2010: Gold Coast, QLD, Australia: IEEE. Doi: 10.1109/ICSPCS.2010.5709770\n\nMolau S, Pitz M, Schluter R, Ney H. Computing Mel-frequency cepstral coefficients on the power spectrum. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing Proceedings (Cat No01CH37221); 2001; Salt Lake City, UT, USA: IEEE. Doi: 10.1109/ICASSP. 2001. 940770\n\nEsposito G, Venuti P. Developmental changes in the fundamental frequency (f0) of infants' cries: a study of children with Autism Spectrum Disorder. Early Child Dev Care. 2010; 180(8):1093-102.\n\nMarchi E, Schuller B, Baron-Cohen S, Golan O, Bölte S, Arora P, et al. Typicality and emotion in the voice of children with autism spectrum condition: Evidence across three languages. In: Proceedings of the INTERSPEECH 2015; 2015; Dresden, Germany: ISCA. 2015. p. 115-119. Available from: https:// www.isca-speech.org.\n\n\nContext: A bibliography of sources related to autism spectrum disorder, child development, language acquisition, and signal processing techniques for analyzing speech.", "metadata": { "doc_id": "Asd_Cry_patterns_43", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Belalcázar-Bolaños E.A., Orozco-Arroyave J.R., Vargas-Bonilla J.F., Haderlein T., Nöth E. Glottal Flow Patterns Analyses for Parkinson's Disease Detection: Acoustic and Nonlinear Approaches. In: Sojka P., Horák A., Kopeček I., Pala K., editors. Text, Speech, and Dialogue: Proceedings of the 19th International Conference on Text, Speech, and Dialogue; 2016 Sep 12-16; Brno, Czech Republic. Cham: Springer; 2016. Doi: 10.1007/978-3-319-45510-5_46\n\nRabiner LR, Schafer RW. Introduction to digital speech processing. Found and trends in signal process. 2007; 1(1):1-194.\n\nPeeters G. A large set of audio features for sound description (similarity and classification) in the CUIDADO project. CUIDADO IST Proj Rep. 2004; 54(0):1-25.\n\nBone D, Lee C-C, Black MP, Williams ME, Lee S, Levitt P, et al. The psychologist as an interlocutor in autism spectrum diorder assessment: Insights from a study of spontaneous prosody. J Speech Lang Hear Res. 2014; 57(4):1162-77. https://doi.org/10.1044/2014_JSLHR-S-13-0062 PMID: 24686340\n\nHänsler E, Schmidt G. Speech and audio processing in adverse environments: Springer Science \\& Business Media; 2008. https://doi.org/10.1055/s-2008-1065331 PMID: 18473287\n\nTheodoridis S, Koutroumbas K. Pattern recognition: Elsevier; 2003.\n\nLi TL, Chan AB. Genre classification and the invariance of MFCC features to key and tempo. In: Lee KT., Tsai WH., Liao HY.M., Chen T., Hsieh JW., Tseng CC., editors. Advances in Multimedia Modeling: Proceedings of the 17th International MultiMedia Modeling Conference; 2011 Jan 5-7; Taipei, Taiwan. Berlin, Heidelberg: Springer; 2011. Doi: 10.1007/978-3-642-17832-0_30\n\nTakahashi T, Yoshimura Y, Hiraishi H, Hasegawa C, Munesue T, Higashida H, et al. Enhanced brain signal variability in children with autism spectrum disorder during early childhood. Hum Brain Mapp. 2016; 37(3):1038-50. https://doi.org/10.1002/hbm.23089 PMID: 26859309\n\nLester BM, Boukydis CZ. Infant crying: Theoretical and research perspectives: Springer; 1985.\n\n\nContext: A bibliography of sources related to speech processing, autism, and related fields.", "metadata": { "doc_id": "Asd_Cry_patterns_44", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: Lester BM, Boukydis CZ. Infant crying: Theoretical and research perspectives: Springer; 1985.\n\nMacDonald M, Lord C, Ulrich D. The relationship of motor skills and adaptive behavior skills in young children with autism spectrum disorders. Res Autism Spectr Disord. 2013; 7(11):1383-90. https://doi. org/10.1016/j.rasd.2013.07.020 PMID: 25774214\n\nHoy MB. Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants. Med Ref Serv Q. 2018; 37 (1):81-8. https://doi.org/10.1080/02763869.2018.1404391 PMID: 29327988\n\n\nContext: References related to infant crying, autism spectrum disorders, and voice assistants.", "metadata": { "doc_id": "Asd_Cry_patterns_45", "source": "Asd_Cry_patterns" } }, { "page_content": "Text: SCIENTIFIC REPORTS\n\nnatureresearch\n\nOPEN\n\nMulti-modular AI Approach to Streamline Autism Diagnosis in Young Children\n\nHalim Abbas ${ }^{1}$, Ford Garberson ${ }^{1}$, Stuart Liu-Mayo ${ }^{1}$, Eric Glover ${ }^{1 *}$ \\& Dennis P. Wall ${ }^{2}$\n\nAbstract\n\nAutism has become a pressing healthcare challenge. The instruments used to aid diagnosis are time and labor expensive and require trained clinicians to administer, leading to long wait times for at-risk children. We present a multi-modular, machine learning-based assessment of autism comprising three complementary modules for a unified outcome of diagnostic-grade reliability: A 4-minute, parentreport questionnaire delivered via a mobile app, a list of key behaviors identified from 2-minute, semistructured home videos of children, and a 2-minute questionnaire presented to the clinician at the time of clinical assessment. We demonstrate the assessment reliability in a blinded, multi-site clinical study on children 18-72 months of age ( $\\mathrm{n}=375$ ) in the United States. It outperforms baseline screeners administered to children by 0.35 ( $90 \\%$ CI: 0.26 to 0.43 ) in AUC and 0.69 ( $90 \\%$ CI: 0.58 to 0.81 ) in specificity when operating at $90 \\%$ sensitivity. Compared to the baseline screeners evaluated on children less than 48 months of age, our assessment outperforms the most accurate by 0.18 ( $90 \\%$ CI: 0.08 to 0.29 at $90 \\%$ ) in AUC and 0.30 ( $90 \\%$ CI: 0.11 to 0.50 ) in specificity when operating at $90 \\%$ sensitivity.\n\n\nContext: A machine learning-based assessment of autism for young children, demonstrating improved diagnostic reliability compared to baseline screening tools.", "metadata": { "doc_id": "Abbas_2020_0", "source": "Abbas_2020" } }, { "page_content": "Text: Idiopathic forms of Autism Spectrum Disorder (ASD) have no known biological cause and may correspond to multiple conditions with similar symptoms. The incidence of ASD has increased in recent years, and it impacts 1 in 59 children according to the latest studies ${ }^{1}$. ASD is diagnosed from clinical observations according to standard criteria ${ }^{2}$ relating to the child's social and behavioral symptoms. Autism is said to be on a spectrum due to the varied severities of symptoms, ranging from relatively mild social impairment to debilitating intellectual disabilities, inabilities to change routines and severe sensory reactions ${ }^{2}$. Approximately $25-50 \\%^{3}$ of autistic children are non-verbal and have severe symptoms.\n\nNotably, diagnosis within the first few years of life dramatically improves the outlook of children with autism, as it allows for treatment during a key window of developmental plasticity ${ }^{6,5}$. Unfortunately, the latest studies show that although $85 \\%$ of parents of children with autism reported developmental concerns about their children by 36 months of age, the median age of diagnosis in the United States is 52 months ${ }^{1}$. The complexity of the diagnostic procedures and the shortage of trained specialists can result in children with ASD not getting a diagnosis early enough to receive behavioral therapies during the time when they are most effective.\n\n\nContext: The challenges and impact of Autism Spectrum Disorder (ASD) diagnosis, including delayed diagnosis and its consequences.", "metadata": { "doc_id": "Abbas_2020_1", "source": "Abbas_2020" } }, { "page_content": "Text: Diagnosing autism in the United States generally takes two steps: developmental screening followed by comprehensive diagnostic evaluation if screened positive ${ }^{6}$. Screening instruments typically use questionnaires that are answered by a parent, teacher or clinician ${ }^{7,8}$. They are generally easy and inexpensive to administer and can be useful to flag some at-risk children, however, they are not always accurate enough to help inform a diagnosis ${ }^{9}$. Standard autism screeners can also have a high false positive rate, leading to unnecessary referrals and healthcare costs ${ }^{10}$. Comprehensive diagnostic evaluation instruments, on the other hand, are more accurate but require long and expensive interactions with highly trained clinicians ${ }^{11,12}$.\n\nIn this paper, we present improvements to two previously published ${ }^{13}$ automated autism assessment modules underlying the Cognoa ${ }^{14}$ software. The first module is based on a brief questionnaire about the child presented directly to parents without supervision. The second module is based on lightly trained analysts evaluating short videos of children within their natural environment that are captured by parents using a mobile device. We also present a new, third module that is intended to be completed in a primary care setting such as a pediatrician's office during a clinic visit. The third module is based upon a questionnaire that is answered by a clinician after examining the child and talking to the parent. We demonstrate that these three modules are as fast and easy to\n\n[^0] [^0]: ${ }^{1}$ Cognoa Inc., Palo Alto, CA, USA. ${ }^{2}$ Departments of Pediatrics, Biomedical Data Science and Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA. *email: eri_g@ericglover.com\n\nadminister as most of the typical screening instruments, yet their combined assessment accuracy is shown in this work to be significantly higher, such that they may be used to aid in diagnosis of autism.\n\n\nContext: The authors describe their work on automated autism assessment modules and how they improve upon traditional screening and diagnostic methods.", "metadata": { "doc_id": "Abbas_2020_2", "source": "Abbas_2020" } }, { "page_content": "Text: We present our approach to selecting maximally predictive features for each of the modules. Both the parent and the clinician questionnaire modules key on behavioral patterns similar to those probed by a standard autism diagnostic instrument, the Autism Diagnostic Interview - Revised (ADI-R)^{11}. ADI-R is administered by a trained clinician, and typically gives consistent results across examiners. But its 93 point questionnaire often spanning 2.5 hours of the interviewer and parent's time makes it largely impractical for the primary care setting^{15}. The video assessment module keys on behavioral patterns similar to those probed in another diagnostic instrument, the Autism Diagnostic Observation Schedule (ADOS)^{12}. ADOS is a multi-modular diagnostic instrument, with different modules for subjects at different levels of cognitive development. It is widely considered a gold standard and is one of the most common behavioral instruments used to aid in the diagnosis of autism^{16}. It consists of an interactive and structured examination of the child by trained clinicians in a tightly controlled setting.\n\n\nContext: The authors describe their feature selection approach for modules using parent/clinician questionnaires and video assessments, referencing established diagnostic instruments (ADI-R and ADOS) and their limitations.", "metadata": { "doc_id": "Abbas_2020_3", "source": "Abbas_2020" } }, { "page_content": "Text: For validation, the three modules are applied to assess children in a clinical study using the Cognoa^{14} software. To-date, Cognoa has been used by over 300,000 parents in the US and internationally. The majority of Cognoa users are parents of young children between 18 and 48 months. The clinical study underlying the validation results discussed in the results section consists of a total of 375 at-risk children who had undergone full clinical examination and received a clinical diagnosis at a center specialized in neurodevelopmental disorders^{17}. The outputs of the assessment modules are compared to those of three screening instruments. The Modified Checklist for Autism in Toddlers, Revised (M-CHAT-R)^{7} is a parent-completed questionnaire for autism that is intended to be administered during developmental screenings for children between the ages of 16 and 30 months and is commonly used as an autism screening instrument. The Social Responsiveness Scale - Second Edition (SRS) is another standard ASD screener that is based upon a questionnaire filled out by an examiner^{18--20}. The SRS has a preschool form intended for children of ages 30 months to 54 months, and a school age form intended for children of ages 48 months through 18 years of age. We use SRS “total score” scale as a baseline autism assessment. The Child Behavior Checklist (CBCL)^{9} is a parent-completed questionnaire that provides risk assessments in many categories. We use the “Autism Spectrum Problems” scale of CBCL for comparison. In all cases, the answers to the questions comprising the screeners are coded, then the codes are summed and the sum compared against a threshold to determine whether the child is at risk.\n\nMethods\n\nWe base our approach on de-identified historical patient records. We collect medical instrument score sheet data pertaining to children tested for suspicion of autism, and process those into training sets for the predictive models underlying each of our three autism assessment modules.\n\n\nContext: Validation of the Cognoa software using a clinical study and comparison to established screening instruments.", "metadata": { "doc_id": "Abbas_2020_4", "source": "Abbas_2020" } }, { "page_content": "Text: Since we apply said predictive models in a significantly different setting than the clinics where the corresponding training data were generated, we expect a consequential performance degradation resulting in unacceptable diagnostic accuracy if conventional machine learning methods are used^{13}. To counteract that effect, we apply custom machine learning techniques as detailed in this section, building upon previous experimental work^{15}. The new techniques discussed below are empirical post-hoc feature selection, training data noise injection, and an overfitting-resilient probabilistic combination of module outcomes.\n\nData. Training data were compiled from multiple repositories of de-identified ADOS and ADI-R score sheets of children between 18 and 84 months of age including Boston Autism Consortium, Autism Genetic Resource Exchange, Autism Treatment Network, Simons Simplex Collection, and Vanderbilt Medical Center. To counteract class imbalance, the sample set negative class was supplemented with 59 low risk children random-sampled from Cognoa's user-base, and ADI-R was administered on those additional controls.\n\n\nContext: The authors describe methods to improve the accuracy of their machine learning models for autism diagnosis, addressing performance degradation due to differences between training and application settings.", "metadata": { "doc_id": "Abbas_2020_5", "source": "Abbas_2020" } }, { "page_content": "Text: The diagnostic accuracy of our modules was measured using data from a multi-site blinded clinical validation study (reviewed and approved by Western IRB project number 2202803)^{17}. The study was performed in 2016 and 2017 at three tertiary care centers in the United States. Informed consent was obtained from guardians of each child, and all relevant regulations and guidelines were followed. Children enrolled in the study were 18 to 72 months of age, of English-speaking households, and were all referred through the typical referral process for suspicion of autism. Every child was measured using autism assessment instruments (such as ADOS, M-CHAT-R, and/or CBCL) as appropriate for his or her age. Diagnosis was ultimately ascertained by a licensed health care provider. Prior to the clinical assessment, parents used the Cognoa mobile app to complete the parent questionnaire and video assessment modules, and starting in 2017, a clinician also completed the Cognoa clinician questionnaire. The clinicians were blinded to the results of the assessment rendered by Cognoa. More details on the steps of the clinical study are shown in Fig. 1.\n\nThe enrollment process in 2016 yielded 162 validation samples, which were used to validate the parent questionnaire and video modules. This same clinical enrollment cohort was used as validation dataset in our previous publication on the subject^{15}. Given the learnings from this dataset, and prior to the extension of the study in 2017, several improvements were made to the algorithms including tuning of model thresholds, training combination modules, and performing feature selection for the clinician module which was newly introduced in 2017. The enrollment process in 2017 yielded 213 additional validation participants, bringing the total N to 375 samples over the course of the two years.\n\n\nContext: Clinical validation study design and participant details.", "metadata": { "doc_id": "Abbas_2020_6", "source": "Abbas_2020" } }, { "page_content": "Text: The sample breakdown by cohort, age group, and diagnosis for all data used for training and validation is shown in Table 1. In both the training and the validation datasets, the majority of the “Not autism” class label is composed mostly of children who are diagnosed with an alternate developmental delay (e.g., ADHD or speech and language disorder). Since these conditions share many symptoms with autism, this is a particularly challenging sample for classification . Only seven of the children in the validation sample are neurotypical, suggesting that this sample will be harder to perform correct classifications on than in the general population.\n\nimg-0.jpeg\n\nFigure 1. Detailed steps performed during the clinical study described in this document.\n\nAge (years) Condition Number of samples Parent/Clinician module training Video module training Clinical validation 2016 Clinical validation 2017 $<4$ Autism 414 1445 75 91 $<4$ Not autism 207 539 20 30 $\\geq 4$ Autism 1885 1865 46 60 $\\geq 4$ Not autism 180 410 21 32\n\nTable 1. Dataset breakdown by age group and condition for each of the sources of training data and for the clinical validation sample. Machine learning model training was stratified by age group. Clinical validation 2016 and 2017 samples are used together to evaluate performance of the parent and video modules in this paper, while the clinician module was only available for the clinical 2017 dataset.\n\nAlgorithm methodology. In this section we explain important aspects of our machine learning methodology that are common to the classifiers underlying each of our three assessment modules.\n\n\nContext: Dataset composition and machine learning methodology", "metadata": { "doc_id": "Abbas_2020_7", "source": "Abbas_2020" } }, { "page_content": "Text: Algorithm methodology. In this section we explain important aspects of our machine learning methodology that are common to the classifiers underlying each of our three assessment modules.\n\nTraining procedure. Classifier training, feature selection, and optimization were done separately for children under four years of age and four years of age and over. The parent questionnaire and clinician questionnaire classifiers make predictions based off of answers to questions that probe similar concepts to the ADI-R questionnaire. They were trained using the answers to questions from historical item-level ADI-R score sheets with labels corresponding to established clinical diagnoses. The video module makes predictions based off of answers to questions that probe similar concepts to the ADOS instrument, as recorded by video analysts. It was trained using ADOS instrument score sheets and diagnostic labels. Progressive sampling was used to verify sufficient training volume as detailed in the supplementary materials. Gradient boosted decision trees were used for all three modules as they consistently performed better than other options that were considered such as neural networks, support vector machines, and logistic regression. For all models, hyper-parameters were tuned with a bootstrapped grid search. In all cases, true class labels (ASD or non-ASD) were used to stratify the folds, and (age, label) pairs were used to weight-balance the samples. More details can be found in the supplementary materials.\n\n\nContext: Within the \"Algorithm methodology\" section of a paper detailing a machine learning approach to autism assessment.", "metadata": { "doc_id": "Abbas_2020_8", "source": "Abbas_2020" } }, { "page_content": "Text: In all cases, the machine learning models were trained using historical patient records that correspond to controlled clinical examinations, but focused on application in non-clinical settings aimed for brevity, ease-of-use, and/or unsupervised parent usage at home. These differences introduce biases which can be significant enough to ruin the performance of an algorithm if not properly addressed, and which cannot be probed by cross validation. See the supplementary material for further details. New strategies to address these biases are discussed below that result in big improvements in accuracy compared to previous work ${ }^{15}$.\n\nimg-1.jpeg\n\nFigure 2. An illustration of the methodology for training diagnostic assessment algorithms capable of outputting one of three possible outcomes: “positive”, “negative”, or “inconclusive”. The first binary classifier is only used to assist in training and never at runtime. It is trained to predict binary “autism” vs “not autism”, and these labels are then compared with the true ASD results to label which samples are incorrectly classified. The samples with their “correct” and “incorrect” labels are used to train the classifiers at runtime. A “indeterminate” classifier is trained to predict which samples will have their ASD diagnosis misclassified, which serves as a filter to identify “inconclusive” cases at runtime, while only the predicted “correct” samples are used to train the final binary ASD diagnosis classifier.\n\n\nContext: Machine learning model training and bias mitigation strategies for autism diagnosis.", "metadata": { "doc_id": "Abbas_2020_9", "source": "Abbas_2020" } }, { "page_content": "Text: Inconclusive outcomes. Each of the three modules predicts one of three assessment outcomes: Positive, negative, and inconclusive. As outlined in Fig. 2, support for inconclusive determination is incorporated using a process that involves three separate machine learning training runs. The first model is trained to make predictions that are used to label the samples in the training data that are the most likely to be misclassified. A second model is then trained using these labels to predict the likelihood of any new samples being misclassified. Finally, only samples that are likely to be classified correctly are used to train a final, binary autism classifier. Only the latter two models are used at prediction time: the one to identify and filter out samples that should be labeled “inconclusive”, and the other to make a binary prediction of whether the child is autistic in those which are likely to be correctly labeled. More details about how these models are trained are available in the supplementary material.\n\nParental module. Initial feature selection. The parental questionnaire probes for a minimal set of highly relevant child behavioral patterns that are maximally predictive of autism in combination. Care is taken to phrase the questions and answers such that the most reliable signal can be input from everyday parents undertaking the questionnaire via mobile app without clinical assistance.\n\nTo that effect, a custom feature selection method is devised involving robust bootstrap-driven backwards subtraction, the details of which are discussed in a previous publication^{13}. Out of an initial set of 93 questions under consideration, this produces an optimal set of 17 novel questions for children less than four years old, and 21 questions for children four and older.\n\n\nContext: Within a section detailing the methodology of a machine learning model for autism detection, specifically describing the \"Inconclusive outcomes\" and \"Parental module\" components.", "metadata": { "doc_id": "Abbas_2020_10", "source": "Abbas_2020" } }, { "page_content": "Text: Empirical post-hoc feature selection refinement. Following the conclusion of the 2016 clinical validation study enrollment, we studied differences in the distribution of answers to each question between the training data and the validation data that was collected in the 2016 clinical study. While some questions had quite good agreement, others show a strong bias towards higher (or lower) severity answer choices in the clinical data than in the training data. Questions for which the mean absolute severity difference was statistically greater than three standard errors (averaged over the autism and the non-autism samples) are rejected. This requirement results in the exclusion of 4 out of the 17 questions in the younger cohort, and 8 out of 21 questions in the older cohort, and the models are re-trained (with new hyper-parameter tuning) on the reduced feature set. This further refinement of the selected features minimizes the significant biases due to differences between the training and application environment. See the supplementary material for more details on these differences.\n\nThis feature refinement leads to a larger boost in performance compared with^{15} than any other improvement. The size of the performance improvement is validated on the held-out sample of children collected during 2017, where the new models show a statistically equivalent increase in performance compared with the 2016 sample.\n\nVideo module. The video assessment module consists of a parent upload of 2 or 3 mobile videos, each 1 to 2 minutes in length, of the child during play or meal time at home. The underlying algorithm produces autism assessments based upon the responses of at least three minimally-trained analysts who watch the videos and then respond to a behavioral questionnaire.\n\n\nContext: Following a description of the video assessment module, the authors detail a post-hoc feature selection refinement process to minimize biases between training and validation data.", "metadata": { "doc_id": "Abbas_2020_11", "source": "Abbas_2020" } }, { "page_content": "Text: The data available for training the video module's classification model are taken from ADOS sessions administered by clinicians in standardized clinical settings. Gradient boosted decision trees are trained keying off of the features identified in the analysis of ADOS records. The questionnaires that the video analysts answer are then created to probe for similar behavioral features as those observed in the training data. A challenge of this methodology is that the module must make predictions in the face of missing features that are not observable in the short videos uploaded by the parents. The video analysts are allowed to skip any questions if not answerable based on\n\nthe posted videos. On average, analysts skipped questions 15% of the time, with big variations among particular questions. This effect, combined with the large discrepancy in the observation time window from the original clinical examination to the brief home-video version, would result in a big assessment-accuracy degradation unless steps are taken to correct for the bias and variance .\n\n\nContext: Machine learning model development for autism detection using video analysis.", "metadata": { "doc_id": "Abbas_2020_12", "source": "Abbas_2020" } }, { "page_content": "Text: We tackle this problem by introducing bias and variance to the training data in a manner designed to make it statistically similar to the video analyst answers on which the assessments will be run. The data from the 2016 clinical study^{13} is used to develop this methodology, and the performance of the algorithm on the data from the children enrolled in 2017 is used to validate the generalizability of the improvements. Most children who participated in the clinical study are also administered a full ADOS, which provided paired ADOS and video data that we use to determine what noise patterns to add. Using these paired data, we construct a probability map for each question-response set describing the ways video analysts are likely to respond for a given “true” ADOS response. We then use the mapping as a stochastic transform to build a new training data set that can be thought of as the results of a hypothetical experiment in which the technicians watch parent-supplied video of the children in the training data and respond accordingly.\n\nThe addition of simulated “setting noise” to the classifier training data leads to a larger boost in performance compared with^{15} than any other improvement^{13}. Additionally, the optimal parameters for the resulting decision tree models favor larger tree depth. This is as expected, since the new models are expected to make determinations as to which features are reliable when present, as well as which features to fall back on when the best features are missing.\n\n\nContext: The authors describe a method to improve the performance of their autism detection algorithm by introducing statistical noise to the training data to mimic the variability in human video analyst responses.", "metadata": { "doc_id": "Abbas_2020_13", "source": "Abbas_2020" } }, { "page_content": "Text: Clinician module. We introduce a module to screen for autism using questionnaire responses from a clinician. A pediatrician might answer these questions during a regular checkup. The questions for the clinician were selected in a similar manner as used for the Parental Module (see the supplementary material for details). Responses from both the parent and the clinician are used in a machine learning module in the same manner as described for the parental questionnaire above. Some key behaviors are probed via questions directed at both the parent and the clinician, but the clinician questions are more nuanced and allow for more subtle answer choices. In cases where the parent and the clinician give contradictory answers to the same question, the clinician's answer overrides that of the parent. The clinician module was introduced to the clinical validation study beginning in 2017. Its results are therefore based on a smaller sample size than those of the other modules.\n\n\nContext: Within the description of the system's modules for autism screening.", "metadata": { "doc_id": "Abbas_2020_14", "source": "Abbas_2020" } }, { "page_content": "Text: Feature selection. In order to create a brief clinician questionnaire appropriate for the primary care setting, multiple lists of candidate questions are each compiled and ordered using different strategies. The lists are then intersected and prioritized, then the top features in the intersection set are shortlisted. The first list of candidate questions is prepared by considering those questions from the original medical instruments that had been excluded from the parental questionnaire because they were deemed too difficult for a parent to answer reliably. This list is ranked by feature importance values as measured and ranked by a dedicated offline machine learning training and cross validation run in the same manner as performed for initial parental module feature selection. The second list is prepared from the parental questionnaire questions by simulating the effects of parents over or underestimating answer severities on children with machine learning responses near a decision threshold. Children in the training data for whom the model response was between [0 and 0.1] above the ASD-vs-non ASD decision threshold had their question severities dropped one at a time by one severity value, while children who were between [0 and 0.1] below the decision threshold had their question severities raised by one severity category. The questions in this list are then ranked based upon the average size of the resulting shift in model responses. The procedure is repeated for children in the training data between [0.1 and 0.3] above or below the decision threshold. In each case the top 7 questions are selected (with significant overlap). This results to a total of 9 candidate questions for young children and 10 for older children. The third list is prepared by consulting domain experts for an assessment of the likelihood of each candidate question to benefit from a clinician's input as a complement to the parent's input. This method is conducted separately for each of the two age-silo groups, and results in an overall\n\n\nContext: The authors describe their feature selection process for creating a brief clinician questionnaire to complement parental input in autism diagnosis.", "metadata": { "doc_id": "Abbas_2020_15", "source": "Abbas_2020" } }, { "page_content": "Text: each candidate question to benefit from a clinician's input as a complement to the parent's input. This method is conducted separately for each of the two age-silo groups, and results in an overall clinician questionnaire of 13 questions for children 18 through 47 month old, and 15 questions for children 4 to six years old.\n\n\nContext: Development of a clinician questionnaire for autism screening.", "metadata": { "doc_id": "Abbas_2020_16", "source": "Abbas_2020" } }, { "page_content": "Text: Module combination. Due to limitations on available training data, it is not possible to train a single combined model that uses the input features from each of the parental, video, and clinician modules. Instead, responses from the modules are each considered to be a probability and combined mathematically^{21} using the equation:\n\n$$ r_{c o n d s}=\\left(I^{\\mathrm{T}} \\Sigma^{-1} R\\right) *\\left(I^{\\mathrm{T}} \\Sigma^{-1} I\\right)^{-1} $$\n\nWhere r_{comb} is the result of the combination, I is a vector of 1 s , R is a vector of responses for each module to be combined, and Σ is the covariance matrix of the response residuals compared to the true diagnosis. The “training” of the combination module consists of calculating the values of Σ to use in this equation, which is done using the responses of each module on data from the clinical study. For each child, the Σ values in the r_{comb} equation were calculated with that child excluded. This process is similar to leave-one-out cross validation, and ensures that the results reported for our combination procedure do not suffer from overfitting.\n\nSince Eq. (1) produces only a single model response, the determination of “inconclusive” outcomes proceeds in a different manner than for the individual assessment modules. Both a lower and an upper threshold are applied on the combined response. Children with a response less than both thresholds are considered to be non-ASD, children with a response in between the two thresholds are considered to be inconclusive, and children with a response greater than both thresholds are considered to have ASD. As in the single model cases, the two thresholds can be tuned independently to optimize the sensitivity, specificity, and model coverage.\n\nimg-2.jpeg\n\n\nContext: The authors describe their approach to combining the outputs of separate modules (parental, video, clinician) to produce a final ASD diagnosis, detailing the mathematical equation used and how it's trained.", "metadata": { "doc_id": "Abbas_2020_17", "source": "Abbas_2020" } }, { "page_content": "Text: img-2.jpeg\n\nFigure 3. ROC curves on the clinical sample for the parent, video, and clinician modules, separately and in combination. Inconclusive determination is allowed for up to $30 \\%$ of the cases. The established screening tools M-CHAT-R, SRS-2 and CBCL are compared as baselines. The ROC curve for the M-CHAT-R baseline instrument only includes children under four years of age since M-CHAT-R is not applicable for older children.\n\nimg-3.jpeg\n\nFigure 4. ROC curves on kids $<4$ years of age in the clinical sample for the parent, video, and clinician modules, separately and in combination. Inconclusive determination is allowed for up to $30 \\%$ of the cases. The established screening tools M-CHAT-R, SRS-2 and CBCL are compared as baselines.\n\nResults\n\nEach of the individual Cognoa assessment modules, their combinations, as well as 3 baselines based on commonly-used autism screening instruments (CBCL, M-CHAT-R, and SRS) are evaluated on the data collected during a blinded clinical study. When the inconclusive determination feature is turned off and all samples are required to be assessed conclusively, the Cognoa assessment modules achieve ROC AUC up to 0.83 and sensitivity and specificity up to $80 \\%$ and $75 \\%$ respectively. Turning on the inconclusive determination feature with an allowance of up to $30 \\%$ inconclusive outcomes results in an accuracy improvement over the conclusive samples, with AUC up to 0.92 with sensitivity and specificity up to $90 \\%$ and $83 \\%$ respectively. This performance is shown to be a statistically significant improvement over each of the baselines used for comparison.\n\n\nContext: Results of a clinical study evaluating the performance of Cognoa assessment modules and comparing them to established autism screening tools.", "metadata": { "doc_id": "Abbas_2020_18", "source": "Abbas_2020" } }, { "page_content": "Text: ROC curves in Fig. 3 show how the parent module performs individually, as well as in combination with the video and clinician modules at a $30 \\%$ inconclusive rate allowance. Figure 4 shows a similar comparison with all the variants consistently restricted to children under four years of age. ROC curves corresponding to the assessment modules with the inconclusive allowance turned off can be found in the supplementary material.\n\nStatistical model performance comparisons between assessment modules and baselines are shown in Table 2. For each comparison, the subset of children for whom both screeners were administered are selected ( $n$ in the table), and 10,000 bootstrapping experiments are run where $n$ children are selected with replacement. The average and $[5 \\%, 95 \\%]$ confidence interval improvements in AUC and the specificity between the screeners are evaluated across all bootstrapping experiments. In the case of specificity the calculation of the improvement is performed using thresholds that are set to achieve $90 \\%$ sensitivity.\n\nTable 2 shows that Cognoa modules show an improvement of at least 0.26 in AUC and at least 0.52 in specificity compared with the CBCL and SRS-2 screeners at $95 \\%$ confidence level. Due to the fact that M-CHAT-R screener is only evaluated on younger children the statistical uncertainty in the comparison is larger, however, it also shows an improvement of at least 0.08 in AUC and 0.11 in specificity at $95 \\%$ confidence level. In these comparisons we allow Cognoa assessment modules to decide to hold aside up to $30 \\%$ of the hardest cases as inconclusive. The same comparisons when we force the classification on all of the hardest cases can be found in Table 3 of the supplementary material.\n\n\nContext: Results of statistical model performance comparisons and ROC curve analysis.", "metadata": { "doc_id": "Abbas_2020_19", "source": "Abbas_2020" } }, { "page_content": "Text: Table 2. Statistical tests of performance improvements between models in this paper and standard baseline screening models. $\\triangle \\mathrm{AUC}$ tells us the increase in AUC found in the screeners of this paper across bootstrapping experiments. $\\Delta$ Specificity tells us the increase in the specificity in the bootstrapping experiments at a threshold designed to achieve $90 \\%$ sensitivity. Each $\\Delta$ calculation shows the average value of the improvement along with the $[0.05,0.95]$ confidence interval.\n\nTime to completion comparisons. A random sample of 529 Cognoa users was used in order to measure time to completion of each of the Cognoa autism assessment modules. The median time to completion of the parent module was just under 4 minutes. The median time to completion of the clinician module was 1.2 minutes. The median time per video analysts to score a videos was 20 minutes. More details can be found in the supplementary material. The results indicate that the parent and clinician modules can be completed in as little time as most established autism screeners and in some cases much faster, while achieving significantly higher accuracy. The time required for a video analyst to score a video is more lengthy, however, the turnaround time is faster than for an ADOS administration ${ }^{12}$ and can be performed by minimally trained analysts as opposed to certified clinical practitioners.\n\nDiscussion\n\n\nContext: Results of statistical performance improvements and time to completion comparisons of the Cognoa autism assessment modules.", "metadata": { "doc_id": "Abbas_2020_20", "source": "Abbas_2020" } }, { "page_content": "Text: Discussion\n\nWe presented a multi-modular assessment consisting of three machine learning modules for the identification of autism via mobile App as well as an evaluation of their performance and time-to-completion in a blinded clinical study. The assessment modules outperform conventional autism screeners, as shown in Table 2 and Fig. 3. The accuracy of the combined assessment is similar to that of gold-standard instruments such as ADOS and ADI-R ${ }^{22}$, without requiring hours of time from certified clinical practitioners. This suggests the potential for the Cognoa assessment to be useful as an autism diagnostic. The high performance of these modules benefits from the use of the techniques described in this paper to identify and set aside up to $30 \\%$ of the most challenging samples as inconclusive. The supplementary material of this paper shows that we outperform conventional autism screeners without this technique as well.\n\n\nContext: Results and Discussion of a novel multi-modular machine learning assessment for autism identification.", "metadata": { "doc_id": "Abbas_2020_21", "source": "Abbas_2020" } }, { "page_content": "Text: Important open questions remain. First, in all cases in this paper, the assessment modules were validated on children who had been preselected as having high risk of autism. Children that are pre-selected in this way tend to have autism-like characteristics regardless of their true diagnosis, increasing the challenge of distinguishing true ASD cases. These modules are expected to perform better on a general population sample of children. Further work is needed to verify this hypothesis by conducting clinical studies on children from the general population. Second, the clinician module newly presented in this work appears promising, but so far it has only been applied in a secondary-care setting. Further testing in primary care clinics is needed to validate accuracy in that setting. In addition, two wider avenues of exploration are interesting as further steps. First, while these assessment modules have been shown to be effective at identifying the presence or absence of autism, our goal is to extend them to identify the severity of the condition (if present) as well. Second, the techniques presented in this paper could potentially be used to build algorithms for other child behavioral conditions than autism, as well as behavioral conditions affecting adults and seniors.\n\nReceived: 12 March 2019; Accepted: 20 February 2020; Published online: 19 March 2020\n\nReferences\n\nBaiso, J. et al. Prevalence of autism spectrum disorder among children aged 8 years- autism and developmental disabilities monitoring network, 11 sites, united states, 2014. MMWR Surveill Samm 67 (No. S5-6), 1-23, https://doi.org/10.15585/mmwr. ss6706a1 (2018).\n\nAssociation., A. P. \\& Association., A. P. Diagnostic and statistical manual of mental disorders : DSM-5 (American Psychiatric Association Arlington, VA, 2013), 5th ed. edn.\n\nPatten, E., Ausderau, K. K., Watson, L. R. \\& Baranek, G. T. Sensory response patterns in nonverbal children with aad. Autism Res. Treat 2013, https://doi.org/10.1155/2013/436286 (2013).\n\n\nContext: Concluding remarks and future directions of a research paper on machine learning modules for autism assessment.", "metadata": { "doc_id": "Abbas_2020_22", "source": "Abbas_2020" } }, { "page_content": "Text: Patten, E., Ausderau, K. K., Watson, L. R. \\& Baranek, G. T. Sensory response patterns in nonverbal children with aad. Autism Res. Treat 2013, https://doi.org/10.1155/2013/436286 (2013).\n\nDawson, G. \\& Bernier, R. A quarter century of progress on the early detection and treatment of autism spectrum disorder. Dev. Psychopathol. 25, 1455-1472, https://doi.org/10.1017/S0954579413000710 (2013).\n\nDawson, G. et al. Randomized, controlled trial of an intervention for toddlers with autism: The early start denver model. Pediatrics 125, e17-e23 https://doi.org/10.1542/peds.2009-0958, http://pediatrics.aappublications.org/content/125/1/e17.full.pdf (2010).\n\nGordon-Lipkin, E., Foster, J. \\& Peacock, G. Whittling down the wait time exploring models to minimize the delay from initial concern to diagnosis and treatment of autism spectrum disorder. Pediatr. clinics North Am.63, 851-859 https://doi.org/10.1016/j. pcl.2016.06.007 (2016). Exported from https://app.dimensions.ai on 2018/10/19.\n\nBernier, R., Mao, A. \\& Yen, J. Diagnosing autism spectrum disorders in primary care. Practitioner 255(1745), 27-30 (2011).\n\nAchenbach, T. \\& Rescorla, L. Manual for the ASEBA preschool forms \\& profiles. Univ. Vermont, Res. Cent. for Child. Youth \\& Fam. (2000).\n\nNorris, M. \\& Lecavalier, L. Screening accuracy of level 2 autism spectrum disorder rating scales: A review of selected instruments. Autism, 14, 263-284, https://doi.org/10.1177/1362361309348071 (2010). PMID: 20591956,\n\nCharman, T. \\& Gotham, K. Measurement issues: Screening and diagnostic instruments for autism spectrum disorders - lessons from research and practise. Child Adolesc. Mental Heal 18, 52-63, https://doi.org/10.1111/j.1475-3588.2012.00664.x.\n\nLord, C., Rutter, M. \\& Le Couteur, A. Autism diagnostic interview-revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J. Autism Dev. Disord. 24, 659-685 (1994).\n\n\nContext: A list of references cited in the article, focusing on diagnostic tools and research related to autism spectrum disorder.", "metadata": { "doc_id": "Abbas_2020_23", "source": "Abbas_2020" } }, { "page_content": "Text: Lord, C. et al. Autism diagnostic observation schedule: a standardized observation of communicative and social behavior. J Autism Dev Disord 19, 185-212 (1989).\n\nAbbas, H., Garberson, F., Glover, E. \\& Wall, D. P. Machine learning approach for early detection of autism by combining questionnaire and home video screening. J. Am. Med. Informatics Assoc. ocyt039, 10.1093_jamia_ocyt039/1/ocyt039 (2018).\n\nCognoa, Inc. 2390 El Camino Real St 220, Palo Alto, CA 94306 https://www.cognoa.com/.\n\nWall, D. P., Dally, R. L., Luyster, R., Jung, J.-Y. \\& DeLuca, T. F. Use of artificial intelligence to shorten the behavioral diagnosis of autism. PLoS One, https://doi.org/10.1371/journal.pone.0043855. (2012).\n\nLord, C. et al. A multisite study of the clinical diagnosis of different autism spectrum disorders. Arch. Gen. Psychiatry 69, 306-313 (2012).\n\nKanne, S., Arnstein Carpenter, L. \\& Warren, Z. Screening in toddlers and preschoolers at risk for autism spectrum disorder: Evaluating a novel mobile-health screening tool. Autism Res. (2017).\n\nMoody, E. J. et al. Screening for autism with the srs and scq: Variations across demographic, developmental and behavioral factors in preschool children. J. Autism Dev. Disord. 47, 3550-3561 (2017).\n\nAldridge, F. J., Gibbs, V. M., Schmidhofer, K. \\& Williams, M. Investigating the Clinical Usefulness of the Social Responsiveness Scale (SRS) in a Tertiary Level, Autism Spectrum Disorder Specific Assessment Clinic. Journal of Autism and Developmental Disorders 42(8), 294-300 (2012).\n\nSchanding, G. T., Nowell, K. P. \\& Goin-Kochel, R. P. Utility of the Social Communication Questionnaire-Current and Social Responsiveness Scale as Teacher-Report Screening Tools for Autism Spectrum Disorders. Journal of Autism and Developmental Disorders 42(8), 1705-1716 (2012).\n\nJacobs, R. A. Methods for combining experts' probability assessments. Neural Computation 7, 867-888, https://doi.org/10.1162/ neco.1995.7.5.867 (1995).\n\n\nContext: References related to autism diagnostic tools and machine learning approaches.", "metadata": { "doc_id": "Abbas_2020_24", "source": "Abbas_2020" } }, { "page_content": "Text: Jacobs, R. A. Methods for combining experts' probability assessments. Neural Computation 7, 867-888, https://doi.org/10.1162/ neco.1995.7.5.867 (1995).\n\nFalkmer, T., Anderson, K., Falkmer, M. \\& Horlin, C. Diagnostic procedures in autism spectrum disorders: a systematic literature review. Eur. Child \\& Adolesc. Psychiatry 22, 329-340, https://doi.org/10.1007/s00787-013-0375-0 (2013).\n\nAuthor contributions\n\nHalim Abbas, Ford Garberson, and Stuart Liu-Mayo were each involved in all aspects of model development, optimization, training, and validation of the models for each of the modules, as well as the writing of this paper. Eric Glover and Dennis Wall provided advice on the development of the algorithms. Halim Abbas, Ford Garberson, Stuart Liu-Mayo and Dennis Wall co-wrote the manuscript.\n\nCompeting interests\n\nAll authors are affiliated with Cognoa Inc. in an employment and/or advisory capacity.\n\nAdditional information\n\nSupplementary information is available for this paper at https://doi.org/10.1038/s41598-020-61213-w. Correspondence and requests for materials should be addressed to E.G. Reprints and permissions information is available at www.nature.com/reprints. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.\n\n\nContext: This section details the literature review, author contributions, and competing interests related to a study utilizing machine learning for autism diagnosis.", "metadata": { "doc_id": "Abbas_2020_25", "source": "Abbas_2020" } }, { "page_content": "Text: Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. (c) The Author(s) 2020\n\n\nContext: Concluding section of a scientific article detailing licensing and copyright information.", "metadata": { "doc_id": "Abbas_2020_26", "source": "Abbas_2020" } }, { "page_content": "Text: img-0.jpeg\n\nJMIR Mhealth Uhealth, 2015 Apr-Jun; 3(2): e68. PMCID: PMC4526946 Published online 2015 Jun 17. doi: 10.2196/mhealth.4393: 10.2196/mhealth. 4393 PMID: 26085230\n\nA Novel System for Supporting Autism Diagnosis Using Home Videos: Iterative Development and Evaluation of System Design\n\nMonitoring Editor: Gunther Eysenbach Reviewed by Anthony Ellertson and Jasjit Suri Nazneen Nazneen, PhD, ${ }^{S 1}$ Agata Rozga, PhD, ${ }^{1}$ Christopher J Smith, PhD, ${ }^{2}$ Ron Oberleitner, MBA, ${ }^{3}$ Gregory D Abowd, PhD, ${ }^{1}$ and Rosa I Arriaga, PhD ${ }^{1}$ ${ }^{1}$ Georgia Institute of Technology, Atlanta, GA, United States ${ }^{2}$ Southwest Autism Research \\& Resource Center, Phoenix, AZ, United States ${ }^{3}$ Behavior Imaging Solutions, Boise, ID, United States Nazneen Nazneen, Georgia Institute of Technology, Technology Square Research Building 85 Fifth Street NW, Atlanta, GA, 30332, United States, Phone: 1 4049033916, Fax: 1 (404)894 7452, Email: nazneen@gatech.edu. ${ }^{S}$ Corresponding author. Corresponding Author: Nazneen Nazneen nazneen@gatech.edu Received 2015 Mar 4; Revisions requested 2015 Apr 26; Accepted 2015 Apr 26. Copyright ©Nazneen Nazneen, Agata Rozga, Christopher J. Smith, Ron Oberleitner, Gregory D Abowd, Rosa I Arriaga. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 17.06.2015.\n\nThis is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.\n\nAbstract\n\nBackground\n\n\nContext: This is the introduction to an article describing a novel system for supporting autism diagnosis using home videos, published in JMIR mHealth and uHealth.", "metadata": { "doc_id": "15_Nazneen_0", "source": "15_Nazneen" } }, { "page_content": "Text: Abstract\n\nBackground\n\nObserving behavior in the natural environment is valuable to obtain an accurate and comprehensive assessment of a child's behavior, but in practice it is limited to in-clinic observation. Research shows significant time lag between when parents first become concerned and when the child is finally diagnosed with autism. This lag can delay early interventions that have been shown to improve developmental outcomes.\n\nObjective\n\nTo develop and evaluate the design of an asynchronous system that allows parents to easily collect clinically valid in-home videos of their child's behavior and supports diagnosticians in completing diagnostic assessment of autism.\n\nMethods\n\nFirst, interviews were conducted with 11 clinicians and 6 families to solicit feedback from stakeholders about the system concept. Next, the system was iteratively designed, informed by experiences of families using it in a controlled home-like experimental setting and a participatory design process involving domain\n\nexperts. Finally, in-field evaluation of the system design was conducted with 5 families of children ( 4 with previous autism diagnosis and 1 child typically developing) and 3 diagnosticians. For each family, 2 diagnosticians, blind to the child's previous diagnostic status, independently completed an autism diagnosis via our system. We compared the outcome of the assessment between the 2 diagnosticians, and between each diagnostician and the child's previous diagnostic status.\n\nResults\n\n\nContext: This abstract describes the development and evaluation of a system (NODA) designed to facilitate remote autism diagnosis through parent-collected videos.", "metadata": { "doc_id": "15_Nazneen_1", "source": "15_Nazneen" } }, { "page_content": "Text: Results\n\nThe system that resulted through the iterative design process includes (1) NODA smartCapture, a mobile phone-based application for parents to record prescribed video evidence at home; and (2) NODA Connect, a Web portal for diagnosticians to direct in-home video collection, access developmental history, and conduct an assessment by linking evidence of behaviors tagged in the videos to the Diagnostic and Statistical Manual of Mental Disorders criteria. Applying clinical judgment, the diagnostician concludes a diagnostic outcome. During field evaluation, without prior training, parents easily (average rating of 4 on a 5 -point scale) used the system to record video evidence. Across all in-home video evidence recorded during field evaluation, $96 \\%(26 / 27)$ were judged as clinically useful, for performing an autism diagnosis. For 4 children ( 3 with autism and 1 typically developing), both diagnosticians independently arrived at the correct diagnostic status (autism versus typical). Overall, in $91 \\%$ of assessments (10/11) via NODA Connect, diagnosticians confidently (average rating 4.5 on a 5-point scale) concluded a diagnostic outcome that matched with the child's previous diagnostic status.\n\nConclusions\n\nThe in-field evaluation demonstrated that the system's design enabled parents to easily record clinically valid evidence of their child's behavior, and diagnosticians to complete a diagnostic assessment. These results shed light on the potential for appropriately designed telehealth technology to support clinical assessments using in-home video captured by families. This assessment model can be readily generalized to other conditions where direct observation of behavior plays a central role in the assessment process.\n\nKeywords: asynchronous telemedicine system, in-home behavior recording, naturalistic observation diagnostic assessment, NODA Connect, NODA smartCapture, remote autism diagnosis\n\nIntroduction\n\nBackground and Motivation\n\n\nContext: The authors describe the design and field evaluation of a telehealth system (NODA) for remote autism diagnosis, detailing its components and demonstrating its effectiveness in facilitating parent-recorded video evidence and clinician assessments.", "metadata": { "doc_id": "15_Nazneen_2", "source": "15_Nazneen" } }, { "page_content": "Text: Introduction\n\nBackground and Motivation\n\nAccording to the Centers for Disease Control and Prevention, the prevalence of autism in the United States has been increasing dramatically, from 1 in 150 to 1 in 68 children between 2000 and 2010 [1,2]. Over the same period, the median age of diagnosis remained relatively stable, around 53 months [1,2].\n\nOne key challenge with respect to diagnosing autism is the significant time lag (20-60 months) between the age at which parents first become concerned about their child's development and the age at which the child receives a diagnosis from a qualified professional [3-5]. Moreover, many ethnic minorities, lowincome families, and rural communities lack access to health care professionals with autism-specific expertise, resulting in delays in diagnosis [6-9]. Even in urban communities where services are more widely available, timely access to diagnostic services is often hampered by long waiting lists. Delays in diagnosis can lead to delays in early intervention services that have been shown to improve future learning capabilities and developmental outcomes [10-13].\n\nAnother key challenge with respect to diagnosing autism is that although clinical professionals acknowledge that observing behavior in the natural environment (eg, the home) is preferred for a comprehensive assessment, in practice behavioral observations are limited to a single in-clinic observation $[10,11,14]$. There are various barriers to more widespread use of home-based observation [14-18]. It is time consuming and resource intensive for clinicians to travel to each family's home to conduct an observation, and impractical to do so for remotely located families. Even when home visits are feasible, the presence of an unfamiliar observer may cause children to alter their behavior due to their awareness of being observed. Such reactivity poses a threat to the validity of any data that are collected. In addition, child behaviors of interest may not occur during the short span of a clinician's home visit.\n\n\nContext: This section introduces the problem of delayed autism diagnosis, challenges in access to specialists, and limitations of traditional assessment methods.", "metadata": { "doc_id": "15_Nazneen_3", "source": "15_Nazneen" } }, { "page_content": "Text: In this paper, we consider the opportunity of designing a telehealth solution to support remote diagnosis of autism through parent-recorded behavioral evidence of child suspected to have autism in the home. Telehealth technology can connect families with clinicians and accelerate the diagnostic process. Indeed, such technologies have recently been investigated as a means of supporting the delivery of treatment for individuals with autism spectrum disorder, including remote coaching of parent-implemented early intervention programs [19-21], behavioral assessments [22,23], and professional development [24]. Few attempts have been made at exploring the potential for such technologies to support diagnostic assessments $[25,26]$.\n\nMost current telehealth technologies support a real-time interaction between a remotely located clinician and a caregiver or patient. By contrast, \"store-and-forward\" telehealth systems support video recordings of live events, which are subsequently shared with a clinical expert for review and assessment. The latter asynchronous approach, which we have adopted, offers several key advantages particularly relevant to the case of remote diagnosis of autism. It enables families to record videos in their home, in the course of their day-to-day activities, which ensures the capture of natural expressions of child behavior that are widely acknowledged as crucial to making an accurate and comprehensive diagnostic assessment [10,11,14]. Moreover, because home recordings can be carried out over the course of several days, they may mitigate some of the consequences of a single clinic-based or live telehealth assessment. These include the child's reactivity, child's current mood or level of fatigue, or the likelihood that low-frequency behaviors may not be observed. From a practical standpoint, it minimizes the need to coordinate schedules with a clinician, and reduces the need for remotely located families to travel long distances to a clinic.\n\nResearch Questions and Contribution\n\n\nContext: The authors are introducing a telehealth solution for remote autism diagnosis and explaining the rationale for using a \"store-and-forward\" approach using parent-recorded videos.", "metadata": { "doc_id": "15_Nazneen_4", "source": "15_Nazneen" } }, { "page_content": "Text: Research Questions and Contribution\n\nOur research addresses two key challenges in designing a system for remote autism diagnosis using home videos. Perhaps the most important challenge is how to enable parents to easily record clinically relevant video evidence of their child's behavior. Diagnostic assessments are typically designed to enable the diagnostician to observe the child under more or less structured periods and different situations for a rich sampling of the child's behavior. Parents can record and share their concerns about their child, but may not know the specific types of situations and behaviors the diagnostician needs to observe. Thus, the first research challenge is how the parent-recorded video can be turned into meaningful clinical evidence through the use of the right kind of technology and the design of the right user experience with that technology. This paper summarizes our work on identifying and evaluating specific design features that contribute to ease of use of the in-home recording system and clinical validity of parent-recorded video evidence. The second challenge involves supporting diagnosticians in completing the remote autism diagnosis. This involves enabling diagnosticians to review the videos in a systematic and structured way so that they can map the situations and behaviors that they observe in the video to the Diagnostic and Statistical Manual of Mental Disorders (DSM) diagnostic criteria [27,28]. In other words, the parentrecorded video of child behavior must produce observations that become evidence to support clinical judgment. This paper summaries our work on identification and evaluation of specific design features that support diagnosticians in completing a remote diagnostic assessment.\n\n\nContext: Following a literature review and introduction of the research, this section outlines the two primary research questions guiding the development of a remote autism diagnosis system using home videos.", "metadata": { "doc_id": "15_Nazneen_5", "source": "15_Nazneen" } }, { "page_content": "Text: The system that resulted from this work includes two components: NODA smartCapture and NODA Connect. NODA smartCapture is a mobile phone-based application that enables parents to easily record clinically relevant prescribed video evidence of their child's behavior. It supports recording and uploading 4 , up to 10 -minute long naturalistic observation diagnostic assessment (NODA) scenarios, that were chosen based on pilot research on video-based diagnosis of autism [29]. These scenarios include (1) the child playing alone, (2) the child playing with a sibling or peer, (3) a family mealtime, and (4) any behavior that is of concern to the parent. The first 3 scenarios provide opportunities for typical social communication and play-based behaviors, whereas the last scenario allows parents to share evidence of a behavior that is of particular concern to them. The NODA Connect is a Web portal for diagnosticians to direct in-home video collection, access the child's developmental history, and conduct a remote diagnostic assessment by linking evidence of behaviors tagged in the videos to DSM criteria. Relying on clinical judgment, the diagnostician renders an opinion about the child's diagnostic outcome.\n\n\nContext: The authors describe the two components of their system, NODA smartCapture and NODA Connect, designed to facilitate remote autism diagnosis through parent-recorded videos and clinician assessment.", "metadata": { "doc_id": "15_Nazneen_6", "source": "15_Nazneen" } }, { "page_content": "Text: The rest of this paper is structured as follows. We first describe interviews with relevant stakeholders about our system concept, that is, \"remote assessment based on in-home video evidence.\" Next, we outline the iterative design of NODA smartCapture and NODA Connect. We then describe an in-field evaluation, in which 5 families used NODA smartCapture in their homes to collect prescribed behavioral evidence and 2 diagnosticians reviewed videos from each family and independently performed an autism diagnosis using NODA Connect. We report results on ease of use of NODA smartCapture, clinical validity of recorded evidence, and NODA Connect's support for diagnosticians in completion of a remote diagnostic assessment. We conclude with a discussion on the utility and limitations of the system, potential design enhancements, our vision for adoption of the system within current autism diagnostic practices, and how such a prescription, collection, and assessment model can be generalized to other clinical applications.\n\nMethods\n\nOverview\n\nFirst, interviews were conducted with parents of children with autism and clinicians to seek input from these key stakeholders about the overall concept of the system. Next, NODA smartCapture was iteratively designed, informed by experiences of families using it in a controlled experimental home-like setting. The NODA Connect was designed using a participatory design process involving a collaborating diagnostician with 20 years of experience in autism diagnosis and a domain expert in autism. Finally, an in-field evaluation was conducted with families and diagnosticians.\n\nStage 1: Insight from Stakeholders\n\n\nContext: Following an introduction to the paper, this section outlines the methods used to develop and evaluate the NODA system.", "metadata": { "doc_id": "15_Nazneen_7", "source": "15_Nazneen" } }, { "page_content": "Text: Stage 1: Insight from Stakeholders\n\nWe conducted a series of structured one-on-one interviews, each lasting 2 hours, with parents $(\\mathrm{n}=7$ ) of children with autism and clinicians $(\\mathrm{n}=11)$ who work with this population. These interviews allowed us to gather input from these key stakeholders about the overall system concept of \"remote assessment based on in-home video evidence,\" and about its feasibility and potential utility. During the interview, a mock-up design based on previous studies $[29,30]$ was presented to elicit feedback from stakeholders. The mock-up design included a mobile phone-based recording application to record the NODA scenarios and a Webbased assessment portal to review the videos and tag behaviors for assessment.\n\nStage 2: Iterative Development\n\nsmartCapture Iterative development The NODA smartCapture application resulting from stage 1 was iteratively improved, informed by the experiences of families using it in a home-like experimental setup. Our goal was to identify specific design features that would enable parents to easily record clinically valid video evidence by analyzing the usage pattern of the recording application in a controlled setting. Families $(\\mathrm{n}=8)$ and their child with autism as well as any siblings $(\\mathrm{n}=18)$ visited the Georgia Tech Aware Home [31] for 2 hours, 1 family at a time. The Aware Home has the look of a single-family home (fully equipped kitchen; living room, bedrooms, and bathroom; furniture; TV, etc) except there are a number of cameras installed throughout, enabling both recording and viewing a live feed of interactions happening in different parts of the home (Figure 1).\n\n\nContext: This section describes the methods used to develop the NODA system, including stakeholder interviews and iterative development of the smartCapture application within the Georgia Tech Aware Home.", "metadata": { "doc_id": "15_Nazneen_8", "source": "15_Nazneen" } }, { "page_content": "Text: Parents were asked to use the NODA smartCapture to record the 4 NODA scenarios, for up to 10 minutes each. Before the study, the Aware Home was set up with toys and items for the snack (to simulate the family mealtime). Ceiling-mounted cameras allowed us to observe the family from another room live as they were using NODA smartCapture, and to record the whole session for subsequent review.\n\nAfter the video recording, each parent completed an interview and was asked to rate the ease of use of the system on a scale ranging from 1 (not easy to use) to 5 (very easy to use). In addition to this feedback from parents and the video evidence recorded by them using NODA smartCapture, video recordings from the fixed ceiling cameras were reviewed to gain further insight into how parents used the system.\n\nThe collaborating diagnostician was asked to rate each parent-collected video, for its clinical validity, on a scale of $0-2$ and give a justification for the assigned rating. A rating of \" 0 \" means that the video is not clinically valid for conducting assessment whereas a rating of \" 2 \" indicates that the video is clinically\n\nvalid. A rating of \" 1 \" indicates that the video is clinically valid but an additional video might be required to fully assess the associated scenario. By analyzing the collaborating diagnostician's reasons for the assigned rating, we identified specific issues that lowered the clinical value of a video evidence.\n\nAfter the first 4 of the total 8 families completed their participation, the design of NODA smartCapture was revised based on initial findings about ease of use and issues lowering clinical utility. The revised system was tested with the remaining 4 families. Once all families completed participation, NODA smartCapture was subsequently improved based on findings from the experience of the last 4 families.\n\n\nContext: Methods section describing the data collection process using NODA smartCapture in a study involving families and clinicians assessing autism.", "metadata": { "doc_id": "15_Nazneen_9", "source": "15_Nazneen" } }, { "page_content": "Text: NODA Connect Iterative Development The NODA Connect Web portal was designed, through an iterative design process, for diagnosticians to direct in-home video-collection process and conduct remote diagnostic assessment. Our goal was to identify specific features that would support diagnosticians in completing the diagnostic assessment for autism based on parent-collected videos, developmental history information, and their clinical judgment. The initial design of NODA Connect was informed by previous pilot research [29] and feedback from stakeholders solicited during the Stage 1 interviews about the system concept. However, major design contributions came from a participatory design process involving a collaborating diagnostician who had 20 years of experience in conducting autism diagnosis and a domain expert in autism. Participatory design is a common method in the technology design community whereby the designer works closely with the target user to collaboratively iterate on the design of a technology [32]. Before the in-field evaluation in the final stage of the research, the design of the NODA Connect platform was further improved based on findings from a pilot assessment conducted via NODA Connect by the collaborating diagnostician.\n\nStage 3: In-Field Evaluation\n\n\nContext: This section describes the iterative development of the NODA Connect web portal and its evaluation in a real-world setting to support remote autism diagnosis.", "metadata": { "doc_id": "15_Nazneen_10", "source": "15_Nazneen" } }, { "page_content": "Text: Stage 3: In-Field Evaluation\n\nThe iterative design process described in Stage 2 resulted in a final design of the remote autism diagnostic assessment system that was then evaluated in the field. During this evaluation, the parents used NODA smartCapture in their homes to record behavior evidence and the diagnosticians used NODA Connect to review and tag the videos, and to complete a diagnostic assessment. We recruited 4 families with at least 1 child with a previously confirmed diagnosis on the autism spectrum and 1 family with a typically developing child. Children were between 2 and 6 years of age (average 4 years). Parents were not given any prior training on NODA smartCapture. They were hand-delivered a kit that included NODA smartCapture preinstalled on an iPod touch and a tripod for mounting the iPod. During an in-home deployment that lasted an average of 2 weeks, each family was asked to complete a brief child developmental history online and use the NODA smartCapture application to record and upload the 410 minute NODA scenarios. The collaborating diagnostician remotely guided the in-home evidencecollection process by reviewing the videos as they were uploaded and sending alerts to the family as needed to request that they rerecord a particular scenario.\n\n\nContext: Following iterative design, the system was evaluated in families' homes to assess usability and effectiveness of remote data collection.", "metadata": { "doc_id": "15_Nazneen_11", "source": "15_Nazneen" } }, { "page_content": "Text: We recruited 3 diagnosticians experienced in autism diagnosis and unfamiliar with our system to complete the independent diagnostic assessments via NODA Connect. Each family's videos were reviewed by at least 2 diagnosticians, who were blind to the diagnostic status of the child. After completing each diagnostic assessment via NODA Connect, the diagnosticians concluded whether the child had autism or was typically developing. They were prompted to assign confidence ratings to the diagnostic outcome: \"How confident are you that the child has autism?\" and \"How confident are you that the child is typically developing?\" on a scale from 1 (not confident) to 5 (extremely confident). Including both of these ratings allowed diagnosticians to indicate diagnostic uncertainty in cases where they were confident that the child does not have autism but also did not think the child was typically developing. Other than the child's age, no other information was disclosed to the diagnosticians about the child's developmental history until they completed the diagnostic assessment via NODA Connect and reached a decision about the child's diagnostic outcome. At the end of this process, a follow-up interview was conducted for the diagnosticians to reflect on their experience of remote diagnostic assessment. The child's previous diagnosis and developmental history were revealed during the interview.\n\nData analysis of the in-field evaluation of NODA smartCapture consisted of assessing its ease of use based on parent ratings, the quality of the videos recorded by parents, and the system log about parents' reliance on help menu and navigation patterns though NODA smartCapture. The collaborating diagnostician rated the clinical utility of parent-recorded videos using the same scale as described earlier for Stage 2. Data\n\n\nContext: Evaluation of a remote autism diagnostic system (NODA) and its components.", "metadata": { "doc_id": "15_Nazneen_12", "source": "15_Nazneen" } }, { "page_content": "Text: analysis of the in-field evaluation of NODA Connect consisted of analysis of how diagnosticians completed diagnostic assessment by tagging videos and completing DSM checklist through NODA Connect. To conduct this analysis, videos of screen capture when diagnosticians were conducting assessment through NODA Connect were examined. In addition, for each child, we compared the outcome of the assessment between the 2 diagnosticians, and between each diagnostician and the child's previous diagnostic status.\n\nResults\n\nEase of Use of NODA smartCapture\n\nBased on experiments in the controlled home-like setting in Stage 2, three main features were added to NODA Connect to facilitate ease of use. First, icons on the home screen clearly depict each of the 4 NODA scenarios parents are being asked to record (Figure 2). Second, clear and redundant cues for the recording status were added so that a parent would know whether a video is being recorded or not, and how many minutes of recording have elapsed (Figure 2). Third, in addition to the \"Stop Recording\" button, an autostop feature that automatically stops the recording after 10 minutes was included to enable oneclick recording. Once video recording is completed, parents can upload it directly or save it on the device to review the video first before uploading it.\n\n\nContext: Evaluation of a telehealth system (NODA Connect) for autism assessment, focusing on diagnostician usability and assessment completion.", "metadata": { "doc_id": "15_Nazneen_13", "source": "15_Nazneen" } }, { "page_content": "Text: These features were implemented based on results from the first 4 participating families of the controlled experiment. Before these features were implemented in NODA smartCapture, the first 4 participating parents gave an average ease-of-use rating of 3 on a 5 -point scale, and the number ( 4 recordings, $1 /$ scenario) and length of recorded videos (maximum 10 minutes) were not consistent with the instructions that were given. Once these features were implemented, the next set of 4 parents gave an average ease-ofuse rating of 4 , which was also maintained in the field evaluation. The second set of 4 parents who participated in the controlled experiment and the 5 parents who participated in the in-field evaluation all collected the right number of videos of appropriate length according to the given instructions. In addition, the log analysis confirmed that parents were able to use NODA smartCapture easily during the in-field evaluation. Parents did not rely much on the help menu even without any prior training for using NODA smartCapture. Log analysis showed that only 2 families of the 5 accessed the help menu, 1 and 2 times, respectively. In addition, on average $73 \\%(22 / 30)$ of the time, parents took the shortest path from selecting a recording scenario to starting a recording. Because there could be reasons other than complexity of the NODA smartCapture that can contribute to stopping a recording and not completing it, analysis of the workflow focused only on scenario selection to starting a recording.\n\nClinical Utility of Video Evidence\n\nBased on data analysis from the controlled experiment in a home-like setting in Stage 2, we identified two key features that increase the clinical utility of the recorded videos. These include an embedded prescription feature and a notification feature.\n\n\nContext: Evaluation of NODA smartCapture usability and clinical utility based on controlled experiments with families.", "metadata": { "doc_id": "15_Nazneen_14", "source": "15_Nazneen" } }, { "page_content": "Text: While rating the clinical utility of videos recorded by parents in the controlled experiment, the collaborating diagnostician identified two sets of issues that negatively influenced utility. The first set related to the set up of the recordings. The most common set-up-related issue was incorrect field of view. For example, parents often captured videos where the face of the child with autism, the relevant toys, or the person that the child was interacting with were not clearly visible on camera, either because of the way the camera was set up or mounted, or because it was too zoomed in or zoomed out. In some cases, parents followed the child around while holding the camera, which was both distracting to the child and prevented the parent from actively playing with the child during the recording. Other times, parents would not set up the camera in advance of recording and would start recording while they are still setting up the camera on the mounting device. The second set of issues related to the frequency and quality of interaction between the child and the parent. Some parents interacted with the child excessively, preventing the clinician from observing what the child does naturally when left alone. Other times there was insufficient interaction between the parent and the child, and the diagnostician wished to observe how the child might react to the\n\nparent's attempts to interact with him/her or whether the child would direct his/her attention to something. Thus, both excessive and insufficient interaction between the parent and the child can make it difficult for a diagnostician to reliably assess the child's level of functioning.\n\n\nContext: Evaluation of parent-recorded videos for autism assessment revealed issues related to recording setup and parent-child interaction quality.", "metadata": { "doc_id": "15_Nazneen_15", "source": "15_Nazneen" } }, { "page_content": "Text: In response to these two sets of issues, and in consultation with the collaborating diagnostician, we embedded explicit instructions within the NODA smartCapture interface. This clinical prescription ( Figure 2) included specific instructions for the parent about how to set up and frame each recording (staging), and how to interact with the child during the recording (social presses). These instructions were intended to maximize the likelihood that the parent records the right kind of video evidence of their child's behavior from the diagnostician's perspective. For each of the 4 recording scenarios, we established a set of directions to improve the staging of the recording and a set of social presses that the parent was asked to present to the child. Staging instructions covered the set up of the camera and the environment, such as (1) making sure the child's face and relevant objects and social partners are in the field of view of the camera; (2) suggestions for appropriate play items, such as toys and books; and (3) across all scenarios, parents were asked to set up the camera ahead of time, and to use a mounting device (tripod). Instructions for social presses included specific actions the parent needed to take during the recording, such as calling the child's name, pointing to an object to see whether the child will look toward it. These actions represented the types of social presses that a diagnostician might use while assessing the child in person.\n\n\nContext: The authors describe how they addressed challenges in collecting useful video data for autism assessments by incorporating explicit instructions within a parent-facing application called NODA smartCapture.", "metadata": { "doc_id": "15_Nazneen_16", "source": "15_Nazneen" } }, { "page_content": "Text: In addition to the explicit directions embedded within the NODA smartCapture interface, we realized (through our discussions with the diagnosticians) that diagnosticians may want to guide parents during the in-home recording process. For example, the diagnostician may wish to ask the parent to rerecord a scenario because the lighting conditions were poor, or because they want the parent to try a social press again. Therefore, a notification system was included in the system (added before the in-field evaluation) whereby the diagnosticians could send notifications to NODA smartCapture from the NODA Connect Web portal. This feature was not intended to support real-time messaging; rather, it was intended to enable the diagnosticians to review the videos uploaded by a parent for appropriateness in advance of the video being used for the diagnostic assessment, and ask for additional recording as needed.\n\nAfter incorporating these prescription and notification features, the clinical validity ratings of the parentcollected videos increased from $81 \\%(13 / 16)$ in the experimental controlled setting to $96 \\%(26 / 27)$ in the in-field evaluation. In total, 10 notifications were sent to families during the field study. Six of these notifications were about instructions for parents to include particular social presses and 4 messages were about confirming the status of recording. Utilization of the notification system reflects its usefulness. Furthermore, the participating diagnosticians in the field evaluation were also asked to rate the usefulness of the videos after they completed remote diagnostic assessment on a 5-point scale ( 1 indicates \"not useful\" to 5 indicates \"very useful\"). Diagnosticians' rating and qualitative feedback during the follow-up interview confirmed that the videos collected by parents during the in-field evaluation were clinically useful (average rating of 4) for conducting remote diagnostic assessment.\n\nCompletion of Diagnostic Assessment via NODA Connect\n\n\nContext: The authors describe the addition of prescription and notification features to their NODA system to improve the clinical validity of parent-collected videos for autism diagnostic assessment.", "metadata": { "doc_id": "15_Nazneen_17", "source": "15_Nazneen" } }, { "page_content": "Text: Completion of Diagnostic Assessment via NODA Connect\n\nThe iterative design process of NODA Connect in Stage 2 helped finalize features that support diagnosticians in completing a diagnostic assessment based on the videos recorded by parents. These include the following: (1) a set of predefined tags that the diagnostician can use to flag specific child behaviors in the videos; (2) an integrated DSM checklist where each tag assigned by a clinician is mapped to the relevant DSM subcriterion; and (3) access to the child's developmental history entered by the parent into the system.\n\nOnce diagnosticians receive appropriate video recordings, they can review them and begin tagging them with behaviors relevant to diagnosing autism (Figure 3). The NODA Connect has a built-in set of tags representing specific behavioral markers such as \"no eye contact\" or \"repetitive play,\" which were created based on the diagnostic criteria for autism within the DSM. The list of tags was compiled by the collaborating diagnostician and the autism domain expert, and vetted through conversations with several other clinical experts. In total, there were 66 tags included in NODA Connect. These tags included both atypical $(\\mathrm{n}=57)$ behavior tags (representing atypical development) and typical $(\\mathrm{n}=9)$ behavior tags\n\n(representing typical development). The goal of the tagging step is to have the diagnostician watch the videos for any evidence of atypical or typical behavior, and flag moments in time when that behavior occurs, without yet considering specific DSM criteria.\n\nOnce all the videos for a child are viewed and tagged, the diagnostician can review the DSM diagnostic checklist (Figure 4). At the time this research study was conducted, the DSM-IV was still in use, and thus, formed the basis of the diagnostic checklist in the NODA Connect assessment portal. Subsequent to the release of the DSM-V, tags and the diagnostic checklist were updated to reflect the new framework.\n\n\nContext: Following a description of the iterative design process of NODA Connect, this section details how the system supports diagnosticians in completing a diagnostic assessment based on parent-recorded videos.", "metadata": { "doc_id": "15_Nazneen_18", "source": "15_Nazneen" } }, { "page_content": "Text: The DSM checklist contains categories of symptoms, with specific subcriteria for each category. The DSM-IV included the following diagnostic categories: (1) qualitative impairments in social interaction; (2) qualitative impairments in communication; and (3) restricted, repetitive and stereotyped patterns of behavior, interests, and activities. Each of these 3 categories included 4 subcriteria. Within NODA Connect, each tag inserted in the videos by the diagnostician during the video review step is automatically mapped to the relevant subcriterion, and shows up as a video snippet (Figure 4).\n\nWithin the DSM checklist, the diagnosticians can review the tags and then check a Yes/No box to indicate whether, based on the tags and their clinical judgment, the child meets that specific criterion. Once the entire DSM checklist is filled, the diagnostician makes a determination about the child's diagnosis of autism based on the DSM criteria, developmental history, and clinical judgment. Note that within NODA Connect the diagnosticians have access to the child's developmental history that parents fill during the inhome video-collection phase, although during our in-field evaluation we restricted access to this checklist so that diagnosticians would remain blind to the child's true diagnostic status.\n\nResults from the in-field evaluation confirmed that NODA Connect features support diagnosticians in completing a diagnostic assessment. Overall, in $91 \\%$ of assessments (10/11) via NODA Connect, diagnosticians reached a decision about diagnostic outcome that matched with the child's previous diagnostic status.\n\n\nContext: The document describes NODA, a system for autism diagnosis using in-home video recordings and a web-based assessment portal (NODA Connect).", "metadata": { "doc_id": "15_Nazneen_19", "source": "15_Nazneen" } }, { "page_content": "Text: Analysis of the NODA Connect usage pattern during the in-field evaluation showed that there was not much variability in the time taken to complete tagging and filling the DSM checklist by different diagnosticians. Across all assessments, the total time taken to complete tagging all the videos of a child and filling out the DSM checklist was on average 62 minutes (SD 14.8 minutes). Diagnosticians spent the least amount of time (average 37 minutes) completing the assessment for the child who was typically developing. After removing the two assessments for this child from the analysis, the average time spent tagging videos and completing the DSM checklist was even more consistent (average 68 minutes, SD 8.3 minutes).\n\nComparative Results of Diagnostic Outcomes\n\nThe main focus of the work presented here was to iteratively develop and evaluate the design of the two systems, NODA smartCapture and NODA Connect. In addition, given that during the in-field evaluation the diagnosticians assigned a diagnosis to the child upon completing the assessment, we were able to compare the diagnosis conducted through NODA Connect with the child's previous diagnosis as indicated in the child's medical record. For 4 of the 5 children ( 3 children with a previous diagnosis of autism and 1 typically developing child), both the remote diagnosticians independently arrived at the same diagnostic decision, and in agreement with the child's actual diagnostic status. For the fifth child (with a previous autism diagnosis), one diagnostician matched the diagnosis in the child's record but the other did not, although the latter indicated with high confidence that the child was not typically developing. A third diagnostician independently reviewed this case via NODA Connect and also confirmed the diagnosis in the child's medical record. Overall, in $91 \\%$ of assessments (10/11) via NODA Connect, diagnosticians reached a decision about diagnostic outcome that matched with the child's previous diagnostic status.\n\nDiscussion\n\nPrincipal Findings\n\n\nContext: Results of a field evaluation of the NODA system for autism diagnosis.", "metadata": { "doc_id": "15_Nazneen_20", "source": "15_Nazneen" } }, { "page_content": "Text: Discussion\n\nPrincipal Findings\n\nThe iterative design approach undertaken in this study enabled us to identify specific features of a store-and-forward telehealth platform that supports remote diagnosis of autism using videos recorded by families in their homes. The results of the in-field evaluation of NODA smartCapture and NODA Connect demonstrated that our system design allowed parents to easily capture clinically useful evidence of child behavior, and diagnosticians to complete a diagnostic assessment of autism with high confidence. See Multimedia Appendix 1 for the most recent version of NODA Capture and Connect resulted from this work.\n\nThis section discusses the perspectives of the various stakeholders on the perceived utility and limitations of the system, potential design enhancements, our vision for the large-scale adoption of the system within current autism diagnostic practices, and how our prescription, collection, and assessment model can be generalized to other clinical assessment applications.\n\nPerceived Utility and Limitations\n\n\nContext: Concluding the study, discussing findings and future directions.", "metadata": { "doc_id": "15_Nazneen_21", "source": "15_Nazneen" } }, { "page_content": "Text: Perceived Utility and Limitations\n\nDuring initial stakeholder interviews in Stage 1, parents and clinicians considered the concept of video collection and sharing of in-home behavioral evidence potentially valuable for a variety of reasons. They reported that this approach can allow clinicians to observe otherwise inaccessible behaviors (eg, lessfrequent behaviors, behavior triggers at home) in their natural context, and to view family-child interactions. Moreover, it can efficiently connect parents and clinicians for timely assessment of the child as, unlike current practice, clinicians can have immediate access to the behavior evidence. However, during the same interviews, parents and clinicians also highlighted several potential barriers to the adoption of an in-home video-recording system. The most commonly mentioned concerns were system complexity, privacy concerns, and child's reactivity. Parents suggested that having explicit data capture and sharing policies, and control over data collection and sharing would alleviate privacy concerns. They also indicated that they would be willing to sacrifice some privacy concerns to get help with a more timely diagnosis for their child. Parents and clinicians also highlighted that the recording device may cause the child to react differently than he or she would otherwise. However, clinicians reported that for them, the child's reactivity to being recorded would not necessarily invalidate the clinical utility of the video evidence, as such reactivity happens during clinic-based observations as well. Parents and clinicians appreciated that the recording application could be installed on mobile phones and tablets because these are everyday objects that children are used to seeing and the reactivity effects would thus likely be minimal.\n\n\nContext: Results of a study evaluating a novel assessment tool (NODA smartCapture) and its perceived utility and limitations by parents and clinicians.", "metadata": { "doc_id": "15_Nazneen_22", "source": "15_Nazneen" } }, { "page_content": "Text: During the in-field evaluation, diagnosticians appreciated that the system helped them conduct an autism diagnosis based on naturalistic behavioral evidence. They also highlighted that, unlike direct observation, video observation would allow them to go back in time to review and verify certain observations, if required. Among the 3 participating diagnosticians in the in-field evaluation, 2 had no previous experience with video observation. These 2 reported that before the study they were reluctant and skeptical about the value of in-home video recording for diagnostic assessment. The third diagnostician had previously participated in other research efforts that involve video observation for assessment and interventions of children with autism and had previously found these methods valuable. However, all the diagnosticians, irrespective of their initial biases, reported that using the remote diagnosis system left them feeling it was extremely valuable and effective for remote autism diagnosis. However, the diagnosticians also identified potential situations when in-home behavior evidence along with a brief developmental history may not be sufficient to complete a diagnostic assessment of autism. These situations included (1) when the child is too young ( $<2$ years old); (2) when the child has very subtle characteristics of autism; and (3) when the child's level of functioning is very limited. According to the diagnosticians, in all these cases it may be difficult to make a judgment about the child's overall development level, which is required for comparison with the child's social profile. In such cases, supplementary evidence in addition to video evidence would be required, which, depending on the situation, could be a parent report, a standard developmental assessment, or even direct observation of the child.\n\nTechnology Enhancements\n\n\nContext: Results of an in-field evaluation of the NODA system and feedback from diagnosticians.", "metadata": { "doc_id": "15_Nazneen_23", "source": "15_Nazneen" } }, { "page_content": "Text: Technology Enhancements\n\nAdvanced technology features can be incorporated into the existing NODA smartCapture and NODA Connect for built-in intelligence. For example, there are a number of factors associated with staging (lighting conditions, audio quality, field of view, whether the child's face is in view, etc) that a recording system could automatically detect during the recording and alert parents to rerecord without the need for the diagnostician to review the videos first. In addition, results from the in-field evaluation indicated that on average tagging videos took $84.6 \\%$ ( $52.5 / 62$ minutes) of the total time spent on completing 1 diagnostic assessment via NODA Connect. The amount of time spent on tagging could be significantly reduced if the Web-based assessment system were to include an automated tagging process. For example, certain detectable behaviors such as response to name call, a smile, giving or taking an object, eye contact, stereotypical behaviors are reasonable candidates to be automatically detected within collected video evidence, given the recent advances in automated video analysis [33-35]. Because these recognition techniques would not be perfect, the system could suggest potential tags in the video timeline and allow the diagnosticians to confirm or reject them. As another example, the assessment system could learn the diagnostician's tag assignment behaviors and highlight the most frequently assigned tags so the diagnostician could quickly locate them.\n\nDiagnostic Workflow and Field Adoption\n\n\nContext: Following a description of the NODA system, this section discusses potential technological enhancements and their impact on the diagnostic workflow and adoption of the system.", "metadata": { "doc_id": "15_Nazneen_24", "source": "15_Nazneen" } }, { "page_content": "Text: Diagnostic Workflow and Field Adoption\n\nAn open question for future research is to explore a feasible workflow for wide-scale adoption of our remote diagnostic system. One workflow that we envision involves a referral mechanism for remote diagnostic assessment like any other laboratory tests. Pediatricians are often the first medical professionals to identify children as potentially showing early signs of autism, and are responsible for referring families to a specialist for further assessment. In the proposed workflow, the pediatrician can refer the family for a remote diagnostic assessment. Upon connecting with the remote assessment service, the parents download NODA smartCapture directly to their mobile phone. A diagnostician at an affiliated diagnostic center can then guide the in-home evidence-collection procedure and complete the diagnostic assessment through NODA Connect. Finally, an electronic diagnostic report summarizing diagnostician's video observation, DSM checklist, and diagnostic outcome can be shared with the pediatrician, who then shares it with parents.\n\nOverall, this workflow has two potential benefits. First, it engages pediatricians, which is beneficial because research suggests that pediatrician involvement in the referral and diagnostic process can result in more timely diagnosis [23,36-39]. A pediatrician sees children at regular intervals during the early years of development and is in the best position to note early warning signs and take appropriate timely action. Second, this workflow model may allow autism diagnostic centers to serve more families by remotely assessing children for the purposes of triage. Children whose diagnostic outcome is not clear through this remote procedure can be seen in person for a more comprehensive diagnostic assessment.\n\nGeneralizability of Our Approach\n\n\nContext: Discussion of the study's findings, focusing on potential workflows for adoption and the generalizability of the approach.", "metadata": { "doc_id": "15_Nazneen_25", "source": "15_Nazneen" } }, { "page_content": "Text: Generalizability of Our Approach\n\nOur remote diagnostic assessment system is based on a prescription, behavior specimen collection, and assessment model. Analogous to traditional medical specimen collection and assessment process, this model involves (1) a clinician's prescription for behavior specimens (in the form of short videos) to be collected; (2) in-home collection of behavior specimens by parents; and (3) the assessment of behavior specimens by a remotely located clinician.\n\nThis is a generic model that is transferable, beyond remote autism diagnosis, to other clinical situations where an analysis of behavior by a professional is key to the clinical assessment. Any condition or situation in which observation of behavior in the natural environment is of value, and for which those behaviors can be specified to ensure relevant examples are recorded, is a candidate use case. During stakeholder interviews, the participating clinicians suggested a number of potential use cases where this model could be applicable and valuable. One such use case is to sort and prioritize families on waiting lists for clinical services to expedite the intake process. Sorting and prioritizing the waiting list is crucial, because timely access to diagnostic and intervention services is often hampered by long waiting lists at centers and clinics. In addition, a system based on such a model may be valuable for providing treatment\n\nand follow-up services to remotely located patients who do not have easy access to the clinic. Another use case is parent training, such as those involving clinicians training parents to implement an intervention at home.\n\n\nContext: Discussing potential applications and broader utility of the remote diagnostic assessment system beyond autism diagnosis.", "metadata": { "doc_id": "15_Nazneen_26", "source": "15_Nazneen" } }, { "page_content": "Text: Although the prescription, collection, and assessment model along with its high-level design features (embedded prescription, guided capture through notification feature, tagging, video observation, assessment based on mapped tags) are both generic, it must be customized within the context of its end-use-case scenario. For instance, the embedded prescribed instructions in the recording application can be contextualized through a new prescription writing pad feature within the Web-based assessment portal. One example of successful transfer and customization of our approach is the use case of medication management. In-home behavior specimens captured and shared through the mobile phone-based system allow physicians to monitor medication side effects and note any improvements in symptoms between office visits using the Web-based assessment portal. In a preliminary evaluation, physicians highlighted that this medication administration system assisted them in monitoring patients with autism spectrum disorder more comprehensively and accurately than using subjective reports provided by caregivers during office visits [40].\n\nConclusions\n\nThe in-field evaluation demonstrated that the system's design enabled parents to easily record clinically valid evidence of their child's behavior, and diagnosticians to complete a diagnostic assessment for autism. These results shed light on the potential for appropriately designed telehealth technology to support clinical assessments using in-home video captured by families. This assessment model can be readily generalized to other conditions where direct observation of behavior plays a central role in the assessment process.\n\n\nContext: The chunk describes the adaptability of the assessment model and provides an example of its successful customization for medication management in autism, concluding with the potential for broader application in telehealth assessments.", "metadata": { "doc_id": "15_Nazneen_27", "source": "15_Nazneen" } }, { "page_content": "Text: The results of this paper are not a final statement on the clinical validity of diagnostic outcome; rather, this paper reports on the design of the remote autism diagnosis system that resulted from an iterative design process and has shown a promising conclusion from an evaluation in the field. The next step is to validate the diagnostic outcome through a clinical trial in which a large sample of children would be assessed via both remote autism diagnosis system and standard in-person diagnostic assessments for comparison.\n\nAcknowledgments\n\nWe would like to thank all participants for their time and valuable feedback. All phases of this study were supported by a National Institutes of Health grant (Grant No 9 R44 MH099035-04).\n\nAbbreviations\n\nDSM Diagnostic and Statistical Manual of Mental Disorders NODA naturalistic observation diagnostic assessment\n\nMultimedia Appendix 1\n\nThe most recent version of NODA smartCapture and Connect.\n\nFootnotes\n\nConflicts of Interest: RO and GA have conflicts of interests. Mr RO is the CEO of Behavior Imaging Solutions, the company that will commercialize NODA as part of the NIMH SBIR grant. Dr GA was a coadvisor for NN during her graduate school period, which presents a conflict of interest that is registered with and managed by Georgia Institute of Technology. The remaining authors have no conflicts of interest to disclose.\n\nReferences\n\nAutism and Developmental Disabilities Monitoring Network Surveillance Year 2002 Principal Investigators. Centers for Disease Control and Prevention Prevalence of autism spectrum disorders: Autism and developmental disabilities monitoring network, 14 sites, United States, 2002. MMWR Surveill Summ. 2007 Feb 9;56(1):12-28. http://www.cdc.gov/mmwr/preview/mmwrhtml/ss5601a2.htm. [PubMed: 17287715]\n\n\nContext: Concluding remarks and acknowledgements of a research paper describing the design and initial evaluation of a remote autism diagnosis system (NODA).", "metadata": { "doc_id": "15_Nazneen_28", "source": "15_Nazneen" } }, { "page_content": "Text: Developmental Disabilities Monitoring Network Surveillance Year 2010 Principal Investigators. Centers for Disease Control and Prevention (CDC) Prevalence of autism spectrum disorder among children aged 8 years - autism and developmental disabilities monitoring network, 11 sites, United States, 2010. MMWR Surveill Summ. 2014 Mar 28;63(2):1-21. http://www.cdc.gov/mmwr/preview/mmwrhtml/ss6302a1.htm. [PubMed: 24670961]\n\nHowlin P, Asgharian A. The diagnosis of autism and Asperger syndrome: Findings from a survey of 770 families. Dev Med Child Neurol. 1999 Dec;41(12):834-839. [PubMed: 10619282]\n\nSivberg B. Parents' detection of early signs in their children having an autistic spectrum disorder. $J$ Pediatr Nurs. 2003 Dec;18(6):433-439. [PubMed: 15058541]\n\nWiggins LD, Baio J, Rice C. Examination of the time between first evaluation and first autism spectrum diagnosis in a population-based sample. J Dev Behav Pediatr. 2006 Apr;27(2 Suppl):79-87. [PubMed: 16685189]\n\nThomas KC, Ellis AR, McLaurin C, Daniels J, Morrissey JP. Access to care for autism-related services. J Autism Dev Disord. 2007 Nov;37(10):1902-1912. doi: 10.1007/s10803-006-0323-7. [PubMed: 17372817] [CrossRef: 10.1007/s10803-006-0323-7]\n\nMoldin SO, Rubenstein JL. Understanding Autism: From Basic Neuroscience to Treatment. Boca Raton, FL: CRC Press; 2006. Apr 25,\n\nShattuck PT, Durkin M, Maenner M, Newschaffer C, Mandell DS, Wiggins L, Lee L, Rice C, Giarelli E, Kirby R, Baio J, Pinto-Martin J, Cuniff C. Timing of identification among children with an autism spectrum disorder: Findings from a population-based surveillance study. J Am Acad Child Adolesc Psychiatry. 2009 May;48(5):474-483. doi: 10.1097/CHI.0b013e31819b3848. http://europepmc.org/abstract/MED/19318992. [PMCID: PMC3188985] [PubMed: 19318992] [CrossRef: 10.1097/CHI.0b013e31819b3848]\n\n\nContext: Studies examining the time and access to autism diagnosis and care.", "metadata": { "doc_id": "15_Nazneen_29", "source": "15_Nazneen" } }, { "page_content": "Text: Mandell DS, Novak MM, Zubritsky CD. Factors associated with age of diagnosis among children with autism spectrum disorders. Pediatrics. 2005 Dec;116(6):1480-1486. doi: 10.1542/peds.2005-0185. http://europepmc.org/abstract/MED/16322174. [PMCID: PMC2861294] [PubMed: 16322174] [CrossRef: 10.1542/peds.2005-0185]\n\nMatson JL. Clinical Assessment and Intervention for Autism Spectrum Disorders. Amsterdam, The Netherlands: Elsevier/Academic Press; 2008. Mar,\n\nSiegel B. Getting the Best for Your Child with Autism: An Expert's Guide to Treatment. New York: Guilford Press; 2008. Jan 1,\n\nMcEachin JJ, Smith T, Lovaas OI. Long-term outcome for children with autism who received early intensive behavioral treatment. Am J Ment Retard. 1993 Jan;97(4):359-372. [PubMed: 8427693]\n\nRattazzi A. The importance of early detection and early intervention for children with autism spectrum conditions. Vertex. 2014;25(116):290-294. [PubMed: 25546644]\n\nFord T. In: Diagnostic and Behavioural Assessment in Children and Adolescents: A Clinical Guide. McLeod BD, Jensen-Doss A, Ollendick TH, editors. New York: Guilford Press; 2014. Apr 4, p. 159.\n\nYoder P, Symons F. Observational Measurement of Behavior. New York: Springer Publishing Company; 2010. Feb 16,\n\nSuen HK, Ary D. Analyzing Quantitative Behavioral Observation Data. Hillsdale, NJ: Lawrence Erlbaum; 1989. Jan 1,\n\nFrick PJ, Barry CT, Kamphaus RW. Clinical Assessment of Child and Adolescent Personality and Behavior. New York: Springer; 2009. Dec 22,\n\nKazdin AE. Artifact, bias, and complexity of assessment: The ABCs of reliability. J Appl Behav Anal. 1977;10(1):141-150. http://europepmc.org/abstract/MED/16795543. [PMCID: PMC1311161] [PubMed: 16795543]\n\nWainer AL, Ingersoll BR. Increasing access to an ASD imitation intervention via a telehealth parent training program. J Autism Dev Disord. 2014 Jul 18; doi: 10.1007/s10803-014-2186-7. Epub ahead of print. [PubMed: 25035089] [CrossRef: 10.1007/s10803-014-2186-7]\n\n\nContext: References related to autism spectrum disorder diagnosis, assessment, intervention, and telehealth.", "metadata": { "doc_id": "15_Nazneen_30", "source": "15_Nazneen" } }, { "page_content": "Text: Vismara LA, Young GS, Rogers SJ. Telehealth for expanding the reach of early autism training to parents. Autism Res Treat. 2012;2012:121878. doi: 10.1155/2012/121878. http://dx.doi.org/10.1155/2012/121878. [PMCID: PMC3512210] [PubMed: 23227334] [CrossRef: $10.1155 / 2012 / 121878]$\n\nBaharav E, Reiser C. Using telepractice in parent training in early autism. Telemed J E Health. 2010;16(6):727-731. doi: 10.1089/tmj.2010.0029. [PubMed: 20583950] [CrossRef: 10.1089/tmj.2010.0029]\n\nWacker DP, Lee JF, Padilla Dalmau YC, Kopelman TG, Lindgren SD, Kuhle J, Pelzel KE, Dyson S, Schieltz KM, Waldron DB. Conducting functional communication training via telehealth to reduce the problem behavior of young children with autism. J Dev Phys Disabil. 2013 Feb 1;25(1):35-48. doi: 10.1007/s10882-012-9314-0. http://europepmc.org/abstract/MED/23543855. [PMCID: PMC3608527] [PubMed: 23543855] [CrossRef: 10.1007/s10882-012-9314-0]\n\nGoldstein F, Myers K. Telemental health: A new collaboration for pediatricians and child psychiatrists. Pediatr Ann. 2014 Feb;43(2):79-84. doi: 10.3928/00904481-20140127-12. [PubMed: 24512157] [CrossRef: 10.3928/00904481-20140127-12]\n\nVismara LA, Young GS, Stahmer AC, Griffith EM, Rogers SJ. Dissemination of evidence-based practice: Can we train therapists from a distance? J Autism Dev Disord. 2009 Dec;39(12):1636-1651. doi: 10.1007/s10803-009-0796-2. http://europepmc.org/abstract/MED/19582564. [PMCID: PMC2777219] [PubMed: 19582564] [CrossRef: 10.1007/s10803-009-0796-2]\n\nParmanto B, Pulantara IW, Schutte JL, Saptono A, McCue MP. An integrated telehealth system for remote administration of an adult autism assessment. Telemed J E Health. 2013 Feb;19(2):88-94. doi: 10.1089/tmj.2012.0104. [PubMed: 23230821] [CrossRef: 10.1089/tmj.2012.0104]\n\n\nContext: A review of telehealth interventions and systems supporting autism assessment and parent training.", "metadata": { "doc_id": "15_Nazneen_31", "source": "15_Nazneen" } }, { "page_content": "Text: Terry M. Telemedicine and autism: Researchers and clinicians are just starting to consider telemedicine applications for the diagnosis and treatment of autism. Telemed J E Health. 2009 Jun;15(5):416-419. doi: 10.1089/tmj.2009.9965. [PubMed: 19548820] [CrossRef: 10.1089/tmj.2009.9965]\n\nAmerican Psychiatric Association . Diagnostic and Statistical Manual of Mental Disorders (DSM-IVTR), 4th Edition, Text Revision. Washington, DC: American Psychiatric Association; 2000. Jun,\n\nAmerican Psychiatric Association . Diagnostic and Statistical Manual of Mental Disorders, 5th Edition: DSM-5. Washington, DC: American Psychiatric Association; 2013. May,\n\nSmith CJ, Oberleitner R, Treulich K, McIntosh R, Melmed R. International Meeting for Autism Research (IMFAR) 2009. May 8, [2015-06-12]. webcite Using a behavioral imaging platform to develop a naturalistic observational diagnostic assessment for autism https://imfar.confex.com/imfar/2009/webprogram/Paper4954.html.\n\nNazneen N, Rozga A, Romero M, Findley AJ, Call NA, Abowd GD, Arriaga RI. Supporting parents for in-home capture of problem behaviors of children with developmental disabilities. Pers Ubiquit Comput. 2011 May 1;16(2):193-207. doi: 10.1007/s00779-011-0385-1. [CrossRef: 10.1007/s00779-011-0385-1]\n\nAware Home Research Initiative (AHRI) Aware Home Research Initiative. [2015-03-03]. webcite http://www.awarehome.gatech.edu/\n\nSears A, Jacko JA. The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications. Mahwah, NJ: Lawrence Erlbaum Associates; 2003.\n\nBidwell J, Rozga A, Essa I, Abowd GD. Measuring child visual attention using markerless head tracking from color and depth sensing cameras. Proceedings of the 16th ACM International Conference on Multimodal Interaction, ICMI; 2014; Istanbul, Turkey. 2014. Nov 12, pp. 447-454. [CrossRef: 10.1145/2663204.2663235]\n\n\nContext: A section discussing the increasing consideration of telemedicine applications for autism diagnosis and treatment, alongside related research and resources.", "metadata": { "doc_id": "15_Nazneen_32", "source": "15_Nazneen" } }, { "page_content": "Text: Rehg J, Abowd GD, Rozga A, Romero M, Clements M, Sclaroff S, Essa I, Ousley O, Li Y, Kim C, Rao H, Kim JC, Presti L, Zhang J, Lantsman D, Bidwell J, Ye Z. Decoding children's social behavior. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2013; Portland, OR. New York: IEEE; 2013. Jun 23, pp. 3414-3421. [CrossRef: 10.1109/CVPR.2013.438]\n\nRao H, Kim JC, Rozga A, Clements MC. Detection of laughter in children's speech using spectral and prosodic acoustic features. Proceedings of the 14th Annual Conference of the International Speech Communication Association, Interspeech; 2013; Lyon, France. 2013. Aug 23, pp. 1399-1403.\n\nCommittee on Children With Disabilities American Academy of Pediatrics: The pediatrician's role in the diagnosis and management of autistic spectrum disorder in children. Pediatrics. 2001 May;107(5):1221-1226. [PubMed: 11331713]\n\nZuckerman KE, Mattox K, Donelan K, Batbayar O, Baghaee A, Bethell C. Pediatrician identification of Latino children at risk for autism spectrum disorder. Pediatrics. 2013;132(3):445-453. doi: 10.1542/peds.2013-0383. http://pediatrics.aappublications.org/cgi/pmidlookup? view=long\\&pmid=23958770. [PMCID: PMC3876760] [PubMed: 23958770] [CrossRef: 10.1542/peds.2013-0383]\n\nSwanson AR, Warren ZE, Stone WL, Vehorn AC, Dohrmann E, Humberd Q. The diagnosis of autism in community pediatric settings: Does advanced training facilitate practice change? Autism. 2013;18(5):555-561. doi: 10.1177/1362361313481507. [PubMed: 23847130] [CrossRef: 10.1177/1362361313481507]\n\nHart-Hester S, Noble SL. Recognition and treatment of autism: The role of the family physician. $J$ Miss State Med Assoc. 1999 Nov;40(11):377-383. [PubMed: 10568088]\n\nReischl U, Oberleitner R. Telehealth technology enabling medication management of children with autism. In: Duffy V, editor. Advances in Human Factors and Ergonomics in Healthcare. Boca Raton, FL: CRC Press; 2012.\n\nFigures and Tables\n\nFigure 1\n\nimg-1.jpeg\n\n\nContext: References and related work on autism diagnosis and management.", "metadata": { "doc_id": "15_Nazneen_33", "source": "15_Nazneen" } }, { "page_content": "Text: Figures and Tables\n\nFigure 1\n\nimg-1.jpeg\n\nAware Home setup for families to experience NODA smartCapture.\n\nFigure 2\n\nimg-2.jpeg\n\nNODA smartCapture. (1) Home screen showing 4 NODA scenarios, as well as status of ones recorded. (2) Each scenario has recording instructions for parents as prescription. Pressing \"Ready\" proceeds to recording interface. (3) Recording mode with clear time-elapsed status and a green boundary to reinforce recording mode.\n\nFigure 3\n\nimg-3.jpeg\n\nNODA Connect: Web-based assessment portal video tagging.\n\nFigure 4\n\nimg-4.jpeg\n\nOpen in a separate window NODA Connect: Web-based assessment portal showing the DSM checklist screen.\n\nArticles from JMIR mHealth and uHealth are provided here courtesy of JMIR Publications Inc.\n\n\nContext: This section describes the NODA smartCapture and NODA Connect system, including screenshots of the user interface and functionality, and acknowledges the source of the articles.", "metadata": { "doc_id": "15_Nazneen_34", "source": "15_Nazneen" } }, { "page_content": "Text: RESEARCH ARTICLE\n\nMobile detection of autism through machine learning on home video: A development and prospective validation study\n\nQandeel Tariq ${ }^{\\circledR}$, Jena Daniels ${ }^{\\circledR 1,2}$, Jessey Nicole Schwartz ${ }^{\\circledR}{ }^{1,2}$, Peter Washington ${ }^{\\circledR}{ }^{1,2}$, Haik Kalantarian ${ }^{1,2}$, Dennis Paul Wall ${ }^{\\circledR}{ }^{1,2}$ * 1 Department of Pediatrics, Division of Systems Medicine, Stanford University, California, United States of America, 2 Department of Biomedical Data Science, Stanford University, California, United States of America\n\ndpwall@stanford.edu\n\nAbstract\n\nCitation: Tariq Q, Daniels J, Schwartz JN, Washington P, Kalantarian H, Wall DP (2018) Mobile detection of autism through machine learning on home video: A development and prospective validation study. PLoS Med 15(11): e1002705. https://doi.org/10.1371/journal. pmed. 1002705\n\nAcademic Editor: Suchi Saria, Johns Hopkins University, UNITED STATES\n\nReceived: June 8, 2018 Accepted: October 25, 2018 Published: November 27, 2018 Copyright: © 2018 Tariq et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.\n\nData Availability Statement: The de-identified data have been made available at the following github repository and include the primary dataset and the validation dataset: https://github.com/qandeelt/video_phenotyping_autism_plos/tree/master/ datasets. The code has been made available at the following github repository and instructions on how to run each classifier have been provided: https://github.com/qandeelt/video_phenotyping_ autism_plos\n\nBackground\n\n\nContext: This section introduces a study developing and validating a mobile application using machine learning to detect autism through analysis of home videos.", "metadata": { "doc_id": "Tariq2018_0", "source": "Tariq2018" } }, { "page_content": "Text: Background\n\nThe standard approaches to diagnosing autism spectrum disorder (ASD) evaluate between 20 and 100 behaviors and take several hours to complete. This has in part contributed to long wait times for a diagnosis and subsequent delays in access to therapy. We hypothesize that the use of machine learning analysis on home video can speed the diagnosis without compromising accuracy. We have analyzed item-level records from 2 standard diagnostic instruments to construct machine learning classifiers optimized for sparsity, interpretability, and accuracy. In the present study, we prospectively test whether the features from these optimized models can be extracted by blinded nonexpert raters from 3-minute home videos of children with and without ASD to arrive at a rapid and accurate machine learning autism classification.\n\nMethods and findings\n\n\nContext: This section introduces a study investigating the potential of using machine learning and home videos to accelerate autism diagnosis by identifying key behaviors through non-expert raters.", "metadata": { "doc_id": "Tariq2018_1", "source": "Tariq2018" } }, { "page_content": "Text: Methods and findings\n\nWe created a mobile web portal for video raters to assess 30 behavioral features (e.g., eye contact, social smile) that are used by 8 independent machine learning models for identifying ASD, each with $>94 \\%$ accuracy in cross-validation testing and subsequent independent validation from previous work. We then collected 116 short home videos of children with autism (mean age $=4$ years 10 months, $\\mathrm{SD}=2$ years 3 months) and 46 videos of typically developing children (mean age $=2$ years 11 months, $\\mathrm{SD}=1$ year 2 months). Three raters blind to the diagnosis independently measured each of the 30 features from the 8 models, with a median time to completion of 4 minutes. Although several models (consisting of alternating decision trees, support vector machine [SVM], logistic regression (LR), radial kernel, and linear SVM) performed well, a sparse 5 -feature LR classifier (LR5) yielded the highest accuracy (area under the curve [AUC]: $92 \\%$ [95\\% CI 88\\%-97\\%]) across all ages tested. We used a prospectively collected independent validation set of 66 videos ( 33 ASD and 33 nonASD) and 3 independent rater measurements to validate the outcome, achieving lower but comparable accuracy (AUC: 89\\% [95\\% CI 81\\%-95\\%]). Finally, we applied LR to the 162-\n\nFunding: The work was supported in part by funds to DPW from NIH (1R01EB025025-01 \\& 1R21HD091500-01), The Hartwell Foundation, Bill and Melinda Gates Foundation, Coulter Foundation, Lucile Packard Foundation, and program grants from Stanford University's Human Centered Artificial Intelligence Program, Precision Health and Integrated Diagnostics Center (PHIND), Beckman Center, Bio-X Center, Predictives and Diagnostics Accelerator, and the Child Health Research Institute. We also received philanthropic support from Bobby Dekesyer and Peter Sullivan. No funding bodies had any role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.\n\n\nContext: A study evaluating a mobile web portal for video raters to assess behavioral features for identifying autism spectrum disorder (ASD) using machine learning models.", "metadata": { "doc_id": "Tariq2018_2", "source": "Tariq2018" } }, { "page_content": "Text: Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: DPW is the scientific founder of Cognoa, a company focused on digital pediatric healthcare; the approach and findings presented in this paper are independent from/not related to Cognoa. All other authors have declared no competing interests exist.\n\nAbbreviations: ADI-R, Autism Diagnostic Interview-Revised; ADOS, Autism Diagnostic Observation Schedule; ADTree, alternating decision tree; ADTree7, 7-feature alternating decision tree; ADTree8, 8-feature alternating decision tree; ASD, autism spectrum disorder; AUC, area under the curve; AUC-ROC, area under the receiver operating characteristic curve; BID, Balanced Independent Dataset; IRA, interrater agreement; LR, logistic regression; LR10, 10-feature logistic regression classifier; LR5, 5-feature logistic regression classifier; LR9, 9-feature logistic regression classifier; LR-EN-VF, logistic regression with an elastic net penalty; ROC, receiver operating characteristic; SOC, standard of care; SVM, support vector machine; SVM10, 10-feature support vector machine; SVM12, 12-feature support vector machine; SVM5, 5-feature support vector machine; UAR, unweighted average recall. video-feature matrix to construct an 8-feature model, which achieved 0.93 AUC ( $95 \\% \\mathrm{CI}$ $0.90-0.97$ ) on the held-out test set and 0.86 on the validation set of 66 videos. Validation on children with an existing diagnosis limited the ability to generalize the performance to undiagnosed populations.\n\nConclusions\n\nThese results support the hypothesis that feature tagging of home videos for machine learning classification of autism can yield accurate outcomes in short time frames, using mobile devices. Further work will be needed to confirm that this approach can accelerate autism diagnosis at scale.\n\nAuthor summary\n\nWhy was this study done?\n\n\nContext: Concluding remarks on the study's findings and future directions, alongside author disclosures and abbreviations used throughout the paper.", "metadata": { "doc_id": "Tariq2018_3", "source": "Tariq2018" } }, { "page_content": "Text: Author summary\n\nWhy was this study done?\n\nAutism has risen in incidence by approximately $700 \\%$ since 1996 and now impacts at least 1 in 59 children in the United States.\n\nThe current standard for diagnosis requires a direct clinician-to-child observation and takes hours to administer.\n\nThe sharp rise in incidence of autism, coupled with the un-scalable nature of the standard of care (SOC), has created strain on the healthcare system, and the average age of diagnosis remains around 4.5 years, 2 years past the time when it could be reliably diagnosed.\n\nMobile measures that scale could help to alleviate this strain on the healthcare system, reduce waiting times for access to therapy and treatment, and reach underserved populations.\n\nWhat did the researchers do and find?\n\nWe applied 8 machine learning models to 162 two-minute home videos of children with and without autism diagnosis to test the ability to reliably detect autism on mobile platforms.\n\nThree nonexpert raters measured 30 behavioral features needed for machine learning classification by the 8 models in approximately 4 minutes.\n\nLeveraging video ratings, a machine learning model with only 5 features achieved $86 \\%$ unweighted average recall (UAR) on 162 videos and $U A R=80 \\%$ on a different and independently evaluated set of 66 videos, with $U A R=83 \\%$ on children at or under 4.\n\nThe above machine learning process of rendering a mobile video diagnosis quickly created a novel collection of labeled video features and a new video feature-based model with $>90 \\%$ accuracy.\n\nWhat do these findings mean?\n\nShort home videos can provide sufficient information to run machine learning classifiers trained to detect children with autism from those with either typical or atypical development. Features needed by machine learning models designed to detect autism can be identified and measured in home videos on mobile devices by nonexperts in timeframes close to the total video length and under 6 minutes.\n\n\nContext: This is the author summary of a study investigating the use of machine learning and home videos to detect autism in children using mobile platforms.", "metadata": { "doc_id": "Tariq2018_4", "source": "Tariq2018" } }, { "page_content": "Text: The machine learning models provide a quantitative indication of autism risk that provides more granularity than a binary outcome to flag inconclusive cases, potentially adding value for use in clinical settings, e.g., for triage.\n\nThe process of mobile video analysis for autism detection generates a growing matrix of video features that can be used to construct new machine learning models that may have higher accuracy for autism detection in home video.\n\nClinical prospective testing in general pediatric settings on populations not yet diagnosed will be needed. However, these results support the possibility that mobile video analysis with machine learning may enable rapid autism detection outside of clinics to reduce waiting periods for access to care and reach underserved populations in regions with limited healthcare infrastructure.\n\nIntroduction\n\nNeuropsychiatric disorders are the single greatest cause of disability due to noncommunicable disease worldwide, accounting for $14 \\%$ of the global burden of disease [1]. A significant contributor to this metric is autism spectrum disorder (ASD, or autism), which has risen in incidence by approximately $700 \\%$ since 1996 [2,3] and now impacts 1 in 59 children in the United States [4,5]. ASD is arguably one of the largest pediatric health challenges, as supporting an individual with the condition costs up to $\\$ 2.4$ million during his/her lifespan in the US [6] and over $\\$ 5$ billion annually in US healthcare costs [6].\n\n\nContext: The authors discuss the potential of machine learning models for autism detection, particularly through mobile video analysis, and highlight the need for clinical testing and the potential to improve access to care.", "metadata": { "doc_id": "Tariq2018_5", "source": "Tariq2018" } }, { "page_content": "Text: Like most mental health conditions, autism has a complex array of symptoms [7] that are diagnosed through behavioral exams. The standard of care (SOC) for an autism diagnosis uses behavioral instruments such as the Autism Diagnostic Observation Schedule (ADOS) [8] and the Autism Diagnostic Interview-Revised (ADI-R) [9]. These standard exams are similar to others in developmental pediatrics [10] in that they require a direct clinician-to-child observation and take hours to administer [11-14]. The sharp rise in incidence of autism, coupled with the unscalable nature of the SOC, has created strain on the healthcare system. Wait times for a diagnostic evaluation can reach or exceed 12 months in the US [15], and the average age of diagnosis in the US remains near 5 years of age [2,13], with underserved populations' average age at ASD diagnosis as high as 8 years [16-18]. The high variability in availability of diagnostic and therapeutic services is common to most psychiatry and mental health conditions across the US, with severe shortages of mental health services in $77 \\%$ of US counties [19]. Behavioral interventions for ASD are most impactful when administered by or before 5 years of age [12,20-23]; however, the diagnostic bottleneck that families face severely limits the impact of therapeutic interventions. Scalable measures are necessary to alleviate these bottlenecks, reduce waiting times for access to therapy, and reach underserved populations in need.\n\n\nContext: The challenges of diagnosing autism, including lengthy wait times and limited access to care, necessitate scalable solutions.", "metadata": { "doc_id": "Tariq2018_6", "source": "Tariq2018" } }, { "page_content": "Text: As a step toward enabling fast and accurate access to care for ASD, we have used supervised machine learning approaches to identify minimal sets of behaviors that align with clinical diagnoses of ASD [24-30]. We assembled and analyzed item-level outcomes from the administration of the ADOS and ADI-R to train and test the accuracy of a range of classifiers. For the ADOS, we focused our analysis on ordinal outcome data from modules 1, 2, and 3, which assess children with limited or no vocabulary, with phrased speech, and with fluent speech, respectively. Each of the 3 ADOS modules uses approximately 10 activities for a clinical observation of the child at risk and 28-30 additional behavioral measurements used to score the child following the observation. Our machine learning analyses focused on archived records of the categorical and ordinal data generated from the scoring component of these ADOS examinations. Similarly, the ADI-R involves 93 multiple-choice questions asked by a clinician of the child's primary care provider during an in-clinic interview; as with the ADOS, we focused our classification task on the ordinal outcome data that resulted from the test's administration.\n\n\nContext: The authors describe their use of machine learning to identify key behaviors associated with autism spectrum disorder (ASD) using data from standardized diagnostic tools.", "metadata": { "doc_id": "Tariq2018_7", "source": "Tariq2018" } }, { "page_content": "Text: These preliminary studies focused on building models optimized for accuracy, sparsity, and interpretability that differentiate autism from non-autism while managing class imbalance. We chose models with small numbers of features, with performance at or no more than 1 standard error away from best test performance, and with interpretable outcomes-for example, scores generated by a boosted decision tree or logistic regression (LR) approach. In all, these studies have used score data from 11,298 individuals with autism (mixed with low-, medium-, and high-severity autism) and 1,356 controls (including some children for whom autism may have been suspected but was ruled out) and have identified the following 8 classifiers: a 7 -feature alternating decision tree (ADTree7) [29], an 8 -feature alternating decision tree (ADTree8) [30], a 12 -feature support vector machine (SVM12) [26], a 9 -feature LR classifier (LR9) [26], a 5 -feature support vector machine (SVM5) [27], a 5 -feature LR classifier (LR5) [27], a 10 -feature LR classifier (LR10) [27], and a 10 -feature support vector machine (SVM10) [27].\n\n\nContext: Machine learning approaches to autism classification have yielded several promising classifiers.", "metadata": { "doc_id": "Tariq2018_8", "source": "Tariq2018" } }, { "page_content": "Text: Two of these 8 classifiers have been independently tested in 4 separate analyses. In a prospective head-to-head comparison between the clinical outcome and ADTree7 (measured prior to the clinical evaluation and official diagnosis) on 222 children ( $N_{\\mathrm{ASD}}=69 ; N_{\\text {controls }}=$ 153; median age $=5.8$ years), the performance, measured as the unweighted average recall (UAR [31]; the mean of the sensitivity and specificity), was $84.8 \\%$ [24]. Separately, Bone and colleagues [32] tested the ADTree7 on a \"Balanced Independent Dataset\" (BID) consisting of ADI-R outcome data from 680 participants ( 462 ASD, mean age $=9.2$ years, $\\mathrm{SD}=3.1$ years) and 218 non-ASD (mean age $=9.4$ years, $\\mathrm{SD}=2.9$ years) and found the performance to be similarly high at $80 \\%$. Duda and colleagues [25] tested the ADTree8 with 2,333 individuals with autism (mean age $=5.8$ years) and 283 \"non-autism\" control individuals (mean age $=6.4$ years) and found the performance to be $90.2 \\%$. Bone and colleagues [32] also tested this ADTree8 model in 1,033 participants from the BID— 858 autism (mean age $=5.2$ years, $\\mathrm{SD}=3.6$ years), 73 autism spectrum (mean age $=3.9$ years, $\\mathrm{SD}=2.4$ years), and 102 non-spectrum (mean age $=3.4$ years, $\\mathrm{SD}=2.0$ years) —and found the performance to be slightly higher at $94 \\%$. These independent validation studies report classifier performance in the range of the published test accuracy and lend additional support to the hypothesis that models using minimal feature sets are reliable and accurate for autism detection.\n\n\nContext: Independent validation studies of specific machine learning models (ADTree7 and ADTree8) for autism detection.", "metadata": { "doc_id": "Tariq2018_9", "source": "Tariq2018" } }, { "page_content": "Text: Others have run similar training and testing experiments to identify top-ranked features from standard instrument data, including Bone [33] and Bussu [34]. These approaches have arrived at similar conclusions, namely that machine learning is an effective way to build objective, quantitative models with few features to distinguish mild-, medium-, and high-severity autism from children outside of the autism spectrum, including those with other developmental disorders. However, the translation of such models into clinical practice requires additional steps that have not yet been adequately addressed. Although some of our earlier work has\n\nshown that untrained video annotators can measure autism behaviors on home videos with high interrater reliability and accuracy [35], the question of what steps must be taken to move from minimal behavioral models into clinical practice remains.\n\n\nContext: Machine learning approaches for autism detection using behavioral models and clinical translation challenges.", "metadata": { "doc_id": "Tariq2018_10", "source": "Tariq2018" } }, { "page_content": "Text: The present study builds on this prior work to address this question and the hypothesis that features represented in our minimal viable classifiers can be labeled quickly, accurately, and reliably from short home videos by video raters with no official training in autism diagnosis or child development. We deployed crowdsourcing and real-time video analysis for feature labeling to run and evaluate the accuracy of the 8 machine learning models trained to detect autism in 2 independent home video repositories. This procedure enabled us to test the ability to reduce to practice the process of rapid mobile video analysis as a viable method for identifying autism symptoms and screening. In addition, as the mobile feature tagging of videos automatically generates a rich feature matrix, it presents the opportunity to train a new artificial intelligence model that has potentially higher generalizability to the task of automatic detection of autism in short video clips. We test this related hypothesis by constructing a novel video feature classifier and comparing its results to alternative models in a held-out subset of the original video feature matrix and in an independent external validation set. The results from this work support the hypothesis that autism detection can be done from mobile devices outside of clinical settings with high efficiency and accuracy.\n\nMethods\n\nSource classifiers for reduce-to-practice testing\n\n\nContext: The study investigates the feasibility of rapid autism detection using mobile devices and crowd-sourced video analysis, building on previous machine learning research in the field.", "metadata": { "doc_id": "Tariq2018_11", "source": "Tariq2018" } }, { "page_content": "Text: Methods\n\nSource classifiers for reduce-to-practice testing\n\nWe assembled 8 published machine learning classifiers to test viability for use in the rapid mobile detection of autism in short home videos. For all of the 8 models, the source of training and validation data was medical records generated through the administration of one of two gold-standard instruments in the diagnosis of autism, the ADOS or the ADI-R. The ADOS has several modules containing approximately 30 features that correspond to developmental level of the individual under assessment. Module 1 is used on individuals with limited or no vocabulary. Module 2 is used on individuals who use phrase speech but who are not fluent. Module 3 is used on individuals who are fluent speakers. The ADI-R is a parent-directed interview that includes $>90$ elements each asked of the parent, with multiple choices for answers. Each model was trained on item-level outcomes from the administration of either the ADOS and ADI-R and optimized for accuracy, sparsity of features, and interpretability.\n\nFor the purpose of brevity without omission of detail, we opted to create an abbreviation for each model using a basic naming convention. This abbreviation took the form of \"model_type\"-\"number of features.\" For example, we used ADTree8 to refer to the use of an alternating decision tree (ADTree) with 8 features developed from medical data from the administration of the diagnostic instrument ADOS Module 1, and LR5 to refer to the LR with 5 behavioral features developed from analysis of ADOS Module 2 medical record data, and so on.\n\n\nContext: This section describes the machine learning classifiers used in a study evaluating their potential for rapid autism detection using short home videos.", "metadata": { "doc_id": "Tariq2018_12", "source": "Tariq2018" } }, { "page_content": "Text: ADTree7. We (Wall and colleagues [29]) applied machine learning to electronic medical record data recorded through the administration of the ADI-R in the diagnostic assessment of children at risk for autism. We used an $80 \\%: 20 \\%$ training and testing split and performed 10 -fold cross-validation for a sample of 891 children with autism and 75 non-autism control participants with an ADTree model containing 7 features. The ADTree uses boosting to manage class imbalance [36,37]. We also performed up-sampling through 1,000 bootstrap permutations to manage class imbalance. The model was validated in a clinical trial on 222 participants [24] and in a BID consisting of 680 individuals ( 462 with autism) [32]. The lowest sensitivity and specificity exhibited were 89.9 and 79.7 , respectively (UAR $=84.8 \\%$ ).\n\nADTree8. We [30] used a dataset of score sheets from ADOS Module 2 for 612 children with ASD and 15 non-autism control participants with a $90 \\%: 10 \\%$ training and testing split and 10 -fold cross-validation to train and test an ADTree model with 8 of the 29 Module 2 features. The ADTree uses boosting and has inherent robustness to class imbalance [36,37]. We also performed up-sampling through 1,000 bootstrap permutations to test the sensitivity of model performance to class imbalance. This 8 -feature ADTree model was independently tested on 446 individuals with autism by Wall and colleagues [30], on 2,333 individuals with autism and 238 without autism by Duda and colleagues [25], and on 1,033 individuals ( 858 autism, 73 autism spectrum, 102 non-spectrum) by Bone and colleagues [32]. The lowest sensitivity and specificity reported were $97.1 \\%$ and $83.3 \\%$, respectively (UAR $=90.2 \\%$ ).\n\n\nContext: Applications of ADTree models for autism diagnosis using various datasets and validation studies.", "metadata": { "doc_id": "Tariq2018_13", "source": "Tariq2018" } }, { "page_content": "Text: LR9. We [26] performed training with ADOS Module 2 records on 362 individuals with autism and 282 individuals without autism with backward feature selection and iterative removal of the single lowest-ranked feature across 10 folds each with a $90 \\%: 10 \\%$ class split. Classes were weighted inversely proportional to class size to manage imbalance. The model with the highest sensitivity and specificity and lowest number of features, LR with L1 regularization and 9 features, was selected for testing. We tested the model on independent data from 1,089 individuals with autism and 66 individuals with no autism diagnosis. The lowest sensitivity and specificity identified were $98.8 \\%$ and $89.4 \\%$, respectively (UAR $=94.1 \\%$ ).\n\nSVM12. We [26] used score sheets from ADOS Module 3 generated by the evaluation of 510 children with ASD and 93 non-ASD control participants. These data were split into a $90 \\%$ training and $10 \\%$ testing set. Training and parameter tuning were performed with stepwise backward feature selection and iterative removal of the single lowest-ranked feature across 10 folds. Classes were weighted inversely proportional to class size to manage imbalance. Several models were fit to each of the feature cross-validation folds. The model with the highest sensitivity and specificity and lowest number of features, a Support Vector Machine (SVM) with a radial basis function, was then applied to the test set to measure generalization error. We tested the model on 1,924 individuals with autism and 214 individuals who did not qualify for an autism diagnosis. The lowest sensitivity and specificity identified on the test set were $97.7 \\%$ and $97.2 \\%$, respectively (UAR $=97.5 \\%$ ).\n\n\nContext: Machine learning models for autism detection using ADOS assessment data.", "metadata": { "doc_id": "Tariq2018_14", "source": "Tariq2018" } }, { "page_content": "Text: LR5 and SVM5. In this experiment, we [27] used medical records generated through the administration of ADOS Module 2 for 1,319 children with autism and 70 non-autism control participants. The dataset was split $80 \\%: 20 \\%$ into train and test sets, with the same proportion for participants with and without ASD in each set. Class imbalance was managed by setting class weights inversely proportional to the class sizes. A 10 -fold cross-validation was used to select features, and a separate 10 -fold cross-validation was run for hyperparameter tuning prior to testing the performance. An SVM and an LR model with L1 regularization showed the highest test performance with 5 features. The lowest sensitivity and specificity exhibited on the test set for SVM5 were $98 \\%$ and $58 \\%$, respectively, (UAR $=78 \\%$ ) and $93 \\%$ and $67 \\%$, respectively, (UAR $=80 \\%$ ) for LR5.\n\nLR10 and SVM10. In this experiment, we [27] used medical records generated through the administration of ADOS Module 3 for 2,870 children with autism and 273 non-autism control participants. The dataset was split $80 \\%: 20 \\%$ into train and test sets, with the same proportion for participants with and without ASD in each set. Class imbalance was managed by setting class weights inversely proportional to the class sizes. A 10 -fold cross-validation was used to select features, and a separate 10 -fold cross validation was run for hyperparameter tuning prior to testing the performance. An SVM and an LR model with L1 regularization showed the highest test performance with 10 features. The lowest sensitivity and specificity exhibited on the independent test set for SVM10 were $95 \\%$ and $87 \\%$, respectively, (UAR $=91 \\%$ ) and $90 \\%$ and $89 \\%$, respectively, (UAR $=89.5 \\%$ ) for LR10.\n\nimg-0.jpeg\n\n\nContext: Machine learning models (SVM and LR with L1 regularization) were tested on datasets from ADOS modules to predict autism, with varying numbers of features and performance metrics reported.", "metadata": { "doc_id": "Tariq2018_15", "source": "Tariq2018" } }, { "page_content": "Text: img-0.jpeg\n\nFig 1. Feature-to-classifier mapping. Video analysts scored each video with 30 features. This matrix shows which feature corresponds to which classifier. Darker colored features indicate higher overlap, and lighter colors indicate lower overlap across the models. The features are rank ordered according to their frequency of use across the 8 classifiers. Further details about the classifiers are provided in Table 1. The bottom 7 features were not part of the machine learning process but were chosen because of their potential relationship with the autism phenotype and for use in further evaluation of the models' feature sets when constructing a video feature-specific classifier. ADTree7, 7-feature alternating decision tree; ADTree8, 8 -feature alternating decision tree; LR5, 5 -feature logistic regression classifier; LR10, 10-feature logistic regression classifier; SVM5, 5 -feature support vector machine; SVM10, 10-feature support vector machine; SVM12, 12-feature support vector machine. https://doi.org/10.1371/journal.pmed.1002705.g001\n\nAccounting for overlap in the features selected, these 8 models measure 23 unique features in total. The test accuracy for each model was $>90 \\%$. All models contain approximately $90 \\%$ fewer questions than the ADI-R and $70 \\%-84 \\%$ fewer questions than the total features measured within the ADOS. An additional 7 features were chosen for their potential diagnostic value and scored by video raters to assess their suitability for scoring home videos, creating a total of 30 features for the mobile video rating process described below (Fig 1).\n\nRecruitment and video collection\n\n\nContext: A study using machine learning to analyze YouTube videos for autism detection, detailing feature selection and classifier mapping.", "metadata": { "doc_id": "Tariq2018_16", "source": "Tariq2018" } }, { "page_content": "Text: Recruitment and video collection\n\nUnder an approved Stanford University IRB protocol, we developed a mobile portal to facilitate the collection of videos of children with ASD, from which participants electronically consented to participate and upload their videos. Participants were recruited via crowdsourcing methods [38-41] targeted at social media platforms and listservs for families of children with autism. Interested participants were directed to a secure and encrypted video portal website to consent to participate. We required participants to be at least 18 years of age and the primary care provider(s) for a child with autism between the ages of 12 months and 17 years. Participants provided videos either through direct upload to the portal or via reference to a video already uploaded to YouTube together with age, diagnosis, and other salient characteristics. We considered videos eligible if they (1) were between 1 and 5 minutes in length, (2) showed the face and hands of the child, (3) showed clear opportunities for or direct social engagement, and (4) involved opportunities for the use of an object such as a utensil, crayon, or toy.\n\n\nContext: Methods section describing participant recruitment and video data collection for an autism detection study.", "metadata": { "doc_id": "Tariq2018_17", "source": "Tariq2018" } }, { "page_content": "Text: We relied on self-reported information provided by the parents concerning the child's official diagnosis of autism or non-autism, the age of the child when the video was submitted, and additional demographic information for videos that were submitted directly to the web portal. For videos that were provided via YouTube URLs, we used YouTube metatags to confirm the age and diagnosis of the child in the video. If a video did not include a metatag for the age of the child in the video, the age was assigned following full agreement among the estimates made by 3 clinical practitioners in pediatrics. To evaluate the accuracy of the parents' selfreport and to safeguard against reporting biases, we commissioned a practicing pediatric specialist certified to administer the ADOS to review a random selection of 20 videos. We also commissioned a developmental pediatrician to review a nonoverlapping random selection of 10 additional videos. These clinical experts classified each video as \"ASD\" or \"non-ASD.\"\n\nFeature tagging of videos to run machine learning models\n\nWe employed a total of 9 video raters who were either students (high school, undergraduate, or graduate-level) or working professionals. None had training or certification for detection or diagnosis of autism. All were given instructions on how to tag the 30 questions and were asked to score 10 example videos before performing independent feature tagging of new videos.\n\n\nContext: Methods for data collection and feature tagging for machine learning models used to detect autism in YouTube videos.", "metadata": { "doc_id": "Tariq2018_18", "source": "Tariq2018" } }, { "page_content": "Text: Table 1. Eight machine learning classifiers used for video analysis and autism detection. The models were constructed from an analysis of archived medical records from the use of standard instruments, including the ADOS and the ADI-R. All 8 models identified a small, stable subset of features in cross-validation experiments. The total numbers of affected and unaffected control participants for training and testing are provided together with measures of accuracy on the test set. Four models were tested on independent datasets and have been mentioned in a separate \"Test\" category. The remaining 4 , indicated with \"Train/test,\" used the given dataset with an $80 \\%: 20 \\%$ train:test split to calculate test accuracy on the $20 \\%$ held-out test set. The naming convention of the classifiers is \"model type\"-\"number of features\".\n\n\nContext: A summary of machine learning classifiers used for autism detection through video analysis, including their performance metrics and feature selection.", "metadata": { "doc_id": "Tariq2018_19", "source": "Tariq2018" } }, { "page_content": "Text: Classifier Medical record source $#$ features $N_{\\text {ASD }}$ $N_{\\text {non-ASD }}$ Mean age (SD) \\% Male ( $N$ ) Test sensitivity Test specificity Test accuracy ADTree8 [30] ADOS Module 1 8 Train: 612 Train:15 6.16 $(4.16)$ $\\begin{aligned} & 76.8 \\% \\ & (N=2,009) \\end{aligned}$ $100 \\%$ $100 \\%$ $100 \\%$ Test [30]: 446 Test [30]: 0 Test [25]: 2,333 Test [25]: 238 Test [32]: 931 Test [32]: 102 ADTree7 [29] ADI-R 7 Train: 891 Train: 75 8.5 (3.3) $\\begin{aligned} & 65 \\% \\ & (N=628) \\end{aligned}$ $100 \\%$ $1.13 \\%$ $99.9 \\%$ Test [24]: 222 Test [24]: 0 Test [32]: 462 Test [32]: 218 SVM with L1 norm (SVM5) [27] ADOS Module 2 5 Train/test: 1,319 Train/test: 70 $\\begin{aligned} & 6.92 \\ & (2.83) \\end{aligned}$ $\\begin{aligned} & 80 \\% \\ & (N=1,101) \\end{aligned}$ $98 \\%$ $58 \\%$ $98 \\%$ LR with L2 norm (LR5) [27] ADOS Module 2 5 Train/test: 1,319 Train/test: 70 $\\begin{aligned} & 6.92 \\ & (2.83) \\end{aligned}$ $\\begin{aligned} & 80 \\% \\ & (N=1,101) \\end{aligned}$ $93 \\%$ $67 \\%$ $95 \\%$ LR with L1 norm (LR9) [26] ADOS Module 2 9 Train: 362 Train: 282 $\\begin{aligned} & 11.75 \\ & (10) \\end{aligned}$ $\\begin{aligned} & 76.4 \\% \\ & (N=1,375) \\end{aligned}$ $98.81 \\%$ $89.39 \\%$ $98.27 \\%$ Test: 1,089 Test: 66 Radial kernel SVM (SVM12) [26] ADOS Module 3 12 Train: 510 Train: 93 $\\begin{aligned} & 16.25 \\ & (11.58) \\end{aligned}$ $\\begin{aligned} & 76.4 \\% \\ & (N=2,094) \\end{aligned}$ $97.71 \\%$ $97.2 \\%$ $97.66 \\%$ Test: 1,924 Test: 214 Linear SVM (SVM10) [27] ADOS Module 3 10 Train/test: 2,870 Train/test: 273 $\\begin{aligned} & 9.08 \\ & (3.08) \\end{aligned}$ $\\begin{aligned} & 81 \\% \\ & (N=2,557) \\end{aligned}$ $95 \\%$ $87 \\%$ $97 \\%$ LR (LR10) [27] ADOS Module 3 10 Train/test: 2,870 Train/test: 273 $\\begin{aligned} & 9.08 \\ & (3.08) \\end{aligned}$ $\\begin{aligned} & 81 \\% \\ & (N=2,557) \\end{aligned}$ $90 \\%$ $89 \\%$ $94 \\%$\n\n\nContext: A table summarizing the performance of various machine learning classifiers for autism spectrum disorder (ASD) detection using different medical record features.", "metadata": { "doc_id": "Tariq2018_20", "source": "Tariq2018" } }, { "page_content": "Text: Abbreviations: ADI-R, Autism Diagnostic Interview-Revised; ADOS, Autism Diagnostic Observation Schedule; ADTree7, 7-feature alternating decision tree; ADTree8, 8 -feature alternating decision tree; LR, logistic regression; LR5, 5-feature LR classifier; LR10, 10-feature LR classifier; SVM, support vector machine; SVM5, 5-feature SVM; SVM10, 10-feature SVM; SVM12, 12-feature SVM.\n\nAfter training, we provided the raters with unique usernames and passwords to access the secure online portal to watch videos and answer 30 questions for each video needed by the feature vectors to run the 8 machine learning classifiers (Table 1). Features were presented to the video raters as multiple-choice questions written at an approximately seventh-grade reading level. The raters, who remained blind to diagnosis throughout the study, were tasked to choose one of the tags for each feature that best described the child's behavior in the video. Each response to a feature was then mapped to a score between 0 and 3, with higher scores indicating more severe autism features in the measured behavior, or 8 to indicate that the feature could not be scored. The behavioral features and the overlap across the models are provided in Fig 1.\n\n\nContext: Machine learning classifiers were trained on behavioral features extracted from YouTube videos, and this section describes the setup for data collection and feature scoring by raters.", "metadata": { "doc_id": "Tariq2018_21", "source": "Tariq2018" } }, { "page_content": "Text: To test the viability of feature tagging videos for rapid machine learning detection and diagnosis of autism, we empirically identified a minimum number of video raters needed to score parent-provided home videos. We selected a random subset of videos from the full set of videos collected through our crowdsourced portal and ran the ADTree8 [30] model on feature vectors tagged by all 9 raters. We chose to run only ADTree8 for efficiency reasons and because this model has been previously validated in 2 independent studies [25,32]. We used a sample-with-replacement permutation procedure to measure accuracy as a function of majority rater agreement with the true diagnostic classification. We incrementally increased the number of video raters per trial by 1 rater, starting with 1 and ending with 9 , drawing with replacement 1,000 times per trial. When considering only 2 raters, we required perfect class agreement between the raters. With an odd number of raters, we required a strict majority consensus. When an even number of raters disagreed on classification, we used an independent and randomly chosen rater's score to break the tie.\n\nAfter determining the minimally viable number of video raters, we used that minimum to generate the full set of 30 -feature vectors on all videos. Seven of the models were written in Python 3 using the package scikit-learn, and one was written in R. We ran these 8 models on our feature matrices after feature tagging on videos. We measured the model accuracy through comparison of the raters' majority classification result with the true diagnosis. We evaluated model performance further by age categories: $\\leq 2$ years, $>2$ to $\\leq 4$ years, $>4$ years to $\\leq 6$ years, and $>6$ years. For each category, we calculated accuracy, sensitivity, and specificity.\n\n\nContext: Experimental validation of a machine learning approach to autism detection using crowd-sourced videos, including determining the minimum number of raters needed and evaluating model performance across different age groups.", "metadata": { "doc_id": "Tariq2018_22", "source": "Tariq2018" } }, { "page_content": "Text: We collected timed data from each rater for each video, which began when a video rater pressed \"play\" on the video and concluded when a video rater finished scoring by clicking \"submit\" on the video portal. We used these time stamps to calculate the time spent annotating each video. We approximated the time taken to answer the questions by excluding the length of the video from the total time spent to score a video.\n\nBuilding a video feature classifier\n\nThe process of video feature tagging provides an opportunity to generate a crowdsourced collection of independent feature measurements that are specific to the video of the child as well as independent rater impressions of that child's behaviors. This in turn has the ability to generate a valuable feature matrix to develop models that include video-specific features rather than features identified through analysis on archived data generated through administration of the SOC (as is the case for all classifiers contained in Table 1). To this end, and following the completion of the annotation on all videos by the minimum number of raters, we performed machine learning on our video feature set. We used LR with an elastic net penalty [42] (LR-EN-VF) to predict the autism class from the non-autism class. We randomly split the dataset into training and testing, reserving $20 \\%$ for the latter while using cross-validation on the training set to tune for hyperparameters. We used cross-validation for model hyperparameter\n\n\nContext: The authors describe their machine learning approach to predict autism class from video features crowdsourced through annotation of YouTube videos.", "metadata": { "doc_id": "Tariq2018_23", "source": "Tariq2018" } }, { "page_content": "Text: tuning by performing a grid search with different values of alpha (varying penalty weights) and L1 ratio (the mixing parameter determining how much weight to apply to L1 versus L2 penalties). Based on the resulting area under the curve (AUC) and accuracy from each combination, we selected the top-performing pair of hyperparameters. Using this pair, we trained the model using LR and balanced class weights to adjust weights inversely proportional to class frequencies in the input data. After determining the top-ranked features based on the trained model and the resulting coefficients, we validated the model on the reserved test set.\n\nIndependent test set for validation of video phenotyping processes\n\nWe used our video portal and crowdsourcing approaches to generate an independent collection of videos for evaluation and feature tagging by 3 different raters than those used in the primary analysis. These raters had similar characteristics to the original group (age, education, no clinical certifications in developmental pediatrics) and were trained for video tagging through the same procedures.\n\nEthics statement\n\nThis study was conducted under approval by Stanford University's IRB under protocol IRB31099. Informed and written consent was obtained from all study participants who submitted videos to the study.\n\nResults\n\nAll classifiers used for testing the time and accuracy of mobile video rating had accuracies above $90 \\%$ (Table 1). The union of features across these 8 classifiers (Table 1) was 23 (Fig 1). These features plus an additional 7 chosen for clinical validity testing were loaded into a mobile video rating portal to enable remote feature tagging by nonclinical video raters.\n\nWe collected a total of 193 videos (Table 2) with average video length of 2 minutes 13 seconds ( $\\mathrm{SD}=1$ minute 40 seconds). Of the 119 ASD videos, 72 were direct submissions made by\n\n\nContext: Mobile video rating portal development and validation using crowd-sourced data.", "metadata": { "doc_id": "Tariq2018_24", "source": "Tariq2018" } }, { "page_content": "Text: We collected a total of 193 videos (Table 2) with average video length of 2 minutes 13 seconds ( $\\mathrm{SD}=1$ minute 40 seconds). Of the 119 ASD videos, 72 were direct submissions made by\n\nTable 2. Demographic information on children in the collected home videos. We collected $N=193$ (119 ASD, 74 non-ASD) home videos for analysis. We excluded 31 videos because of inadequate labeling or video quality. We used a randomly chosen 25 autism and 25 non-autism videos to empirically define an optimal number of raters. Video feature tagging for machine learning was then done on 162 home videos.\n\nVideos N $N_{\\text {ASD }}$ $N_{\\text {non- }}$ ASD Mean age (SD) $\\leq 2$ years $\\leq 2$ years and $\\leq 4$ years $\\geq 4$ years and $\\leq 6$ years $\\geq 6$ years Percent male, ASD Percent male, non-ASD Total 193 119 74 4 years 4 months (2 years 1 month) $\\begin{aligned} & 24.4 \\% \\ & (n=47) \\end{aligned}$ $33.7 \\%(n=65)$ $25.9 \\%(n=50)$ $\\begin{aligned} & 15.5 \\% \\ & (n=31) \\end{aligned}$ $\\begin{aligned} & 39.38 \\% \\ & (n=76) \\end{aligned}$ $23.32 \\%(n=45)$ Excluded 31 3 28 3 years 8 months (1 year 11 months) $\\begin{aligned} & 32.3 \\% \\ & (n=10) \\end{aligned}$ $29.0 \\%(n=9)$ $25.8 \\%(n=8)$ $\\begin{aligned} & 9.6 \\% \\ & (n=4) \\end{aligned}$ $3.23 \\%(n=1)$ $58.06 \\%(n=18)$ Total videos used for analysis of all 8 classifiers 162 116 46 4 years 4 months (2 years 2 months) $\\begin{aligned} & 22.8 \\% \\ & (n=37) \\end{aligned}$ $34.5 \\%(n=56)$ $25.9 \\%(n=42)$ $\\begin{aligned} & 16.7 \\% \\ & (n=27) \\end{aligned}$ $\\begin{aligned} & 67.2 \\% \\ & (n=78) \\end{aligned}$ $56.5 \\%(n=26)$ Subset of videos used to find minimally viable number of raters 50 25 25 4 years 6 months (2 years 4 months) $\\begin{aligned} & 28 \\% \\ & (n=14) \\end{aligned}$ $34 \\%(n=17)$ $18 \\%(n=9)$ $\\begin{aligned} & 20 \\% \\ & (n=10) \\end{aligned}$ $\\begin{aligned} & 48 \\% \\ & (n=12) \\end{aligned}$ $44 \\%(n=11)$\n\nAbbreviation: ASD, autism spectrum disorder.\n\n\nContext: A study analyzing home videos to develop machine learning classifiers for autism spectrum disorder.", "metadata": { "doc_id": "Tariq2018_25", "source": "Tariq2018" } }, { "page_content": "Text: Abbreviation: ASD, autism spectrum disorder.\n\nthe primary caregiver of the child, and 47 were links to an existing video on YouTube. Of the 74 non-ASD videos, 46 non-ASD videos were links to existing YouTube videos, and 28 were direct submissions from the primary caregiver. We excluded 31 videos because of insufficient evidence for the diagnosis $(n=25)$ or inadequate video quality $(n=6)$, leaving 162 videos ( 116 with ASD and 46 non-ASD) which were loaded into our mobile video rating portal for the primary analysis. To validate self-reporting of the presence or absence of an ASD diagnosis, 2 clinical staff trained and certified in autism diagnosis evaluated a random selection of 30 videos ( 15 with ASD and 15 non-ASD) from the 162 videos. Their classifications had perfect correspondence with the diagnoses provided through self-report by the primary caregiver.\n\nWe randomly selected 50 videos ( 25 ASD and 25 non-ASD) from the total 162 collected videos and had 9 raters feature tag all in an effort to evaluate the potential for an optimal number of raters, with optimal being defined through a balance of scalability and information content. The average video length of this random subset was 1 minute 54 seconds ( $\\mathrm{SD}=46$ seconds) for the ASD class and 2 minutes 36 seconds ( $\\mathrm{SD}=1$ minute 15 seconds) for the nonASD class. We then ran the ADTree8 (Table 1) model on the feature vectors generated by the 9 raters. We found the difference in accuracy to be statistically insignificant between 3 ratersthe minimum number to have a majority consensus on the classification with no ties-and 9\n\nimg-1.jpeg\n\n\nContext: Methods section describing data collection and validation for an autism detection study using YouTube videos.", "metadata": { "doc_id": "Tariq2018_26", "source": "Tariq2018" } }, { "page_content": "Text: img-1.jpeg\n\nFig 2. Accuracy across different permutations of $\\mathbf{9}$ raters for $\\mathbf{5 0}$ videos. We performed the analysis to determine the optimal number (the minimum number to reach a consensus on classification) of video raters needed to maintain accuracy without loss of power. Nine raters analyzed and generated feature tags for a subset of $n=50$ videos ( $n=25$ ASD, $n=25$ non-ASD) on which we ran the ADTree8 classifier (Table 1). The increase in accuracy conferred by the use of 3 versus 9 raters was not significant. We therefore set the optimal rater number to 3 for subsequent analyses. ADTree8, 8 -feature alternating decision tree; ASD, autism spectrum disorder.\n\nraters (Fig 2). We therefore elected to use a random selection of 3 raters from the 9 to feature tag all 162 crowdsourced home videos.\n\nModel performance\n\nThree raters performed video screening and feature tagging to generate vectors for each of the 8 machine learning models for comparative evaluation of performance (Fig 3). All classifiers had sensitivity $>94.5 \\%$. However, only 3 of the 8 models exhibited specificity above $50 \\%$. The top-performing classifier was LR5, which showed an accuracy of $88.9 \\%$, sensitivity of $94.5 \\%$, and specificity of $77.4 \\%$. The next-best-performing models were SVM5 with $85.4 \\%$ accuracy ( $54.9 \\%$ specificity) and LR10 with $84.8 \\%$ accuracy ( $51 \\%$ specificity).\n\n\nContext: Machine learning model performance evaluation using crowd-sourced video ratings.", "metadata": { "doc_id": "Tariq2018_27", "source": "Tariq2018" } }, { "page_content": "Text: LR5 exhibited high accuracy on all age ranges with the exception of children over 6 years old (although note that we had limited examples of non-ASD [ $n=1$ ] class in this range). This model performed best on children between the ages of 4 and 6 years, with sensitivity and specificity both above $90 \\%$ (Fig 4, Table 3). SVM5 and LR10 showed an increase in performance on children ages $2-4$ years, both with $100 \\%$ sensitivity and the former with $66.7 \\%$ and the latter with $58.8 \\%$ specificity. The 3 raters agreed unanimously on 116 out of 162 videos ( $72 \\%$ ) when using the top-performing classifier, LR5. The interrater agreement (IRA) for this model was above $75 \\%$ in all age ranges with the exception of the youngest age group of children, those\n\nimg-2.jpeg\n\nFig 3. Overall procedure for rapid and mobile classification of ASD versus non-ASD and performance of models from Table 1. Participants were recruited to participate via crowdsourcing methods and provided video by direct upload or via a preexisting YouTube link. The minimum for majority rules of 3 video raters tagged all features, generating feature vectors to run each of the 8 classifiers automatically. The sensitivity and specificity based on majority outcome generated by the 3 raters on 162 ( 119 with autism) videos are provided. Highlighted in yellow is the best performing model, LR5. ADTree7, 7-feature alternating decision tree; ADTree8, 8-feature alternating decision tree; ASD, autism spectrum disorder; LR5, 5-feature logistic regression classifier; LR9, 9-feature logistic regression classifier; LR10, 10-feature logistic regression classifier; SVM5, 5-feature support vector machine; SVM10, 10-feature support vector machine; SVM12, 12-feature support vector machine.\n\nimg-3.jpeg\n\n\nContext: Performance evaluation of machine learning models for autism spectrum disorder (ASD) classification using crowd-sourced video data.", "metadata": { "doc_id": "Tariq2018_28", "source": "Tariq2018" } }, { "page_content": "Text: img-3.jpeg\n\nFig 4. Performance for LR5 by age. LR5 exhibited the highest classifier performance ( $89 \\%$ accuracy) out of the 8 classifiers tested (Table 1). This model performed best on children between the ages of 2 and 6 years. (A) shows the performance of LR5 across 4 age ranges, and (B) provides the ROC curve for LR5's performance for children ages 2 to 6 years. Table 3 provides additional details, including the number of affected and unaffected control participants within each age range. AUC, area under the curve; LR5, 5 -feature logistic regression classifier; ROC, receiver operating characteristic. https://doi.org/10.1371/journal.pmed.1002705.g004 under 2 years, for which there was a greater frequency of disagreement. The numbers of nonASD representatives were small for the older age ranges evaluated (Table 3).\n\nThe median time for the 3 raters to watch and score a video was 4 minutes (Table 4). Excluding the time spent watching the video, raters required a median of 2 minutes 16 seconds to tag all 30 features in the analyst portal. We found a significant difference $(p=0.0009)$ between the average time spent to score the videos of children with ASD and the average time spent to score the non-ASD videos ( 6 minutes 36 seconds compared with 5 minutes 8 seconds).\n\nIndependent validation\n\n\nContext: Results of a study using machine learning to detect autism from YouTube videos, specifically focusing on the performance of a 5-feature logistic regression classifier (LR5) and validation data.", "metadata": { "doc_id": "Tariq2018_29", "source": "Tariq2018" } }, { "page_content": "Text: Independent validation\n\nTo validate the feasibility and accuracy of rapid feature tagging and machine learning on short home videos, we launched a second effort for crowdsourcing videos of children with and without autism to generate an independent replication dataset. We collected 66 videos, 33 of children with autism and 33 non-ASD. This set of videos was comparable to the initial set of 162 videos in terms of gender, age, and video length. The average age for children with ASD was 4 years 5 months ( $\\mathrm{SD}=1$ year 9 months), and the average age for non-ASD children was 3 years 11 months ( $\\mathrm{SD}=1$ year 7 months). Forty-two percent $(n=14)$ of the children with ASD were male and $45 \\%(n=15)$ of the non-ASD children were male. The average video length was 3 minutes 24 seconds, with an SD of 45 seconds. For this independent replication, we used 3 different raters, each with no official training or experience with developmental pediatrics. The raters required a median time of 6 minutes 48 seconds for complete feature tagging. LR5 again yielded the highest accuracy, with a sensitivity of $87.8 \\%$ and a specificity of $72.7 \\%$. A total of 13 of the 66 videos were misclassified, with 4 false negatives.\n\nTable 3. Model performance by age. This table details the accuracy, sensitivity, specificity, precision, and recall for 8 classifiers (Table 1) and for 4 age ranges found in evaluation of 162 home videos with an average length of 2 minutes. We also provide the IRA, which indicates the frequency with which the model results from all 3 raters' feature tags agreed on class. The top-performing classifier was LR5, which yielded an accuracy of $88.9 \\%$, sensitivity of $94.5 \\%$, and specificity of $77.4 \\%$. Other notable classifiers were SVM5 and LR10, which yielded $85.4 \\%$ and $84.8 \\%$ accuracy, respectively. These 3 best-performing classifiers showed improved classification power within certain age ranges.\n\n\nContext: Validation of machine learning models using crowdsourced videos of children with and without autism.", "metadata": { "doc_id": "Tariq2018_30", "source": "Tariq2018" } }, { "page_content": "Text: Age group Statistic ADTree8 ADTree7 SVM5 LR5 LR9 SVM12 SVM10 LR10 Overall (116 ASD, 46 non-ASD, $64.4 \\%$ male) Sensitivity $100 \\%$ $94.5 \\%$ $100 \\%$ $94.5 \\%$ $100 \\%$ $100 \\%$ $100 \\%$ $100 \\%$ Specificity $22.4 \\%$ $37.3 \\%$ $54.9 \\%$ $77.4 \\%$ $31.4 \\%$ $0 \\%$ $17.6 \\%$ $51.0 \\%$ Accuracy $76.1 \\%$ $76.3 \\%$ $85.4 \\%$ $88.9 \\%$ $78.4 \\%$ $71.6 \\%$ $73.9 \\%$ $84.8 \\%$ IRA $76.1 \\%$ $67.5 \\%$ $68.4 \\%$ $71.7 \\%$ $75.9 \\%$ $71.6 \\%$ $79.5 \\%$ $70.9 \\%$ Precision $74.3 \\%$ $76.3 \\%$ $82.3 \\%$ $89.7 \\%$ $76.0 \\%$ $71.6 \\%$ $72.4 \\%$ $82.0 \\%$ UAR $61.2 \\%$ $65.9 \\%$ $77.5 \\%$ $86.0 \\%$ $65.7 \\%$ $50 \\%$ $58.8 \\%$ $75.5 \\%$ $\\leq 2$ years (17 ASD, 20 non-ASD, $56.8 \\%$ male) Sensitivity $100 \\%$ $100 \\%$ $100 \\%$ $93.3 \\%$ $100 \\%$ $100 \\%$ $100 \\%$ $100 \\%$ Specificity $14.2 \\%$ $18.2 \\%$ $38.1 \\%$ $77.3 \\%$ $22.7 \\%$ $0 \\%$ $14.3 \\%$ $47.6 \\%$ Accuracy $50 \\%$ $51.4 \\%$ $62.9 \\%$ $83.8 \\%$ $54.1 \\%$ $40.5 \\%$ $50.0 \\%$ $96.2 \\%$ IRA $53 \\%$ $48.6 \\%$ $45.7 \\%$ $51.4 \\%$ $48.6 \\%$ $100 \\%$ $66.7 \\%$ $100 \\%$ Precision $45.5 \\%$ $45.5 \\%$ $51.9 \\%$ $73.7 \\%$ $46.9 \\%$ $45.9 \\%$ $45.4 \\%$ $57.7 \\%$ UAR $57.1 \\%$ $59.1 \\%$ $69.1 \\%$ $85.3 \\%$ $61.4 \\%$ $50.0 \\%$ $57.2 \\%$ $73.8 \\%$ $>2$ years and $\\leq 4$ years (39 ASD, 17 non-ASD, $66.1 \\%$ male) Sensitivity $100 \\%$ $97.3 \\%$ $100 \\%$ $91.8 \\%$ $100.0 \\%$ $100 \\%$ $100 \\%$ $100 \\%$ Specificity $23.6 \\%$ $50 \\%$ $66.7 \\%$ $73.7 \\%$ $38.9 \\%$ $0 \\%$ $22.2 \\%$ $58.8 \\%$ Accuracy $76.4 \\%$ $50 \\%$ $88.9 \\%$ $85.7 \\%$ $80.4 \\%$ $66.7 \\%$ $74.5 \\%$ $86.8 \\%$ IRA $74.5 \\%$ $81.8 \\%$ $63.0 \\%$ $75.0 \\%$ $76.8 \\%$ $100 \\%$ $85.4 \\%$ $77.4 \\%$ Precision $74.5 \\%$ $80.0 \\%$ $85.7 \\%$ $87.2 \\%$ $77.6 \\%$ $69.6 \\%$ $72.5 \\%$ $83.7 \\%$ UAR $61.8 \\%$ $73.7 \\%$ $83.4 \\%$ $82.8 \\%$ $69.5 \\%$ $50.0 \\%$ $61.1 \\%$ $79.4 \\%$ $>4$ years and $\\leq 6$ years (34 ASD, 8 non-ASD, $61.9 \\%$ male) Sensitivity $100 \\%$ $96.8 \\%$ $100 \\%$ $96.9 \\%$ $100.0 \\%$ $100 \\%$ $100 \\%$ $100 \\%$ Specificity $40.0 \\%$ $60 \\%$ $72.7 \\%$ $90.9 \\%$ $40.0 \\%$ $0 \\%$ $18.2 \\%$ $50.0 \\%$ Accuracy $85.4 \\%$\n\n\nContext: A table comparing the performance of various machine learning models (AD, IRA, Precision, UAR) for autism spectrum disorder (ASD) detection across different age groups.", "metadata": { "doc_id": "Tariq2018_31", "source": "Tariq2018" } }, { "page_content": "Text: male) Sensitivity $100 \\%$ $96.8 \\%$ $100 \\%$ $96.9 \\%$ $100.0 \\%$ $100 \\%$ $100 \\%$ $100 \\%$ Specificity $40.0 \\%$ $60 \\%$ $72.7 \\%$ $90.9 \\%$ $40.0 \\%$ $0 \\%$ $18.2 \\%$ $50.0 \\%$ Accuracy $85.4 \\%$ $87.8 \\%$ $92.9 \\%$ $95.3 \\%$ $85.7 \\%$ $74.4 \\%$ $79.1 \\%$ $88.1 \\%$ IRA $85.4 \\%$ $78.0 \\%$ $78.6 \\%$ $79.1 \\%$ $85.7 \\%$ $93.0 \\%$ $76.7 \\%$ $71.4 \\%$ Precision $83.8 \\%$ $88.2 \\%$ $91.2 \\%$ $96.9 \\%$ $84.2 \\%$ $80.9 \\%$ $78.0 \\%$ $86.5 \\%$ UAR $70.0 \\%$ $78.4 \\%$ $86.4 \\%$ $93.9 \\%$ $70.0 \\%$ $50.0 \\%$ $59.1 \\%$ $75.0 \\%$ $>6$ years (26 ASD, 1 non-ASD, $74.1 \\%$ male) Sensitivity $100 \\%$ $84.6 \\%$ $100 \\%$ $96.2 \\%$ $100 \\%$ $100 \\%$ $100 \\%$ $100 \\%$ Specificity $0 \\%$ $0 \\%$ $0 \\%$ $0 \\%$ $0 \\%$ $0 \\%$ $0 \\%$ $0 \\%$ Accuracy $96.2 \\%$ $81.5 \\%$ $96.2 \\%$ $92.6 \\%$ $96.2 \\%$ $96.2 \\%$ $96.2 \\%$ $96.3 \\%$ IRA $96.2 \\%$ $70.4 \\%$ $96.2 \\%$ $81.5 \\%$ $96.2 \\%$ $100 \\%$ $100 \\%$ $85.2 \\%$ Precision $96.3 \\%$ $95.7 \\%$ $96.3 \\%$ $96.2 \\%$ $96.3 \\%$ $96.3 \\%$ $96.3 \\%$ $96.3 \\%$ UAR $50.0 \\%$ $42.3 \\%$ $50.0 \\%$ $48.1 \\%$ $50.0 \\%$ $50.0 \\%$ $50.0 \\%$ $50.0 \\%$\n\n\nContext: Performance metrics (sensitivity, specificity, accuracy, etc.) for autism detection models across different age groups and datasets.", "metadata": { "doc_id": "Tariq2018_32", "source": "Tariq2018" } }, { "page_content": "Text: Abbreviations: ADTree7, 7-feature alternating decision tree; ADTree8, 8-feature alternating decision tree; ASD, autism spectrum disorder; IRA, interrater agreement; LR5, 5-feature logistic regression classifier; LR9, 9-feature logistic regression classifier; LR10, 10-feature logistic regression classifier; SVM5, 5-feature support vector machine; SVM10, 10-feature support vector machine; SVM12, 12-feature support vector machine; UAR, unweighted average recall. https://doi.org/10.1371/journal.pmed.1002705.t003\n\nGiven the higher average time for video evaluation, we hypothesized that the videos contained challenging displays of autism symptoms. Therefore, we examined the probabilities generated by the LR5 model for the 13 misclassified videos. Two of the 4 false negatives and 4 of the 9 false positives had borderline probabilities scores between 0.4 and 0.6 . We elected to define a probability threshold between 0.4 and 0.6 to flag videos as inconclusive cases. Twentysix of the 66 videos fell within this inconclusive group when applying this threshold. When we excluded these 26 from our accuracy analysis, the sensitivity and specificity increased to $91.3 \\%$ and $88.2 \\%$, respectively.\n\nTable 4. Time required for mobile tagging of video features needed to run the machine learning models. We highlight the average length of videos (all participants, only participants with ASD, and only participants without ASD) as well as the average time required to watch and score the videos and the average time required from start to end of the scoring component alone.\n\n\nContext: Analysis of machine learning model performance and video evaluation times in autism detection.", "metadata": { "doc_id": "Tariq2018_33", "source": "Tariq2018" } }, { "page_content": "Text: Total time required for review and feature tagging Total time required for feature tagging alone Video length Overall Mean (SD) 6 minutes 9 seconds (5 minutes 28 seconds) 3 minutes 36 seconds (5 minutes 52 seconds) 2 minutes 13 seconds (1 minute 40 seconds) Median 4 minutes 0 seconds 2 minutes 16 seconds 1 minute 45 seconds Range 1 minute 0 seconds to 37 minutes 0 seconds 0 minutes 50 seconds to 35 minutes 42 seconds 0 minutes 25 seconds to 8 minutes 6 seconds ASD only Mean (SD) 6 minutes 36 seconds (5 minutes 54 seconds) 4 minutes 22 seconds (6 minutes 20 seconds) 2 minutes 4 seconds (1 minute 40 seconds) Median 5 minutes 0 seconds 2 minutes 40 seconds 1 minute 30 seconds Range 1 minute 0 seconds to 37 minutes 0 seconds 0 minutes 50 seconds to 35 minutes 42 seconds 0 minutes 25 seconds to 8 minutes 6 seconds Non-ASD only Mean (SD) 5 minutes 8 seconds (4 minutes 8 seconds) 2 minutes 18 seconds (4 minutes 22 seconds) 2 minutes 38 seconds (1 minute 34 seconds) Median 4 minutes 0 seconds 1 minute 21 seconds 2 minutes 11 seconds Range 1 minute 0 seconds to 30 minutes 0 seconds 0 minutes 50 seconds to 25 minutes 42 seconds 0 minutes 36 seconds to 6 minutes 42 seconds\n\nAbbreviation: ASD, autism spectrum disorder. https://doi.org/10.1371/journal.pmed. 1002705 . t004\n\nTraining a video feature-specific classifier\n\n\nContext: Results of a crowdsourcing study analyzing time spent reviewing and tagging features in videos of children, broken down by ASD vs. non-ASD groups.", "metadata": { "doc_id": "Tariq2018_34", "source": "Tariq2018" } }, { "page_content": "Text: Abbreviation: ASD, autism spectrum disorder. https://doi.org/10.1371/journal.pmed. 1002705 . t004\n\nTraining a video feature-specific classifier\n\nTo build a video feature-specific classifier, we trained an LR-EN-VF model on 528 (3 raters $\\times 176$ videos) novel measures of the 30 video features used to distinguish the autism class from the neurotypical cohort. Out of these 176 videos ( $\\mathrm{ASD}=121$, non-ASD $=58$ ), 162 (ASD $=116$, non-ASD $=46$ ) were from the analysis set, and 14 videos ( $\\mathrm{ASD}=5$, non$\\mathrm{ASD}=12$ ) were from the set of 66 validation videos. Model hyperparameters (alpha and L1 ratio) identified through 10 -fold cross-validation were 0.01 and 0.6 , respectively. We used a high L1 ratio to enforce sparsity and to decrease model complexity and the number of features. We had similar proportions ( 0.60 ) for non-ASD and ASD measures in the training set and held-out test set, which allowed us to create a model that generalizes well without a significant change in sensitivity or specificity on novel data. The model had an area under the receiver operating characteristic curve (AUC-ROC) of $93.3 \\%$ and accuracy of $87.7 \\%$ on the held-out test set. A comparison of LR-EN-VF with LR L2 penalty (no feature reduction) revealed similar results (AUC-ROC: $93.8 \\%$, test accuracy: $90.7 \\%$ ) (Fig 5). The top-8 features selected by the model consisted of the following, in order of highest to lowest rank: speech patterns, communicative engagement, understands language, emotion expression, sensory seeking, responsive social smile, stereotyped speech. One of these 8 features-sensory seeking-was not part of the full sets of items on the standard instrument data used in the development and testing of the 8 models depicted in Table 1. We then validated this classifier on the remaining 52 videos ( $\\mathrm{ASD}=28$, non-ASD $=21$ ) from the validation set, and the results showed an accuracy of $75.5 \\%$ and an AUC-ROC of $86.0 \\%$.\n\nDiscussion\n\n\nContext: The chunk describes the training and validation of a machine learning model (LR-EN-VF) using video features to classify individuals with autism spectrum disorder.", "metadata": { "doc_id": "Tariq2018_35", "source": "Tariq2018" } }, { "page_content": "Text: Discussion\n\nPrevious work [26-29] has shown that machine learning models built on records from standard autism diagnoses can achieve high classification accuracy with a small number of features. Although promising in terms of their minimal feature requirements and ability to generate an accurate risk score, their potential for improving autism diagnosis in practice has\n\nimg-4.jpeg\n\nFig 5. ROC curve for LR-EN-VF showing performance on test data along with an ROC for L2 loss with no feature reduction. The former chose 8 out of 30 video features. AUC, area under the curve; LR-EN-VF, logistic regression with an elastic net penalty; ROC, receiver operating characteristic. https://doi.org/10.1371/journal.pmed.1002705.g005 remained an open question. The present study tested the ability to reduce these models to the practice of home video evaluation by nonexperts using mobile platforms (e.g., tablets, smartphones). Independent tagging of 30 features by 3 raters blind to diagnosis enabled majority rules machine learning classification of 162 two-minute (average) home videos in a median of 4 minutes at $90 \\%$ AUC on children ages 20 months to 6 years. This performance was maintained at $89 \\%$ AUC ( $95 \\%$ CI $81 \\%-95 \\%$ ) in a prospectively collected and independent external set of 66 videos each with 3 independent rater measurement vectors. Taking advantage of the probability scores generated by the best-performing model (L1-regularized LR model with 5 features) to flag low-confidence cases, we were able to achieve a $91 \\%$ AUC, suggesting that the approach could benefit from the use of the scores on a more quantitative scale rather than just as a binary classification outcome.\n\n\nContext: Evaluation of a machine learning model for autism detection using home videos rated by non-experts.", "metadata": { "doc_id": "Tariq2018_36", "source": "Tariq2018" } }, { "page_content": "Text: By using a mobile format that can be accessed online, we showed that it is possible to get multiple independent feature vectors for classification. This has the potential to elevate confidence in classification outcome at the time of diagnosis (i.e., when 3 or more agree on class) while fostering the growth of a novel matrix of features from short home videos. In the second part of our study, we tested the ability for this video feature matrix to enable development of a new model that can generalize to the task of video-based classification of autism. We found that an 8 -feature LR model could achieve an AUC of 0.93 on the held-out subset and 0.86 on the prospective independent validation set. One of the features used by this model, sensory seeking, was not used by the instruments on which the original models were trained, suggesting the possibility that alternative features may provide added power for video classification.\n\nThese results support the hypothesis that the detection of autism can be done effectively at scale through mobile video analysis and machine learning classification to produce a quantified indicator of autism risk quickly. Such a process could streamline autism diagnosis to enable earlier detection and earlier access to therapy that has the highest impact during earlier windows of social development. Further, this approach could help to reduce the geographic and financial burdens associated with access to diagnostic resources and provide more equal\n\nopportunity to underserved populations, including those in developing countries. Further testing and refinement should be conducted to identify the most viable method(s) of crowdsourcing video acquisition and feature tagging. In addition, prospective trials in undiagnosed and in larger, more-balanced cohorts including examples of children with non-autism developmental delays will be needed to better understand the approach's potential for use in autism diagnosis.\n\nSupporting information\n\n\nContext: Mobile video analysis and machine learning classification for autism detection.", "metadata": { "doc_id": "Tariq2018_37", "source": "Tariq2018" } }, { "page_content": "Text: Supporting information\n\nS1 Table. Results of 8 classifiers on independent validation set. LR10, LR5, and ADTree7 are the top-3 best-performing classifiers on the validation set, which falls in line with the results observed on the test dataset of 162 videos used earlier. LR5 still performs with the highest specificity out of the 8 models. ADTree7, 7-feature alternating decision tree; LR5, 5-feature logistic regression classifier; LR10, 10-feature logistic regression classifier. (DOCX) S1 Text. Instructions for video raters. (DOCX) S1 Checklist. The tripod checklist. (DOCX)\n\nAcknowledgments\n\nWe would like to thank Kaitlyn Dunlap, the participating families, and each of our video raters for their important contributions to this study.\n\nAuthor Contributions\n\nConceptualization: Dennis Paul Wall. Data curation: Qandeel Tariq, Jena Daniels, Jessey Nicole Schwartz, Peter Washington, Haik Kalantarian, Dennis Paul Wall.\n\nFormal analysis: Qandeel Tariq, Peter Washington, Haik Kalantarian, Dennis Paul Wall. Funding acquisition: Dennis Paul Wall. Investigation: Qandeel Tariq, Jena Daniels, Jessey Nicole Schwartz, Dennis Paul Wall. Methodology: Qandeel Tariq, Jena Daniels, Dennis Paul Wall. Project administration: Jena Daniels, Jessey Nicole Schwartz, Dennis Paul Wall. Resources: Jena Daniels, Jessey Nicole Schwartz, Peter Washington, Haik Kalantarian, Dennis Paul Wall.\n\nSoftware: Qandeel Tariq, Dennis Paul Wall. Supervision: Dennis Paul Wall. Validation: Qandeel Tariq, Dennis Paul Wall. Visualization: Qandeel Tariq, Jessey Nicole Schwartz, Dennis Paul Wall. Writing - original draft: Qandeel Tariq, Jessey Nicole Schwartz, Dennis Paul Wall. Writing - review \\& editing: Qandeel Tariq, Jena Daniels, Jessey Nicole Schwartz, Peter Washington, Haik Kalantarian, Dennis Paul Wall.\n\nReferences\n\nPrince M, Patel V, Saxena S, Maj M, Maselko J, Phillips MR, et al. Global mental health 1 - No health without mental health. Lancet. 2007; 370(9590):859-77. https://doi.org/10.1016/S0140-6736(07) 61238-0 PMID: 17804063\n\n\nContext: Concluding sections of a research paper, including supplementary materials, acknowledgments, author contributions, and references.", "metadata": { "doc_id": "Tariq2018_38", "source": "Tariq2018" } }, { "page_content": "Text: Baio J, Wiggins L, Christensen DL, Maenner MJ, Daniels J, Warren Z, et al. Prevalence of Autism Spectrum Disorder Among Children Aged 8 Years-Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014. MMWR Surveillance Summaries. 2018; 67(6):1. https://doi.org/10. 15585/mmwr.ss6706a1 PMID: 29701730. PMCID: PMC5919599.\n\nHertz-Picciotto I, Delwiche L. The Rise in Autism and the Role of Age at Diagnosis. Epidemiology. 2009; 20(1):84-90. https://doi.org/10.1097/EDE.0b013e3181902d15 PMID: 19234401. PMCID: PMC4113600.\n\nChristensen DL, Baio J, Van Naarden Braun K, Bilder D, Charles J, Constantino JN, et al. Prevalence and Characteristics of Autism Spectrum Disorder Among Children Aged 8 Years-Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2012. MMWR Surveill Summ. 2016; 65 (3):1-23. https://doi.org/10.15585/mmwr.ss6503a1 PMID: 27031587.\n\nChristensen DL, Bilder DA, Zahorodny W, Pettygrove S, Durkin MS, Fitzgerald RT, et al. Prevalence and characteristics of autism spectrum disorder among 4-year-old children in the autism and developmental disabilities monitoring network. Journal of Developmental \\& Behavioral Pediatrics. 2016; 37 (1):1-8. https://doi.org/10.1097/DBP.0000000000000235 PMID: 26651088.\n\nBuescher AV, Cidav Z, Knapp M, Mandell DS. Costs of autism spectrum disorders in the United Kingdom and the United States. JAMA Pediatr. 2014; 168(8):721-8. https://doi.org/10.1001/jamapediatrics. 2014.210 PMID: 24911948.\n\nMcPartland JC, Reichow B, Volkmar FR. Sensitivity and specificity of proposed DSM-5 diagnostic criteria for autism spectrum disorder. J Am Acad Child Adolesc Psychiatry. 2012; 51(4):368-83. https://doi. org/10.1016/j.jaac.2012.01.007 PMID: 22449643. PMCID: PMC3424065.\n\nLord C, Rutter M, Goode S, Heemsbergen J, Jordan H, Mawhood L, et al. Austism diagnostic observation schedule: A standardized observation of communicative and social behavior. Journal of autism and developmental disorders. 1989; 19(2):185-212. PMID: 2745388.\n\n\nContext: Prevalence and diagnostic studies of Autism Spectrum Disorder (ASD).", "metadata": { "doc_id": "Tariq2018_39", "source": "Tariq2018" } }, { "page_content": "Text: Lord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. Journal of autism and developmental disorders. 1994; 24(5):659-85. PMID: 7814313.\n\nAssociation AP. Diagnostic and statistical manual of mental disorders (DSM-5®). Arlington, VA: American Psychiatric Pub; 2013.\n\nBernier R, Mao A, Yen J. Psychopathology, families, and culture: autism. Child Adolesc Psychiatr Clin N Am. 2010; 19(4):855-67. https://doi.org/10.1016/j.chc.2010.07.005 PMID: 21056350.\n\nDawson G. Early behavioral intervention, brain plasticity, and the prevention of autism spectrum disorder. Dev Psychopathol. 2008; 20(3):775-803. https://doi.org/10.1017/S0954579408000370 PMID: 18606031.\n\nMazurek MO, Handen BL, Wodka EL, Nowinski L, Butter E, Engelhardt CR. Age at first autism spectrum disorder diagnosis: the role of birth cohort, demographic factors, and clinical features. J Dev Behav Pediatr. 2014; 35(9):561-9. https://doi.org/10.1097/DBP.000000000000097 PMID: 25211371.\n\nWiggins LD, Baio J, Rice C. Examination of the time between first evaluation and first autism spectrum diagnosis in a population-based sample. Journal of Developmental and Behavioral Pediatrics. 2006; 27 (2):S79-S87. PMID: 16685189.\n\nGordon-Lipkin E, Foster J, Peacock G. Whittling Down the Wait Time: Exploring Models to Minimize the Delay from Initial Concern to Diagnosis and Treatment of Autism Spectrum Disorder. Pediatr Clin North Am. 2016; 63(5):851-9. https://doi.org/10.1016/j.pcl.2016.06.007 PMID: 27565363. PMCID:PMC5583718.\n\nHowlin P, Moore A. Diagnosis in autism: A survey of over 1200 patients in the UK. autism. 1997; 1 (2): $135-62$.\n\nKogan MD, Strickland BB, Blumberg SJ, Singh GK, Perrin JM, van Dyck PC. A National Profile of the Health Care Experiences and Family Impact of Autism Spectrum Disorder Among Children in the United States, 2005-2006. Pediatrics. 2008; 122(6):E1149-E58. https://doi.org/10.1542/peds.2008-1057 PMID: 19047216.\n\n\nContext: Diagnostic criteria, prevalence, and delays in autism spectrum disorder diagnosis.", "metadata": { "doc_id": "Tariq2018_40", "source": "Tariq2018" } }, { "page_content": "Text: Siklos S, Kerns KA. Assessing the diagnostic experiences of a small sample of parents of children with autism spectrum disorders. Res Dev Disabil. 2007; 28(1):9-22. https://doi.org/10.1016/j.ridd.2005.09. 003 PMID: 16442261.\n\nThomas KC, Ellis AR, Konrad TR, Holzer CE, Morrissey JP. County-level estimates of mental health professional shortage in the United States. Psychiatr Serv. 2009; 60(10):1323-8. https://doi.org/10. 1176/ps.2009.60.10.1323 PMID: 19797371.\n\nDawson G, Jones EJH, Merkle K, Venema K, Lowy R, Faja S, et al. Early Behavioral Intervention Is Associated With Normalized Brain Activity in Young Children With Autism. Journal of the American Academy of Child and Adolescent Psychiatry. 2012; 51(11):1150-9. https://doi.org/10.1016/j.jaac. 2012.08.018 PMID: 23101741. PMCID: PMC3607427.\n\nDawson G, Rogers S, Munson J, Smith M, Winter J, Greenson J, et al. Randomized, controlled trial of an intervention for toddlers with autism: the Early Start Denver Model. Pediatrics. 2010; 125(1):e17-23. https://doi.org/10.1542/peds.2009-0958 PMID: 19948568. PMCID: PMC4951085.\n\nLanda RJ. Efficacy of early interventions for infants and young children with, and at risk for, autism spectrum disorders. International Review of Psychiatry. 2018; 30(1):25-39. https://doi.org/10.1080/ 09540261.2018.1432574 PMID: 29537331. PMCID: PMC6034700.\n\nPhillips DA, Shonkoff JP. From neurons to neighborhoods: The science of early childhood development. Washington, D.C.: National Academies Press; 2000. https://doi.org/10.17226/9824 PMID: 25077268.\n\nDuda M, Daniels J, Wall DP. Clinical Evaluation of a Novel and Mobile Autism Risk Assessment. J Autism Dev Disord. 2016; 46(6):1953-61. https://doi.org/10.1007/s10803-016-2718-4 PMID: 26873142. PMCID: PMC4860199.\n\nDuda M, Kosmicki JA, Wall DP. Testing the accuracy of an observation-based classifier for rapid detection of autism risk. Transl Psychiatry. 2014; 4(8):e424. https://doi.org/10.1038/tp.2014.65 PMID: 25116834.\n\n\nContext: Studies on autism diagnosis, intervention, and related challenges.", "metadata": { "doc_id": "Tariq2018_41", "source": "Tariq2018" } }, { "page_content": "Text: Kosmicki JA, Sochat V, Duda M, Wall DP. Searching for a minimal set of behaviors for autism detection through feature selection-based machine learning. Translational Psychiatry. 2015; 5(2):e514. https:// doi.org/10.1038/tp.2015.7 PMID: 25710120. PMCID: PMC4445756.\n\nLevy S, Duda M, Haber N, Wall DP. Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism. Mol Autism. 2017; 8(1):65. https://doi.org/10.1186/ s13229-017-0180-6 PMID: 29270283. PMCID: PMC5735531.\n\nWall DP, Kosmicki J, DeLuca TF, Harstad E, Fusaro VA. Use of machine learning to shorten observa-tion-based screening and diagnosis of autism. Translational Psychiatry. 2012; 2(4):e100. https://doi. org/10.1038/tp.2012.10 PMID: 22832900. PMCID: PMC3337074.\n\nWall DP, Daily R, Luyster R, Jung JY, Deluca TF. Use of artificial intelligence to shorten the behavioral diagnosis of autism. PLoS One. 2012; 7(8):e43855. https://doi.org/10.1371/journal.pone.0043855 PMID: 22952789.\n\nWall DP, Kosmíscikí J, Deluca TF, Harstad L, Fusaro VA. Use of machine learning to shorten observa-tion-based screening and diagnosis of autism. Translational Psychiatry. 2012; 2(e100). https://doi.org/ 10.1038/tp.2012.10 PMID: 22832900. PMCID: PMC3337074.\n\nSchuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, et al. Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies. leee Transactions on Affective Computing. 2010; 1 (2):119-31. https://doi.org/10.1109/T-Affc. 2010.8\n\nBone D, Goodwin MS, Black MP, Lee CC, Audhkhasi K, Narayanan S. Applying machine learning to facilitate autism diagnostics: pitfalls and promises. J Autism Dev Disord. 2015; 45(5):1121-36. https:// doi.org/10.1007/s10803-014-2268-6 PMID: 25294649. PMCID: PMC4390409.\n\n\nContext: Studies utilizing machine learning to improve autism diagnosis and screening.", "metadata": { "doc_id": "Tariq2018_42", "source": "Tariq2018" } }, { "page_content": "Text: Bone D, Bishop SL, Black MP, Goodwin MS, Lord C, Narayanan SS. Use of machine learning to improve autism screening and diagnostic instruments: effectiveness, efficiency, and multi-instrument fusion. Journal of Child Psychology and Psychiatry. 2016; 57(8):927-37. https://doi.org/10.1111/jcpp. 12559 PMID: 27090613. PMCID: PMC4958551.\n\nBussu G, Jones EJH, Charman T, Johnson MH, Bultelaar JK, Team B. Prediction of Autism at 3 Years from Behavioural and Developmental Measures in High-Risk Infants: A Longitudinal Cross-Domain Classifier Analysis. Journal of Autism and Developmental Disorders. 2018; 48(7):2418-33. https://doi. org/10.1007/s10803-018-3509-s PMID: 29453709. PMCID: PMC5996007.\n\nFusaro VA, Daniels J, Duda M, DeLuca TF, D'Angelo O, Tamburello J, et al. The Potential of Accelerating Early Detection of Autism through Content Analysis of YouTube Videos. Plos One. 2014; 9(4): e93533. https://doi.org/10.1371/journal.pone.0093533 PMID: 24740236. PMCID: PMC3989176.\n\nFreund Y, Schapire RE, editors. Experiments with a new boosting algorithm. Icml; 1996 July 3, 1996; Bari, Italy. San Francisco, CA, USA: Morgan Kaufman Publishers Inc.; 1996.\n\nFreund Y, Mason L, editors. The alternating decision tree learning algorithm. icml; 1999 June 27, 1999; Bled, Slovenia. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.\n\nBehrend TS, Sharek DJ, Meade AW, Wiebe EN. The viability of crowdsourcing for survey research. Behav Res Methods. 2011; 43(3):800-13. https://doi.org/10.3758/s13428-011-0081-0 PMID: 21437749.\n\nDavid MM, Babineau BA, Wall DP. Can we accelerate autism discoveries through crowdsourcing? Research in Autism Spectrum Disorders. 2016; 32:80-3.\n\nOgunseye S, Parsons J, editors. What Makes a Good Crowd? Rethinking the Relationship between Recruitment Strategies and Data Quality in Crowdsourcing. Proceedings of the 16th AIS SIGSAND Symposium; 2017 May 19-20, 2017; Cincinnati, OH.\n\n\nContext: A review of machine learning applications in autism research, including studies utilizing behavioral data, YouTube content, and crowdsourcing techniques.", "metadata": { "doc_id": "Tariq2018_43", "source": "Tariq2018" } }, { "page_content": "Text: Swan M. Crowdsourced health research studies: an important emerging complement to clinical trials in the public health research ecosystem. J Med Internet Res. 2012; 14(2):e46. https://doi.org/10.2196/ jmir.1988 PMID: 22397809. PMCID: PMC3376509.\n\nZou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2005; 67(2):301-20. https://doi.org/10.1111/j.1467-9868. 2005.00503.x\n\n\nContext: Methods used to improve machine learning models for autism detection.", "metadata": { "doc_id": "Tariq2018_44", "source": "Tariq2018" } }, { "page_content": "Text: Deep-Learning-Based Detection of Infants with Autism Spectrum Disorder Using Auto-Encoder Feature Representation\n\nJung Hyuk Lee ${ }^{1}$, Geon Woo Lee ${ }^{1}$, Guiyoung Bong ${ }^{2}$, Hee Jeong Yoo ${ }^{2,3}$ and Hong Kook Kim ${ }^{1, * }$ 1 School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea; ljh0412@gist.ac.kr (J.H.L.); geonwoo0801@gist.ac.kr (G.W.L.) 2 Department of Psychiatry, Seoul National University Bundang Hospital, Seongnam-si, Gyeonggi-do 13620, Korea; 20409@snubh.org (G.B.); hjyoo@snu.ac.kr (H.J.Y.) 3 Department of Psychiatry, College of Medicine, Seoul National University, Seoul 03980, Korea Correspondence: hongkook@gist.ac.kr\n\nReceived: 29 October 2020; Accepted: 24 November 2020; Published: 26 November 2020 check for updates\n\nAbstract\n\n\nContext: A research article exploring the use of deep learning and auto-encoders to detect autism spectrum disorder in infants based on vocal features.", "metadata": { "doc_id": "LEE_0", "source": "LEE" } }, { "page_content": "Text: Received: 29 October 2020; Accepted: 24 November 2020; Published: 26 November 2020 check for updates\n\nAbstract\n\nAutism spectrum disorder (ASD) is a developmental disorder with a life-span disability. While diagnostic instruments have been developed and qualified based on the accuracy of the discrimination of children with ASD from typical development (TD) children, the stability of such procedures can be disrupted by limitations pertaining to time expenses and the subjectivity of clinicians. Consequently, automated diagnostic methods have been developed for acquiring objective measures of autism, and in various fields of research, vocal characteristics have not only been reported as distinctive characteristics by clinicians, but have also shown promising performance in several studies utilizing deep learning models based on the automated discrimination of children with ASD from children with TD. However, difficulties still exist in terms of the characteristics of the data, the complexity of the analysis, and the lack of arranged data caused by the low accessibility for diagnosis and the need to secure anonymity. In order to address these issues, we introduce a pre-trained feature extraction auto-encoder model and a joint optimization scheme, which can achieve robustness for widely distributed and unrefined data using a deep-learning-based method for the detection of autism that utilizes various models. By adopting this auto-encoder-based feature extraction and joint optimization in the extended version of the Geneva minimalistic acoustic parameter set (eGeMAPS) speech feature data set, we acquire improved performance in the detection of ASD in infants compared to the raw data set.\n\nKeywords: auto-encoder; bidirectional long short-term memory (BLSTM); joint optimization; acoustic feature extraction; autism spectrum disorder\n\n1. Introduction\n\n\nContext: This chunk presents a study utilizing deep learning, specifically an auto-encoder model, to improve the detection of autism spectrum disorder (ASD) in infants using acoustic speech features.", "metadata": { "doc_id": "LEE_1", "source": "LEE" } }, { "page_content": "Text: Keywords: auto-encoder; bidirectional long short-term memory (BLSTM); joint optimization; acoustic feature extraction; autism spectrum disorder\n\n1. Introduction\n\nAutism spectrum disorder (ASD) is a developmental disorder with a high probability of causing difficulties in social interactions with other people [1]. According to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), ASD involves several characteristics such as being confined to specific interests or behaviors, delayed linguistic development, and poor functionality in terms of communicating or functioning in social situations [2]. As there is wide variation in terms of the types and severities of ASD based on its characteristics, the disorder is referred to as a spectrum [1]. Not only does ASD have the characteristics of a developmental disorder with a life-span disability, but its prevalence is also increasing-from 1 in 150 children in 2000 to 1 in 54 children in 2016 [3]. As diverse evidence has been obtained from previous research showing that the chance of improvement in the\n\nsocial abilities of people with ASD increases when an earlier clinical intervention is performed [4], the early detection of ASD characteristics has become a key point of current ASD research.\n\nVarious instruments for discriminating ASD have been developed, and the commonly accepted gold standard schemes are behavioral assessments, which are time-consuming procedures and require multidisciplinary teams (MDTs). However, most behavioral assessments suffer in terms of the stability of their ASD diagnosis as a result of the issues of accessibility or subjectivity and interpretive bias between professions [5]. Therefore, several attempts to develop objective and precise diagnostic methods have been made in multiple fields, such as genetic determination [6], principle analysis of brain images [7], and physiological approaches [8].\n\n\nContext: This chunk introduces the research focus on early autism spectrum disorder (ASD) detection using objective methods, following a discussion of the disorder's characteristics, prevalence, and limitations of current behavioral assessment approaches.", "metadata": { "doc_id": "LEE_2", "source": "LEE" } }, { "page_content": "Text: One prominent area of behavioral observations is that of infants' vocal characteristics. Children with ASD are known to have abnormalities in their prosody resulting from deficits in their ability to recognize the inherent mental conditions of others [9], and their atypical vocalizations are known to be monotonous or exaggerated, which can be revealed using various acoustic characteristics, followed by engineering approaches for the discrimination of ASD or typical development (TD) in children based on the vocal and acoustic features. For example, in [10], the researchers estimated deficits in the vocalization of children with ASD at an average age of 18 months, such as \"flat\" intonation, atypical pitch, or control of volume based on the variability of pitch and the long-term average spectrum (LTAS) using fast Fourier transform, where significant differences were observed in the spectral components at low-band frequencies, as well as spectral peaks and larger pitch ranges and standard deviations. The development of linguistic abilities is also considered to be a distinguishable feature of delayed development in children with ASD. Earlier vocal patterns at age 6-18 months were proven to be differentiable in a study [11] that aimed to confirm the hypothetical vocal patterns and social quality of vocal behavior in order to differentiate between ASD and TD cohorts in groups of children aged 0-6, 6-12, and 12-18 months in terms of categorized speech patterns consisting of vocalization, long reduplicated babbling, two-syllable babbling, and first words. Evidence of abnormalities in children with ASD were shown, in these cases, as a significant decrease in vocalization and first word rate, while the difference in babbling ability between children with ASD and TD was negligible.\n\n\nContext: Automatic detection of Autism Spectrum Disorder (ASD) using vocal and acoustic features.", "metadata": { "doc_id": "LEE_3", "source": "LEE" } }, { "page_content": "Text: Given the development and improvement of machine learning algorithms, as the achievement in the performance of state-of-the-art classification and discrimination tasks [12], recent attempts to develop automated classification methods based on machine learning techniques have been based on the distinctiveness of vocal characteristics, and have been shown to be promising alternatives to the conventional methods in many publications [13]. For examples of machine learning classification, the researchers of [14] employed various acoustic-prosodic features, including fundamental frequency, formant frequencies, harmonics, and root mean square signal energy. In their research, support vector machines (SVMs) and probabilistic neural networks (PNNs) were adopted as classifiers, which showed effectual accuracy in discriminating children with ASD from children with TD. Meanwhile, the authors of [15] employed more recent deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) with spectral features from short-time Fourier transform (STFT) and constant Q transform (CQT), to classify children diagnosed using the autism diagnostic observation schedule (ADOS), also showing promising results in multiple outcomes from SVMs, RNNs, and a combination of CNN and RNN classifiers.\n\n\nContext: Machine learning techniques are being applied to analyze vocal characteristics for autism classification, building on advancements in machine learning algorithms and showing promise as alternatives to traditional methods.", "metadata": { "doc_id": "LEE_4", "source": "LEE" } }, { "page_content": "Text: A generalized acoustic feature set, an extended version of the Geneva minimalistic acoustic parameter set (eGeMAPS) [16], and the bidirectional long short-term memory (BLSTM) model were adopted to differentiate between children with ASD and children with TD in [17], showing that $75 \\%$ of the subjects' utterances were correctly classified with the simple application of a deep learning model and feature sets. While the quality of previous research based on various acoustic features has proven the effectiveness of acoustic features and classification algorithms for the detection of abnormalities in children's voices in ASD group compared to those of TD group, the complexity and relationship being inherent between the features will remain uncertain until a large amount of data can be accumulated. Furthermore, a limitation still remains in terms of the problems regarding data collection, since there are\n\ndifficulties pertaining to the need to secure the anonymity of infant subjects, as well as the unintended ignorance of parents at earlier stages of their infant's development. The data of infants are, accordingly, dispersed by gender, age, and number of vocalizations, or consist of comparably small volumes of audio engineering data in general. These problems were typically overlooked by previous research with controlled and small amounts of data.\n\n\nContext: Review of previous research on using acoustic features and deep learning models to detect autism spectrum disorder (ASD) in children, highlighting limitations in data collection and the need for larger datasets.", "metadata": { "doc_id": "LEE_5", "source": "LEE" } }, { "page_content": "Text: In order to provide suggestions for a method to overcome the abovementioned restrictions, we focus on examining the feasibility of neural networks as a feature extractor, employing an auto-encoder (AE), which can modify acoustic features into lowered and separable feature dimensions [18]. We construct a simple six-layered stacked AE that contains an input layer, three fully connected (FC) layers, an output layer, and one auxiliary output layer, which has categorical targets for ASD and TD for the optimization of the latent feature space of the AE. We train the AE and deep learning models and compare the results for each model based on SVMs and vanilla BLSTM, while adopting the same model parameters from the method suggested in [17].\n\nThe remainder of this paper is organized as follows. Section 2 describes the specifications of the participants' data, data processing, feature extraction, statistical analysis, and experimental setup. Section 3 presents the performance evaluations for each algorithm of the SVMs and vanilla BLSTM. Lastly, Section 4 concludes the paper.\n\n2. Proposed Method\n\n2.1. Data Collection and Acoustic Feature Extraction\n\n\nContext: The authors propose using neural networks, specifically auto-encoders, to overcome limitations in feature extraction for autism detection and detail their proposed method and paper organization.", "metadata": { "doc_id": "LEE_6", "source": "LEE" } }, { "page_content": "Text: 2. Proposed Method\n\n2.1. Data Collection and Acoustic Feature Extraction\n\nThis study was based on the audio data from video recordings of ASD diagnoses, which were collected from 2016 to 2018 at Seoul National University Bundang Hospital (SNUBH). We received approval from the Institutional Review Board (IRB) at SNUBH to use fully anonymized data for retrospective analysis (IRB no: B-1909/567-110) from existing research (IRB no: B-1607/353-005). We collected the audio data of 39 infants who were assessed using seven multiple instruments, consisting of (1) ADOS, second edition (ADOS-2), (2) the autism diagnostic interview, revised (ADI-R), (3) the behavior development screening for toddlers interview (BeDevel-I), (4) the behavior development screening for toddlers play (BeDevel-P), (5) the Korean version of the childhood autism rating scale (K-CARS) refined from CARS-2, (6) the social communication questionnaire (SCQ), and (7) the social responsiveness scale (SRS) [19-22]. The final diagnosis was based on the best clinical estimate diagnosis according to the DSM-5 ASD criteria by a licensed child psychiatrist using all of the available participant information. The participants' ages ranged between 6 and 24 months, where the average age was 19.20 months with a standard deviation (SD) of 2.52 months. Note here that the age means the age at the time when each infant visited the hospital to undergo an initial diagnosis examination. There were four males and six females diagnosed with ASD, whose average age was 14.72 months with a SD of 2.45 . The remaining participants consisted of TD children ( 19 males and 10 females). Table 1 displays the collected data distribution, while Table 2 shows detailed information of collected data from the infants.\n\nTable 1. Distribution of age and gender (male/female).\n\n\nContext: Methods section describing data collection and acoustic feature extraction for a study on ASD diagnosis using audio data.", "metadata": { "doc_id": "LEE_7", "source": "LEE" } }, { "page_content": "Text: Table 1. Distribution of age and gender (male/female).\n\nAges (Month) No. of Subjects Diagnosed as ASD No. of Subjects Diagnosed as TD No. of Infant Subjects 6-12 months 0 $5 \\mathrm{M} / 1 \\mathrm{~F}$ $5 \\mathrm{M} / 1 \\mathrm{~F}$ 12-18 months $1 \\mathrm{M} / 3 \\mathrm{~F}$ $14 \\mathrm{M} / 9 \\mathrm{~F}$ $15 \\mathrm{M} / 12 \\mathrm{~F}$ 18-24 months $3 \\mathrm{M} / 3 \\mathrm{~F}$ 0 $3 \\mathrm{M} / 3 \\mathrm{~F}$ Age (average $\\pm$ SD) $19.20 \\pm 2.52$ $14.72 \\pm 2.45$ $15.92 \\pm 3.17$\n\nTable 2. Detailed information on the age, gender, and initial and definite diagnosis dates of each infant in Table 1.\n\n\nContext: Describing the demographics and diagnostic timelines of infants participating in a study using machine learning to detect autism spectrum disorder.", "metadata": { "doc_id": "LEE_8", "source": "LEE" } }, { "page_content": "Text: Table 2. Detailed information on the age, gender, and initial and definite diagnosis dates of each infant in Table 1.\n\nInfant ID Age (Months) on Initial Diagnosis Date Gender Initial Diagnosis Date (Year/Month/Day) Definite Final Diagnosis Date (Year/Month/Day) ASD/TD 1 18 Male 2018/07/28 2018/08/28 TD 2 18 Male 2017/07/27 2017/08/27 TD 3 10 Male 2018/08/10 2018/09/10 TD 4 13 Male 2017/06/10 2017/07/10 TD 5 22 Female 2018/01/31 2018/02/28 ASD 6 16 Male 2018/03/17 2018/04/17 TD 7 17 Female 2018/06/30 2018/07/30 TD 8 14 Female 2018/01/06 2018/02/06 TD 9 18 Male 2018/07/17 2018/08/17 TD 10 14 Male 2017/11/04 2017/12/04 TD 11 17 Female 2017/06/29 2017/07/29 ASD 12 12 Female 2018/01/20 2018/02/20 TD 13 9 Male 2017/02/18 2017/03/18 TD 14 18 Female 2017/03/04 2017/04/04 ASD 15 18 Male 2018/05/19 2018/06/19 TD 16 24 Female 2018/08/08 2018/09/08 ASD 17 19 Male 2018/02/24 2018/03/24 ASD 18 19 Male 2017/04/18 2017/05/18 ASD 19 18 Female 2017/03/04 2017/04/04 TD 20 12 Male 2016/12/31 2017/01/31 TD 21 16 Female 2018/03/16 2018/04/16 TD 22 20 Male 2017/10/14 2017/11/14 ASD 23 15 Male 2018/05/09 2018/06/09 ASD 24 17 Female 2017/02/04 2017/03/04 TD 25 16 Male 2018/03/17 2018/04/17 TD 26 12 Male 2018/03/29 2018/04/29 TD 27 17 Female 2017/01/25 2017/02/25 TD 28 17 Male 2018/02/08 2018/03/08 ASD 29 14 Male 2018/01/13 2018/02/13 TD 30 16 Male 2016/11/30 2016/12/30 TD 31 12 Male 2017/03/22 2017/04/22 TD 32 15 Male 2017/03/11 2017/04/11 TD 33 16 Male 2017/12/05 2018/01/05 TD 34 13 Female 2017/12/13 2018/01/13 TD 35 15 Female 2017/03/25 2018/04/25 TD 36 13 Male 2018/08/25 2018/09/25 TD 37 21 Male 2017/06/24 2017/07/24 ASD 38 14 Male 2017/02/22 2017/03/22 TD 39 14 Male 2018/01/27 2018/02/27 TD\n\n\nContext: Detailed demographic and diagnostic information for study participants.", "metadata": { "doc_id": "LEE_9", "source": "LEE" } }, { "page_content": "Text: As each infant's audio data were recorded during the clinical procedure to elicit behaviors from infants, with the attendance of one doctor or clinician and one or both parents with the child in the clinical area, the audio components consisted of various speeches from the child, the clinician, and the parent(s), as well as noises from toys or dragging chairs. Note here that the recordings were done in one of two typical clinical rooms in SNUBH, where the room dimensions were $365 \\mathrm{~cm} \\times 400 \\mathrm{~cm} \\times 270 \\mathrm{~cm}$ and $350 \\mathrm{~cm} \\times 350 \\mathrm{~cm} \\times 270 \\mathrm{~cm}$, and the hospital noise level was around 40 dB . In order to analyze the vocal characteristics of the infants, each audio clip was processed and split into audio segments containing the infant's voice, not disturbed by music or clattering noises from toys or overlapped by the voices of the clinician or parent(s). Each segment was classified into one of five categories, labeled from 0 to 4 , for measuring the data distribution. Each label was intended to show differentiable characteristics relative to the children's linguistic development: (1) 0 for one syllable, which is a short,\n\n\nContext: Data collection and preprocessing of infant audio recordings for autism detection.", "metadata": { "doc_id": "LEE_10", "source": "LEE" } }, { "page_content": "Text: momentary single vocalization such as \"ah\" or \"ba\"; (2) 1 for two syllables, commonly denoted as canonical babbling, as a reduplication of clear babbling of two identical or variant syllables such as \"baba\" or \"baga\"; (3) 2 for babbling, not containing syllables; (4) 3 for first word, such as \"mother\" or \"father\"; and (5) 4 for atypical voice, including screaming or crying. The distribution of each type of vocalization in seconds is shown in Table 3. The number of vocalizations per category is presented along with a rational value considering the difference between the ASD and TD groups. While the data were unbalanced and very small, the distribution of ASD and TD vocalizations show the same tendency as reported in [10], where the ASD group showed a significantly lower ratio of first words and an increased ratio of atypical vocalizations, revealing developmental delay in linguistic ability.\n\nTable 3. Amount (ratio) of each type of vocalization in seconds.\n\nVocal Label ASD TD 0 80.134 (0.104) 267.897 (0.250) 1 314.405 (0.409) 443.498 (0.414) 2 33.241 (0.043) 34.766 (0.032) 3 8.311 (0.011) 57.286 (0.054) 4 333.400 (0.433) 266.794 (0.249) Total 769.491 1070.241\n\n\nContext: Analysis of vocalization patterns in children with autism spectrum disorder (ASD) compared to typically developing (TD) children.", "metadata": { "doc_id": "LEE_11", "source": "LEE" } }, { "page_content": "Text: For acquiring qualified and effective feature sets for the vocal data, eGeMAPS was employed for voice feature extraction. GeMAPS is a popular feature set providing minimalistic speech features generally utilized for automatic voice analysis rather than as a large brute force parameter set. As an extended version, eGeMAPS contains 88 acoustic features that were fully utilized in this experiment. Each recorded set of audio data stored as a 48 kHz stereo file was down-sampled and down-mixed into a 16 kHz mono-audio file, taking into consideration its usability and resolution in mel-frequency cepstral coefficients (MFCCs). To extract the speech features for ASD classification, each infant's utterances were segmented into 25 ms frames with a 10 ms overlap between frames. Then, 88 different features of the eGeMAPS were extracted for each frame with open source speech and music interpretation using the large-space extraction (OpenSMILE) toolkit [23], and these features were normalized by mean and standard deviation. The normalization scaling was acquired and fixed by normalizing the factors of the training data set. The features were grouped for each five frames considering the time-relevant characteristics of the speech data.\n\n2.2. Pre-Trained AE for Acoustic Features\n\nTo further process and refine the acoustic data, a feature-extracting AE was introduced. An AE is a hierarchical structure that is trained as a regression model for reproducing the input parameters. The AE takes inputs and converts them into latent representations, and then reconstructs the input parameters from the latent values [24]. If we consider an input of $A E, \\boldsymbol{x} \\in R^{d}$, then the latent representation $\\boldsymbol{z} \\in R^{d^{r}}$ and the reconstruction of the input $\\boldsymbol{y} \\in R^{d}$ are obtained by applying a nonlinear activation function $f$ to the weight sum of $\\boldsymbol{z}$ using a weighting matrix $\\boldsymbol{W} \\in R^{d \\times d^{r}}$ and a bias vector $\\boldsymbol{b} \\in R^{d^{r}}$, such as\n\n\nContext: This section details the methodology for extracting and processing acoustic features from infant vocalizations to classify Autism Spectrum Disorder (ASD), specifically outlining the use of eGeMAPS and a pre-trained autoencoder.", "metadata": { "doc_id": "LEE_12", "source": "LEE" } }, { "page_content": "Text: $$ \\begin{aligned} & \\boldsymbol{z}=f\\left(\\boldsymbol{W}^{\\boldsymbol{T}} \\boldsymbol{x}+\\boldsymbol{b}\\right) \\ & \\boldsymbol{y}=f\\left(\\boldsymbol{W}^{\\boldsymbol{T}} \\boldsymbol{z}+\\boldsymbol{b}^{\\prime}\\right) \\end{aligned} $$\n\nwhere $T$ is a matrix transpose operator. When the latent dimension $d^{r}