class: center, middle, inverse, title-slide .title[ # Visualization and Modeling of Multivariate Data in Environmental Applications ] .subtitle[ ## PhD Final Examination ] .author[ ### Alison Kleffner ] .date[ ### Department of Statistics, University of Nebraska - Lincoln ] --- class:primary <style> /* colors: #EEB422, #8B0000, #191970, #00a8cc */ /* define the new color palette here! */ a, a > code { color: #8B0000; text-decoration: none; } .title-slide h2::after, .mline h1::after { content: ''; display: block; border: none; background-color: #8B0000; color: #8B0000; height: 2px; } .remark-slide-content { background-color: #FFFFFF; border-top: 80px solid #8B0000; font-size: 20px; font-weight: 300; line-height: 1.5; <!-- padding: 1em 2em 1em 2em --> background-image: url(css/UNL.svg); background-position: 2% 98%; background-size: 10%; border-bottom: 0; } .inverse { background-color: #8B0000; <!-- border-top: 20px solid #696969; --> <!-- background-image: none; --> <!-- background-position: 50% 75%; --> <!-- background-size: 150px; --> } .remark-slide-content > h1 { font-family: 'Roboto'; font-weight: 300; font-size: 45px; margin-top: -95px; margin-left: -00px; color: #FFFFFF; } .title-slide { background-color: #FFFFFF; <!-- border-left: 80px solid #8B0000; --> background-image: url(css/UNL.svg); background-position: 98% 98%; <!-- background-attachment: fixed, fixed; --> background-size: 20%; border-bottom: 0; border: 10px solid #8B0000; <!-- background: transparent; --> } .title-slide > h1 { color: #111111; font-size: 32px; text-shadow: none; font-weight: 500; text-align: left; margin-left: 15px; padding-top: 80px; } .title-slide > h2 { margin-top: -25px; padding-bottom: -20px; color: #111111; text-shadow: none; font-weight: 100; font-size: 28px; text-align: left; margin-left: 15px; } .title-slide > h3 { color: #111111; text-shadow: none; font-weight: 100; font-size: 28px; text-align: left; margin-left: 15px; margin-bottom: -20px; } body { font-family: 'Roboto'; font-weight: 300; } .remark-slide-number { font-size: 13pt; font-family: 'Roboto'; color: #272822; opacity: 1; } .inverse .remark-slide-number { font-size: 13pt; font-family: 'Roboto'; color: #FAFAFA; opacity: 1; } .title-slide-custom .remark-slide-number { display: none; } .title-slide-custom h3::after, .mline h1::after { content: ''; display: block; border: none; background-color: #8B0000; color: #8B0000; height: 2px; } .title-slide-custom { background-color: #FFFFFF; <!-- border-left: 80px solid #8B0000; --> background-image: url(css/UNL.svg); background-position: 98% 98%; <!-- background-attachment: fixed, fixed; --> background-size: 20%; border-bottom: 0; border: 10px solid #8B0000; <!-- background: transparent; --> } .title-slide-custom > h1 { color: #111111; font-size: 40px; text-shadow: none; font-weight: 500; text-align: left; margin-left: 15px; padding-top: 80px; padding-bottom: 10px; } .title-slide-custom > h2 { margin-top: -25px; padding-bottom: 30px; color: #111111; text-shadow: none; font-weight: 100; font-size: 32px; text-align: left; margin-left: 15px; } .title-slide-custom > h3 { margin-top: -25px; padding-bottom: -25px; color: #111111; text-shadow: none; font-weight: 100; font-size: 32px; text-align: left; margin-left: 15px; } .title-slide-custom > h4 { color: #111111; text-shadow: none; font-weight: 100; font-size: 28px; text-align: left; margin-left: 15px; margin-bottom: -30px; padding-bottom: -25px; } .title-slide-custom > h5 { color: #111111; text-shadow: none; font-weight: 100; font-size: 24px; text-align: left; margin-left: 15px; margin-bottom: -40px; } <!-- img { --> <!-- max-width: 50%; --> <!-- } --> </style> # Outline
Overall Motivation
Redesigning Yield Map Plots for Comprehension and Usability
Visual Diagnostics for Trajectory Data
Spatio-Temporal Model for Arctic Sea Ice
Overall Conclusion
References --- class:primary # Overall Motivation The rapid development of technology, like global position systems (GPS) and geographic information systems (GIS), has led to a dramatic increase in the amount of spatial and spatio-temporal data collected (Ansari, Ahmad, Khan, Bhushan, and others, 2020) + This growth has necessitated the development of new techniques to work with this data (Yuan, Sun, Zhao, Li, and Wang, 2017) + We focus on environmental applications with multivariate spatial data and trajectory data. ??? The development of technology like global positioning systems and geographic information systems has dramatically increased the amount of spatial and spatio-temporal data that are able to be easily collected. Hence, new techniques have been needed to work with this data that accurately account for relationships over space and through time, while. Further, the new techniques need to be able to work with large amounts of data. This data can be broken into different sub-categories, so for this work we focus on multivariate spatial data and trajectories used in different applications related to the environment. --- class:inverse <br> <br> <br> <br> <br> <br> <br> <br> .center[ # Multivariate Spatial Data: Redesigning Yield Map Plots for Comprehension and Usability ] --- class:primary # Spatial Data Spatial data relates to a geographical area or location. + Often, more than one attribute is measured at a location. **Focus:** However, due to the variables occupying the same space, multivariate spatial data is complex + Difficulties in visualization due to issues like clutter (He, Tao, Wang, and Lin, 2019) **Example:** Visualization of crop input application versus crop yield ??? This first project focuses on spatial data, specifically multivariate spatial data. Spatial data relates to a geographical area or location. Following Tobler's law of geography says that all spatial data is related, but near objects are more related than distant objects. Often more than one temperature is measured at a location, like rainfall and temperature across the state of Nebraska. Not only can accurately account for the relationship though space be difficult, but the variables in a multivariate spatial data set occupying the same space adds to the complexity. This makes visualization especially difficult to issues like clutter. In this project I explore spatial visualizations for deriving a relationship between two variables, specifically crop input application and crop yield. --- class:primary #Background + With a projected increase in future crop demand, researchers are conducting studies on crop input application to increase yield, focusing on sustainability (Tilman, Balzer, Hill, and Befort, 2011) + Example: Nitrogen Fertilizer - Nitrogen is an essential component of food production as is allows plants to photosynthesize efficiently (Maheswari, Murthy, and Shanker, 2017) - Nearly half of the nitrogen fertilizer supplied to the field is not used by the crops (Billen, Garnier, and Lassaletta, 2013) - This excess nitrogen can be harmful **Data Intensive Farm Management (DIFM)**: Conduct On-Farm Precision Experiments to find economically optimal application rate to increase profit while reducing environmental impacts. .center[ <!-- Trigger the Modal --> <img id='imglogo' src='images/logo.png' alt='Data Intensive Farm Management (DIFM) Project' width='30%'> <!-- The Modal --> <div id='modallogo' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodallogo'> <!-- Modal Caption (Image Text) --> <div id='captionlogo' class='modal-caption'></div> </div> ] ??? I'm going to begin by motivating why we want to understand this relationship. There is a projected increase in future crop demand, so studies are be conducted on crop input application to increase yield, with a focus on sustainable practices. As an example, a common crop input in nitrogen fertilizer as nitrogen is an essential component of food production, as it allows plants to photosynthesize efficiently. However, nearly half of the fertilizer supplied to the field is not used by the crops. This excess nitrogen can be harmful, as an example, the excess nitrogen may get washed in the rain into nearby streams, polluting drinking water. So a better understanding of this relationship is needed to not use more crop input than necessary. --- class:primary # Trial Design and Data Collection .pull-left[ **Step 1**: Design experiment using site-specific inputs [Trial Design Tool](http://trialdesign.difm-cig.org/) .center[ <!-- Trigger the Modal --> <img id='imgTrial_Design' src='images/Trial_Design.png' alt='Example of A Trial Design' width='75%'> <!-- The Modal --> <div id='modalTrial_Design' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalTrial_Design'> <!-- Modal Caption (Image Text) --> <div id='captionTrial_Design' class='modal-caption'></div> </div> ] **Output**: Shape files that can be put into a farmer's tractor that allows them to carry out the experiments ].pull-right[ **Step 2**: Conduct experiments and collect data .center[ <!-- Trigger the Modal --> <img id='imgdata_collection' src='images/data_collection.png' alt='How Data is collected' width='45%'> <!-- The Modal --> <div id='modaldata_collection' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodaldata_collection'> <!-- Modal Caption (Image Text) --> <div id='captiondata_collection' class='modal-caption'></div> </div> ] **Collected Data**: - As-applied - Yield - Location of measurements ] ??? Next, I am going to talk about the data we used and how it was collected. So, the first step was to develop infrastructure that allows for the development of experiments using site-specific inputs. After the user is satisfied with the trial design, they can download shape files that can be put into a farmer's tractor that allows them to carry out the experiments. The data we used in our plots includes the original trial design. Additionally the as-applied data was collected, or the amount of crop input that was actually applied to the field. The clear circles on this image are locations where the tractor used the crop input. So we are able to obtain the actual amount applied and the location of the measurement. Finally, after harvest we are able to obtain the yield measurements and their locations (yellow dots). --- class:primary # Explain the Results **Step 3**: Explain the optimal management decisions **How**: Machine learning models to learn how crop yield responds to different input application rates, field characteristics, and weather - Build trust in models **One way to do this**: Visually explore the relationship between input application and yield - Show the spatial correlations between the application/treatment and yield in a way that is understandable to farmers and consultants. - Develop perceptually optimal plots that communicate this relationship. **Next**: Sub-optimal design choices in current plots ??? The last step in this process is to analyze the data and explain the optimal decision management decisions. Eventually we want to develop an user interface that is designed around explaining the machine learning output to non-experts to help inform decision. We want to do this to build farmer's trust in the models. Additionally, as a whole this will help us learn how crop yield responds to different input application rates, field characteristics, and weather with the goal to help increase profits. One way to help build trust in model output is to allows the farmers to visually explore the relationship between crop input application and yield. There is a great benefit of plotting data, so we wanted to explore this relationship visually. Hence, we want to show the spatial correlations between the treatment application and yield in a way that is understandable to farmers and consultants. Additionally, we need to develop these plots so they are perceptually optimal. --- class:primary #Layout: Superimposed Graphs **Benefits**: Easier to compare as users can use perception rather than memory - Useful when spatial location is a key component of the comparison (Wang, Haleem, Shi, Wu, Zhao, Fu, and Qu, 2018) **Drawback**: Clutter .pull-left[ .center[ <!-- Trigger the Modal --> <img id='imgold_map' src='images/old_map.png' alt='Example 1' width='70%'> <!-- The Modal --> <div id='modalold_map' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalold_map'> <!-- Modal Caption (Image Text) --> <div id='captionold_map' class='modal-caption'></div> </div> ] ].pull-right[ + Multiple dots on top of one another - Obscures true number of dots, harder to find patterns - Visual cues, like color, becomes partially obstructed, reducing search efficiency (Bravo and Farid, 2004a; Bravo and Farid, 2004b) - Overburdens human perception, causing errors in performing tasks (Huang, Eades, and Hong, 2009) ] ??? So in more detail, the superimposed graphs violate a general principle, which is to show the data clearly. As we can seen in this plot, multiple of the yield measurements are found on top of one another, which obscures the true number of points and make it more difficult to find a pattern. Obscuring the visual cue of color reduces search efficiency. Further, the clutter can overburden the human perceptual system, causing errors in deriving a relationship between the variables. --- class:primary #Layout: Juxtaposed Graphs **Benefits**: Less issues with visual clutter and easier to create (Gleicher, Albers, Walker, Jusufi, Hansen, and Roberts, 2011) **Drawback**: Most of the comparative burden placed on users' memory + A mental image is relied on for comparison, as the user moves their eyes between images (shifting focus). - The plot contents may not be accurately formed in working memory, leading to potential errors when deriving patterns (Vanderplas, Cook, and Hofmann, 2020; LYi, Jo, and Seo, 2021) + Lack of visual cues for locations .center[ <!-- Trigger the Modal --> <img id='imgjuxtaposedex' src='images/juxtaposed-ex.jpeg' alt='Juxtaposed Example' width='35%'> <!-- The Modal --> <div id='modaljuxtaposedex' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodaljuxtaposedex'> <!-- Modal Caption (Image Text) --> <div id='captionjuxtaposedex' class='modal-caption'></div> </div> ] ??? With Juxtaposed graphs, once again, most of the comparative burden is place on the user's memory. A mental image is relied upon as the user shifts their focus between the plots. So if the plot contents are not accurately formed in working memory, the user may incorrectly derive patterns. Also, in spatial visualizations, the lack of visual cues adds to this complexity. --- class:primary #Color Schemes .pull-left[ + Red-green color blindness - Experienced by approximately 8% of men and 0.5% of women of Northern European Ancestory (Wong, 2011) - Difficult to discriminate between these colors (Wong, 2011) + Same Color scheme for multiple variables - May cause confusion + Rainbow Color Scheme - No inherent ordering of magnitude (Light and Bartlein, 2004) - Extremes are visually close (Silva, Sousa Santos, and Madeira, 2011) ].pull-right[ .center[ <!-- Trigger the Modal --> <img id='imgtrevisan' src='images/trevisan.png' alt='Trevisan et al (2021)' width='45%'> <!-- The Modal --> <div id='modaltrevisan' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodaltrevisan'> <!-- Modal Caption (Image Text) --> <div id='captiontrevisan' class='modal-caption'></div> </div> <!-- Trigger the Modal --> <img id='imgmaxwell' src='images/maxwell.png' alt='Maxwell et al (2018)' width='45%'> <!-- The Modal --> <div id='modalmaxwell' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalmaxwell'> <!-- Modal Caption (Image Text) --> <div id='captionmaxwell' class='modal-caption'></div> </div> ] ] **Next:** Redesign Process ??? Red-green color blindness is experienced by approximately 8% of men and 0.5% of northern european ancestory Another common issue among these graphs is choosing a good color palette. First, red-green color blindness makes it difficult to discriminate between these colors, so we should not use them in conjunction, which many of the plots did. Next, the stop-light color scheme should be avoided, as first yellow can have a highlighting effect. Also, a univariate scale would be more appropriate since we are only working with magnitude of our variables. Third, using the same color scheme for multiple variables should be avoided as this may cause confusion. Finally, rainbow color schemes should not be used as they have no inherent ordering of magnitdue and as we saw previously, the extremes of red and violet are visually close. --- class:primary #Redesign: Color Blending .center[ <!-- Trigger the Modal --> <img id='imgAttempt2new' src='images/Attempt2-new.png' alt='Second Iteration' width='40%'> <!-- The Modal --> <div id='modalAttempt2new' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalAttempt2new'> <!-- Modal Caption (Image Text) --> <div id='captionAttempt2new' class='modal-caption'></div> </div> ] **Focus**: Superimpose the treatment and yield plots, while reducing the clutter + Yield Polygons + Transparency to show both at same time + Change Colors ??? I chose to use superimpose plots as spatial location is a key component of our comparison, and we are already asking our user to do a complicated task. So the focus turns to reducing the visual clutter. My first iteration used the yield points that were transformed into polygons using the distance between points, swath width, and heading (green space around plot). These are the same polygons used in the juxtaposed DIFM plots, so now we have non-overlapping polygons of the yield. Through the literature a common suggestion for these plots is to use transparency to try and blend the colors. So I introduced transparency to the trial design layer to help blend the colors. I started with using similar colors schemes as the original DIFM plots to help obtain buy in (first plot). However, due to potentially having users who are red-green. A suggestion by Wong (2011) is to change green to blue, so that was what was done here. So at least one of the colors are the default. However, in my personal opinion, the blending needs more work and deriving relationships is difficult. --- class:primary #Redesign: Bivariate Color Plot **Alternative to color blending** .center[ .pull-left[ <!-- Trigger the Modal --> <img id='imgcolormap4' src='images/color-map4.png' alt='Bivariate Color Map' width='100%'> <!-- The Modal --> <div id='modalcolormap4' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalcolormap4'> <!-- Modal Caption (Image Text) --> <div id='captioncolormap4' class='modal-caption'></div> </div> ]].pull-right[ + **Benefit**: Relationship between the variables is most important (Elmer, 2013) + Recommendation: 3x3 scale (Leonowicz, 2003) - + Quantiles (Biesecker, Zahnd, Brandt, Adams, and J.M, 2020) + **Focus**: Diagonal - Diagonal: grayscale color scheme. - Upper left and lower right: complementary color scheme (Strode, Morgan, Thornton, Mesev, Rau, Shortes, and Johnson, 2020) + **Drawback**: Lose more detailed information ] ??? A suggestion for an alternative to color blending is a bivariate color plot. A benefit of this plot is that it is helpful if showing the relationship between variables is more important. Leonowicz (2003) suggests a 3x3 scale, where the data was split using quantiles as suggested by Bieskecker (2020), as they found quantiles lead to more perceptual visual groupings. In this plots, the focus is on the diagonal, so a grayscale color scheme was used here, and the lower/upper corners have complementary color schemes. Typically focused on the diagonal and two corners, so we are still in the 5-7 range even though there are 9 categories technically. However, drawbacks include losing detailed information since a large range of numbers is only split into three categories. --- class:primary #Redesign: Correlation **Directly encode correlations between As Applied treatments and Yield** .pull-left[ **Benefit**: Direct statement of correlation while maintaining some spatial orientation - Explicit Encoding Layout (Gleicher, Albers, Walker, et al., 2011) + Maintain some spatial information. + Correlations may be impacted by field location **Drawback**: Complicated to connect the displayed relationship back to the data (Gleicher, Albers, Walker, et al., 2011) ].pull-right[ <!-- Trigger the Modal --> <img id='imgcorrplot' src='images/corr-plot.png' alt='Correlation' =''> <!-- The Modal --> <div id='modalcorrplot' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalcorrplot'> <!-- Modal Caption (Image Text) --> <div id='captioncorrplot' class='modal-caption'></div> </div> ] ??? Finally, we thought it would be important to directly state the relationship between the variables to the user, so they don't have to derive it themselves. Another comparative layout, explicit encoding, directly gives the relationship to the users. So we calculated the correlation within each trial plot between the as-applied data and yield. The trial plot is to help maintain some spatial orientation. I chose a blue-white-red color scheme due to the association of blue with coolness/negative and red with warmth/positive (like a temperature map). This is a diverging color scale, with white in the middle as the neutral value. Can see for the most part, as application amount increase, so does the yield. Some areas where the inverse is true (more investigation, maybe lower, dirt differs, etc). Used a bivariate color scheme go through white color for zero. Maintaining some spatial information is important as correlation is impacted by field location. However, a major drawback is that it doesn't state the data, so we know what the correlation is, but not the values used to find it. --- class:primary # Redesign: Correlation with Scatterplot **Add some context back** .center[ <!-- Trigger the Modal --> <img id='imgcorrwithscat2' src='images/corr-with-scat2.png' alt='Hybrid Layout' width='40%'> <!-- Trigger the Modal --> <img id='imgcorrwithscat3' src='images/corr-with-scat3.png' alt='Hybrid Layout with Hover' width='40%'> <!-- The Modal --> <div id='modalcorrwithscat2' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalcorrwithscat2'> <!-- Modal Caption (Image Text) --> <div id='captioncorrwithscat2' class='modal-caption'></div> </div> <!-- The Modal --> <div id='modalcorrwithscat3' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalcorrwithscat3'> <!-- Modal Caption (Image Text) --> <div id='captioncorrwithscat3' class='modal-caption'></div> </div> ] + A standard practice to overcome weakness of decontextualization is utilizing a hybrid comparative layout (LYi, Jo, and Seo, 2021) - Juxtaposed scatterplot to the correlation plot + Interactivity to connect the plots - Hovering over a trial plot highlights the corresponding points in the scatterplot used in the correlation calculation. [Link](https://alisonkleffner.github.io/yield-map-redesign/interactive-example.html) ??? A standard practice to overcome the decontextualization of the explicit encoded layout is using a hybrid layout. Here we juxtaposed a scatterplot of the data used to calculate the correlation to the correlation plot. Interactivity connects the two plots, where hovering over a trial plot in the correlation map highlights the corresponding points in the scatterplot. --- class:primary #Conclusion/Future Work **Next Step**: Obtain Feedback from those using the plots (farmers, crop consultants) - Eventually do some testing between the layouts to see which farmers are reading more accurately. **Eventually**: Develop a R Shiny app to explaining machine learning output to non-experts - Build trust in the model predictions without requiring farmers to learn the details of statistical modeling. - Will utilize these plots, among others ??? The next step is to obtain feedback from those that would be actually using the plots to make any edits. And eventually I would like to do some testing between the color blending, color map, and juxtaposed correlation and scatterplot to see which are read more accurately. As I saw some mixed reviews between color blending and the color map. Eventually the plot with be utilized in an R shiny app that explains machine learning output to non-experts to help build trust in the results. --- class:inverse <br> <br> <br> <br> <br> <br> <br> <br> .center[ # Trajectories ] --- class:primary # What are Trajectories? Trajectories are considered the most complex form of data based on points, but are becoming more easily available (Kisilevich, Mansmann, Nanni, and Rinzivillo, 2009; Rinzivillo, Pedreschi, Nanni, and Giannotti, 2008) + Complexities: - Difficult to visualize effectively - Diverse set of properties - Different lengths .center[ <!-- Trigger the Modal --> <img id='imgtrajectory' src='images/trajectory.png' alt='Example of a Trajectory from Ansari et al. (2020)' width='50%'> <!-- The Modal --> <div id='modaltrajectory' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodaltrajectory'> <!-- Modal Caption (Image Text) --> <div id='captiontrajectory' class='modal-caption'></div> ] ??? Now we are going to shift gears and focus on a type of spatio-temporal data, trajectories. Trajectories are complex to work with due to multiple factors, for example, having a diverse set of properties (speed) and different length. Additionally, due to have dimensions in space and time, they are difficult to visualize effectively. However, trajectories are becoming more easily available due to tracking devices, so better ways to work with them are necessary. --- class:inverse <br> <br> <br> <br> <br> <br> <br> <br> .center[ # Trajectory Visual Diagnostics ] --- class:primary # Background **Interest**: Discover patterns within the trajectories to help understand their behavior (Andrienko, Andrienko, and Wrobel, 2007) **A Method**: Visualization during Exploratory Data Analysis (EDA): + Trajectories are difficult to visualize (messy) + Provide insight into the underlying dynamics driving movement. **Extract Features**: Since trajectories are complex, so we can extract features from the raw data (Harold, Lorenzoni, Shipley, and Coventry, 2016) + Can use visualization to motivate feature creation + Want to create features that provide a quantitative summary of the movement seen in plots **Case Study**: Arctic Sea Ice trajectories ??? Of common interest with trajectories is to discover patterns within their movements to help understand the trajectories behavior. One method to help discover these patterns is to visualize the trajectories during exploratory data analysis. These plots tend to messy due to the complexity of trajectories, but they can still provide some insight into the underlying dynamics driving movement. Since trajectories are complex, extracting features form them may make them easier to work within different methods. The features should summarize it's movement. Visualization can be used to motivate the creation of features. For example, Wu et al (2022) developed a method called TPoSTE which created features based on events to separate a boat's trajectory into period of fishing or sailing. For our process, we focus on a case study involving arctic sea ice trajectories. --- class:primary # Data .center[ <!-- Trigger the Modal --> <img id='imgrgps_grid' src='images/rgps_grid.jpg' alt='Example of initial grid used to track movement (Peterson & Sulsky, 2011)' width='25%'> <!-- The Modal --> <div id='modalrgps_grid' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalrgps_grid'> <!-- Modal Caption (Image Text) --> <div id='captionrgps_grid' class='modal-caption'></div> </div> ] + Arctic Sea Ice is tracked by NASA's RADARSTAT Geophysical Processor System (RGPS), which uses synthetic aperture radar (SAR) images + Each grid cell vertex is assigned an identifier `\((j=1,...,n)\)` which is used for tracking + Set of all trajectories: .center[ `\(\mathcal{G} = \left\{g_1, ..., g_n\right\}\)` `\(\\\)` where `\(g_{j} = \left\{s_{jt} : t \in \mathcal{T}_j\right\}\)`, `\(\mathcal{T}_j \subset \left\{t=1...T\right\}\)` a collection of time points where `\(g_j\)` is observed `\(\\\)` and `\({s_{jt}}\)` = `\((x_{jt}, y_{jt})\)` ] + For our study region, `\(n\)` = 8811, and `\(T\)` = 22 ??? The sea ice trajectories were tracked by NASA's RADARSTATE geophysical processor system (RGPS), which uses sequential synthetic aperture radar images to track the trajectory of point on an ice sheet. On the first day of the study period, a grid is put on the image, where each grid cell vertex is assigned an identifier (j) that is tracked over the study period using feature based and area based tracking. At the end of the study period we have a data set of n trajectories, where each trajectory is a collection of spatial locations at different times. Due to collecting this data with a satellite, not all the trajectories are observed on the same day, so we have a collection of possible times. We focused on the Beaufort region, so we have a total of 8811 trajectories on 22 possible days. --- class:primary # Gestalt Principles of Visual Perception **Identifing Patterns**: Can use the gestalt principles of visual perception to process large amounts of data efficiently. + Explains how humans naturally perceive objects and organize them in groups .pull-left[ Principle of Similarity: group items that look similar . + Trajectories with similar shapes and orientations (Chalbi, Ritchie, Park, Choi, Roussel, Elmqvist, and Chevalier, 2020) .center[ <!-- Trigger the Modal --> <img id='imglawsim' src='images/law-sim.png' alt='Example of gestalt law of similarity' width='40%'> <!-- The Modal --> <div id='modallawsim' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodallawsim'> <!-- Modal Caption (Image Text) --> <div id='captionlawsim' class='modal-caption'></div> </div> ] ].pull-right[ Principle of Common Fate: group objects that share a dynamic behavior + Affected by the same underlying processes (Chalbi, Ritchie, Park, et al., 2020; Alais David & Lee, 1998) .center[ <!-- Trigger the Modal --> <img id='imgcommonfate' src='images/common-fate.png' alt='Example of gestalt law of common fate' width='60%'> <!-- The Modal --> <div id='modalcommonfate' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalcommonfate'> <!-- Modal Caption (Image Text) --> <div id='captioncommonfate' class='modal-caption'></div> </div> ]] ??? Before we look at visualizations of the trajectories used during EDA, I first wanted to talk about an important tool to help extract patterns. The gestalt principles of visual perception helps users process large amounts of data efficiently by explaining how humans naturally perceive objects and organize them into groups. There are multiple, but we are going to focus on three here. The principle of similarity says that people tend to group items that look similar. So we organize trajectories with similar shapes and orientations into groups. Second is the principle of common fate used with animated graphics. This principle says that people group objects that share a dynamic behavior, like a flock of birds, as this means those objects are potentially affected by the same underlying processes. --- class:primary # Static Trajectory Plot + Line segments, with the direction of each trajectory denoted by an arrow at the end (Andrienko, Andrienko, and Gatalsky, 2000) - Shows the displacement and direction over time. + This plot violates several guidelines for effective visualization, making it unsuitable for presentation. - However, the principle of similarity helps a viewer easily group trajectories that look to move with a similar form in the same direction over time .center[ <!-- Trigger the Modal --> <img id='imgtraj_plot' src='images/traj_plot.png' alt='Plot of id trajectories to show movement and directiction of movement' width='55%'> <!-- The Modal --> <div id='modaltraj_plot' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodaltraj_plot'> <!-- Modal Caption (Image Text) --> <div id='captiontraj_plot' class='modal-caption'></div> </div> ] ??? First, we created a static plot of all of our trajectories, which connected each observed location of a trajectory with a line segment, and an arrow was added to the end to show the ending direction of movement. So this plot shows the displacement and direction of each trajectory over time.This plot is messy and violates several principles of effective graphics. For example, it's cluttered (trajectories on top of each other), and the color has no meaning (was just used to help visually differentiate the different trajectories better). The coloring is potentially problematic due to the principle of similarity, a user may group all similarly colored trajectories and try to derive a relationship which would be inaccurate. However, using the principle of similarity is also helpful as we can see group of trajectories that look to move with a similar form in the same direction. These groups tend to occur in contiguous patches and stick out even though the plot is kind of a mess. So we can make an assumption, that the underlying process causing the sea ice to move changes based on the location. --- class:primary # Animated Trajectory Plot [Link](https://alisonkleffner.github.io/yield-map-redesign/traj.html) + Shows the incremental progress of each trajectory over time - Plot the new location at each time step and connect the new observation with the previous through a line segment. + New information: - See a trajectory speeding up or slowing down through the length of the added line segments. - Associate a movement with a particular day + Using gestalt principle of common fate - Trajectories moving with a similar velocity in contiguous patches. ??? Another drawback of the static trajectory plot is the inability to associated different movements with a specific time. So don't learn things related to specific times, just total time. So we can create an animation of our movement. We chose to use animation as Griffin et al (2006) found that animated plots allows users to identify moving clusters easier than multiple juxtaposed static plots. In our animated plot, we showed the incremental process of each trajectory on each day by adding the movement for a day, represented by a line segment, to the previous days movement. Now we can see the trajectory speeding up and slowing down based on the length of added line segments. Further, we can associate a movement with a particular day, like what day a trajectory changes direction. Here we can use the principle of common fate to group trajectories moving with a similar velocity (direction/speed), which once again seems to occur in contiguous patches. --- class:primary # Deriving Numerical Features: Bounding Box + We create a bounding box around for each trajectory to represent it's movement + Bounding Box Features: - Length travel in x/y between the minimum and maximum location .center[ ( `\(x_{max} - x_{min}\)` and `\(y_{max} - y_{min}\)`)] - Length travel in x/y between latest and earliest observation .center[ ( `\(x_{2} - x_{1}\)` and `\(y_{2} - y_{1}\)`)] - Angle of movement (direction) .center[ .pull-left[ <!-- Trigger the Modal --> <img id='imgbb_1' src='images/bb_1.png' alt='Points used to Develop Bounding Box 1' width='75%'> <!-- The Modal --> <div id='modalbb_1' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalbb_1'> <!-- Modal Caption (Image Text) --> <div id='captionbb_1' class='modal-caption'></div> </div> ].pull-right[ <!-- Trigger the Modal --> <img id='imgbb_2' src='images/bb_2.png' alt='Points used to Develop Bounding Box 2' width='55%'> <!-- The Modal --> <div id='modalbb_2' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalbb_2'> <!-- Modal Caption (Image Text) --> <div id='captionbb_2' class='modal-caption'></div> </div> ] ] ??? After visually exploring the data during exploratory data analysis, we wanted to derive numerical features based on the visualizations. This was done by creating essentially a bounding box around each trajectory, which represents its movement over time. We can then calculate different features from this bounding box. First, we can find the distance between the maximum and minimum coordinates (total displacement). Second, this value may not always represent the first and last day of the time frame, so we also found the different between the latest and earliest observation (displacement in time). Finally, using the displacement in time, we found the angle that the trajectory moved over the time frame. --- class:primary # Deriving Numerical Features: Wiggle .center[ <!-- Trigger the Modal --> <img id='imgwiggle' src='images/wiggle.png' alt='Wiggle Calculation' width='50%'> <!-- The Modal --> <div id='modalwiggle' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalwiggle'> <!-- Modal Caption (Image Text) --> <div id='captionwiggle' class='modal-caption'></div> </div> ] + To determine the amount of “wiggle”, we estimated the total length of the trajectory (arc length). - Find the distance between each set of two connected points (yellow lines). - Add all the calculated distances to estimate the total length of the trajectory. - Trajectories with a higher total length are "wigglier" ??? Next, besides the bounding box, we wanted to derive a numerical assessment of "wiggle", as this is something that can be seen but may be hard to quantify. How I began to think through this was by imaging we pulled a trajectory so that it was straight. Trajectories with more wiggle would be longer that those with less wiggle. Since each trajectory consists of observed points connected by line segments, I found the length of each line segment and atted them all together (estimate of act length). So the trajectories with a higher total length are wigglier. --- class:primary # Feature Selection + Not all characteristics of a trajectory are simultaneously relevant (Rinzivillo, Pedreschi, Nanni, et al., 2008) + Clustering algorithm to assign trajectories to groups of similar movements + No label information to help evaluate feature importance, used visualization to make judgements (Li, Cheng, Wang, Morstatter, Trevino, Tang, and Liu, 2017). - Redundant: adding it to the clustering algorithm, while holding the other variables constant, does not change the assigned clusters. - Relevance: help with cluster continuity .center[ <!-- Trigger the Modal --> <img id='imgfeaturecomparison4' src='images/feature-comparison4.png' alt='Subsets of Clusters to Determine Relevant Features' width='60%'> <!-- The Modal --> <div id='modalfeaturecomparison4' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalfeaturecomparison4'> <!-- Modal Caption (Image Text) --> <div id='captionfeaturecomparison4' class='modal-caption'></div> </div> ] ??? Finally, not all characteristics of a trajectory are simultaneously relevant when analyzing trajectories, so we wanted to employ feature selection to assess relevancy. Since we could see the movement occurring in contiguous patches, we applied a clustering algorithm to find this groups of similar movements. So we have no label information (don't know true trajectories), we used visualization to make judgements. So assigned each cluster a color, and output a single point of each trajectory with the assigned color onto a map. We classified redundant features as adding a feature to the clustering does not change the assigned cluster. Further, we wanted our features to help create continguous clusters. Did not text all possible subsets of features, just some I thought was relevant. (Explain process). Plot 4 is the features selected. --- class:primary # Conclusion/Future Work + We can use messy plots to motivate the creation of numerical features to summarize trajectory movements. + Relevant features can also be determined using visualization and make the trajectories easier to work with in future analyses. + In the future, we plan to extend our bounding box features to three dimensions through a case study of seabird flight trajectories. ??? In conclusion, we can use messy plots of trajectories to help motivate the creation of features that summarize a trajectories movements. We can also assess relevancy of the created features through visualization by visualizing the clustering of groups of similar movements. These future can then be used in future methods. In the future, we plan on extending our bounding box features into a three-dimensional bounding box through a case study of the of the trajectories of sea bird flights. --- class:inverse <br> <br> <br> <br> <br> <br> <br> <br> .center[ # A Spatio-Temporal Model for Arctic Sea Ice ] --- class:primary # Importance of Studying Sea Ice + Sea ice insulates the warm ocean from the colder atmosphere (Peterson and Sulsky, 2011) + Cracks, or leads, may form in the ice pack due to dynamic processes - Allows for heat from the ocean to be transferred to the atmosphere (Schreyer, Sulsky, Munday, Coon, and Kwok, 2006). - Accounts for half of the heat flux between the ocean and atmosphere (Badgley, 1961) + The state of sea ice and understanding the dynamics driving changes provides information for weather prediction, climate and ocean models (Reiser, Willmes, and Heinemann, 2020) **Goals**: - Develop a lead detection method by clustering similar trajectories - Develop a model reconstructing the underlying process for interpolation .center[ <!-- Trigger the Modal --> <img id='imgIceChunk' src='images/Ice Chunk.png' alt=' Artic Sea Ice with Crack' width='30%'> <!-- The Modal --> <div id='modalIceChunk' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalIceChunk'> <!-- Modal Caption (Image Text) --> <div id='captionIceChunk' class='modal-caption'></div> </div> ] ??? Sea ice is act as an insulator between the warm ocean and the colder atmosphere. Crack, or leads, form in the ice due to dynamic processes, like wind and ocean currents. When these leads for, heat from the ocean is transferred into the atmosphere, warming it (so potentially having an impact on climate change). Leads account for half of the heat flux between the ocean and the atmosphere even though they only occupy a smaller percentage of sea ice. The state of the sea ice, which includes characteristics about the lead like width, and understanding the dynamics driving movements provides valuable information for weather prediction, climate models, and ocean models. So locating these leads is important and --- class:primary # Review of Methods .center[ <!-- Trigger the Modal --> <img id='imgplot4' src='images/plot4.png' alt='Example Lead Detection Results' width='20%'> <!-- The Modal --> <div id='modalplot4' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalplot4'> <!-- Modal Caption (Image Text) --> <div id='captionplot4' class='modal-caption'></div> </div> ] **Lead Detection**: Use bounding box features in K-Means clustering + Boundaries between clusters are the estimated locations of leads + Found through simulations and the sea ice trajectories that our method provides a reasonable estimation of lead locations **Finding Neighbors**: Use information gained from clusters to identify spatio-temporal neighbors + Would expect a missing point at that time to move similarly to known points in same cluster + Cluster trajectories by week to find neighbors - Intersection of one week's clusters with the week before and week after would create groups - Each member of an intersection are spatio-temporal neighbors as they are in a similar geographic region over time. ??? For the sake of time, I'm just going to review the pieces that has not changes since my comprehensive exam. So using the features from the static visualization and feature selection in the previous section, we clustered the trajectories using k-means clustering. A drawback is that for k-means, the number of clusters must be specified prior to clusters, which we determined using the silhouette statistic. The trajectories are assigned a color based on the cluster assignment, and a point location was plotted on a plot, showing groups of contiguous clusters (like feature selection slide). We hypothesized that the boundary between clusters would be the estimated location of a lead, and we found through applying our process through simulations and the sea ice trajectories that our method provides a reasonable estimation of lead locations. Next, to develop our model to reconstruct the underlying process, we used the information gained from clusters to identify spatio-temporal neighbors, as we would expect a missing point to move similarly to known points in the same cluster. To find the spatio-temporal neighbors, we cluster the trajectories by week (sub-trajectories). We used by week clusters as it was the smallest interval could detect movement and also see some continuity between weeks. After clustering, we find the spatial intersection between the desired weeks clusters (week wanting to interpolate) with the week before and the week after. The groups created by these spatial intersection are considered to be spatio-temporal neighbors as they are in a similar geographic region over time. --- class:primary #INLA We developed a model to reconstruct the underlying process (assumed Gaussian), where we simultaneously obtain the movement in `\(x\)` (called `\(u\)`) and the movement in `\(y\)` (called `\(v\)`) at time `\(t\)`. + The movements are added to the previous location to estimate the missing location `$$(\hat{u}_{t-1}, \hat{v}_{t-1}) + (x_{t-1}, y_{t-1}) = (\hat{x}_t, \hat{y}_t)$$` We elected to use the Integrated Nested Laplace Approximation (INLA) approach + Computational benefits over other methods, as it focuses on models that can be expressed as latent Gaussian Markov Random Fields (GMRF) + Easily accounts for the spatio-temporal structure of the data during the inferential process (Krainski, Gomez-Rubio, Bakka, Lenzi, Castro-Camilo, Simpson, Lindgren, and Rue, 2019) ??? We wanted to develop a model that reconstructs the assumed Gaussian underlying process, where we can jointly obtain the movement in x (called u) and the movement in y (called v) at time t. We can then add these movements to the previous location to estimate the missing location (explain the equation). We elected to use the integrated nested laplace approximation or INLA to create our models due to their compuational benefits over other methods since it focuses on models that can be expressed as latent gaussian markov random fields (see where later). Additionally, they provide flexibility to easily account for the spatio-temporal data during the underlying process. --- class:primary #INLA: Continuously Indexed GF Let's first begin by introducing a univariate process: `$$\left\{H(s,t), (s,t) \in \mathcal{D} \in \mathcal{R}^3\right\}$$` where there are `\(N\)` spatial locations at `\(t\)` time points. The process is assumed Gaussian, where the model can be rewritten as `$$H_{{i,t}} \sim N(\eta_{i,t}, \sigma^2_e)$$` with `\(\sigma^2_e\)` representing the nugget effect and the linear predictor is defined as `$$\eta_{i,t} = \alpha + w_{i,t}$$` with `\(\alpha\)` denoting the intercept and the realization of the latent ST Gaussian Field (GF) is represented by `\(w \sim GF(0,\Sigma)\)`. The covariance function `\((\Sigma)\)` is the separable Matern ST covariance function (Blangiardo, Cameletti, Baio, and Rue, 2013). ??? To begin setting up our model, we can define out bivariate data by the joint process H, where we have n spatial locations at t time points. We assume the process is Gaussian so we can rewrite the model as `\(H_{{i,t}} \sim BivN(\eta_{i,t}, \sigma^2_e)\)`, where the sigma squared e represents the nugget effect (or measurement error) and our linear predictor eta is expressed as alpha, the intercept plus the realization of the latent spatio-temporal Gaussian field, w. The realization has a separable matern spatio temporal covariance function. --- class:primary #INLA: SPDE When working with point data, assuming a continuously indexed GF for `\(w_{i,t}\)` is not a computationally efficient approach. Instead used stochastic partial differential equations (SPDE)- represent a GF with Matern Covariance Function through a discretely indexed process called a GMRF (Krainski, Gomez-Rubio, Bakka, et al., 2019) .pull-left[ - Discrete Index is created with a constrained delaunay triangulation .center[ <!-- Trigger the Modal --> <img id='imgmeshex' src='images/mesh-ex.png' alt='Example of the triangulation of a field used in our simulation study' width='90%'> <!-- The Modal --> <div id='modalmeshex' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalmeshex'> <!-- Modal Caption (Image Text) --> <div id='captionmeshex' class='modal-caption'></div> </div> ]].pull-right[ The linear predictor can then be rewritten as `$$\eta_{i,t} = \alpha + \sum^{G}_{g=1}\tilde{A}_{itg}\tilde{w}$$` where `\(\tilde{A}_{itg}\)` is a sparse precision matrix mapping the GMRF, `\(\tilde{w}\)`, from `\(N\)` locations to the `\(G\)` nodes in the triangulation ] ??? When working with point data like our trajectories, assuming a continuously indexed gaussian field for w is not a computationally efficient approach. So instead we used stochastic partial differential equations. These represent a gaussian field with a matern covariance function through a discretely indexed process called a gaussian markov random field. To create the discrete index, a constrained delaunay triangulation. This triangluation prevents highly obtuse triangles, and we create triangles outside the boundary to prevent the variance being significantly higher at the boundary points (boundary effect). In this figure, an example of simulated data is given. On the right hand side is the data, which a blue line around the points to show the boundary. The left hand side shows the triangulation of this space, with the same blue line, where we see more triangles on the inside than outside. The density of the triangles also affects computation time. Since we are using the stochastic partial differential equations, the linear predictors is rewritten, where we still have our intercept, but the A tilde is a sparse precision matrix that maps the GMRF, w, from the N locations in the data set to the G nodes in the triangulation. --- class:primary #Final Model: Bivariate Process We can represent `\(u\)` and `\(v\)` by the joint process `$$\left\{B(s,t), (s,t) \in \mathcal{D} \in \mathcal{R}^3\right\}$$` The process is assumed Gaussian, where the model can be rewritten as `$$B_{{i,t}} \sim BiN(\eta_{i,t}, \sigma^2_e)$$` The linear predictor is written as `$$\eta_{it} = \alpha_u + \alpha_{v} + z_{u}(s,t) + z_{v}(s,t) + z_{uv}(s,t)$$` where `\(\alpha_u\)` and `\(\alpha_{v}\)` are the intercepts for each response and the `\(z\)` functions represent the SPDE model for the `\(u\)` and `\(v\)` spatio-temporal effects, along with their interaction. --- class:primary #Using Model We developed a model within each intersection to account for the non-stationary aspect of our data using only the data from t − 1, t, and t + 1. Create a spatial grid encompassing the sea ice. + Use the centroid of each grid cell to obtain estimated values for the underlying process + Use last known location to determine which grid cell to use as an estimate ??? So for our final model to jointly measure u and v of the underlying process, we write the linear predictor as `\(\eta_{it} = \alpha_u + \alpha_{v} + z_{u}(s,t) + z_{v}(s,t) + z_{uv}(s,t)\)`, where we have our intercepts for each variable and the z functions represent the SPDE model for u and v spatio-temporal effects, along with their interaction. We developed a model within each intersection to account for the non-stationarity aspect of our data, using only data at time t, t-1, t+1. Only used three days as already computationally inefficient, and don't expect days further out to have much of an impact. Once we develop the model, we created a spatial grid that encompassed the sea ice to obtain values to use in our model (centroid of the grid cell). The underlying process should be smooth within an intersection, so the centroid should be close enough that it has a similar value of the underlying process. Once we create the grid for initial location estimates of the missing data, we used the developed bivariate model to find the predicted locations using the posterior mean of the linear predictor. The estimates are added to the previous day's known location to obtain the esimate of the missing location. --- class:primary #Overview of Results from Simulation Study Compared four different methods: + Linear Interpolation + Joint Nonstationary Spatio-Temporal Model + Joint Nonstationary Spatial Model with time fixed effect + Joint Stationary Spatio-Temporal Model Results: + Joint Stationary Spatio-Temporal Model generally performs the worst + The spatial model with fixed time (for `\(y\)`) and the spatio-temporal model (for `\(x\)`) perform better than linear interpolation for less smooth underlying processes ??? To test our methods, we conducted a simulation study, but for times sake I'm just going to show the interesting results. First we compared four different methods for finding missing points of the trajectories. First, we used linear interpolation, which is the simplest method to interpolate trajectory data. It estimates a missing point on a straight-line path between two observed locations. Second is our described model. Third, is we used the same process, but them model is a spatial model with a fixed time effect due to only have three days in the model. Finally, we created a stationary spatio-temporal model to check if our non-stationarity approach is valid, so our inla with spde model was created using the whole data set, not in each intersection. Performance was assessed using the root mean square error. The stationary model generally performs the worst, validating our non-stationarity modeling approach. The spatial model with fixed time (for `\(y\)`) and the spatio-temporal model (for `\(x\)`) perform better than linear interpolation for less smooth underlying processes. I found it interesting that our models tended to perform worse for x than y, meaning it doesn't perform as well for the second variable. --- class:primary #Sea Ice Trajectory Interpolation + Due to observing most of the ice sheet every three days, our response in the model was the movement over three days, meaning `$$(\hat{u}_{t-3}, \hat{v}_{t-3}) + (x_{t-3}, y_{t-3}) = (\hat{x}_t, \hat{y}_t)$$` + Of the observed data in an intersection with a known location after three days, we randomly removed 10% + We fit the spatial model with a time fixed effect - Only using three days to develop our model, so there is probably not a significant amount of temporal dependence. - Performs better for `\(y\)`, the already worse performing coordinate in the simulation study ??? Now I'm going to walk through some of the results for our model using the sea ice trajectories. For testing, we created a model for each intersection and time combination. So if we had p intersections, which vary by week, and t time points, then t x p models are developed. Due to observing most of the ice sheet every three days, our response in the model was the movement over three days. Of the observed data in an intersection with a known location after three days, we randomly removed 10%. Then using the observed data, we fit the spatial model with a time fixed effect as it is computationally more efficient than the spatio-temporal model. Also, since we are only using three days to develop our model, there is probably not a significant amount of temporal dependence, as shown in the simulation study, where the spatial model sometimes performs better than the spatio-temporal model (for y). This model was used to estimate the underlying process that drives movement over three days. --- class:primary #Example Results - Week 1 <table style="width:80%;"> <caption>RMSE for Interpolation Methods by cluster for Week 1</caption> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Joint Spatial</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Linear</div></th> </tr> <tr> <th style="text-align:center;"> Cluster </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 4.410 </td> <td style="text-align:center;"> 15.50 </td> <td style="text-align:center;"> 1.330 </td> <td style="text-align:center;"> 21.800 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 0.443 </td> <td style="text-align:center;"> 0.87 </td> <td style="text-align:center;"> 0.303 </td> <td style="text-align:center;"> 0.527 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 1.120 </td> <td style="text-align:center;"> 1.48 </td> <td style="text-align:center;"> 1.770 </td> <td style="text-align:center;"> 4.740 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 1.140 </td> <td style="text-align:center;"> 1.98 </td> <td style="text-align:center;"> 2.920 </td> <td style="text-align:center;"> 3.420 </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 0.549 </td> <td style="text-align:center;"> 1.98 </td> <td style="text-align:center;"> 0.969 </td> <td style="text-align:center;"> 5.507 </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 1.320 </td> <td style="text-align:center;"> 2.97 </td> <td style="text-align:center;"> 2.330 </td> <td style="text-align:center;"> 7.450 </td> </tr> </tbody> </table> ??? For time, I am just showing the RMSE results for Week 1. Once again, we are comparing it with linear interpolation. Overall our model seems to be performing better than linter interpolation. However, since each cluster is made of different movements we also Look at the RMSEs for each cluster, for x, in week 1, our model performs better than linear interpolation for cluster 3 through 6. The results for the y coordinate are similar, except in week 1, our model also performs better in cluster 1 --- class:primary #Visual of Clusters to Describe Results + Black Polygon (Cluster 4): Data is More Spread out and not linear + Red Polygon (Cluster 2): Data does not move much .center[ <!-- Trigger the Modal --> <img id='imginsettrajplot' src='images/inset-traj-plot.png' alt='Overall trajectory plot with colored polygons to denote the location of the broken out clusters' width='80%'> <!-- The Modal --> <div id='modalinsettrajplot' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalinsettrajplot'> <!-- Modal Caption (Image Text) --> <div id='captioninsettrajplot' class='modal-caption'></div> </div> ] ??? This plot is just to obtain a visual idea of what kinds of movement lead to a better peformance of our model. So as example, the black polygon represents cluster 4, where our model performs best. Here the data is more spread out and not linear, so would not expect linear interpolation to work well here. Second, the red polygons refer to cluster 2, here the data does not move much over the week, allows for linear interpolation to have an easier time interpolating between two close points. So our model shows promise for curved data that is not highly sampled. --- class:primary #Coverage A benefit of using a model-based approach is that we can determine the uncertainty of our estimate. - Standard deviation of estimate computed using the posterior marginals, which are then used to create an interval of our estimates - The intervals can then be used to find the proportion of intervals containing the true amount of movement during testing of our models <br> <table style='width:75%;'> <caption><b>Coverage</b></caption> <thead> <tr> <th style="text-align:center;"> Week </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 0.389 </td> <td style="text-align:center;"> 0.370 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 0.451 </td> <td style="text-align:center;"> 0.476 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0.443 </td> <td style="text-align:center;"> 0.373 </td> </tr> </tbody> </table> ??? A benefit of using a model-based approach is that we can determine the uncertainty of our estimate. Using the posterior marginals, we can find the standard deviation of the estimate, which can then be used to create an interval of our estimates. The intervals can then be used to find the proportion of intervals containing the true amount of movement during testing of our models, otherwise known as coverage. Our coverage is pretty low, which is some cause for concern. So there is maybe better priors to use in our inla model, or a better creation of the triangulation, etc. --- class:primary # Discussion of Model **Advantages**: + Takes into account the nonstationarity of the data + Showed some improvement, in terms of RMSE, over linear interpolation for curved data that is not highly sampled + Able to calculate uncertainty **Areas for Improvement**: + Computational efficiency + Coverage shows that there is room for improvement with the prior specification ??? Our model is beneficial as it takes into account the non-stationarity of the data, which is an important component of this specific underlying process. And it Showed some improvement, in terms of RMSE, over linear interpolation for curved data that is not highly sampled. Also, it is able to estimate days on the edges, which linear interpolation is not able to do since it requires two observed locations to estimate in between. Finally we can calculate an uncertainty estimate of our predictions. Areas of improvement for our approach include computational efficiency. Due to develop t x p (number of intersections) models, it takes a long time to run (~20-45 minutes). Next, coverage showed there is room for improved with prior specification. Currently assuming the same prior for each model (more efficient), but probably not accurate. --- class:primary #Future Modeling Work + Methods to improve computational efficiency should be considered (parallel computing). + Currently, our interpolation method is a two-step process: find clusters and create model in each cluster. - Eventually, we would like to turn this into a one-step process, potentially using Voronoi tessellations and a piecewise Gaussian Process. - Combining everything into one step may also increase computational efficiency. ??? In the future, for lead detection, I would like to use a better method for determining the number of clusters. For example Ossama et al (2011) has a process for determining the number of unique directions, which may be a better estimate of `\(k\)`. Second, methods to improve computational efficiency should be considered, like maybe parallel computing. Finally, probably the most important is that currently, our interpolation method is a two-step process: find clusters and create model in each cluster. Eventually, we would like to turn this into a one-step process, potentially using Voronoi tessellations and a piecewise Gaussian Process. Combining everything into one step may also increase computational efficiency. --- # Overall Conclusion The amount of trajectory and multivariate spatial data collected has increased due to technological advances, where each data type has characteristics that make them complicated to work with. This research presents work with each data type using environmental applications to address some of these complications. + Importance of design choices when developing plots the occupy the same spatial domain + Messy visualizations can be used for feature extraction + Developing a non-stationary spatio-temporal model to reconstruct a process causing items to move --- class:primary # References <font size="2"> <p><cite><a id='bib-alais-gestalt-1998'></a><a href="#cite-alais-gestalt-1998">Alais David & Lee, S.</a> (1998). “Visual features that vary together over time group together over space”. In: <em>Nature Neuroscience</em> 1, pp. 160–164. DOI: <a href="https://doi.org/10.1038/414">10.1038/414</a>.</cite></p> <p><cite><a id='bib-andrienko_visual_2007'></a><a href="#cite-andrienko_visual_2007">Andrienko, G., N. Andrienko, and S. Wrobel</a> (2007). “Visual Analytics Tools for Analysis of Movement Data”. In: <em>SIGKDD Explor. Newsl.</em> 9.2, p. 38–46. ISSN: 1931-0145. DOI: <a href="https://doi.org/10.1145/1345448.1345455">10.1145/1345448.1345455</a>.</cite></p> <p><cite><a id='bib-andrienko_supporting_2000'></a><a href="#cite-andrienko_supporting_2000">Andrienko, N., G. Andrienko, and P. Gatalsky</a> (2000). “Supporting Visual Exploration of Object Movement”. In: <em>Proceedings of the Working Conference on Advanced Visual Interfaces</em>. AVI '00. Palermo, Italy: Association for Computing Machinery, p. 217–220. ISBN: 1581132522. DOI: <a href="https://doi.org/10.1145/345513.345319">10.1145/345513.345319</a>.</cite></p> <p><cite><a id='bib-ansari_spatiotemporal_2020'></a><a href="#cite-ansari_spatiotemporal_2020">Ansari, M. Y., A. Ahmad, S. S. Khan, et al.</a> (2020). “Spatiotemporal clustering: a review”. In: <em>Artificial Intelligence Review</em> 53.4, pp. 2381–2423. DOI: <a href="https://doi.org/10.1007/s10462-019-09736-1">10.1007/s10462-019-09736-1</a>.</cite></p> <p><cite><a id='bib-badgley_1961'></a><a href="#cite-badgley_1961">Badgley, F.</a> (1961). “Heat balance at the surface of the arctic ocean”. In: <em>Proc 29th Annual Western 528 Snow Conference Spokane</em>. , pp. 101–104.</cite></p> <p><cite><a id='bib-biesecker-2020'></a><a href="#cite-biesecker-2020">Biesecker, C., W. Zahnd, H. M. Brandt, et al.</a> (2020). “A Bivariate Mapping Tutorial for Cancer Control Resource Allocation Decisions and Interventions”. In: <em>Preventing chronic disease</em> 17 (E01). DOI: <a href="https://doi.org/10.5888/pcd17.190254">10.5888/pcd17.190254</a>.</cite></p> <p><cite><a id='bib-billen_nitrogen_2013'></a><a href="#cite-billen_nitrogen_2013">Billen, G., J. Garnier, and L. Lassaletta</a> (2013). “The nitrogen cascade from agricultural soils to the sea: modelling nitrogen transfers at regional watershed and global scales”. In: <em>Philosophical Transactions of the Royal Society B: Biological Sciences</em> 368.1621, p. 20130123. DOI: <a href="https://doi.org/10.1098/rstb.2013.0123">10.1098/rstb.2013.0123</a>.</cite></p> <p><cite><a id='bib-BLANGIARDO201333'></a><a href="#cite-BLANGIARDO201333">Blangiardo, M., M. Cameletti, G. Baio, et al.</a> (2013). “Spatial and spatio-temporal models with R-INLA”. In: <em>Spatial and Spatio-temporal Epidemiology</em> 4, pp. 33–49. DOI: <a href="https://doi.org/10.1016/j.sste.2012.12.001">10.1016/j.sste.2012.12.001</a>.</cite></p> <p><cite><a id='bib-BRAVO2004b'></a><a href="#cite-BRAVO2004b">Bravo, M. and H. Farid</a> (2004a). “Recognizing and segmenting objects in clutter”. In: <em>Vision Research</em> 44.4, pp. 385–396. DOI: <a href="https://doi.org/10.1016/j.visres.2003.09.031">10.1016/j.visres.2003.09.031</a>.</cite></p> <p><cite><a id='bib-BRAVO2004a'></a><a href="#cite-BRAVO2004a">—</a> (2004b). “Search for a category target in clutter”. In: <em>Perception</em> 33.5, pp. 643–652. DOI: <a href="https://doi.org/10.1068/p5244">10.1068/p5244</a>.</cite></p> <p><cite><a id='bib-brewer_2002'></a><a href="#cite-brewer_2002">Brewer, C. A. and L. Pickle</a> (2002). “Evaluation of Methods for Classifying Epidemiological Data on Choropleth Maps in Series”. In: <em>Annals of the Association of American Geographers</em> 92.4, pp. 662-681.</cite></p> </font> --- class:primary # References <font size="2"> <p><cite><a id='bib-chalbi_gestalt'></a><a href="#cite-chalbi_gestalt">Chalbi, A., J. Ritchie, D. Park, et al.</a> (2020). “Common Fate for Animated Transitions in Visualization”. In: <em>IEEE Transactions on Visualization and Computer Graphics</em> 26.1, pp. 386–396. DOI: <a href="https://doi.org/10.1109/TVCG.2019.2934288">10.1109/TVCG.2019.2934288</a>.</cite></p> <p><cite><a id='bib-leaflet'></a><a href="#cite-leaflet">Cheng, J., B. Karambelkar, and Y. Xie</a> (2022). <em>leaflet: Create Interactive Web Maps with the JavaScript 'Leaflet' Library</em>. R package version 2.1.1. URL: <a href="https://CRAN.R-project.org/package=leaflet">https://CRAN.R-project.org/package=leaflet</a>.</cite></p> <p><cite><a id='bib-cleveland_graphical_1984'></a><a href="#cite-cleveland_graphical_1984">Cleveland, W. S. and R. McGill</a> (1984). “Graphical perception: Theory, experimentation, and application to the development of graphical methods”. In: <em>Journal of the American statistical association</em> 79.387, pp. 531–554. DOI: <a href="https://doi.org/10.2307/2288400">10.2307/2288400</a>.</cite></p> <p><cite><a id='bib-gleicher2011'></a><a href="#cite-gleicher2011">Gleicher, M., D. Albers, R. Walker, et al.</a> (2011). “Visual Comparison for Information Visualization”. In: <em>Information Visualization</em> 10.4, pp. 289-–309. DOI: <a href="https://doi.org/10.1177/1473871611416549">10.1177/1473871611416549</a>.</cite></p> <p><cite><a id='bib-gordon_2015'></a><a href="#cite-gordon_2015">Gordon, I. and S. Finch</a> (2015). “Statistician Heal Thyself: Have We Lost the Plot?” In: <em>Journal of Computational and Graphical Statistics</em> 24.4, pp. 1210–1229. DOI: <a href="https://doi.org/10.1080/10618600.2014.989324">10.1080/10618600.2014.989324</a>.</cite></p> <p><cite>Griffin, A. L., A. M. MacEachren, F. Hardisty, et al. (2006). “A Comparison of Animated Maps with Static Small-Multiple Maps for Visually Identifying Space-Time Clusters”. In: <em>Annals of the Association of American Geographers</em> 96.4, pp. 740–753. DOI: <a href="https://doi.org/10.1111/j.1467-8306.2006.00514.x">10.1111/j.1467-8306.2006.00514.x</a>.</cite></p> <p><cite><a id='bib-climate-viz'></a><a href="#cite-climate-viz">Harold, J., I. Lorenzoni, T. Shipley, et al.</a> (2016). “Cognitive and Pyschological Science Insights to Improve Climate Change Data Visualization”. In: <em>Nature Climate Change</em> 6, pp. 1080 – 1089. DOI: <a href="https://doi.org/0.1038/NCLIMATE3162">0.1038/NCLIMATE3162</a>.</cite></p> <p><cite><a id='bib-he-mult-2019'></a><a href="#cite-he-mult-2019">He, X., Y. Tao, Q. Wang, et al.</a> (2019). “Multivariate spatial data visualization: a survey”. In: <em>Journal of Visualization</em> 22 (5), pp. 897–912. DOI: <a href="https://doi.org/10.1007/s12650-019-00584-3">10.1007/s12650-019-00584-3</a>.</cite></p> <p><cite><a id='bib-huang2009'></a><a href="#cite-huang2009">Huang, W., P. Eades, and S. Hong</a> (2009). “Measuring Effectiveness of Graph Visualizations: A Cognitive Load Perspective”. In: <em>Information Visualization</em> 8.3, pp. 139–152. DOI: <a href="https://doi.org/0.1057/ivs.2009.10">0.1057/ivs.2009.10</a>.</cite></p> </font> --- class:primary # References <font size="2"> <p><cite><a id='bib-key_detectability_1993'></a><a href="#cite-key_detectability_1993">Key, J., R. Stone, J. Maslanik, et al.</a> (1993). “The detectability of sea-ice leads in satellite data as a function of atmospheric conditions and measurement scale”. In: <em>Annals of Glaciology</em> 17, pp. 227–232. DOI: <a href="https://doi.org/10.3189/S026030550001288X">10.3189/S026030550001288X</a>.</cite></p> <p><cite><a id='bib-kisilevich_spatio-temporal_nodate'></a><a href="#cite-kisilevich_spatio-temporal_nodate">Kisilevich, S., F. Mansmann, M. Nanni, et al.</a> (2009). “Spatio-temporal clustering”. In: <em>Data Mining and Knowledge Discovery Handbook</em>. Ed. by O. Maimon and L. Rokach. Boston, MA: Springer, pp. 855–874. DOI: <a href="https://doi.org/10.1007/978-0-387-09823-4_44">10.1007/978-0-387-09823-4_44</a>.</cite></p> <p><cite><a id='bib-klippel-tobler-2011'></a><a href="#cite-klippel-tobler-2011">Klippel, A., F. Hardisty, and R. Li</a> (2011). “Interpreting Spatial Patterns: An Inquiry Into Formal and Cognitive Aspects of Tobler's First Law of Geography”. In: <em>Annals of the Association of American Geographers</em> 101.5, pp. 1011–1031. DOI: <a href="https://doi.org/10.2307/27980249">10.2307/27980249</a>.</cite></p> <p><cite><a id='bib-kodinariya_2013'></a><a href="#cite-kodinariya_2013">Kodinariya, T. and P. Makwana</a> (2013). “Review on Determining of Cluster in K-means Clustering”. In: <em>International Journal of Advance Research in Computer Science and Management Studies</em> 1, pp. 90–95.</cite></p> <p><cite><a id='bib-spde-book'></a><a href="#cite-spde-book">Krainski, E. T., V. Gomez-Rubio, H. Bakka, et al.</a> (2019). <em>Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA</em>. Baltimore, Maryland: Chapman & Hall/CRC Press. DOI: <a href="https://doi.org/10.1201/9780429031892">10.1201/9780429031892</a>.</cite></p> <p><cite><a id='bib-Leonowicz2003RESEARCHOT'></a><a href="#cite-Leonowicz2003RESEARCHOT">Leonowicz, A.</a> (2003). “RESEARCH ON TWO-VARIABLE CHOROPLETH MAPS AS A METHOD FOR PORTRAYING GEOGRAPHICAL RELATIONSHIPS”. In: <em>Proceedings of the 21st International Cartographic Conference</em>. The International Cartographic Association.</cite></p> <p><cite><a id='bib-li-features'></a><a href="#cite-li-features">Li, J., K. Cheng, S. Wang, et al.</a> (2017). “Feature Selection: A Data Perspective”. In: <em>ACM Comput. Surv.</em> 50.6. DOI: <a href="https://doi.org/10.1145/3136625">10.1145/3136625</a>.</cite></p> <p><cite><a id='bib-light-rainbow-2004'></a><a href="#cite-light-rainbow-2004">Light, A. and P. Bartlein</a> (2004). “The End of the Rainbow? Color Schemes for Improved Data Graphics”. In: <em>Eos, Transactions American Geophysical Union</em> 85.40, pp. 385–391. DOI: <a href="https://doi.org/10.1029/2004EO400002">10.1029/2004EO400002</a>.</cite></p> <p><cite><a id='bib-lyi2021'></a><a href="#cite-lyi2021">LYi, S., J. Jo, and J. Seo</a> (2021). “Comparative Layouts Revisited: Design Space, Guidelines, and Future Directions”. In: <em>IEEE Transactions on Visualization and Computer Graphics</em> 27.2, pp. 1525–1535. DOI: <a href="https://doi.org/10.1109/TVCG.2020.3030419">10.1109/TVCG.2020.3030419</a>.</cite></p> <p><cite><a id='bib-macdonald_1999'></a><a href="#cite-macdonald_1999">Macdonald, L.</a> (1999). “Using color effectively in computer graphics”. In: <em>Computer Graphics and Applications</em> 19, pp. 20 - 35. DOI: <a href="https://doi.org/10.1109/38.773961">10.1109/38.773961</a>.</cite></p> </font> --- class:primary # References <font size="2"> <p><cite><a id='bib-MAHESWARI2017175'></a><a href="#cite-MAHESWARI2017175">Maheswari, M., A. Murthy, and A. Shanker</a> (2017). “12 - Nitrogen Nutrition in Crops and Its Importance in Crop Quality”. In: <em>The Indian Nitrogen Assessment</em>. Ed. by Y. P. Abrol, T. K. Adhya, V. P. Aneja, N. Raghuram, H. Pathak, U. Kulshrestha, C. Sharma and B. Singh. Elsevier, pp. 175–186. DOI: <a href="https://doi.org/10.1016/B978-0-12-811836-8.00012-4">10.1016/B978-0-12-811836-8.00012-4</a>.</cite></p> <p><cite><a id='bib-Miller1956TheMN'></a><a href="#cite-Miller1956TheMN">Miller, G. A.</a> (1956). “The magical number seven plus or minus two: some limits on our capacity for processing information”. In: <em>Psychological review</em> 63 (2), pp. 81–97. DOI: <a href="https://doi.org/10.1037/h0043158">10.1037/h0043158</a>.</cite></p> <p><cite><a id='bib-ossama_extended_2011'></a><a href="#cite-ossama_extended_2011">Ossama, O., H. M. Mokhtar, and M. E. El-Sharkawi</a> (2011). “An extended k-means technique for clustering moving objects”. In: <em>Egyptian Informatics Journal</em> 12.1, pp. 45-51. ISSN: 1110-8665. DOI: <a href="https://doi.org/10.1016/j.eij.2011.02.007">10.1016/j.eij.2011.02.007</a>.</cite></p> <p><cite><a id='bib-peterson_evaluating_2011'></a><a href="#cite-peterson_evaluating_2011">Peterson, K. and D. Sulsky</a> (2011). “Evaluating Sea Ice Deformation in the Beaufort Sea Using a Kinematic Crack Algorithm with RGPS Data”. In: <em>Remote Sensing of the Changing Oceans</em>. Berlin, Heidelberg: Springer. ISBN: 978-3-642-16541-2.</cite></p> <p><cite><a id='bib-reiser_new_2020'></a><a href="#cite-reiser_new_2020">Reiser, F., S. Willmes, and G. Heinemann</a> (2020). “A New Algorithm for Daily Sea Ice Lead Identification in the Arctic and Antarctic Winter from Thermal-Infrared Satellite Imagery”. In: <em>Remote Sensing</em> 12.12, p. 1957. DOI: <a href="https://doi.org/10.3390/rs12121957">10.3390/rs12121957</a>.</cite></p> <p><cite><a id='bib-rinzivillo_visuallydriven_2008'></a><a href="#cite-rinzivillo_visuallydriven_2008">Rinzivillo, S., D. Pedreschi, M. Nanni, et al.</a> (2008). “Visually–driven analysis of movement data by progressive clustering”. In: <em>Information Visualization</em> 7.3, pp. 225–239. DOI: <a href="https://doi.org/10.1057/palgrave.ivs.9500183.">10.1057/palgrave.ivs.9500183.</a>.</cite></p> <p><cite><a id='bib-schreyer_elastic_2006'></a><a href="#cite-schreyer_elastic_2006">Schreyer, H. L., D. L. Sulsky, L. B. Munday, et al.</a> (2006). “Elastic-decohesive constitutive model for sea ice”. In: <em>Journal of Geophysical Research: Oceans</em> 111.C11. DOI: <a href="https://doi.org/10.1029/2005JC003334">10.1029/2005JC003334</a>.</cite></p> <p><cite><a id='bib-SILVA2011320'></a><a href="#cite-SILVA2011320">Silva, S., B. Sousa Santos, and J. Madeira</a> (2011). “Using color in visualization: A survey”. In: <em>Computers & Graphics</em> 35.2, pp. 320–333. DOI: <a href="https://doi.org/10.1016/j.cag.2010.11.015">10.1016/j.cag.2010.11.015</a>.</cite></p> <p><cite><a id='bib-strode-2020'></a><a href="#cite-strode-2020">Strode, G., J. Morgan, B. Thornton, et al.</a> (2020). “Operationalizing Trumbo’s Principles of Bivariate Choropleth Map Design”. In: <em>Cartographic Perspectives</em> 94, pp. 5–24. DOI: <a href="https://doi.org/10.14714/CP94.1538">10.14714/CP94.1538</a>.</cite></p> </font> --- class:primary # References <font size="2"> <p><cite><a id='bib-tilman_sustainalbe_2011'></a><a href="#cite-tilman_sustainalbe_2011">Tilman, D., C. Balzer, J. Hill, et al.</a> (2011). “Global food demand and the sustainable intensification of agriculture”. In: <em>Proceedings of the National Academy of Sciences</em> 108.50, pp. 20260–20264. DOI: <a href="https://doi.org/10.1073/pnas.1116437108">10.1073/pnas.1116437108</a>.</cite></p> <p><cite>Trevisan, R., D. Bullock, and N. Martin (2021). “Spatial Variability of Crop Responses to Agronomic Inputs in On-Farm Precision Experimentation”. In: <em>Precision Agriculture</em> 22.2, pp. 342–363. DOI: <a href="https://doi.org/10.1007/s11119-020-09720-8">10.1007/s11119-020-09720-8</a>.</cite></p> <p><cite><a id='bib-vanderplas2020'></a><a href="#cite-vanderplas2020">Vanderplas, S., D. Cook, and H. Hofmann</a> (2020). “Testing Statistical Charts: What Makes a Good Graph?” In: <em>Annual Review of Statistics and Its Application</em> 7.1, pp. 61–88. DOI: <a href="https://doi.org/10.1146/annurev-statistics-031219-041252">10.1146/annurev-statistics-031219-041252</a>.</cite></p> <p><cite><a id='bib-wagemans-2012-a'></a><a href="#cite-wagemans-2012-a">Wagemans, J., J. H. Elder, M. Kubovy, et al.</a> (2012). “A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization”. In: <em>Psychological Bulletin</em> 138.6, pp. 1172–1217. DOI: <a href="https://doi.org/10.1037/a0029333">10.1037/a0029333</a>.</cite></p> <p><cite><a id='bib-wagemans-2012-b'></a><a href="#cite-wagemans-2012-b">Wagemans, J., J. Feldman, S. Gepshtein, et al.</a> (2012). “A century of Gestalt psychology in visual perception: II. Conceptual and theoretical foundations.” In: <em>Psychological bulletin</em> 138.6, pp. 1218-52. DOI: <a href="https://doi.org/10.1037/a0029334">10.1037/a0029334</a>.</cite></p> <p><cite><a id='bib-wang_comp18'></a><a href="#cite-wang_comp18">Wang, Y., H. Haleem, C. Shi, et al.</a> (2018). “Towards Easy Comparison of Local Businesses Using Online Reviews”. In: <em>Computer Graphics Forum</em> 37.3, pp. 63–74. DOI: <a href="https://doi.org/10.1111/cgf.13401">10.1111/cgf.13401</a>.</cite></p> <p><cite><a id='bib-wong2011color'></a><a href="#cite-wong2011color">Wong, B.</a> (2011). “Color blindness”. In: <em>nature methods</em> 8.6, p. 441. DOI: <a href="https://doi.org/10.1038/nmeth.1618">10.1038/nmeth.1618</a>.</cite></p> <p><cite><a id='bib-wu-fish-2022'></a><a href="#cite-wu-fish-2022">Wu, S., E. Zimányi, M. Sakr, et al.</a> (2022). “Semantic Segmentation of AIS Trajectories for Detecting Complete Fishing Activities”. In: <em>2022 23rd IEEE International Conference on Mobile Data Management (MDM)</em>. IEEE, pp. 419-424. DOI: <a href="https://doi.org/10.1109/MDM55031.2022.00092">10.1109/MDM55031.2022.00092</a>.</cite></p> <p><cite><a id='bib-yuan_review_2017'></a><a href="#cite-yuan_review_2017">Yuan, G., P. Sun, J. Zhao, et al.</a> (2017). “A Review of Moving Object Trajectory Clustering Algorithms”. In: <em>Artificial Intelligence Review</em> 47.1, pp. 123–144. DOI: <a href="https://doi.org/10.1007/s10462-016-9477-7">10.1007/s10462-016-9477-7</a>.</cite></p> </font> --- class:primary #Acknowledgements **Advisers** + Dr. Yawen Guan + Dr. Susan VanderPlas **Committee Members** + Dr. Erin Blankenship + Dr. Taro Mieno **Other** + Family + Graduate school friends and peers --- class:inverse <br> <br> <br> .center[ # Questions? <br> <br> ] --- class:primary # Data Difficulty .center[ <!-- Trigger the Modal --> <img id='imgtrajexample' src='images/traj-example.png' alt='Example of a single trajectory in our data set' width='90%'> <!-- The Modal --> <div id='modaltrajexample' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodaltrajexample'> <!-- Modal Caption (Image Text) --> <div id='captiontrajexample' class='modal-caption'></div> </div> ] ??? This data set in particular can be complicated to work with. To show this we look at one trajectory. For example, we have missing data, where there is no pattern within the missingness, due to data collection by satellite. So gaps between observations vary between 1-3 days. Secondly, at day 5, we have two observations, so not only are we missing data, sometimes we have multiple observations on the same day, which can be difficult to account for. The plot to the left shows the path of the trajectory, where are observed locations are connected with a line segment. This shows us a depiction of what we see in the data set. It shows how visualizing the trajectories helps the user understand the movements better, so it's important to visualize the trajectories, even though they may be messy. --- class:primary # Other Lead Detection Methods .pull-left[ **Thermal** + Surface temperature differs between a lead and the surrounding sea ice. + Use thermal channels of the Advanced Very High Resolution Radiometer (AVHRR) (Key, Stone, Maslanik, and Ellefsen, 1993) - Heavily dependent on clear skies and has issues with thin ice + Methods have been proposed to reduce the impact of clouds .center[ <!-- Trigger the Modal --> <img id='imgthermal_example' src='images/thermal_example.png' alt=' Output from a Thermal Algorithm (Rohrs et al, 2012)' width='40%'> <!-- The Modal --> <div id='modalthermal_example' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalthermal_example'> <!-- Modal Caption (Image Text) --> <div id='captionthermal_example' class='modal-caption'></div> </div> ] ].pull-right[ **Deformation** + Cell deformation is determined by point motion (Peterson and Sulsky, 2011) - The determinant of the deformation gradient measures accumulated area changes - Can find the size and orientation + Drawbacks - Need complete set of space-time observations to calculate deformation - Underestimation of error in deformation product .center[ <!-- Trigger the Modal --> <img id='imgkinematic_crack_algorithm' src='images/kinematic_crack_algorithm.png' alt='Example of detected leads using a kinematic crack algorithm which uses the determinant of the deformation gradient to detect leads (Peterson & Sulsky, 2011)' width='30%'> <!-- The Modal --> <div id='modalkinematic_crack_algorithm' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalkinematic_crack_algorithm'> <!-- Modal Caption (Image Text) --> <div id='captionkinematic_crack_algorithm' class='modal-caption'></div> </div> ] ] --- class:primary #Missing Data + In general, data collection methods may fail, leaving positions in a trajectory unknown or may want to overcome sampling sparseness + In our case, missing data is due to the path of the satellite used to collect the data. .center[ <!-- Trigger the Modal --> <img id='imgdata_example' src='images/data_example.jpeg' alt='Missing Data within the Ice Sheet' width='50%'> <!-- The Modal --> <div id='modaldata_example' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodaldata_example'> <!-- Modal Caption (Image Text) --> <div id='captiondata_example' class='modal-caption'></div> </div> ] ??? When collecting data by satellite, the path of the satellite doesn't observe the whole ice sheet every day, leaving positions in a trajectory unknown. Missingness tends to occur in spatial chunks, which is a reason why many simple interpolation methods are not useful either. For example, this figure shows the location of each trajectory on the first day of the data set. The uncolored boxes represent missing locations, which we can see occurs in a large spatial chunk. --- class:primary #Gaussian Process (GP) **For Spatial Data** `\(\left\{X(s): s \in D \subset R^2\right\}\)` is a Gaussian Process if all its finite-dimensional distributions are Gaussian .center[ ie. `\(X(s) \sim GP(0,c(.|.))\)` ] Meaning, for `\(\left\{s_1,...,s_n\right\}\)`, `\(x\)` = `\((x_1,...,x_n)^T \sim MVN(0, \sigma^2_c\Sigma_{\theta_{i,q}})\)` Can define `\(\Sigma_{\theta_{i,q}}\)` as the Matern Covariance Function (Guinness and Katzfuss, 2021) `$$\Sigma_{\theta_{i,q}} = \frac{1}{\Gamma(\lambda)2^{\lambda-1}}(\kappa||s_{i}-s_{q}||)^{\lambda} K_{\lambda}(\kappa||s_{i}-s_{q}||)$$` `\(\\\)` where `\(\lambda\)` is the smoothness parameter, `\(\kappa\)` the scaling parameter, and `\(K_{\lambda}()\)` the Bessel function Joint density of the observations can be written as a product of conditional densities (Guinness, 2018) `$$f(x_1,...,x_n) = f(x_1)\prod^n_{i=2} f(x_{i}|x_{1},...,x_{i-1})$$` This can be a computationally complex process due to the inversion of `\(\Sigma_{\theta}\)` --- class:primary #Gaussian Process (GP) **Extension to ST Data** Now, the covariance function is a Matern Space-Time, which is a separable covariance function (Wikle et al, 2019) .center[ `\(\Sigma = \sigma^2_{c}C_{1}(\Delta_{iq})C_{2}(\Lambda_{tp})\)` `\(\\\)` where `\(C_{1}(\Delta_{iq}) = \frac{1}{\Gamma(\lambda)2^{\lambda-1}}(\kappa||s_{i}-s_{q}||)^{\lambda} K_{\lambda}(\kappa||s_{i}-s_{q}||)\)` and `\(C_{2}(\Lambda_{tp})\)` is an autoregressive order 1 (AR(1)) model with a temporal lag of `\(||t-p||\)` where `\(\sigma^2_{c}\)` is the variance component, `\(\lambda\)` is the smoothness parameter, `\(\kappa\)` the scaling parameter related to the range, and `\(K_{\lambda}()\)` the Bessel function of the second kind. ] Separability can speed up computations if no missing data, but we do... --- class:primary #Simulation Study + Create Underlying Process Grid - Simulating movement of ocean that causes observations to move - First generate a fine grid (40x40) - To simulated a covariance matrix, the initial grid over seven days was used in the Matern Space-Time covariance Function. - This data is then used to create the covariance matrix, `\(C_{d,c}(\theta)\)` - Covariance parameters and defined mean trend ( `\(\mu_{d,c}\)` ) is different for each cluster `\(\\\)` .center[ `\(H_{c}(s,t) \sim GP(\mu_{c}, C_{c}(\theta))\)` `\(\\\)` where `\(H_{c}(s,t)\)` is the joint displacement at location `\(s\)` & time `\(t\)` `\(\\\)` for cluster `\(c\)` ] **Note**: Only 2 clusters for simplicity --- class:primary #Simulation Study + Create Observed Grid - Movement of an observed point is determined by the value of the nearest point of the underlying process for that day ($r$) - Obtained a week's worth of simulated data .center[ `\((x_{t,j}, y_{t,j}) = (H_{t-1,c,r}, H_{t-1,c,r}) + (x_{t-1,j}, y_{t-1,j})\)` `\(\\\)` where t=1,...,7 (time), j = 1,...,121 (id), `\(\\\)` H is the underlying process value at `\(t-1\)` for cluster ( `\(c\)` ) and grid id ( `\(r\)` ) ] .center[ <!-- Trigger the Modal --> <img id='imggridcombo' src='images/grid-combo.png' alt='Underlying and Observed Grid Plotted Together' width='40%'> <!-- The Modal --> <div id='modalgridcombo' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalgridcombo'> <!-- Modal Caption (Image Text) --> <div id='captiongridcombo' class='modal-caption'></div> </div> ] --- class:primary #Simulated Data Created 3 different scenarios, each with different parameter values. .center[ <!-- Trigger the Modal --> <img id='imgsimtraj' src='images/sim-traj.png' alt='Simulated Trajectories for Each Simulation' width='90%'> <!-- The Modal --> <div id='modalsimtraj' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalsimtraj'> <!-- Modal Caption (Image Text) --> <div id='captionsimtraj' class='modal-caption'></div> </div> ] --- class:primary #Simulated Clustering Results + Results are shown at two different time points - On initial grid ( `\(t=0\)` ) - On last day of the week ( `\(t=7\)` ) .center[ <!-- Trigger the Modal --> <img id='imgsimclustresults' src='images/sim-clust-results.png' alt='Clusters for each Simulation at t=0 and t=7' width='80%'> <!-- The Modal --> <div id='modalsimclustresults' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalsimclustresults'> <!-- Modal Caption (Image Text) --> <div id='captionsimclustresults' class='modal-caption'></div> </div> ] --- class:primary #Simulated Interpolation Results + Simulated and clustered another week of data. + 10% of the data for the first week are randomly assigned to be missing. <br> <table style="width:80%;"> <caption>RMSE for Interpolation Methods</caption> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Linear</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Spatial Model</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">ST Model</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Stationary Model</div></th> </tr> <tr> <th style="text-align:center;"> Simulation </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 0.160 </td> <td style="text-align:center;"> 0.263 </td> <td style="text-align:center;"> 0.192 </td> <td style="text-align:center;"> 0.307 </td> <td style="text-align:center;"> 0.167 </td> <td style="text-align:center;"> 0.418 </td> <td style="text-align:center;"> 0.197 </td> <td style="text-align:center;"> 0.599 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 0.173 </td> <td style="text-align:center;"> 0.281 </td> <td style="text-align:center;"> 0.172 </td> <td style="text-align:center;"> 0.229 </td> <td style="text-align:center;"> 0.172 </td> <td style="text-align:center;"> 0.480 </td> <td style="text-align:center;"> 0.201 </td> <td style="text-align:center;"> 0.670 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0.375 </td> <td style="text-align:center;"> 0.558 </td> <td style="text-align:center;"> 0.399 </td> <td style="text-align:center;"> 0.598 </td> <td style="text-align:center;"> 0.303 </td> <td style="text-align:center;"> 0.659 </td> <td style="text-align:center;"> 0.402 </td> <td style="text-align:center;"> 1.050 </td> </tr> </tbody> </table> --- class:primary #Simulated Interpolation Results <table> <caption>RMSE for Interpolation Methods by cluster</caption> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Linear</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Spatial Model</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">ST Model</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Stationary</div></th> </tr> <tr> <th style="text-align:center;"> Simulation </th> <th style="text-align:center;"> Cluster </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 0.154 </td> <td style="text-align:center;"> 0.252 </td> <td style="text-align:center;"> 0.159 </td> <td style="text-align:center;"> 0.305 </td> <td style="text-align:center;"> 0.077 </td> <td style="text-align:center;"> 0.272 </td> <td style="text-align:center;"> 0.182 </td> <td style="text-align:center;"> 0.667 </td> </tr> <tr> <td style="text-align:center;"> </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 0.163 </td> <td style="text-align:center;"> 0.269 </td> <td style="text-align:center;"> 0.208 </td> <td style="text-align:center;"> 0.308 </td> <td style="text-align:center;"> 0.201 </td> <td style="text-align:center;"> 0.481 </td> <td style="text-align:center;"> 0.206 </td> <td style="text-align:center;"> 0.555 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 0.148 </td> <td style="text-align:center;"> 0.256 </td> <td style="text-align:center;"> 0.137 </td> <td style="text-align:center;"> 0.195 </td> <td style="text-align:center;"> 0.119 </td> <td style="text-align:center;"> 0.381 </td> <td style="text-align:center;"> 0.164 </td> <td style="text-align:center;"> 0.774 </td> </tr> <tr> <td style="text-align:center;"> </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 0.185 </td> <td style="text-align:center;"> 0.293 </td> <td style="text-align:center;"> 0.186 </td> <td style="text-align:center;"> 0.244 </td> <td style="text-align:center;"> 0.192 </td> <td style="text-align:center;"> 0.522 </td> <td style="text-align:center;"> 0.219 </td> <td style="text-align:center;"> 0.600 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 0.408 </td> <td style="text-align:center;"> 0.448 </td> <td style="text-align:center;"> 0.315 </td> <td style="text-align:center;"> 0.737 </td> <td style="text-align:center;"> 0.193 </td> <td style="text-align:center;"> 0.785 </td> <td style="text-align:center;"> 0.330 </td> <td style="text-align:center;"> 1.290 </td> </tr> <tr> <td style="text-align:center;"> </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 0.304 </td> <td style="text-align:center;"> 0.721 </td> <td style="text-align:center;"> 0.443 </td> <td style="text-align:center;"> 0.493 </td> <td style="text-align:center;"> 0.354 </td> <td style="text-align:center;"> 0.569 </td> <td style="text-align:center;"> 0.439 </td> <td style="text-align:center;"> 0.872 </td> </tr> </tbody> </table> --- class:primary #Simulated Interpolation Results + A benefit of using a model-based approach is that are able to determine the uncertainty of the estimate. <table style="width:80%;"> <caption>Coverage</caption> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Spatial Model</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">ST Model</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Stationary</div></th> </tr> <tr> <th style="text-align:center;"> Simulation </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> <th style="text-align:center;"> X </th> <th style="text-align:center;"> Y </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 0.610 </td> <td style="text-align:center;"> 0.597 </td> <td style="text-align:center;"> 0.716 </td> <td style="text-align:center;"> 0.313 </td> <td style="text-align:center;"> 0.615 </td> <td style="text-align:center;"> 0.205 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 0.829 </td> <td style="text-align:center;"> 0.658 </td> <td style="text-align:center;"> 0.829 </td> <td style="text-align:center;"> 0.357 </td> <td style="text-align:center;"> 0.769 </td> <td style="text-align:center;"> 0.320 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0.579 </td> <td style="text-align:center;"> 0.697 </td> <td style="text-align:center;"> 0.817 </td> <td style="text-align:center;"> 0.507 </td> <td style="text-align:center;"> 0.654 </td> <td style="text-align:center;"> 0.256 </td> </tr> </tbody> </table> --- class:primary #Finding Spatio-Temporal Neighbors .center[ <!-- Trigger the Modal --> <img id='imgintexamp' src='images/int-examp.png' alt='Comparison of Our Results to a Kinematic Crack Algorithm' width='80%'> <!-- The Modal --> <div id='modalintexamp' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalintexamp'> <!-- Modal Caption (Image Text) --> <div id='captionintexamp' class='modal-caption'></div> </div> ] --- class:primary #Clustering Using All Data - Comparison + Can compare our results with deformation data found using a kinematic crack algorithm calculated using the RGPS data (Peterson and Sulsky, 2011) - Note that this image does not represent the true ice cracks, just the cracks determined by this method. .center[ <!-- Trigger the Modal --> <img id='imgallweekscomp' src='images/all-weeks-comp.png' alt='Comparison of Our Results to a Kinematic Crack Algorithm' width='80%'> <!-- The Modal --> <div id='modalallweekscomp' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalallweekscomp'> <!-- Modal Caption (Image Text) --> <div id='captionallweekscomp' class='modal-caption'></div> </div> ] --- class:primary #Ice Data Results: Clustering by Week .center[ <!-- Trigger the Modal --> <img id='imgclust_by_week' src='images/clust_by_week.png' alt='Results of Clustering by Week' width='90%'> <!-- The Modal --> <div id='modalclust_by_week' class='modal'> <!-- Modal Content (The Image) --> <img class='modal-content' id='imgmodalclust_by_week'> <!-- Modal Caption (Image Text) --> <div id='captionclust_by_week' class='modal-caption'></div> </div> ] --- class:primary # Existing Trajectory Clustering + Combination of geographic location with time introduces new challenges in clustering, where a cluster is now determined based on spatial and temporal similarity `\(\\\)` `\(\\\)` + Similarity Measures - One of the components of a clustering algorithm is to determine how to measure similarity - Similarity measures have been developed for spatio-temporal data, but many rely on having trajectories of the same length or are sensitive to noise + Density-Based Clustering - Objects that are densely packed in a region should be grouped together in a cluster - Can cluster objects into any shape and number of clusters do not need to be pre-defined - However, since our data is based on a grid, point density will be consistent across the domain + Model-Based Clustering - Can assume a model for each cluster, where the best fitting data for the model is found in order to determine cluster membership - Can be difficulty in finding the assume model (Info from (Ansari, Ahmad, Khan, et al., 2020)) --- class:primary # Exisiting Spatio-Temporal Interpolation + Trajectory Interpolation - Linear - Kinematic - Curved - Constrained Random Walk + Dynamic Spatio-Temporal Interpolation - Reconstruct spatio-temporal dynamics during interpolation - Mostly assume stationarity which is often inappropriate - Example: Optimal Interpolation models the covariance function of the spatio-temporal field dynamics. Then a linear combination of the observations results in an interpolated field, assuming stationarity (Fablet, Huynh Viet, and Lguensat, 2017; Ouala, Fablet, Herzet, Chapron, Pascual, Collard, and Gaultier, 2018)