Hey all, I have been stuck for a while attempting to figure out how to prep data correctly to run a robust regression model using R(studio, if that matters), but Matlab and Spyder are at my disposal as well. Behavioral performance data was collected automatically from the simulation system used for the experiment–this was a UAV flight simulator that collects scan polygon coordinates, which are later parsed according to scan angle, zoom level, and polygon size (since polygons can be larger than would be reasonable for identifying ground targets effectively). These polygons are further processed to provide summaries of subjects’ performance for each area scanning task and target identification: (%) scanned, (%) not-scanned and (%) over-scan.
Since these are overall performance calculations I currently have most of the data split between a pre-processed raw data table, and one with summarized performance stats. Therefore the current arrangement of data follows:
[ID] [Task #] [Zoom level] [Area of Scan Polygon] [Target in FOV (binary)] [Lighting Condition (scale)], with ~137k rows
[ID] [Task #] [Time] [Duration] [Scanned] [Not-scanned] [Over-scan], with 150 rows
We plan to perform non-linear regression model (due to the relatively bimodal nature of the polygon area data) to see if lighting conditions, polygon size and task (#) predict their scan performance metrics. I have a few questions I have been stuck on:
- Is a robust regression appropriate for use in this situation?
- Must I consider the distribution (and bimodal nature) of the polygon area data to choose a more appropriate regression model (e.g., beta, gamma, etc.)?
- How might I re-frame or merge data tables, given that there is processed raw and summary score data, to be able to perform a regression from a single table? Or is it possible to keep the tables separate and run a regression with variables from both (e.g., data=list(table1,table2))?