Am I using the right test (Mann–Whitney U test)?


I did a survey to find out if people can distinguish human-drawn images and computer-generated images (I used 30 images for both categories). I showed every participant one image at a time and the participant could chose “I think this image is computer-generated” on a 5-point likert scale.

My theory/thesis: computer-generated images and human-drawn images cannot be distinguished.

To test this, I think I should calculate the correct identification (the human-drawn image was consideres as human-drawn / the compute-generated image was considered as computer-generated image) per image. To do this, I encoded the 5-point likert scale to numerical values, i.e. strongly disagree is 0, disagree is 1, neutral is 2, agree is 3 and strongly agree is 4. To calculate the correct identification I can calculate per participant the difference between the expected value (4 for computer-generated images and 0 for human-drawn images) and average them per image.

Then I have 30 (averaged) values for guessing correctly per category (human-drawn images and computer-generated images). Now I would like to apply the Mann–Whitney U test to check if there are more correct guesses for the computer-generated images than for the human-drawn images (i.e. because the computer-generated image is too simple). This would indicate, that participants were able to distinguish the two types of images.

Am I doing this right or am I missing something?

Sorry for the wall of text 🙂

/e: Maybe to tell you why I think this might be wrong: Even if there was just one participant I could apply this test and could get significant results. This irritates me, since I think that the amount of participants should influence the significance and not the amount of images

submitted by /u/iamstillsleeping
[link] [comments]

Published by

Nevin Manimala

Nevin Manimala is interested in blogging and finding new blogs

Leave a Reply

Your email address will not be published. Required fields are marked *