Should You Trust the ratings of MyAnimeList?

The rating system on Myanimelist is biased towards mediocre anime. For those who don’t know Myanimelist is a database type website that holds information on as far as I can tell every anime that’s been aired in the past decade or so. MyAnimeList or more conveniently called MAL also contains millions of users submitted reviews in both a numerical unit on a scale of 1 to 10 and also a written form. MAL then ranked shows based on a formula which is obtained from the user review data about any particular show. The system works well and for what it’s worth, the site functions as it should. The best shows are ranked highly and the worst shows are ranked poorly. However, a bit of discrepancy needed when looking at the rankings of mediocre anime and that’s because of something I’m going to coin as the two-dimensional bias or a 2DB for short. 2DB is a massive issue that’s affecting the majority of anime specifically, mediocre and phenomenal anime. Now in an attempt to make this more digestible, I’m going to break this down into a few parts.

1. The Data behind 2DB

I’m probably a slightly more eccentric anime viewer. But I’m not into the whole anime lifestyle thing. I specifically don’t like the idea of anime outside of the internet and as such I’m drawn more to the Internet in order to get my anime fix and one way that I accomplished that is by quite simply browsing Myanimelist looking for the most ridiculous or deranged anime reviews and summaries. Now in my constant quest for the abnormal, I started noticing that the vast majority of these MAL scores were somewhere in the 6 to 8 range on a scale of 1 to 10, which seemed a bit strange considering that I had actually watched a relatively sizable portion of these shows and a lot of them were just okay in my personal “opinion”. Now that’s a word that we really have to be careful about as it can be used as a catch-all debunking explanation of this issue so I’m going to address it right now. The following data is not my “opinion”, however, it is data on gathered “opinions”, and this means that it is not subjective. However, it is biased and that’s my point.

The MAL rating data is biased. After first making this discovery of the 6 to 8 range majority all I had to do was a quick Google search for MAL ranking distributions and I came across a website called Anime.plus. Anime.plus is typically used for the individual analysis of a profile from MyAnimeList. However, hidden down at the bottom is a global statistics tab and that’s where 2DB starts to take shape. Out of around 13628 active users and over 14,000 anime entries gathered, the mean rating for anime was 6.95. Almost the whole two points higher than what the actual mean should be 5 or an average anime as MAL suggests. And the manga was even higher at 7.07.

Now, this data doesn’t fully confirm my theory as the 2DB’s existence but does make a strong suggestion that the case is possible. In the interest of staying as fair as I can, I’ll address a possible concern with the data and that’s why does the mean has to be 5? Well, that’s a good question and it all has to do with our perception of data sets. Specifically, what we believe constitutes the middle of a range. When we look at a range of numbers it’s pretty easy to define what the middle is, and once we identify that midpoint then we have our benchmark. With our benchmark, we can then make judgments on things that are placed inside the range and the way that it typically works is everything above that middle benchmark is good and everything below it is considered bad. The issue is when that benchmark is no longer the middle of the range but people don’t recognize that it’s changed. This leads to the detrimental rating system on MAL. Because of this mediocre anime seems to be much better than it actually is and it hurts actual criticism of anime.

2. The slice-of-life sample

Now, that we’ve seen the data behind 2DB. Let’s take a closer look at its influence on a small group of slice-of-life shows. The show’s I’m going to be talking about our Ichigo Machimaro, Mitsudomoe, Mitsuboshi Colors, and Yuro Yuri.

I’ve chosen these shows to focus on because they’re all relatively the same a group of cute girls doing cute things while living their lives and the ratings is all within 0.23 of each other at the time of recording. All four of these shows are cliché everyday comedies that hold no real purpose besides an occasional smile and a little bit of entertainment. Their art styles are nothing special, the voice acting is fine, the music is good, and the episodic progressions are average. These shows are overall average and that’s the peculiar bit. Why are these shows rated so highly if they’re just indeed average? Simple, as we saw previously the mean rating is actually about a 7. And with nothing inherently wrong with any of these shows a rating of slightly, barely above average. The lowest being .5 above the mean rating is perfectly acceptable. However, these four shows are still rated over 7.5 and to be honest they don’t deserve it. What I’m saying is this 2 DB is causing a discrepancy between the numerical rating of a show and what the actual rating of a show should be based on the standard mean average of 5. This is a large problem because it lessens our trust in the rating system.

3. Connections

So, now that we know what 2DB is. A rating bias that affects the community’s perception of anime and manga, we can look outside of anime which is where a similar kind of bias occurs. An article on FiveThirtyEight, written by Walt Hickey, uncovers the lopsided rating curves of the movie review site Fandango. The article is very well written and helped me connect a lot of bits and pieces together in order to establish a 2DB theory so give it a read if you get the chance. But to briefly summarize this article, the writer finds out that Fandango reviews are unevenly distributed towards the top end similar to MAL’s rating system. However, 2DB is causing a similar issue on MAL with inflated review numbers. These numbers are influencing the perception of these shows that are affected by them.

As shown previously the average anime is now considered very good and very good anime is crammed in the small section between 8 and 10. Again similar to the Fandango rating system, MAL is essentially functioning on a scale of 7 to 10 for most every anime.

4. Biased Beginnings

The anime crowd is a very peculiar set of people; very invested in a very adamant about their medium of choice. I believe that this 2DB problem started with the community’s view of anime as a whole. For some anime is held on this pedestal above all other forms of entertainment. It’s seen as the end-all-be-all method of storytelling by an extremely dedicated fan base. Thanks to the invention of the Internet, this fan base can spread their opinions out of rapid pace. My theory is that when people first started reviewing anime on sites like MAL, they rated them against Western media as a whole. What I want to tell is that reviewers didn’t see anime as another form of content that they would rate against other anime they saw it as this holy grail of media. Thus inflating the review numbers and setting off the initial two-dimensional bias.

People tended to rate anime more highly because it’s Japanese or because it’s JDM or whatever it is. For some reason we let this community be controlled and altered by the small group of absolutely batch bit crazy fans who hype up anime for the sole reason that it’s anime.

5. Fixing the problem

There are two ways to solve this 2DB issue. The first is for MAL to recalibrate their ranking data so that the mean falls at average again. Or as a community, we acknowledge that this problem is here and we make decisions off of the new average of 6.95. The former being the most labor-intensive as I’m sure that restructuring a massive amount of data is probably a little time consuming and difficult which is why I personally lean towards the solution of acknowledgment.

In order for 2DB to be fixed, we have to agree upon this new average and thanks to the data we have that’s pretty easy given how the mean score is now 6.96. But it does add complications to figuring out what future ratings of the show should be. We also understand that just by making this theories and writing this post we would probably influence a few people including myself and to changing how we numerically rate shows we watch, which could possibly make it difficult to create a benchmark average as some will continue to rate shows as they have been and others will change to a less biased rating method going against other anime. The problem with 2DB is that people are perceiving anime as better than it really is and that inflation of this review score essentially reseats every anime on the rating scale as the mean has shifted from the standard 5 to the escalated 6.95

Conclusion

So, this is where I am stuck at though. I can’t think of a feasible method of solving this problem besides the two stated previously. Apart of me thinks that people will never stop manipulating their ratings of anime in an attempt to outperform Western media and somehow gain confirmation of anime. But another part of me doesn’t want to give in on this problem as it needs to be addressed because it could possibly lead to the manipulation of the rating systems in the future, like in fandango’s case. The problem of 2DB is still unchanged and this problem is a serious hurdle for the anime community to get over. It’s a major problem of trust with the community-oriented website that we’ve built for years. We can’t keep inflating our views in order to pad the anime ego we have. We can’t keep inflating our reviews in order to say that anime is inherently better than Western media. Most shows that come out aren’t that good, I mean they’re just okay at best. In order for this user rating system that’s been data gathered over the years to be trusted, we need to have it be as accurate and fair as possible. But in the current state of these MAL ratings, we perceived them as being significantly better than they actually are.

1. The Data behind 2DB

2. The slice-of-life sample

3. Connections

4. Biased Beginnings

5. Fixing the problem

Leave a Comment Cancel reply