the Data

The Data

Here are the datasets (available below and on Kaggle, from my League of Legends Short Story Google Sheet):

lol_ss_main_dataset
lol_ss_dataset_writers
lol_champions_by_region
Lol_champions_sstory_spread

lol_ss_main_dataset.csv
File Size:	38 kb
File Type:	csv

Download File

lol_ss_dataset_writers.csv
File Size:	39 kb
File Type:	csv

Download File

lol_champions_by_region.csv
File Size:	24 kb
File Type:	csv

Download File

lol_champions_sstory_spread.csv
File Size:	13 kb
File Type:	csv

Download File

Here’s what the lol_ss_main_dataset looks like-

Here's a primer on the data fields for the available datasets:

"lol_ss_main_dataset” -
title: these are the titles of the short stories.
region: The geographical/political regions that are spread across the landmasses of the planet. If the region is “Runeterra” then there was no specific (or identifiable) setting for the short story. There are also several Alternative Universe (AU) settings for some of the stories and take place outside of normal lore.
writer: the author of the story
word_count: how many words are in the story
min_to_read: how long (based on a read speed of 250 words-per-minute) it would take to read the story
pov: the point of view of the story (1st, 2nd, 3rd, or other narrative forms)
tense: the tense structure of the story (present, past, a mix of the two, or n/a based off the pov)
publish_date: the date the story was published (on the website or the publish date of the book.)
day_of_week: the day of the week the story was published.
month: the month the story was published.
day: the day of the month the story was published.
year: the year the story was published.
story_link: the link to the story.

I pulled some of the data from here <https://universe.leagueoflegends.com/en_US/explore/short-stories/a-z/> to get the list of stories, as well as the starring Champion and date of publication.
I filled in any missing data from here <https://leagueoflegends.fandom.com/wiki/Category:Short_Story>

“lol_ss_dataset_writers” - the fields are the same as the “lol_ss_main_dataset”

Since there were two stories with multiple authors, I listed those stories multiple times to accommodate for their inclusion as well as the (best as I could manage) word count inclusion for the writers. Because of that, the number of titles and regions attributed to them are slightly off. In my analysis I have accounted for that, but if you only used the “lol_ss_main_dataset”, you might notice a slight disparity in numbers.

“lol_ champions_by_region” -
champion: the playable characters in the League of Legends game and also the focal point for lore development. They are the characters on Runeterra that have the greatest ability to affect change and push the story forward.
region:The geographical/political regions to which each Champion is aligned/most strongly connected to. If the region is “Runeterra” then the champion has no connection to a specific country or region.
release_date: the date the champion was released into gameplay.
class: The martial role the champion fills in gameplay/on the battlefield.
Sub-class: a more focused category within class that further specifies the way the champion can manage attacks, healing, and other actions in game play.
Class-2nd: some champions can also fill the role of an additional class.
Sub-class-2nd: some champions can also fill the role of an additional sub-class.
champion_with_title   title_only:
Link_to_champion: the link to the Universe champion page.

This data set lets you see which Champions are matched to which regions as well as their classes and subclasses. I do not plan to do any extensive analysis of how those affect game play, but am including them because seeing the classes helps give an understanding to the martial focus of the character. While lore has mostly been crafted to meet Champion ability and play style, some have been reworked, and I believe how they would go into battle and what role they serve has some bearing on personality. I pulled some of the data from here <https://leagueoflegends.fandom.com/wiki/List_of_champions> when I was wanting to get a simple list of champions. Since the classes came with it, I kept it in the dataset.
I got the region list for champions here <https://universe.leagueoflegends.com/en_US/champions/> and sorted by Region.

“Lol_champions_sstory_spread” -
title: these are the titles of the short stories.
Champ_star_a: a champion that has an active role in the story (though not necessarily from their point of view.)
Champ_star_b: an additional champion that also has an active role in the story The same goes for c-j below.
Champ_star_c → champ_star_j
champ_ment_a: a champion that is either directly mentioned or in some way referenced in the story.
Champ_ment_b: another champion that is either directly mentioned or in some way referenced in the story. The same goes for c-i below.
Champ_ment_c → champ_ment_i
- This dataset lets you see which stories are about or include which champions.

Source of the data:

The primary source for this information is Riot Games - League of Legends, Universe website - https://universe.leagueoflegends.com/en_US/
Supplemental information was also pulled from Fandom's League of Legends Wiki - https://leagueoflegends.fandom.com/wiki/League_of_Legends_Wiki
Also included, the short stories from the book - League of Legends: Realms of Runeterra (Official Companion) Lawrence, Ariel; Head of Narrative, Riot Games; Voracious, New York, 2019.

Collection Methods:

The main fields (title, region, writer, date_published, and link) were pulled straight from lists on the universe webpage and put into a Google Sheet (with split-text-to-columns helping clean it up.)
I looked at the JSON (index.json) for each short story's webpage to get the word count and minutes-to-read as well as confirm writer and publish date. (Some stories remained without a publish date.)
To help fill in the blanks I also cross referenced the short stories to the Fandom Wiki for League of Legends (mostly for the writers/authors and region.)
For the pov (1st, 2nd, 3rd, mix,...) and tense (simplified to past/present/mix/n/a) I determined these by looking at and evaluating the stories myself - and so recognize these have the greatest chance for error. (If you have an alternate suggestion, please let me know!)
I have limited this dataset to the short stories Riot has published on their Universe website and the ones in their League of Legends: Realms of Runeterra (Official Companion) book. To get word counts from the book, I scanned them with the CamScanner android app using their to-text feature. I then put that into a Google Doc, cleaned the errors in text translation and pulled up the word count for the doc. (These have not and will not be shared, as they are from published material. I only took these steps to get the word count.)

Processing and Cleaning

As with any data pulled from the internet, I had a lot of sorting, splitting-text-to-columns and formatting to even it all out. Trimming white space and making sure all of the names were spelled exactly the same (and when you have apostrophes like in Bel’Veth, Kai’Sa, or Vel’Koz depending on the program, not all “ ‘ “ count as exact matches.) I had to seek out more data from the Fandom Wiki site as not all of the writers are listed with their short stories on the official Riot Universe site. I had to do the same for the region associated with the stories. Since the Fandom Wiki is a secondary source, I did check it against what the primary site did list and found it to be a good match. For the datasets I am making available for download I made sure the field names were consistent and friendly for importing to RStudio or other IDEs, Tableau, and Kaggle.

The Data

​Source of the data:

​Collection Methods:

Processing and Cleaning

Source of the data:

Collection Methods: