
INTRO
The macroeconomic environment of the game industry is complex, with many genres, platforms, monetization models, investment strategies, and cultural contexts at play.
And it has boom and bust cycles.
I’ve recently theorized that overproduction of studios, and hence games, could be a driver of that boom and bust cycle. The idea seems logical, but anecdotes and personal experience are not data, and a simple operational model is not proof.
Given that I’d like to verify or falsify my theory, and that STEAM® has a bunch of publicly available APIs, I thought I’d have a look.
So, I went digging. An archaeological journey, if you will, down through 20+ years of STEAM® data. Note that this article doesn’t provide verification or falsification of the theory. Rather, it shares preliminary statistics, highlights, and observations from the investigative journey so far. The train has not reached the destination.
APPLICATION TYPES
Applications are assigned a type. There are 11 unique application types at the time of this writing that I can be classed into three groups.
game, dlc, demo, music, mod — These application types are still being created as of June of 2024.
movie, episode, video, advertising, series — These application types were created in the past but were discontinued at least one and as many as 12 years ago.
hardware — There are only a handful of these (OK, if you have 7 fingers) and they are all Valve products.
Another grouping is applications for which there are no data supplied by the API. These I’ve labeled as either [invalid] (API returned “success”:false) or [unknown] (calls to the API that returned 200 OK but with an empty content response).
Here is a graphic representation of all 200K+ application types available as of 06/23/2024.
The table below represents those numbers as percentages of all known application IDs.
Type Percent
game 55.593
dlc 22.748
[invalid] 8.299
demo 7.561
music 3.285
episode 1.118
movie 0.850
video 0.343
advertising 0.105
mod 0.051
series 0.041
hardware 0.003
[unknown] 0.002
APPLICATION RELEASE DATES
The store data includes a release date in a variety of formats that were cleaned and standardized.
Some applications do not supply any information at all for release date, being entirely absent, supplying an empty string, or unusable strings like ‘default’.
There are applications with release date values from before September 2003, which is when STEAM® started carrying non-Valve titles. I presume some publishers/teams who first launched games without STEAM® (e.g., before the service existed) used that original release date.
There are applications with release dates far into the future. Decades, and even thousands of years. ;)
Some applications have release dates marked by quarter. I have interpreted these to land on the last day of that quarter.
APPLICATIONS WITH NO RELEASE DATE
There were over 20,000 applications without a release date that were either invalid or unknown. The vast majority, nearly 18,000, were of the type game. The vast majority of these are marked as ‘coming soon’. Only 88 games had neither a release date nor marked as ‘coming soon’.
DISCONTINUED APPLICATION TYPES
These application types occurred over various ranges of time as indicated in this graph but are no longer being published. TBH, I hadn’t noticed that STEAM® tried competing with streaming services by offering movies and TV series for a while. You can find a blurb about that here.
ACTIVE APPLICATION TYPES
The four most numerous application types with publication dates between 2003 and May 31st of 2024 are game, dlc, demo, and music. There are still mods being published, but the rate of those is very low so I put it on a distinct graph. Also, because zero doesn’t reasonably map onto a logarithmic axis. 😜
The data in the following graphs only extends through May 31st, 2024. The apparent “dip” of volume in 2024 is, at a minimum, caused by an incomplete year of data.
More than 6,000 games are marked with a release date in 2024 after May 31st.
Here is the promised mod application data. Not many games are doing mods, relatively speaking. User generated content games are popular (Roblox, Minecraft, Fortnite), but they don’t tend to use STEAM® to distribute their game frameworks, editors, or content.
The vast majority of mods are one-offs. Out of the 92 mods released on STEAM® between 2003 and May 31st of 2024, only a handful of publishers or developers have created more than one. Some developers are also listed as the publisher and contribute to counts in both columns in the table below.
The most active mod publisher is CMI at 5 mods, followed by Tripwire Interactive at 4 mods. The most active developer is Relic Entertainment at 4 mods.
There are only 13 unique entities in the union of developer and publisher names associated with more than one mod. Those 13 entities account for 25 of the 92 available mods.
Total mods Publishers Developers
5 1 0
4 1 1
3 2 3
2 5 5
OUTRO
I hope you find these tidbits as interesting as I do. Summarizing the journey thus far:
STEAM store JSON data is structured with medium depth.
STEAM store data is not super clean, probably because standard engineering philosophy is to be conservative in what you generate and relaxed in what you accept. Things rendered in a web page can have various forms.
There are four primary applications being published on STEAM: games, dlc, demos, and music. Mods are also still published but at a very slow rate.
As of June of 2024 the entirety of the store data for games (excluding all the other application types) can be stored in a 530 Mb parquet file, including all the text descriptions.
This article lays the groundwork for further investigation, specifically, looking at the rate of game publishing on STEAM in more detail.
For now, there is still more digging to do. I plan to share further discoveries in a subsequent article that will focus on game publication rates because more games, DLCs, and demos are released on STEAM® every year.
One last thing: the game count numbers published by SteamDB and SteamSpy do not agree. But that is for next time. 😜
METHODOLOGY
After looking around some, I built web API automation using Python and various database technologies to download, clean, and store publicly provided data from Valve’s STEAM® APIs. My time was spent on encountering and cleaning data format inconsistencies, figuring out how to get a clean schema for parquet files, and also, well, waiting on 200K+ downloads (one per unique STEAM® application ID). Downloads are rate limited, so, yeah.
The rest was applying various queries to the data, exploring, hypothesizing, learning the syntax for querying structured parquet files, and creating graphs.
MISCELLANEA
During the course of this work I found some documentation and examples to follow, and also deduced things from the data and behaviors of the API. Here’s a list of some of those things.
All applications on STEAM® are assigned a unique application ID. In the data this is usually labeled appid, or steam_appid.
A public API exists that provides that list of IDs together with Unicode name strings of the application. The list is probably not a complete historical record, as entries on the list do get removed.
Another public API exists to download STEAM® store data using that unique application identifier. The data is in JSON format and has a medium deep nested structure. E.g., lists of objects.
I could not find a published schema definition for the JSON format so I constructed one by surveying all the files. This lead to discovering anomalies, like empty strings or empty arrays as indicators of ‘null’, or data not available. Also, release date formats are somewhat…diversified in format.
On 06/23/2024 there were 204,038 unique IDs in my accumulated database. I do not have a historical record of IDs removed before my first call to the API from several weeks before that date.
New application ID’s are continually added to the list at about 160 per day in the month of June, 2024.
Unique identifiers are continually removed from the list at about 20 per day in the month of June, 2024.
Some application store data is not available, in that requests for the data provides a { “success” : false } result. Internet commentary implies that these objects don’t have store data but may be associated with a different STEAM® application id. For example, the “ultimate edition” of the game.
It is also possible that no data is available because the proposed application was discontinued/deleted from the service. See above.
A handful of applications returned a 200 OK response with an empty document. These do appear in the store, so that anomalous behavior is likely to be a backend server issue.