For my playlist controller/generator application, I wanted to have a way to produce an accurate list length to reflect about one day’s worth of songs. I figured that if I knew how long every track was in my library, I should be able to get a good estimate as to how many tracks I could potentially play in a single day. From that, I then can figure out how many tracks to load into the playlist. So, first I need to have the lengths, in seconds (and fractions of seconds) for each and every track in the library.
I have my library partitioned into different categories, one category per track. I did not want to use the genre for this, as the genre could be something that does not reflect my own personal preference for categorizing my music. I have the categories organized in my settings file so that they are listed in order of my preference. This is done in a two-dimensional list configuration in python. The placement order in this list determines the “weight” of a particular category, which effectively determines how much more or less I’d like to hear songs in that category.
I divide up the tracks based on category, and take the geometric mean of the lengths of each track in each category. For those who are unfamiliar with a geometric mean, it is another kind of average that can be taken. The more common arithmetic mean is you sum up the values, then divide the sum by the number of values present. The geometric mean is taken by multiplying all of the values, then take the Nth root of the product, where N is the total number of values. So, if you have three values, 3, 2, 4, the arithmetic mean of these would be (3 + 2 + 4) / 3 = 3. Th geometric mean would be the cube root of (3 * 2 * 4), or the cube root of 24, which would be 2.88… This kind of average tends to predict lower than the arithmetic mean, which in this particular scenario would be like assuming what might be possible if a lot of smaller-length tracks were played within a single day.
Unfortunately, the calculation for geometric means results in insanely huge numbers due to multiplication, and actually can end up going out of bounds for floating-point numbers due to the size of the number that is calculated. So, there is a trick that can be done to avoid this. The trick is to use the arithmetic mean, and the harmonic mean. The harmonic mean is summing the reciprocals of the values, and then multiplying that by the total number of values (in the above example, you would do 3 * (1/3 + 1/4 + 1/2) = 2.77…). then you do the geometric mean of the arithmetic and harmonic means (square root of 3 * 2.77… = 2.88…).
Because I process favorites vs non-favorites differently, I run this geometric mean on all tracks in a category, and then just the favorites in that category, and apply the weights I’ve defined in my settings file for favorites or normal tracks to the results, and sum the two together to get the adjusted geometric mean for the whole category.
I then do a weighted average for the adjusted geometric mean for each category, based on the weights I’ve calculated earlier. A weighted average is to multiply the category’s geometric mean by the category’s weight, and summing that with the product of each category, and dividing it by the sum of the weights. The end result is a very good estimate for the average length, in seconds, for each track. I divide the number of seconds in a day by this average length, and I get the total number of tracks per day, which will be the basis for the number of tracks I load into the playlist. The end result is that different tracks impact this calculation in different weights based on the category and whether it is a favorite track or not. My favorites in my favorite category have a bigger impact than a non-favorite track in my least favorite category. Smaller-length tracks have a bigger impact on this number than longer tracks.