Thankyou so much for your response, I've got a few questions from your explanation
Every oscillator can spend this energy "budget" in a mode that has a particular frequency or wavelength, and there are no excluded values - given that on average they all have kT energy ie. all oscillators have the same energy, what causes them to spend their "budget" on different frequencies?
energy density tends to go to infinity at short wavelengths because there is an infinite number of frequencies at short wavelengths -confused with this part, can't we same the same for any point on the graph eg. theres an infinite amount of frequencies for long wavelengths?
		
		
	 
Good questions - this shows you are really engaging.
(i) Something I didn't mention before it is extremely unlikely that all the oscillators will have exactly the average energy kT. Entropy will make sure that this doesn't happen. There is a spread of energies or a distribution function for these energies. This distribution function was figured out by our old friend, Ludwig Boltzmann way back in 1868. This creates a "natural" spread to their energies.
(ii) No, there is not an infinite amount of frequencies for long wavelengths. I will borrow the model used by Lord Rayleigh. Consider a box or cavity that is at thermal equilibrium confining the radiant energy, and the waves form standing waves when bouncing backwards and forwards between the ends of the box. The shortest standing wave that will fit between the ends of the  box is half λ equal to the length of the box. This fixes a lower limit to the frequency of the radiation. The next standing wave is when λ is equal to the length of the box, then 3/2 λ and so on, however there is no upper limit to the standing waves that can fit between the ends of the box. You can keep increasing the frequency to infinity and they can still form standing waves between the ends of the box. So there is a lower limit to the allowed frequency but no upper limit to the allowed frequency.
As for the shape of the Intensity distribution curve, if you start by plotting intensity versus frequency, and then you transform the horizontal axis by the reciprocal function (inverse transform function λ = c/f) to turn it into intensity versus wavelength, this distorts the horizontal axis, it stretches out the intensity for long wavelengths (low frequencies) and squeezes the function for short wavelengths (high frequencies) so you get this concentration of the intensity towards short wavelengths. Now, we know in the real world this doesn't happen, so there must be another mechanism that cuts off the intensity sharply when hf > kT , and this was Max Planck's hypothesis. He then had to empirically work out what the constant "h" needed to be to fit the data, and he came up with h=6 x 10
-34 Joule.seconds