The heaviest work is not on the programming side.OpenAl (I believe that's what is used) is already initialised in the client, and sounds generated on the fly (for example when you cast a spell). Therefore, all what is needed is, somewhere in the movement process, to test for a given material flag (at about the same time the collision test is made to see if you're walking on something or not). According to that flag, and the character speed, you generate a sound (designed to be synchronised). Well, even better if the sound also takes into account what kind of shoes you are wearing, if any.
You do have to check if the client generates the sound then sends the info to the server, or if the server makes the choice then tells the client to generate the sound. In the latter case, you may not be able to achieve anything. To find out which classes and methods are used, I would inspect the code that deals with spellcasting; your sound code would be pretty similar.
The problem is, to be effective, the entire world has to be map-flagged with a sound property. That's quite some work for the setting team.
Well, in the mean time, you can always prepare your client for generic sounds, generated on any floor; but you'll end up hearing only your own footsteps if the server is unaware of it.