It has the potential to be a tremendous improvement and/or major detraction from it depending on its use obviously. For one, adding a voice, with accent and emotional tones adds a much stronger dimension to conversation than plain text. Although reading text is by no means boring or unattractive, I would say that after a fair share of reading text one notices the many spelling errors, grammar mistakes that really divert from its meaning. Without taking a survey, my general impression is that a vast majority are horrible spellers and/or type terribly. The effort is there for most, while some its just lazy typing. Here's a good intention gone awry: "How fart tho um' lady?" The meaning can be recognized but I really think about flatulance for a few seconds and subsequently become distracted.
However, I think it has to be mixed in with text to some degree because the actions of your characters isn't best described in first person. I'd much prefer reading text that says, "Shalmanaser laughs as he continues kill stealing" as opposed to (in voice): "Muahaha, I'm kill stealing." The latter is more direct but far more blunt and less eloquent.
Of course, when too many people are in a conversation, its terribly difficult to get a word in edgewise without interrupting each other. Hence I think it can make for good roleplay with few people involved, or several without a dominant personality. Delays/net traffic could also take away from the experience. I think we all have tried paltalk and skype or yahoo voice chat to know enough about the limits in use and functionality about such programs.
Its worth trying, and if used in the right setting could make for some great story telling. If integrated somehow, where one can just hear conversations nearby, attenuated by distance, walls, etc... could provide an atmosphere unlike any other game. But that's another challenge to itself, and I'm guessing 3rd party software will be a solution for a long time to come unless there was some type of development in the works. I'm sure its been thought of and shot down... don't know the programming requirements for such a function. But imagine as you see a large group of people just gathered around, instead of just hearing silence, you hear their voices as you run past. It adds to realism me thinks.
I'm sure there are more disadvantages, but its late and i need to go to bed. /me clicks post.