Geoni:
If your avatar stands closely against a wall, all it can see is a part of this one wall - which is a textured flat described by only 4 edge points (vertices) in "ideal" case; now, if the 3D engine is smart enough, it is able to calculate that it only needs to send the attributes of this one flat to the graphic card to get it rendered: The 4 coordinates, the texture, and the lights relevant to its appearance. This calculation is done based on the avatar camera's "frustum", the "pyramid slice" which contains everything visible to the camera (so e.g. anything behind the camera won't be visible anyway, as well as anything outside the viewing angle), with a minimum and maximum distance.
If your avatar is located in a narrow alley, in addition to anything outside the frustum, much of the whole level is also not visible because close standing houses will block the sight already in an early distance. Now what is faster: Sending nearly the whole city to the graphic card and relying on its internal features of rendering them in the right order (e.g. Z-buffer culling), or sorting all objects by location and distance, calculating their visibility in software, and sending only at least possibly visible objects to the graphic card? Is it the sorting and calculation with the CPU or the bus bandwidth of the graphic card which takes more time? - Hard to answer, because it depends on the CPU and GPU model. Some 3D engines are able to "partition" the level automatically, some need a bit support by the level designer. I am not sure what CrystalSpace3D is able to do here, and what PlaneShift developers would need to enable and use to profit from it...
If your avatar is now looking across the plaza, its camera can see most of the houses in Hydlaa at once. Not much close to it could hide most of them. Even though they are smaller and in the background, they still have to be sorted in the right order to be drawn without flickering and peeping-through distant objects. The more objects you can see simultaneously, the slower the rendering gets.
And when the whole scene is rendered several times, overlaying lighting and surface effects, then this difference multiplies.
I have a feeling that city levels are rendered more often than really necessary per frame, but I could not prove it, it is just a guess -- mostly based on the still flickering shadowmap changes (as if there are still concurring shadowmaps).