When optimizing OSG applications most of the task is really about
optimizing the scene graph to address bottlenecks. You've taken the
first step in look at high level cull, draw dispatch (the OSG's draw
traversal dispatch data into the OpenGL FIFO) and draw GPU stats, and
listing the various scene graph stats that cause the performance
The next step is working out how to address the performance issues,
the first step in this is to look at which element of the frame is
longest and start there as this is likely the most fruitful place to
achieve gains. In your case I expect each of the high cull, draw
dispatch and draw GPU are all linked to how fine grained the scene
graph is set up - you have large number of scene graph level objects
relative to the number vertices and primitives being dispatched.
The most effective way to improve performance will be merge
osg::Geometry and with it simplify the scene graph structure above it.
You can only merge osg::Geometry that share the same state so this may
require some adjustment to how you manage the state of objects.
Moving state changes from osg::StateSet's/osg::Material into the
osg::Geometry as per vertex data/per PrimitiveSet data may the be the
solution, it might be that shaders might be need to help with this.
For picking, if you merge a large number of osg::Geometry then CPU
based intersection testing may end up being slower, to resolve this
you can build osg::KdTree for the osg::Geometry and then the
IntersectionVisitor will be able to use the KdTree to speed up
If you want to pick patches then you may want to keep each patch in
it's own osg::PrimitiveSet rather than merging these. However,
merging osg::PrimitiveSet is something that will help performance,
just like merging osg::Geometry will do. For modern graphics hardware
it's typically best to just use a single osg::DrawElementsUShort/UInt
with GL_TRIANGLES mode for all the triangles in an osg::Geometry
rather than using triangle strips. So if you are wanting to pick and
edit patches that you once managed at the osg::Geometry level, then
you could merge all the triangles in the original osg::Geometry into a
single osg::DrawElementUShort/UInt and then merge the osg::Geometry
data whilst not further merging any of the primitives sets.
The osgUtil::Optimizer class has a collected of visitors that can help
with merging state and geometries, and the osgUtil::MeshOptimizers can
also help with generating efficient meshes.
On 2 May 2018 at 01:31, Oran Wallace <> wrote:
Been using OSG for a while and have learned a lot and enjoyed it. I
currently have an application with uses OSG and Qt for displaying a highly
detailed model. A database is loaded which may cause colors in the model to
change or additional geometries to be generated.
My OSG viewport is a class which subclasses from QWidget and
osgViewer::CompisiteViewer and is embedded into a QMainWindow. This works
fine so I've stuck with it. The application can perform most CAD-like
operations on the model with OSG.
I've finally encountered a model which brings my application to around 1-2
fps while interacting with it. I know there are various techniques used to
help with performance but also understand the approach depends on the
situation. I'm currently considering a major rewrite.
1. 540k vertices
2. 81000 drawables
3. 77000 sorted drawables
4. 81000 fast drawables
5. 81000 primitive sets
6. 100000 triangles
7. 37000 quad
8. 181000 polygon ( probably hurting a lot)
9. 26000 unique state sets
10. 73000 instance state sets
11. 28000 groups
Cull: 113.50, Draw: 390, GPU: 355 (never seen values this high on the
Additionally the graph doesn't seem to rendering correctly; the largest
"bar" is Cull and last ~10% is the Draw, which doesnt seem to match the
values. (Probably because a single frame takes so long?)
1. I am orthographic project as it is a "CAD" style application.
2. Currently all objects are within Geode->Geometry nodes, each with
their own vertex, color, and normal arrays.
3. The main "mode" of the application is a "ghosted" mode which is
applied to the walls of the model. Each wall has an osg::Material set to it
and blending turned on.
4. All objects are pickable and hold a variety of property data
(extracted from the models file or the database). I have implemented
"picking" using the PickHandler::pick example and code.
5. A user often views the whole model from afar interacting with other
Qt widgets and watch how the scene changes, occasionally they will zoom in
on a section but are never "inside" the model. I can clearly see when view
frustum culling is working when zooming in.
6. My graph is VERY flat. Nearly all object are attached to the root
node (objects like cubes, cylinders, and meshes), when the rooms form a
closed space I create a group node whos children are the walls. (I often
rearranged it but never saw an performance changes from this)
7. Anything else you need to know...
Fairly certain my main problem is just the sheer number of draw calls the
occur when my models get a decent amount (20,000+) objects. It seems a
remedy for this is to combine drawables and share one vertex array
(correct?). Would this totally break the picking code? This application is
designed to run on computers with "buisness" level GPU (example GeForce GT
640) but sadly I have to use the nouveau drivers.
Sorry for being so long winded by wanted to clearly lay down the situation
for anyone that feels like helping. Also thanks a lot to anyone willing to
help me out!
Post generated by Mail2Forum