OpenSceneGraph Forum Forum Index OpenSceneGraph Forum
Official forum which mirrors the existent OSG mailing lists. Messages posted here are forwarded to the mailing list and vice versa.
 
   FAQFAQ    SearchSearch    MemberlistMemberlist    RulesRules    UsergroupsUsergroups    RegisterRegister 
 Mail2Forum SettingsMail2Forum Settings  ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
   AlbumAlbum  OpenSceneGraph IRC ChatOpenSceneGraph IRC Chat   SmartFeedSmartFeed 

Experiences with osgCompute


 
Post new topic   Reply to topic    OpenSceneGraph Forum Forum Index -> osgCompute [osgCompute]
View previous topic :: View next topic  
Author Message
Bart
Newbie


Joined: 13 Nov 2009
Posts: 9

PostPosted: Wed Sep 14, 2011 11:44 am    Post subject:
Experiences with osgCompute
Reply with quote

After wrapping up a project in which I used osgCompute together with OpenSceneGraph I have some mixed feelings about this library. I shall write them down below in the hope that they can be of help to the developers and anybody else that may also bumped to the specific issue. Perhaps/probably some of them may be fixed in an elegant manner and then please tell me how. Although I know how to work with OSG and CUDA I certainly do not consider myself to be an expert on OSG or CUDA.

1) OsgCompute does not seem to support CUDA constant memory. I have to do a plain CUDA call: cudaMemCpyToSymbol

2) About the init function:
2.1) It was not clear to me that I can do stuff with osgCompute only after the init() function has been called. I wanted to fill CUDA memory objects with data before the init function was called (see 2.2 for the reason why). I was unaware that this was not possible and it took me a while to figure this out.

2.2) The lazy allocation with the init function, which is called before the launch function, is not desirable, especially not in real-time applications. Often you want to set up as much as possible at the start of your application and not do this during run-time (from a real-time performance perspective). I would prefer that the user should have the responsibility of initializing his data (e.g. in constructor) and not force the user to declare an init function.

2.3) The init function causes me to declare a lot of variables globally. Normally the constructor receives all the necessary data and consequently I use this data for initialization purposes only once, so there is no need to store the received data globally. But now if I want to do some fill CUDA memory with data or copy a constant to CUDA memory, I have to do this in the init function and not in the constructor. Consequently I store all this data globally, so the init function can access it.

2.4) What does the init function do in the Resource class (and subsequently in subclasses Module and Memory)? Documentation is not really concrete on this:
"Init will check for internal parameters applied to resources and will create device dependent objects"

2.5) I tried to get rid of this init function somehow and use constructors again. But then I had to remove the META_Object macro and that gave me more errors than I would like see and I did not want to spend too much time on this anymore.

3) About the map function:
3.1) When and what is synchronized between device and host? It's described rather vaguely. And are we talking about a complete memory copy or a simple pointer-copy between OpenGL and CUDA context?

3.2) In the osgEndianness demo we allocate a Memory object (corresponding to a cudaMalloc) and there is still a map() function called. As far as I know mappings are only performed on GL-buffer objects to map them from OpenGL context to CUDA context. Secondly, I think in (one of) the other examples you call map() each frame of a Memory object. Why? It makes sense to do this if we are dealing with a GL-buffer object, since GL manages the buffer object and may choose to move the buffer in memory somehow. But a map() on Memory object seems strange, since it is not a GL-buffer object. It is even stranger to call this map() each frame, since cudaMalloc gives us a device pointer that remains valid.

3.3) Why not call unmap after map in the provided examples? It is probably called somewhere behind the scenes, but is not clear to user (also when it is called).

4) Certain names are not very insightful, for example, the class 'Module'. The name 'Module' is a very general term and does not provide the user much information about this class. At first I thought the Computation class was the class that contained a CUDA program (would make sense I think: a class Computation that contains a program that performs GPU computations), but now I understand that a Module contains a CUDA program and Computation contains Modules.

5) There is a setImage function in osgCuda::Memory (why not in osgCompute::Memory?) and I really have no idea what the purpose of this function is. In osgCompute 0.5 it was called setArray() and was used to set the data of a Memory object (Buffer object in osgCompute 0.5) and then the name setArray would actually make sense. The setArray function could be passed an OSG Array (either FloatArray, Vec2Array, etc) and I found it ideal to set the data of a Memory object. But now I have I have to copy the data into the object myself (memcpy) or I have to pass the setImage an Image object with the data. Especially the latter method is cumbersome, because you have to set up an Image object with all the correct arguments and also to fill it with Vec2, Vec3, or Vec4 data is not intuitive. Lastly, in all your examples you use the setImage function not to set an actual image, but to fill the Memory with ordinary floats (now I am completely baffled why it was changed from setArray to setImage)

6) Low-level implementation knowledge is asked from the user. If the acceptResource function receives an osgCuda::Geometry and you want to extract the normals from it, then you have to know some low-level offset that the user should not have to worry about.

7) Extremely difficult to separate visualization and simulation (i.e. CUDA computations). I blame this on CUDA and not OsgCompute, since CUDA only provides a way to map GL-buffer objects to CUDA context and not the other way around. Therefore GL-buffer objects (visualization) appear in my simulation code.

8) Compiling OsgCompute failed with OSG 3.0.1. Do this to fix it:
Place <cstring> in Warp.cpp (otherwise memcpy() is not found)
#include <osg/gl> should be #include <osg/GL>

9) The calling of launch every frame is in my opinion not useful (it should at least be configurable). My project consisted of a simulation and a visualization and the simulation consisted of computations on a CUDA device. I wanted that the user could interact with my application, so I want a high frame rate. Therefore, I decided not to update the simulation each frame, but with a certain time step (e.g. 0.1 seconds). But it was not easy to get this done, because the launch method is called each frame, so I had to use additional constructs to prevent updating each frame (which is cumbersome if the CUDA computation is far away from the object that is controlling the whole update process of the application). Moreover, I later wanted to switch between my CPU implementation and my GPU implementation, but since the launch function is automatically called each frame I again had to measures myself to prevent the GPU from performing CUDA computations when it shouldn't do so.

In general, I found the library somewhat over-engineered and had some pitfalls that I was unaware of. At certain points it is also difficult to follow what the principles behind them are. It would already help if there were more examples that demonstrate one single feature, just like OpenSceneGraph's tutorial and examples. A tutorial explaining the principles would also help, instead of source code with comments. I think this library has a lot of potential of becoming a great addition to OSG, but at this point I am still questioning whether I should use it for my next project or simply use plain CUDA and add a bit of low-level OpenGL to OSG.

If you want to check the project I made with OsgCompute, including source code, you can download it at http://home.kpn.nl/riji99ou/lennardJones.zip (it is a 2D particle simulation)

This post has become long enough now :)

Best regards,
Bart
Back to top
View user's profile Send private message
Mick
User


Joined: 11 Mar 2009
Posts: 31

PostPosted: Wed Sep 21, 2011 10:56 am    Post subject:
Reply with quote

Dear Bart,

thanks a lot for your very detailed annotations.
Many of your comments are based on the combination of CUDA and OSG which is sometimes not an easy task since interoperability with GL is still tricky. We try to answer the questions later in the day.

Best regards,
Mick

_________________
SVT Group
Back to top
View user's profile Send private message
Mick
User


Joined: 11 Mar 2009
Posts: 31

PostPosted: Wed Sep 21, 2011 5:32 pm    Post subject:
Reply with quote

Hi Bart!

We try to give you some answers to your questions. Please let us know if you need more details in some areas.

Actually, a lot of the design decisions are based on the internal and detailed knowledge about OSG and GL.


Bart wrote:

1) OsgCompute does not seem to support CUDA constant memory. I have to do a plain CUDA call: cudaMemCpyToSymbol


Right - constant memory is not supported as a buffer object. Thus, you need to transfer data to the constant memory by using "cudaMemCpyToSymbol". However it would be very easy to integrate such a behavior into a cuda memory object by defining an additional alloc-hint.


Quote:

2) About the init function:
2.1) It was not clear to me that I can do stuff with osgCompute only after the init() function has been called. I wanted to fill CUDA memory objects with data before the init function was called (see 2.2 for the reason why). I was unaware that this was not possible and it took me a while to figure this out.


What do you mean exactly by "init"-function? Each memory object has a init-function which checks the object's parameters for consistency which is implicitly called when using the memory object! There is no need to call init manually (like _tmpBuffer->init()). In the examples we do it mostly just for clarification. But be careful: when using the dimension of an e.g. a texture and geometry objects should be initialized first manually.

Quote:

2.2) The lazy allocation with the init function, which is called before the launch function, is not desirable, especially not in real-time applications. Often you want to set up as much as possible at the start of your application and not do this during run-time (from a real-time performance perspective). I would prefer that the user should have the responsibility of initializing his data (e.g. in constructor) and not force the user to declare an init function.


The lazy allocation is exactly implemented the way OSG does the allocation. If you want to prevent such a lazy allocation (e.g. for performance issues) you can simply map the buffer to the device and init its memory right after you created the buffer, eg.:
Code:

cudaMemset( curBuffer->map(osgCompute::MAP_DEVICE_TARGET ), 0x0, curBuffer->getByteSize() );


Quote:

2.3) The init function causes me to declare a lot of variables globally. Normally the constructor receives all the necessary data and consequently I use this data for initialization purposes only once, so there is no need to store the received data globally. But now if I want to do some fill CUDA memory with data or copy a constant to CUDA memory, I have to do this in the init function and not in the constructor. Consequently I store all this data globally, so the init function can access it.


Actually it is not quite clear to me what kind of init and which constructor you mean. Please give us an short example (unfortunately I did not look into your source code yet).


Quote:

2.4) What does the init function do in the Resource class (and subsequently in subclasses Module and Memory)? Documentation is not really concrete on this:
"Init will check for internal parameters applied to resources and will create device dependent objects"


Please see the answer to 2.1. But you are right - we should go more into details in the documentation.

Quote:

2.5) I tried to get rid of this init function somehow and use constructors again. But then I had to remove the META_Object macro and that gave me more errors than I would like see and I did not want to spend too much time on this anymore.


Please see my answer to 2.3. The init-function really seems to bother you Smile If you clarify what you want to do maybe we could help with some examples/code how to deal with the initialization of memory objects.


Quote:

3) About the map function:
3.1) When and what is synchronized between device and host? It's described rather vaguely. And are we talking about a complete memory copy or a simple pointer-copy between OpenGL and CUDA context?


"Device" means GPU-memory, "host" means CPU-memory. Thus, synchronization needs to copy the memory between device and host. Please note, the synchronization is taking place automatically when using "map" (see 3.2).


Quote:

3.2) In the osgEndianness demo we allocate a Memory object (corresponding to a cudaMalloc) and there is still a map() function called. As far as I know mappings are only performed on GL-buffer objects to map them from OpenGL context to CUDA context. Secondly, I think in (one of) the other examples you call map() each frame of a Memory object. Why? It makes sense to do this if we are dealing with a GL-buffer object, since GL manages the buffer object and may choose to move the buffer in memory somehow. But a map() on Memory object seems strange, since it is not a GL-buffer object. It is even stranger to call this map() each frame, since cudaMalloc gives us a device pointer that remains valid.


"Mapping" means that different memory locations (e.g. CPU / GPU) are synchronized. "map" gives you a pointer to the actual memory. When calling "map" on a memory object, the map-function checks whether some memory need to be synchronized, allocated, or if just the specific memory pointer needs to be returned. Thus, don't worry about performance issues when calling "map" on a memory object. If you save the memory pointer to a member-variable there might be some danger that some other algorithm (for example a GL render task or CPU-work) would update its specific memory and the memory object would run out of memory-synchronization!

Quote:

3.3) Why not call unmap after map in the provided examples? It is probably called somewhere behind the scenes, but is not clear to user (also when it is called).


"Unmap" is handled implicitly (and automatically) by texture and geometry objects since GL-buffers need to be "unmapped" from the CUDA-context before rendering.


Quote:

4) Certain names are not very insightful, for example, the class 'Module'. The name 'Module' is a very general term and does not provide the user much information about this class. At first I thought the Computation class was the class that contained a CUDA program (would make sense I think: a class Computation that contains a program that performs GPU computations), but now I understand that a Module contains a CUDA program and Computation contains Modules.


This is right: a "computation" contains several "modules". A "module" implements several CUDA algorithms. It would be some overhead to assign only each single CUDA kernel to a separate computation-node. That's why the modules are introduced.

Quote:

5) There is a setImage function in osgCuda::Memory (why not in osgCompute::Memory?) and I really have no idea what the purpose of this function is. In osgCompute 0.5 it was called setArray() and was used to set the data of a Memory object (Buffer object in osgCompute 0.5) and then the name setArray would actually make sense. The setArray function could be passed an OSG Array (either FloatArray, Vec2Array, etc) and I found it ideal to set the data of a Memory object. But now I have I have to copy the data into the object myself (memcpy) or I have to pass the setImage an Image object with the data. Especially the latter method is cumbersome, because you have to set up an Image object with all the correct arguments and also to fill it with Vec2, Vec3, or Vec4 data is not intuitive. Lastly, in all your examples you use the setImage function not to set an actual image, but to fill the Memory with ordinary floats (now I am completely baffled why it was changed from setArray to setImage)


osg::Images are much more general and available via plugins and osg::Arrays do not make sense for texture objects.

Quote:

6) Low-level implementation knowledge is asked from the user. If the acceptResource function receives an osgCuda::Geometry and you want to extract the normals from it, then you have to know some low-level offset that the user should not have to worry about.


We derived osgCuda::Geometry exactly from osg::Geometry. So you just need to handle osgCuda::Geometry the same way you handle osg::Geometry.

Quote:

7) Extremely difficult to separate visualization and simulation (i.e. CUDA computations). I blame this on CUDA and not OsgCompute, since CUDA only provides a way to map GL-buffer objects to CUDA context and not the other way around. Therefore GL-buffer objects (visualization) appear in my simulation code.


Just use "osg::ref_ptr<osgCompute::Memory> myMemory" as a member in your class. Using this more abstract way, you do not mix simulation too much with visualization code.
Code:

osgCuda::Geometry* myGeometry;
osg::ref_ptr<osgCompute::Memory> myMemory = myGeometry->getMemory();


Quote:

Cool Compiling OsgCompute failed with OSG 3.0.1. Do this to fix it:
Place <cstring> in Warp.cpp (otherwise memcpy() is not found)
#include <osg/gl> should be #include <osg/GL>


The most recent test we did is with OSG Library 3.1.0 (developer version). It runs with no problems.

Quote:

9) The calling of launch every frame is in my opinion not useful (it should at least be configurable). My project consisted of a simulation and a visualization and the simulation consisted of computations on a CUDA device. I wanted that the user could interact with my application, so I want a high frame rate. Therefore, I decided not to update the simulation each frame, but with a certain time step (e.g. 0.1 seconds). But it was not easy to get this done, because the launch method is called each frame, so I had to use additional constructs to prevent updating each frame (which is cumbersome if the CUDA computation is far away from the object that is controlling the whole update process of the application). Moreover, I later wanted to switch between my CPU implementation and my GPU implementation, but since the launch function is automatically called each frame I again had to measures myself to prevent the GPU from performing CUDA computations when it shouldn't do so.


There are several ways to control the workflow. Firstly, launch callbacks:
Code:

osgCompute::LaunchCallback* launchCallback = new MyLaunchCallback;
mainComputation->setLaunchCallback( launchCallback );

...
...

void MyLaunchCallback::operator()( osgCompute::Computation& computation )
{
   // ... define the order of your launches
   myModule2->launch();
   if (ok)
      myModule1->launch();
}


And: you can simply disable modules if you don't need them: myModule->disable();

Quote:

At certain points it is also difficult to follow what the principles behind them are. It would already help if there were more examples that demonstrate one single feature, just like OpenSceneGraph's tutorial and examples. A tutorial explaining the principles would also help, instead of source code with comments. I think this library has a lot of potential of becoming a great addition to OSG, but at this point I am still questioning whether I should use it for my next project or simply use plain CUDA and add a bit of low-level OpenGL to OSG.


The combination of GL with CUDA and the integration of all this into scenegraphs probably can't be done in a very easy straight forward way. This may look a bit more sophisticated. But the design of osgCompute is done in a general manner - thus it may fit a lot of application areas.

Example code is always of great help for users. Unfortunately, the time for documentation and examples is always short. But you are welcome to provide some examples, of course.


Thanks a lot for your interest in using osgCompute.
If you have more detailed questions feel free to ask.

Best regards,
Mick

_________________
SVT Group
Back to top
View user's profile Send private message
Bart
Newbie


Joined: 13 Nov 2009
Posts: 9

PostPosted: Fri Sep 23, 2011 12:00 pm    Post subject:
Reply with quote

Hi Mick,

Thank you for answers, which (partly) clarified things. I supplied a short example program as you requested. The program displays a triangle (osgCuda::Geometry) and the GPU increments its positions with a positive value, so it's moving to the top-right. Secondly, there is a dummy array (osgCuda::Memory) and the GPU increments its elements with a certain value and prints this to the standard output every frame. I listed the code below, you can also download a zip at http://home.kpn.nl/riji99ou/testOsgCompute.zip

Code:
#include <osgViewer/Viewer>
#include <osgGA/TrackballManipulator>
#include <osg/Geometry>
#include <osg/Geode>
#include <osgCuda/Computation>
#include <osgCuda/Geometry>
#include <osgCompute/Module>
#include "TestModule.h"


using namespace osg;
using namespace std;


//main
int main() {
    //Setup viewer
    osgViewer::Viewer viewer;
    viewer.setCameraManipulator(new osgGA::TrackballManipulator());
    viewer.getCameraManipulator()->setHomePosition(Vec3(0.5f, 0.5f, 3.0f),   //Eye position
                                                   Vec3(0.5f, 0.5f, 0.0f),   //Coordinate looking at
                                                   Vec3(0.0f, 1.0f, 0.0f));  //Positive y is up-vector
    viewer.getCamera()->setClearColor(Vec4f(0.0f, 0.0f, 1.0f, 1.0f));  //Blue background
    viewer.home();
    viewer.setThreadingModel(osgViewer::Viewer::SingleThreaded);  //osgCompute does not support multi-threading
    viewer.setUpViewInWindow(200, 200, 800, 600, 0);

    //Setup triangle
    ref_ptr<Geometry> const geoHost = new Geometry();
    geoHost->setDataVariance(Object::DYNAMIC);  //We update the vertices
    geoHost->setUseVertexBufferObjects(true);
    //Setup vertex positions of triangle
    int numVertices = 3;
    ref_ptr<Vec2Array> vertices = new Vec2Array();
    vertices->reserve(numVertices);
    vertices->push_back(Vec2f(0.0f, 0.0f));
    vertices->push_back(Vec2f(1.0f, 0.0f));
    vertices->push_back(Vec2f(1.0f, 1.0f));
    geoHost->setVertexArray(vertices);
    vertices->getVertexBufferObject()->setUsage(GL_STREAM_DRAW);
    //Setup elements of triangle
    ref_ptr<DrawElementsUInt> elements = new DrawElementsUInt(PrimitiveSet::TRIANGLES);
    elements->reserve(3);
    elements->push_back(0);
    elements->push_back(1);
    elements->push_back(2);
    geoHost->addPrimitiveSet(elements);
    elements->getElementBufferObject()->setUsage(GL_STATIC_DRAW);

    //Dummy Vec3 array (just for demonstrating purposes)
    ref_ptr<Vec3Array> dummy = new Vec3Array();
    int numDummies = 10;
    dummy->reserve(numDummies);
    for (int i = 0; i < numDummies; ++i)
        dummy->push_back(Vec3f(8.0f, 9.0f, 10.0f));

    //Place geometry in osgCuda::Geometry
    osgCuda::Geometry* geoDev = new osgCuda::Geometry;
    geoDev->setVertexArray(geoHost->getVertexArray());
    geoDev->addPrimitiveSet(geoHost->getPrimitiveSet(0));
    geoDev->setStateSet(geoHost->getOrCreateStateSet());
    geoDev->addIdentifier("triangle");
    //Setup test module
    TestModule* tm = new TestModule(numVertices, dummy);
    //Setup computation
    osgCuda::Computation* computation = new osgCuda::Computation();
    computation->addModule(*tm);
    computation->addResource(*geoDev->getMemory());

    //Setup scene graph
    ref_ptr<Group> const root = new Group();
    ref_ptr<Geode> const geode = new Geode();
    geode->addDrawable(geoDev);
    computation->addChild(geode);
    root->addChild(computation);

    //Set scene data
    viewer.setSceneData(root);

    //Draw loop
    while (!viewer.done()) {
        viewer.frame();
    }

    return 0;
}


Code:
#ifndef TEST_MODULE_H
#define TEST_MODULE_H


#include <osg/Array>
#include <osgCompute/Module>
#include <osgCuda/Memory>
#include <osgCompute/Resource>


class TestModule : public osgCompute::Module {
    private:
        //Vertex data of triangle
        int numVertices;
        osg::ref_ptr<osg::Vec2Array> hostPositions;
        osg::ref_ptr<osgCompute::Memory> devPositions;

        //Data of dummy
        int numDummies;
        osg::ref_ptr<osg::Vec3Array> hostDummy;
        osg::ref_ptr<osgCompute::Memory> devDummy;


    public:
        //Constructors
        TestModule() : osgCompute::Module() {}
        TestModule(int numVertices, osg::ref_ptr<osg::Vec3Array> dummy);

        //OSG macros
        META_Object( , TestModule)

        //Called once by osgCompute and should be used to create and initialize all resources
        bool init();

        //Used for getting a resource that is located somewhere else in the scene graph
        void acceptResource(osgCompute::Resource& resource);

        //Called each frame by osgCompute and should contain calls to CUDA kernel(s)
        void launch();


    protected:
        //Destructor
        ~TestModule() { osgCompute::Module::clear(); }


    private:
        //Copy
        TestModule(const TestModule&, const osg::CopyOp& ) {}

        //Assignment
        inline TestModule &operator=(const TestModule &) { return *this; }
};

#endif


Code:
#include <iostream>
#include <cstring>
#include <cuda/cuda_runtime.h>
#include "TestModule.h"


using namespace osg;


extern "C" void incrementPositions(float2* positions, int numVertices);
extern "C" void incrementDummy(float3* dummy, int numDummies);


TestModule::TestModule(int numVertices, osg::ref_ptr<osg::Vec3Array> dummy) : osgCompute::Module() {
    this->numVertices = numVertices;

    //Store dummy, because we again have to use it in init-function (not practical...)
    hostDummy = dummy;
    numDummies = hostDummy->getNumElements();

    //Set up (non-GL) memory of CUDA device
    devDummy = new osgCuda::Memory;
    devDummy->setElementSize(3 * sizeof(float));
    devDummy->setDimension(0, numDummies);

    //Fill (non-GL) memory of CUDA device and map it CUDA device (CAUSES THE SYSTEM TO FREEZE)
//  memcpy(devDummy->map(osgCompute::MAP_HOST_TARGET), hostDummy->getDataPointer(), devDummy->getByteSize());
//  devDummy->unmap();
//  devDummy->map(osgCompute::MAP_DEVICE);
//  devDummy->unmap();  //Necessary?

    //Set CUDA constant (CAUSES THE APPLICATION TO CRASH)
//  float hostTestConst = 0.0005f;
//  cudaMemcpyToSymbol("testConst", &hostTestConst, sizeof(float));
}


//init
bool TestModule::init() {
    //Fill (non-GL) memory of CUDA device
    memcpy(devDummy->map(osgCompute::MAP_HOST_TARGET), hostDummy->getDataPointer(), devDummy->getByteSize());
    devDummy->unmap();
    //devDummy->init();  //Necessary?

    //Set CUDA constant
    float hostTestConst = 0.0005f;
    cudaMemcpyToSymbol("testConst", &hostTestConst, sizeof(float));

    return osgCompute::Module::init();
}


//acceptResource
void TestModule::acceptResource(osgCompute::Resource& resource) {
    std::cout << "Resource accepted\n";
    if (resource.isIdentifiedBy("triangle"))
        devPositions = dynamic_cast<osgCompute::Memory*>(&resource);;
}


//launch
void TestModule::launch() {
    //Increment values of dummy array
    incrementDummy((float3*)devDummy->map(osgCompute::MAP_DEVICE), numDummies);
    //Print result
    Vec3f* ptrDummy = (Vec3f*)devDummy->map(osgCompute::MAP_HOST_SOURCE);  //Map to host for reading
    for (int i = 0; i < numDummies; ++i)
        std::cout << ptrDummy[i][0] << "," << ptrDummy[i][1] << "," << ptrDummy[i][2] << "\n";
    std::cout << "\n";
    devDummy->unmap();  //Necessary?

    //Increment positions of triangle
    incrementPositions((float2*)devPositions->map(), numVertices);
    devPositions->unmap();  //Necessary?
}


Quote:
There is no need to call init manually (like _tmpBuffer->init())

Then why make init a public function? But the init-function of Memory is not what really bothers me.

The thing that bothers me most is the class you have to make when you inherit from the osgCompute::Module class (TestModule in the example program). First of all, all your examples use this META_Object macro in the class that inherits from Module, so I went along at first. This macro, however, forces me to declare a constructor with no parameters. Removing the macro gave me other errors, which, at that time, I did not want to solve also, so I complied with the no-parameters constructor. Perhaps this macro has some great use that I am unaware of (probably it has).

Secondly, inheriting from Module requires me to use an init- and launch-function. In this init-function I seem to be allowed to do CUDA stuff and no sooner. I am allowed to set up a osgCuda::Memory object (devDummy in the example) in the constructor, but if I map it to the device so I can already allocate devDummy on the CUDA device at start-up, then the system freezes and displays the following error several times (and it's a Linux system):
osgCuda::Computation::setupDevice(): cannot share device with OpenGL.setting the device when a process is active is not allowed
Another example is setting constant memory in the constructor. If I do this, the application crashes with the error:
osgCuda::Computation::setupDevice(): cannot share device with OpenGL.setting the device when a process is active is not allowed
osgCuda::Computation::setupDevice(): cannot share device with OpenGL.setting the device when a process is active is not allowed
osgCuda::Computation::setupDevice(): cannot share device with OpenGL.setting the device when a process is active is not allowed
osgCuda::Computation::setupDevice(): cannot share device with OpenGL.setting the device when a process is active is not allowed
[Computation::accept(GLObjectsVisitor)]: No valid Computation Device found.
osgCuda::Computation::setupDevice(): cannot share device with OpenGL.setting the device when a process is active is not allowed
osgCuda::Computation::setupDevice(): cannot share device with OpenGL.setting the device when a process is active is not allowed
[osgCuda::GeometryMemory::alloc()]: unable to register buffer object (cudaGraphicsGLRegisterBuffer).unspecified driver error.
osgCuda::Computation::setupDevice(): cannot share device with OpenGL.setting the device when a process is active is not allowed
osgCuda::Computation::setupDevice(): cannot share device with OpenGL.setting the device when a process is active is not allowed
[osgCuda::Memory::sync()] "": cudaMemcpy() to host failed. unspecified launch failure.
Segmentation fault

If I move the CUDA stuff to the init-function, everything is fine again. But this is not what I want, I want the constructor for setting up and initializing everything and not use an additional init-function. Now I also have to store the constructor's arguments, so I can use them again in the init-function.

Quote:
"Mapping" means that different memory locations (e.g. CPU / GPU) are synchronized. "map" gives you a pointer to the actual memory. When calling "map" on a memory object, the map-function checks whether some memory need to be synchronized, allocated, or if just the specific memory pointer needs to be returned. Thus, don't worry about performance issues when calling "map" on a memory object. If you save the memory pointer to a member-variable there might be some danger that some other algorithm (for example a GL render task or CPU-work) would update its specific memory and the memory object would run out of memory-synchronization!

As far as I understand is a map from memory of an osgCuda::Geometry to CUDA context a simple pointer copy of GL-context to CUDA context. If I map from GPU to a CPU or vice versa, would osgCompute do a complete copy or provide a pointer and then each time transfer a data element over the PCI-express bus when requested? It would be nice if this somehow could be included in the documentation.

Quote:
This is right: a "computation" contains several "modules". A "module" implements several CUDA algorithms. It would be some overhead to assign only each single CUDA kernel to a separate computation-node. That's why the modules are introduced.

My problem is not with the design here, it's name 'Module' that does not convey concrete information what the class is about. The term 'Module' is a very general term that can represent a lot of things, so it's not clear to the user what this class is about, unless he investigates the class in detail.

Quote:
osg::Images are much more general and available via plugins and osg::Arrays do not make sense for texture objects.

Then why use memcpy() in all your examples to fill a Memory object? Normally, your data is present in some Vec2Array, Vec3Array, etc. And it would be nice if you could simply pass this array to a function of a Memory object, which would then fill the Memory with the data of the array, instead of doing memcpy's. Besides the term 'setImage' is again very misleading, because it is also used for non-image things. In the osgEndianess example you use setImage to fill a Memory object (not with image data).

Quote:
We derived osgCuda::Geometry exactly from osg::Geometry. So you just need to handle osgCuda::Geometry the same way you handle osg::Geometry.

But if I want to extract the normals from some 2D osgCuda::Geometry that I receive as resource, how can I do that in a clean manner? As far as I know you should use some offset:
Code:
void TestModule::acceptResource(osgCompute::Resource& resource) {
    if (resource.isIdentifiedBy("triangle"))
      devNormals = dynamic_cast<osgCompute::Memory*>(&resource + sizeof(Vec2f) * numVertices);
}

But what if I want to extract the texture coordinates, what should the offset then be? Or any other geometry attribute? This requires low-level knowledge of how memory is allocated, which the (general) user does not know.

Quote:
And: you can simply disable modules if you don't need them: myModule->disable();

Will this give the control to the user? Or does it also have other consequences?
Back to top
View user's profile Send private message
jens.svt
User


Joined: 16 Mar 2009
Posts: 30

PostPosted: Wed Oct 05, 2011 1:40 pm    Post subject:
Reply with quote

Hi Bart,

sorry again that it took so long to write an answer. Thanks a lot for your support
to improve osgCompute.

Quote:

The thing that bothers me most is the class you have to make when you inherit from
the osgCompute::Module class (TestModule in the example program). First of all, all
your examples use this META_Object macro in the class that inherits from Module, so I
went along at first. This macro, however, forces me to declare a constructor with no
parameters. Removing the macro gave me other errors, which, at that time, I did not want
to solve also, so I complied with the no-parameters constructor.
Perhaps this macro has some great use that I am unaware of (probably it has).
...
Secondly, inheriting from Module requires me to use an init- and launch-function.
In this init-function I seem to be allowed to do CUDA stuff and no sooner.
...
If I move the CUDA stuff to the init-function, everything is fine again. But this
is not what I want, I want the constructor for setting up and initializing everything
and not use an additional init-function. Now I also have to store the constructor's
arguments, so I can use them again in the init-function.


As we use the runtime version of CUDA it is absolutely necessary to include a class which
encapsulate CUDA programs. However, if you just want to add a simple CUDA operation use a
osgCompute::LaunchCallback with your kernels, similar to
for OpenGL calls within OSG. We tried a lot of times to allow a developer to use CUDA
stuff everywhere in a module. The point is that OSG initializes an OpenGL-Context during
the first traversal of the scene graph (See osgViewer class). However, the constructor
of a module is called beforehand. One solution would be to write a new viewer class which
handles the OpenGL-contexts differently (We thought about it but we currently have no time).
If you have another solution for the problem that would be great. Another thing is that you
can call osgViewer::Viewer::realize() and osgCompute::GLMemory::bindToContext( viewerContext ).
with that new context of the root camera. After this you can call your CUDA code everywhere.
Please see the following source files:
http://www.cg.informatik.uni-siegen.de/data/Downloads/svt/osgCUDAEverywhere.zip
Here we have included these calls at the start of Main.cpp. Just use CMake to build the application or copy your files back to your project.
The Macro META_Object will not be necessary in the next version of osgCompute. However, in
the current vesion it defines the library name and class name of an osg::Object as all osg::Objects
have to define it. We will change the init() and the launch() function to a non-abstract
function in the next version of osgCompute as well.

Quote:

As far as I understand is a map from memory of an osgCuda::Geometry to CUDA context a
simple pointer copy of GL-context to CUDA context. If I map from GPU to a CPU or vice
versa, would osgCompute do a complete copy or provide a pointer and then each time
transfer a data element over the PCI-express bus when requested? It would be nice if
this somehow could be included in the documentation.


A real copy operation is only applied if you map something with osgCompute::MAP_XXX_TARGET.
With the TARGET specifier you tell osgCompute that it will be changed. For all other
mappings no copy operation is applied afterwards as no data has changed. We will extend the
documentation, thank you very much. For us it is clear now but that is only because we u
se it all the time.

Quote:

My problem is not with the design here, it's name 'Module' that does not convey concrete information
what the class is about. The term 'Module' is a very general term that can represent a lot
of things, so it's not clear to the user what this class is about, unless he investigates
the class in detail.


Would it be helpful to rename it to osgCompute::Program? Please make a suggestion.

Quote:

Then why use memcpy() in all your examples to fill a Memory object? Normally, your data
is present in some Vec2Array, Vec3Array, etc. And it would be nice if you could simply
pass this array to a function of a Memory object, which would then fill the Memory with
the data of the array, instead of doing memcpy's. Besides the term 'setImage' is again
very misleading, because it is also used for non-image things. In the osgEndianess
example you use setImage to fill a Memory object (not with image data).


We have implemented such a functionality in a earlier version of osgCompute. We moved towards
osg::Image objects as these objects are better serializeable and one can provide plugins to
load or store them. The osgCompute example was designed to show the flexibility to do
things in osgCompute. I will change the functionality as it seems to be confusing instead.

Quote:

But if I want to extract the normals from some 2D osgCuda::Geometry that I receive as
resource, how can I do that in a clean manner? As far as I know you should use some
offset: But what if I want to extract the texture coordinates, what should the offset
then be? Or any other geometry attribute? This requires low-level knowledge of how memory
is allocated, which the (general) user does not know.


I think that is an application specific concern and one should not try to encapsulate it
in a general memory handling system. The user always has to define how its memory is organized.
When you write an algorithm you have to be shure that subsequent operations can work with the
resulting data (e.g. your results is stored as float4). If you already know that it is an
GLBufferObject than it is clear that your Module receives the required informations from the
accompanying osg::Geometry object. Please tell me if i understood your question wrong.

Quote:

'...you can simply disable modules if you don't need them: myModule->disable();'
Will this give the control to the user? Or does it also have other consequences?


I do not understand what you mean here. To disable a module means that you can turn off
the functionality of a module. If you want to gain access to the sequence in which modules
are called you should use a osgCompute::LaunchCallback.

Best regards,
Jens
Back to top
View user's profile Send private message
Bart
Newbie


Joined: 13 Nov 2009
Posts: 9

PostPosted: Sat Oct 08, 2011 1:14 pm    Post subject:
Reply with quote

Thanks for all your answers.

Quote:
As we use the runtime version of CUDA it is absolutely necessary to include a class which
encapsulate CUDA programs. However, if you just want to add a simple CUDA operation use a
osgCompute::LaunchCallback with your kernels, similar to
for OpenGL calls within OSG. We tried a lot of times to allow a developer to use CUDA
stuff everywhere in a module. The point is that OSG initializes an OpenGL-Context during
the first traversal of the scene graph (See osgViewer class). However, the constructor
of a module is called beforehand. One solution would be to write a new viewer class which
handles the OpenGL-contexts differently (We thought about it but we currently have no time).
If you have another solution for the problem that would be great. Another thing is that you
can call osgViewer::Viewer::realize() and osgCompute::GLMemory::bindToContext( viewerContext ).
with that new context of the root camera. After this you can call your CUDA code everywhere.
Please see the following source files:
http://www.cg.informatik.uni-siegen.de/data/Downloads/svt/osgCUDAEverywhere.zip

I see the difficulty now. Although not the cleanest solution, I can live with the 'four-lines solution' you provided in your code. Perhaps it might be an idea to add a static function like setupOsgCompute(Viewer &v) or initOsgCompute(Viewer &v). Then you can execute the 'four-lines solution' on the viewer you get by reference. This way you hide low-level details from the user. My suggestion might be a problem for people who want to use multiple GL contexts (I personally never made use of multiple GL-context). You can also rename the function to setupSingleContextOsgCompute(Viewer &v) (a bit verbose, but clear)

Editing the Viewer class would indeed be the best solution. Perhaps some cooperation between the maintainer of the Viewer class and you guys might solve this issue?

As for the LaunchCallBack solution you suggest, I never thought it could be done like that. When I read the documentation of this class I though its purpose was the change the launch order of the modules. Since I only use one module, I did not look at it any further into that class. Well, it seems it can be used for other things as well. Thanks for the suggestion.


Quote:
Would it be helpful to rename it to osgCompute::Program? Please make a suggestion.

The documentation says the following about Module and Computation:
"A module is the base class to implement application specific parallel algorithms working on resources. Modules implement a strategy design pattern in connection with a computation node. Think of a module as a separated algorithm executed once in each frame like osg::Program objects are executed during rendering. However, modules are much more flexible as execution is handed over to the module."
"A computation is a container where you can add your osgCompute::Module objects, just like osg::Program is a container for osg::Shader objects"
In Module the actual calls to CUDA kernels reside and it's the place where we allocate and set CUDA memory, so I would call it Computation. Program is also okay. Because this class sets up the CUDA memory and launches the CUDA computation/program.
The Computation class is currently a container class for Modules. If you want to keep you analogy with shader programs (where a program contains vertex, geometry, or fragment shaders), you could call this class Program. In that case a Program class is a container for Computations. However, if you prefer to rename your Module to Program, then you can rename your Computation to ProgramContainer (or ProgramCollection). Then the user instantly sees those two classes belong together and how they are connected to each other.


Quote:
I think that is an application specific concern and one should not try to encapsulate it
in a general memory handling system. The user always has to define how its memory is organized.
When you write an algorithm you have to be shure that subsequent operations can work with the
resulting data (e.g. your results is stored as float4). If you already know that it is an
GLBufferObject than it is clear that your Module receives the required informations from the
accompanying osg::Geometry object. Please tell me if i understood your question wrong.

I think you understood me wrong. The user does not know how osgCuda::Geometry object organizes its memory. For example, I construct a TestModule tm and an osgCuda::Geometry geo and set geo's 3D vertices, 3D normals, 2D texture coordinates. I attach the identifier "thisIsAGeo" to geo and then call tm->addModule(geo->getMemory()). In the TestModule class there is an acceptResource, which is called with geo as argument. Now I want to extract the vertices, normals, texture coordinates from this geo object. How do I do this? The follow code demonstrates what I mean:
Code:
void TestModule::acceptResource(osgCompute::Resource& resource) {
    if (resource.isIdentifiedBy("thisIsAGeo")) {
      //Is it first positions, then normals, then texCoords?
      positions = dynamic_cast<osgCompute::Memory*>(&resource);
      normals = dynamic_cast<osgCompute::Memory*>(&resource + numVertices * sizeof(Vec3f));
      texCoords = dynamic_cast<osgCompute::Memory*>(&resource + 2 * numVertices * sizeof(Vec3f));
      //Or is it first positions, then texCoords, then normals?
      positions = dynamic_cast<osgCompute::Memory*>(&resource);
      texCoords = dynamic_cast<osgCompute::Memory*>(&resource + numVertices * sizeof(Vec3f));
      normals = dynamic_cast<osgCompute::Memory*>(&resource + numVertices * (sizeof(Vec3f) + sizeof(Vec2f));
      //Or is it first normals, then positions, then texCoords?
      normals = dynamic_cast<osgCompute::Memory*>(&resource);
      positions = dynamic_cast<osgCompute::Memory*>(&resource + sizeof(Vec3f) * numVertices);
      texCoords = dynamic_cast<osgCompute::Memory*>(&resource + 2 * numVertices * sizeof(Vec3f));
      //Or is it....
      //I don't know, because I don't know how osgCuda::Geometry organizes its memory
   }
}



Quote:
I do not understand what you mean here. To disable a module means that you can turn off the functionality of a module

Will this stop the module only from calling launch each frame (i.e. disable the callback only)? So that if I call launch manually whenever I want, it still executes this function and hence my CUDA stuff. Or will the module completely stop working: ignore launch hints, will no longer accept resources, remove resources, get resources, etc.?


Based on you statements of the new version of osgCompute, I am curious about and looking forward to this new version. Is there a (rough) release date of this new version?
Back to top
View user's profile Send private message
jens.svt
User


Joined: 16 Mar 2009
Posts: 30

PostPosted: Mon Oct 10, 2011 9:33 am    Post subject:
Reply with quote

Hi Bart,

We renamed modules and computation nodes. We also get rid of the META_object macros for user defined computations.
Check out the current SVN version and see it done.
Check out the example as well:
http://www.cg.informatik.uni-siegen.de/data/Downloads/svt/osgCUDAEverywhere.zip

However, we did not include geometry offsets. We currently
have a discussion about this topic internaly.

Best regards,
Jens
Back to top
View user's profile Send private message
Bart
Newbie


Joined: 13 Nov 2009
Posts: 9

PostPosted: Fri Oct 14, 2011 12:13 pm    Post subject:
Reply with quote

Hi Jens,

I finally have come around to check your new version. And I like it. Particularly the class that inherits from the new Computation class is a lot cleaner now.

As for the geometry-offset problem. It would be nice if you would let it know how you tackled the problem.

One last thing: you release single-threaded versions and for the multi-threaded version we should contact you guys. What is the particular reason behind this?

Best regards,
Bart
Back to top
View user's profile Send private message
Mick
User


Joined: 11 Mar 2009
Posts: 31

PostPosted: Fri Oct 14, 2011 12:56 pm    Post subject:
Reply with quote

Hi Bart!

Bart wrote:

One last thing: you release single-threaded versions and for the multi-threaded version we should contact you guys. What is the particular reason behind this?



Yes - we fully supported OSG multi-threading options in an earlier version. Unfortunately, it became more and more difficult for us to keep new functions in osgCompute compatible to the multi-threading of OSG viewer. Thus, for now we stay with the single threaded version.

Best regards,
Mick

_________________
SVT Group
Back to top
View user's profile Send private message
Bart
Newbie


Joined: 13 Nov 2009
Posts: 9

PostPosted: Fri Oct 21, 2011 9:10 am    Post subject:
Reply with quote

After reading that osgCompute is single threaded now I first took this for granted. Still, I am now questioning how bad is it that the viewer is executing in a single thread. Does anybody know the impact on the performance? Is it significant? And is the impact size only proportional to the scene graph attached to it?
Back to top
View user's profile Send private message
jens.svt
User


Joined: 16 Mar 2009
Posts: 30

PostPosted: Fri Oct 28, 2011 12:35 pm    Post subject:
Reply with quote

Hi Bart,

We did not make any explicit test about the performance of the viewer
itself. With our applications the single threaded version has always been
faster. However, we did try to use multiple graphic windows(contexts)
which turned out to be not so easy in combination with CUDA.

Best regards,
Jens
Back to top
View user's profile Send private message
zeeshan
Newbie


Joined: 03 Dec 2012
Posts: 4

PostPosted: Wed Dec 19, 2012 7:48 am    Post subject:
Thanks
Reply with quote

Thanks for sharing...

Smile


___________________
http://www.womensjewellery.eu
Back to top
View user's profile Send private message
zeeshan
Newbie


Joined: 03 Dec 2012
Posts: 4

PostPosted: Mon Feb 11, 2013 6:14 pm    Post subject:
Experiences with osgCompute
Reply with quote

Thanks for sharing...

:)


___________________
http://www.womensjewellery.eu

------------------
Read this topic online here:
http://forum.openscenegraph.org/viewtopic.php?p=51628#51628







------------------
Post generated by Mail2Forum
Back to top
View user's profile Send private message
sholmes
User


Joined: 05 Sep 2011
Posts: 62

PostPosted: Mon Apr 07, 2014 8:04 pm    Post subject:
Reply with quote

Should this (osgCUDAEverywhere) build with the version of osgCompute that is in SVN?

CD

jens.svt wrote:
Hi Bart,

We renamed modules and computation nodes. We also get rid of the META_object macros for user defined computations.
Check out the current SVN version and see it done.
Check out the example as well:
http://www.cg.informatik.uni-siegen.de/data/Downloads/svt/osgCUDAEverywhere.zip

However, we did not include geometry offsets. We currently
have a discussion about this topic internaly.

Best regards,
Jens
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    OpenSceneGraph Forum Forum Index -> osgCompute [osgCompute] All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum

Similar Topics
Topic Author Forum Replies Posted
No new posts How to postprocess the data in depth ... eastwood osgCompute [osgCompute] 3 Mon Aug 25, 2014 4:56 pm View latest post
No new posts is osgCompute still active? sholmes General [forum] 1 Tue Jul 08, 2014 1:49 pm View latest post
No new posts osgCompute Questions [or... osgComput... sholmes osgCompute [osgCompute] 0 Wed Apr 09, 2014 2:10 pm View latest post
No new posts osgCompute Questions Conan General 0 Mon Apr 07, 2014 6:52 pm View latest post
No new posts osgCompute example SMesserschmidt General 1 Wed Dec 04, 2013 9:44 am View latest post


Board Security Anti Bot Question MOD - phpBB MOD against Spam Bots
Powered by phpBB © 2001, 2005 phpBB Group
Protected by Anti-Spam ACP