This repository was archived by the owner on Mar 20, 2023. It is now read-only.
Description [copy/paste of internal issue created by @pramodk ]
In GPU implementation of synapse model, we need to setup random123 streams on GPU which are being done in bbcore_read for every synapse instance:
static void bbcore_read (double * x, int * d, int * xx, int * offset, _threadargsproto_) {
assert (!_p_rng);
uint32_t * di = ((uint32_t *)d) + *offset;
if (di[0 ] != 0 || di[1 ] != 0 )
{
nrnran123_State** pv = (nrnran123_State**)(&_p_rng);
*pv = nrnran123_newstream (di[0 ], di[1 ]);
}
*offset += 2 ;
}
For this we launch a kernel for Serial compute i.e. just initialize stream as:
/* nrn123 streams are created from cpu launcher routine */
nrnran123_State* nrnran123_newstream (uint32_t id1, uint32_t id2) {
nrnran123_State* s;
cudaMalloc ( (void **)&s, sizeof (nrnran123_State) );
cudaMemset ( (void **)&s, 0 , sizeof (nrnran123_State) );
nrnran123_setup_cuda_newstream<<<1 ,1 >>> (s, id1, id2);
cudaDeviceSynchronize ();
return s;
}
This is terribly slow and inefficient! See sample profile for small 5msec simulation:
Reactions are currently unavailable
[copy/paste of internal issue created by @pramodk]
In GPU implementation of synapse model, we need to setup random123 streams on GPU which are being done in bbcore_read for every synapse instance:
For this we launch a kernel for Serial compute i.e. just initialize stream as:
This is terribly slow and inefficient! See sample profile for small 5msec simulation: