JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Just process wavefront by wavefront? | Reply
Wavefront processing looks interesting. The funny thing about algorithms is they always look easy when they're explained well, otherwise they're really hard...

Anyway, I couldnt help thinking, is there any reason why we cant just divide each wavefront by the number of threads, and process wavefront by wavefront?

That way, there is little or no idle time issue (just rounding issues when you divide the length of the wavefront by the number of threads), and the communication is limited to the triggering of the next wavefront.

As long as each thread uses the same algorithm for working out which bit of the current wavefront it is responsible for, just with a different thread number (1,2,3...), then communications is very small?

So, if we have three threads, 1,2,3. First wave would be:


1


next would be:


2
1


next would be:


3
2
1


Then:


1
3
2
1


... and so on.

Obviously for the first few wave fronts, there's a little idle time, because the wavefront is small compared to the number of threads, but those will pass fairly quickly, and after that it will be quite evenly spread out?
Re: Just process wavefront by wavefront? (response to post by hughperkins) | Reply
What you have described is the 1x1 block pattern that I noted in the article and provides very little idle time as you have noted.

There are two downsides to it however:
1) Load balancing issues could arise - take the example algorithm whose cell value depends on the cells before it. In your example, take the wave to the bottom right 3 cells - you'll see thread 3 (or whichever thread happens to process the bottom right cell) really get's hit hard because it does the most and the second most expensive processing while thread 2 only does the second most expensive and thread 1 doesn't do either. And in fact, if you back it up a few more waves - you'll see that thread 3 really get's alot more of the load than the other two.

2) Communications could be an issue with this scheme because each cell (which is processed by a single thread) will always [with the exception of the first and last few waves) need to communicate with 2 other threads. This can cause issues if the cell calculations have little CPU (like in the Lev. algorithm) because the cells can be calculated much faster than the data can flow between threads (ie they synchronization overhead). Choosing larger block sizes would allow you to reduce this at the cost of more idle time because a cell may not need to worry about the communication overhead.
RSS