

1) Can somebody explain how come in update we can simply substitute the +from read() with a  ? The way we define responsability and the rightmost nonzero bit stuff, it is not obvious to me at all why is this reversible?
2) What does "scaling the whole" tree mean? Do we want to scale each frequency (f[i]), or each tree entry (t[i]), and what's the relation between these 2 cases? In void scale(int c){ for (int i = 1 ; i <= MaxVal ; i++) tree[i] = tree[i] / c; } I see division by c, was this the idea? Because if it is scaling, I would imagine we want to multiply... 

1) In update, you are updating responsible nodes, but in read, you are reading sum of the whole subtree. If idx is a10...0 (0 n times), then you will read all nodes with indexes a0b, where b is of length n. I think you should take a look at tree and try to figure out how does it work.
2) those two cases are the same. Let us take frequency F. It is some of entries (doesn't important which) and F can be written as F = e1 + e2 + ... + ek. If you scale F by s, then you will have F / s = (e1 + ... + ek) / s = e1 / s + ... + ek / s, what is the same as you scaled each entry.
If you don't like tree[i] / c, then replace "/ c" by " * (1 / c)" and you will multiply bu its inverse :) 

Yes, but each frequency participates in multiple tree[] entries, therefore, by dividing each tree entry by c, some individual frequency might end up divided more than once, whereas each tree entry gets divided once. So how are they equivalent? 

Also, can you please show me read() and readSingle() for 2 dimensions at least? 

<cite>Yes, but each frequency participates in multiple tree[] entries, therefore, by dividing each tree entry by c, some individual frequency might end up divided more than once, whereas each tree entry gets divided once. So how are they equivalent?</cite>
But individual frequency is sum of entries, so when you divide all entries by s it is same as you divide frequency by s. Maybe I don't understand your question, so please, can you give example where it "doesn't work"?
2D (it is almost same as update, changes are identical as in 1D case)
int read(int x , int y){
int y1, ret = 0;
while (x){
y1 = y;
while (y1){
ret += tree[x][y1];
y1 = (y1 & y1);
}
x = (x & x);
}
return ret;
}
I can write you readSingle, but I will try to explain you the difference, so you can try it by yourself. As you probably understood from the article, (x, y) means the field is responsible for some interval of xes (which is calculated in the same way as in 1D) and each of those is responsible for same interval of yes. So, as you just "copied" the same code twice to solve update and read (or called procedure with different parameters), try to done it in this case too. 

Yes, I treied with examples, and you are right, dividing each seperate frequency, and each tree entry by the same number has the same effect. Just for the record, I naively thought that as you divide each tree entry by c, each frequency participating in it gets dividied by c, therefore individual frequencies get divided many times. Thanks a lot I guess one final issue I have is related to reading individual frequencies : I wrote out the stuff with z on paper, did the math, and saw it works, but what would be a more natural way to represent it in my head? How should I view it in order to make it more natural? Maybe you can direct me to some illustrations of trees where I can see it more clearly? 

Well, at image 1.7 is shown (circled) what should be subtracted. Actually, it is kind of optimization. You can always read single frequency by read(x, y)  read(x, y  1)  read(x  1, y) + read(x  1, y  1)
But as explained for 1D, where you can do read(x)  read(x  1), it is easier to find place where their calculation starts to be the same  read(x), in some way, follows a tree and sum frequencies in the same way read(x  1) is doing that. The main idea of optimization is to throw away part of the paths where they are same. The same stands for nD (for any n).
EDIT: typos :) 

I see. So if we proceed beyond z, we will end up adding, and then subtracting the same stuff from ret, thus doing unnecessay work, thus we first add, and then subtract the parts of tree[x] and tree[x1] that they don't share. 

How would you read the total inside a rectangle with corners (x1, y1) and (x2, y2) ? I guess the common part can be defined as x = x1&x2 and y = y1&y2, but what to do with them? 

Well, you can do read(x2, y2)  read(x2, y1  1)  read(x1  1, y2) + read(x1, y2) (principle of inclusion and exclusion). 
