
Yes, a (labeled) tree on n vertices can be represented with O(n log n) bits, and this is optimal by Cayley's formula: http://en.wikipedia.org/wiki/Cayley%27s_formula So you're right in pointing out you need more than n bits overall. (But notice that any bound that is polynomial in n log n, or in n^10 for that matter, is also polynomial in n).
No, it is not necessary (nor does it make much sense) in order to prove that a certain problem is NPcomplete to show that there is no smaller representation. Input representation is in fact part of the problem definition. A problem can become significantly easier or harder if instances are encoded in a different way. For a trivial example, notice that if your instances to SAT are formulas padded with 2^n garbage bits, then you obviously can solve this "variant" of SAT in polynomial time.
The reason why this is often glossed over is that usually all "reasonable" representations are polynomially related in size (and moreover you can turn one into another in polynomial time), and it doesn't matter which one you use as long as you don't try to "cheat" (e.g. by including in the encoding data that is hard to compute from the original representation, or making it significantly longer).
As for good books, the best one on complexity is in my opinion Arora and Barak's book, you can find a draft online at http://www.cs.princeton.edu/theory/complexity/ I'm not sure if it is the most suitable for beginners (it covers a huge amount of material), but you may be interested in having a look at the first few chapters. 