Get Time
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
Vector Resize Hack | Reply

In topcoder competitions all the data is passed in standard containers. Usually it is OK but in high-performance marathons where processing speed is the main goal copying a lot of data is too slow. It also makes SSE cumbersome to use on input/output.


Since the problem appears only in high-performance marathons, only C++ is considered here (other languages have little chance in such marathons anyway). The competitors have to apply very dirty hacks to std::vectors to cope with the problems described above.

The example below shows a hack to attach vector to custom memory buffer (thanks to iquadrat). It is very unpractical, but it works well with GCC and MSVC (release, no IDE).

template<class T> void ResizeVectorHack(std::vector<T> &v, int size) {
  T* data = (T*)_mm_malloc(size * sizeof(T), 16);  //allocate aligned buffer
  T** ptr = (T**)&v;
  v.resize(1);                                     //leads to memory leak, but who cares...
  while (*ptr != &v[0]) ptr++;                     //MSVC adds some trash to std::vector
  *(ptr++) = data;
  *(ptr++) = data + size;
  *(ptr++) = data + size;
std::vector<float> CompetitorsMainMethod(const std::vector<float> &input) {
  std::vector<float> res;
  ResizeVectorHack(res, input.size());
  memcpy(&res[0], &input[0], res.size() * sizeof(res[0]));
  return res;


Here CompetitorsMainMethod is the method which is called by the testing system. The only thing it does is duplicates the input and returns it. Note that ResizeVectorHack calls _mm_malloc only for SSE alignment. If you don't use SSE you can replace it with ordinary malloc call.

Let's look into input/output in more detail:
1. The input is declared as const reference. You can change the type of any input parameter to const reference freely although there is no place in rules that admits it. Since only reference is passed, no copy occurs here.
2. The vector buffer is allocated via low-level C routines. malloc performs no initialization - no overhead here. The three necessary pointers to the data buffers are written to the vector internal members.
3. The output vector is returned by value. Does it mean that it is copied on return? Most certainly not because of the return value optimization. The C++ standard allows the compiler to ignore the copy + destructor calls of the function return value. There are several requirements to do it: the returned object must be created on the stack of function considered and be returned in all the possible execution pathes. In this particular case profiling on GCC and MSVC shows that no copy and destruction occurs.

Also note that it is useless to declare the return value as a const reference: the copy on return will still happen in this case.
Some time ago the static initialization hack was working in marathons. The static initialization of global members was not accounted in the work time measurement. Unfortunately this bug in the tester was fixed and the hack is no longer useful.
Re: Vector Resize Hack (response to post by syg96) | Reply
If you are using gcc locally, here is this code. The memory alloc could be changed from using the default reserve to making aligned allocations, if desired.
template<typename T>
class vector_uninit : public vector<T>
    T* set_size(typename vector<T>::size_type new_size)
        this->_M_impl._M_finish = this->_M_impl._M_start + new_size;
        return this->_M_impl._M_start;

Note that with this method, it is necessary before returning the vector_uninit to swap it into a normal vector. Otherwise it will be copied.
vector_uninit<int> v;
// ...
vector<int> retval;
return retval;
Re: Vector Resize Hack (response to post by syg96) | Reply
I want to note here something that is not directly related to std::vector but more to memory allocation in general. malloc will allocate memory to use. This memory needs to come from somewhere. This can be A)from the heap of the process or B) a call to mmap (I am assuming that a environment with glibc is used). Which of the two depends on the size and the mallopt settings.

In case A) there are two additional possibilities. Either an unused block is found in the heap and reused or the heap segment must be enlarged. Now, in the case of the enlargment, and also whenever mmap is used, there will be new pages associated to the process. These pages are initially unmapped in the virtual memory table. Upon first access, the OS will associate a physical page with the virtual page and zero initialize the page. (The zero initialization is to prevent one process getting used memory of another process and thereby possibly confidential information.)

This means although you might avoid the zero-initialisation by the stl using the resize hack, you still might have the cost of zero-initializing memory by the OS. Often, this cost cannot be avoided. But when many vectors are allocated and deallocated you can make sure that the process memory will be reused by following tuning of the memory allocator:
mallopt(M_TRIM_THRESHOLD, -1)
mallopt(M_MMAP_MAX, 0)

The first call makes sure that the processes heap segement never gets smaller, the second call disables mmap. This makes sure that physical memory once allocated to the process stays so.