Hope this clarifies your doubt.]]>

I've been studying this topic lately, and I came out with a question which answer I have not been able to figure out.

First of all, this question is actually about matching in general graphs, not bipartite graphs. It turns out that the algorithm for solving this problem is very similar to "Kuhn's algorithm" but when it finds an odd-length cycle it shrinks it into a single node and continues the search. The algorithm is called "Edmonds' Blossom algorithm".

However, it is well known that a given matching is the maximum one if and only if no more augmenting paths can be found on the graph. So, my question is the following one:

Kuhn's algorithm simply finds an augmenting path in the graph for including new matched edges. Therefore in principle it should also work for general graphs: if no more augmenting paths are found, then necessarily we have a maximum matching!. So I implemented the algorithm adding some extra information in the DFS for avoiding cycles and tested it for bipartite graphs and worked perfectly, however when it comes to general graphs it fails... of course the reason is because there are augmenting paths the algorithm cannot find (apparently because of odd-length cycles, a.k.a. blossoms), but why?

All of this converges to a simple question: why in Edmonds' algorithm is it necessary to shrink blossoms? I've read many papers where it is explained how to shrink blossoms and why an augmenting path in the reduced graph is still valid in the expanded one, but they never explain why this is actually necessary!

Thanks for your help.]]>

Solving this problem using maximum flow algorithm is obvious. We will create two new vertices:

All used edges between left and right parts are maximum matching. It is obvious why it works. It is easier to understand, especially if you understand and know how to code max flow algorithm.

#include<vector> #include<utility>usingnamespacestd;classMaxFlowImplementation {public: vector<vector<int> > g; vector<bool> used;intn, k;boolfind_path(inta,intb) {if(a == b)returntrue;if(used[a])returnfalse; used[a] =true;for(inti = 0; i < n+k+2; i++)if(g[a][i] > 0 && find_path(i, b)) { g[a][i]--; g[i][a]++;returntrue; }returnfalse; } vector<pair<int, int> > find_max_matching(vector<vector<int> > &v,int_n,int_k) { //v[i] is a list of adjacent vertices to vertex i, where i is from left part and v[i][j] is from right part n = _n; //n is number of vertices in left part of graph k = _k; //k is number of vertices in right part of graph g = vector<vector<int> >(n+k+2, vector<int>(n+k+2)); //g[i][j] = 1 if there is edge between vertex i from left part // and vertex j from right partfor(inti = 0; i < v.size(); i++)for(intj = 0; j < v[i].size(); j++) g[i][v[i][j]] = 1;intS = n+k;intT = n+k+1;for(inti = 0; i < n; i++) g[S][i] = 1;for(inti = 0; i < k; i++) g[n+i][T] = 1; vector<vector<int> > _g(g); used = vector<bool> (n+k+2,false);while(find_path(S, T)) fill(used.begin(), used.end(),false); vector<pair<int, int> > res;for(inti = 0; i < n; i++)for(intj = n; j < n+k; j++)if(g[i][j] < _g[i][j]) res.push_back(make_pair(i, j));returnres; } };

Solving maximum bipartite problem can be useful to solve problems using Hungarian algorithm, minimum vertex cover, etc. To tell the truth, while max - flow implementation is easier to understand and implement, Kuhn algorithm is more effective, especially improved implementation.

Good example of using this algorithm in solving Topcoder Problem is PythTriplets Problem. We have numbers. Some pairs of these numbers are good. For example, we have {a, b, c, d}. Let good pairs be: (a,b) (b,c) (c,d). We need to select maximum amount of pairs so that no number to be in different pairs. Here they are (a,b) and (c,d).

Of course first you need to do is build a graph. Each vertex is a number. After creating edges between vertices (create edge between

First no edge can be between even numbers because

Second no edge can be between odd numbers because

= 4*x*x + 4*x + 1 + 4*y*y + 4*y + 1 =

= 4*x*x + 4*x + 4*y*y + 4*y + 2 =

= 2*(2*x*x + 2*x + 2*y*y + 2*y + 1) =

= 2 * z;

and we can't divide

We will go through vertices and call

After that we just find maximum matching and return it's size:

classPythTriplets {public:intgcd(inta,intb) {returnb?gcd(b, a%b):a; }booledge(inta,intb)//is there an edge between a and b? {longlong_a = a, _b = b, n = _a*_a + _b*_b;longlongsqr = (longlong)(sqrt((double)(n))+1e-9);if(sqr*sqr != n)returnfalse;returngcd(a, b) == 1; }intfindMax(vector <string> stick) { string s;for(inti = 0; i < stick.size(); i++) s += stick[i]; vector<int> v; istringstream is(s);inta;while(is >> a) v.push_back(a); vector<int> nums;//even numbers and then odd numbers(first n numbers are even, the rest are odd)for(inti = 0; i < v.size(); i++)if(v[i]%2 == 0) nums.push_back(v[i]);intn = nums.size(), k = v.size() - n;for(inti = 0; i < v.size(); i++)if(v[i]%2 == 1) nums.push_back(v[i]); vector<vector<int> > g(nums.size());for(inti = 0; i < v.size(); i++)for(intj = i+1; j < v.size(); j++)if(edge(nums[i], nums[j])) g[i].push_back(j); //if you use Kuhn algorithm KuhnImplementation obj; // or if you use max flow algorithm //MaxFlowImplementation objreturnobj.find_max_matching(g, n, k).size(); } };

Another good example of using this algorithm is problem RookAttack, here we have chessboard with n rows and k columns and we have to place maximum number of rooks there so that no of them will attack each other. But we also have some broken cells, where we can't place rooks.

So we create a bipartite graph, where rows are vertices from left part of graph and columns are vertices from right part of graph and we create an edge between row and column if this cell is not broken. After that we just find maximum bipartite matching and return its size. It works because if we use cell

]]>classRookAttack {public:inthowMany(intn,intk, vector <string> cuts) { vector<vector<int> > g(n); vector<vector<bool> > _g(n, vector<bool> (k,true));for(inti = 0; i < cuts.size(); i++) { istringstream is(cuts[i]);inta, b;charc;while(is >> a >> b) { _g[a][b] =false; is >> c; } }for(inti = 0; i < n; i++)for(intj = 0; j < k; j++)if(_g[i][j]) g[i].push_back(j+n); //if you use Kuhn algorithm KuhnImplementation obj; // or if you use max flow algorithm //MaxFlowImplementation objreturnobj.find_max_matching(g, n, k).size(); } };

In this article we shall speak about Solving Maximum Bipartite Matching Problem. There is a bipartite graph containing

Chain with length

Matching is maximal if and only if there are no increasing chains with respect to it.

So, let’s notice that if we find an increasing chain, we can increase our matching by one. We will go along this increasing chain and mark edges which are not marked and unmark edges that were marked (first edge is not marked by definition – we will mark it, second is marked, so we will unmark it and so on). It is obvious that by doing this operation we will increase our matching by one, because length of increasing chain is always odd and because in this chain we had [k/2]+1 unmarked edges and [k/2] marked edges and after rematching we have [k/2] unmarked edges and [k/2]+1 marked edges in the chain.

So the main idea of the algorithm is to search increasing chains while we can and increase matching.

Well, new problem is how to find increasing chains. Kuhn’s algorithm is based on

Complexity of searching an increasing chain is

Let’s see how we will search increasing chains: we shall use

boolkuhn(vertex v) {if(used[v])returnfalse; used[v] =true;for(vertex q adjacent to v) {if((q has no pair)orkuhn(pairs[q])) { pairs[q] = v;returntrue; } } } find_max_matching {for(vertex v = {1,..,n}) { used = {0}; kuhn(v); } }

#include<vector> #include<utility>usingnamespacestd;classKuhnImplementation {public:intn, k; vector < vector<int> > g; vector<int> pairs; vector<bool> used;boolkuhn (intv) {if(used[v])returnfalse; used[v] =true;for(inti = 0; i < g[v].size(); ++i) {intto = g[v][i]-n;if(pairs[to] == -1 || kuhn (pairs[to])) { pairs [to] = v;returntrue; } }returnfalse; } vector<pair<int, int> > find_max_matching(vector<vector<int> > &_g,int_n,int_k) { g = _g; //g[i] is a list of adjacent vertices to vertex i, where i is from left patr and g[i][j] is from right part n = _n; //n is number of vertices in left part of graph k = _k; //k is number of vertices in right part of graph pairs = vector<int> (k, -1); //pairs[i] is a neighbor of vertex i from right part, on marked edge used = vector<bool> (n,false);for(inti = 0; i < n; ++i) { fill(used.begin(), used.end(),false); kuhn (i); } vector<pair<int, int> > res;for(inti = 0; i < k; i++)if(pairs[i] != -1) res.push_back(make_pair(pairs[i], i+n));returnres; } };

Let’s modify algorithm in next way. Do not clear used marks after you find one path. Instead of it run a series of DFS-es over all vertices in a single phase. One such phase takes strictly O(V+E) time (graph full traversal) and can find more than one increasing path at once. Moreover, the first phase will behave precisely as greedy algorithm (which is also improvement). After running each phase you should clear used and run the next phase. Terminate when no path is found during one phase.

#include<vector> #include<utility>]]>usingnamespacestd;classKuhnImplementation {public:intn, k; vector < vector<int> > g; vector<int> pairs_of_right, pairs_of_left; vector<bool> used;boolkuhn (intv) {if(used[v])returnfalse; used[v] =true;for(inti = 0; i < g[v].size(); ++i) {intto = g[v][i]-n;if(pairs_of_right[to] == -1 || kuhn (pairs_of_right[to])) { pairs_of_right[to] = v; pairs_of_left[v] = to;returntrue; } }returnfalse; } vector<pair<int, int> > find_max_matching(vector<vector<int> > &_g,int_n,int_k) { g = _g; //g[i] is a list of adjacent vertices to vertex i, where i is from left patr and g[i][j] is from right part n = _n; //n is number of vertices in left part of graph k = _k; //k is number of vertices in right part of graph pairs_of_right = vector<int> (k, -1); pairs_of_left = vector<int> (n, -1); //pairs_of_right[i] is a neighbor of vertex i from right part, on marked edge //pairs_of_left[i] is a neighbor of vertex i from left part, on marked edge used = vector<bool> (n,false);boolpath_found;do{ fill(used.begin(), used.end(),false); path_found =false; //remember to start only from free vertices which are not visited yetfor(inti = 0; i < n; ++i)if(pairs_of_left[i] < 0 && !used[i]) path_found |= kuhn (i); }while(path_found); vector<pair<int, int> > res;for(inti = 0; i < k; i++)if(pairs_of_right[i] != -1) res.push_back(make_pair(pairs_of_right[i], i+n));returnres; } };

Do not clear used marks after you find one path. Instead of it run a series of DFS-es over all vertices in a single phase. One such phase takes strictly O(V+E) time (graph full traversal) and can find more than one increasing path at once. Moreover, the first phase will behave precisely as greedy algorithm, so it supersedes you improvement. After running each phase you should clear used marks and run the next phase. Terminate when no path is found during one phase.

To apply this improvement to your simple implementation do the following:

1. Store pairs for vertices of left part too.

Add declaration: vector<int> pair2;

Add initialization: pair2 = vector<int> (n, -1);

Do not forget to set it in DFS:

pairs [to] = v;

pair2 [v] = to; //added line

return true;

2. Replace your DFS(kuhn) calls code in find_max_matching with this:

intphases = 0;boolhaschanged;do{ fill(used.begin(), used.end(),false); haschanged =false; //remember to start only from free vertices which are not visited yetfor(inti = 0; i < n; ++i)if(pair2[i] < 0 && !used[i]) haschanged |= kuhn (i); phases++; }while(haschanged);

The full working time is O(P(N + M)) where P is number of phases. It is clear that it won't exceed min(n, k)<=V.

I've just run your solution with this modification on PythTriplets problem with assert(phases <= 10). It is OK. 10 is much less than 100=)

This acceleration is also applicable to maxflow problem with unit capacities. Run DFS series in a single phase without clearing used in between. And do not use "used" mark for source/sink.]]>

About PythTriplets Problem. A lot of people solved it without knowing that the graph is bipartite (me included). The algorithm may be easily modified to work for arbitrary graph heuristically. This heuristics is often accepted unless you meet serious problemsetter=) For example, in problem Work scheduling DFS heuristics was accepted until Burunduk1 added some tests. Now it seems to be unsolvable in this incorrect way.]]>