



NAME Solving Maximum Bipartite Matching Problem
PROBLEM In this article we shall speak about Solving Maximum Bipartite Matching Problem. There is a bipartite graph containing N vertices (n vertices in left part and k = Nn vertices in right part of graph) and M edges. We are to find maximum bipartite matching, i.e. mark maximum number of edges, so that no one of them have adjacent vertices with each other. This problem can be easily solved in two ways.
SOLUTION First way: Kuhn’s algorithm.
Chain with length k is a simple path (i.e. it contains no repeated vertices or edges) that has k edges in the bipartite graph. Alternating chain (in a bipartite graph, with respect to some matching) is a chain in which the edges alternately belong / not belong to the matching. The increasing chain (in a bipartite graph, with respect to a matching) is an alternating chain, whose initial and final vertices (as well as edges) do not belong to matching.
Berja’s theorem: Matching is maximal if and only if there are no increasing chains with respect to it.
So, let’s notice that if we find an increasing chain, we can increase our matching by one. We will go along this increasing chain and mark edges which are not marked and unmark edges that were marked (first edge is not marked by definition – we will mark it, second is marked, so we will unmark it and so on). It is obvious that by doing this operation we will increase our matching by one, because length of increasing chain is always odd and because in this chain we had [k/2]+1 unmarked edges and [k/2] marked edges and after rematching we have [k/2] unmarked edges and [k/2]+1 marked edges in the chain.
So the main idea of the algorithm is to search increasing chains while we can and increase matching.
Well, new problem is how to find increasing chains. Kuhn’s algorithm is based on dfs (depth first search) or bfs (breadth first search) algorithm.
Complexity of searching an increasing chain is O(N+M) and maximum number of them is N/2, overall asymptotic is O(N*N + N*M) = O(NM)
Let’s see how we will search increasing chains: we shall use dfs. We will call dfs only from vertices of graph’s left part. From left part it goes to right only using not marked edges, and from right to left only using marked edges. In our implementation dfs will be called only from left graph’s vertices and will return bool – true if it found chain and false if not. From current vertex algorithm will go through all adjacent not marked edges to vertex from right part called TO. If TO has no adjacent marked edges, dfs will return true, because it is last vertex of increasing chain. If it has, we will call dfs from TO’s neighbor on marked edge and return true if it returned true.
Pseudocode:
bool kuhn(vertex v)
{
if (used[v]) return false;
used[v] = true;
for(vertex q adjacent to v)
{
if((q has no pair) or kuhn(pairs[q]))
{
pairs[q] = v;
return true;
}
}
}
find_max_matching
{
for(vertex v = {1,..,n})
{
used = {0};
kuhn(v);
}
}
Implementation (C++):
#include<vector>
#include<utility>
using namespace std;
class KuhnImplementation
{
public:
int n, k;
vector < vector<int> > g;
vector<int> pairs;
vector<bool> used;
bool kuhn (int v)
{
if (used[v]) return false;
used[v] = true;
for (int i = 0; i < g[v].size(); ++i)
{
int to = g[v][i]n;
if (pairs[to] == 1  kuhn (pairs[to]))
{
pairs [to] = v;
return true;
}
}
return false;
}
vector<pair<int, int> > find_max_matching(vector<vector<int> > &_g, int _n, int _k)
{
g = _g;
//g[i] is a list of adjacent vertices to vertex i, where i is from left patr and g[i][j] is from right part
n = _n;
//n is number of vertices in left part of graph
k = _k;
//k is number of vertices in right part of graph
pairs = vector<int> (k, 1);
//pairs[i] is a neighbor of vertex i from right part, on marked edge
used = vector<bool> (n, false);
for (int i = 0; i < n; ++i)
{
fill(used.begin(), used.end(), false);
kuhn (i);
}
vector<pair<int, int> > res;
for(int i = 0; i < k; i++)
if(pairs[i] != 1)
res.push_back(make_pair(pairs[i], i+n));
return res;
}
};
Improved implementation:
Let’s modify algorithm in next way. Do not clear used marks after you find one path. Instead of it run a series of DFSes over all vertices in a single phase. One such phase takes strictly O(V+E) time (graph full traversal) and can find more than one increasing path at once. Moreover, the first phase will behave precisely as greedy algorithm (which is also improvement). After running each phase you should clear used and run the next phase. Terminate when no path is found during one phase.
#include<vector>
#include<utility>
using namespace std;
class KuhnImplementation
{
public:
int n, k;
vector < vector<int> > g;
vector<int> pairs_of_right, pairs_of_left;
vector<bool> used;
bool kuhn (int v)
{
if (used[v]) return false;
used[v] = true;
for (int i = 0; i < g[v].size(); ++i)
{
int to = g[v][i]n;
if (pairs_of_right[to] == 1  kuhn (pairs_of_right[to]))
{
pairs_of_right[to] = v;
pairs_of_left[v] = to;
return true;
}
}
return false;
}
vector<pair<int, int> > find_max_matching(vector<vector<int> > &_g, int _n, int _k)
{
g = _g;
//g[i] is a list of adjacent vertices to vertex i, where i is from left patr and g[i][j] is from right part
n = _n;
//n is number of vertices in left part of graph
k = _k;
//k is number of vertices in right part of graph
pairs_of_right = vector<int> (k, 1);
pairs_of_left = vector<int> (n, 1);
//pairs_of_right[i] is a neighbor of vertex i from right part, on marked edge
//pairs_of_left[i] is a neighbor of vertex i from left part, on marked edge
used = vector<bool> (n, false);
bool path_found;
do {
fill(used.begin(), used.end(), false);
path_found = false;
//remember to start only from free vertices which are not visited yet
for (int i = 0; i < n; ++i)
if (pairs_of_left[i] < 0 && !used[i])
path_found = kuhn (i);
} while (path_found);
vector<pair<int, int> > res;
for(int i = 0; i < k; i++)
if(pairs_of_right[i] != 1)
res.push_back(make_pair(pairs_of_right[i], i+n));
return res;
}
};


Second way: Maximum flow algorithm.
Solving this problem using maximum flow algorithm is obvious. We will create two new vertices: S and T. We will create edges from S to all vertices of left part of graph and we will create edges from all vertices of right part of graph to T. All edges capacity will be equal to 1. After that we need to find maximum flow from S to T. All used edges between left and right parts are maximum matching. It is obvious why it works. It is easier to understand, especially if you understand and know how to code max flow algorithm.
Implementation:
#include<vector>
#include<utility>
using namespace std;
class MaxFlowImplementation
{
public:
vector<vector<int> > g;
vector<bool> used;
int n, k;
bool find_path(int a, int b)
{
if(a == b) return true;
if(used[a]) return false;
used[a] = true;
for(int i = 0; i < n+k+2; i++)
if(g[a][i] > 0 && find_path(i, b))
{
g[a][i];
g[i][a]++;
return true;
}
return false;
}
vector<pair<int, int> > find_max_matching(vector<vector<int> > &v, int _n, int _k)
{
//v[i] is a list of adjacent vertices to vertex i, where i is from left part and v[i][j] is from right part
n = _n;
//n is number of vertices in left part of graph
k = _k;
//k is number of vertices in right part of graph
g = vector<vector<int> >(n+k+2, vector<int>(n+k+2));
//g[i][j] = 1 if there is edge between vertex i from left part
// and vertex j from right part
for(int i = 0; i < v.size(); i++)
for(int j = 0; j < v[i].size(); j++)
g[i][v[i][j]] = 1;
int S = n+k;
int T = n+k+1;
for(int i = 0; i < n; i++)
g[S][i] = 1;
for(int i = 0; i < k; i++)
g[n+i][T] = 1;
vector<vector<int> > _g(g);
used = vector<bool> (n+k+2, false);
while(find_path(S, T))
fill(used.begin(), used.end(), false);
vector<pair<int, int> > res;
for(int i = 0; i < n; i++)
for(int j = n; j < n+k; j++)
if(g[i][j] < _g[i][j])
res.push_back(make_pair(i, j));
return res;
}
};
DISCUSSION Solving maximum bipartite problem can be useful to solve problems using Hungarian algorithm, minimum vertex cover, etc. To tell the truth, while max  flow implementation is easier to understand and implement, Kuhn algorithm is more effective, especially improved implementation.
Good example of using this algorithm in solving Topcoder Problem is PythTriplets Problem. We have numbers. Some pairs of these numbers are good. For example, we have {a, b, c, d}. Let good pairs be: (a,b) (b,c) (c,d). We need to select maximum amount of pairs so that no number to be in different pairs. Here they are (a,b) and (c,d).
Of course first you need to do is build a graph. Each vertex is a number. After creating edges between vertices (create edge between a and b if gcd(a, b) = 1 and a^{2} + b^{2} = (some integer)^{2}). The trick is that it is a bipartite graph. How to build a bipartite graph? There are two ways here.
First way is to notice that if we place all even numbers to left part and all odd numbers to right part then we will get a bipartite graph. How to prove it? First no edge can be between even numbers because gcd(2*x, 2*y) >= 2. Second no edge can be between odd numbers because (2*x+1)^2 + (2*y+1)^2 = = 4*x*x + 4*x + 1 + 4*y*y + 4*y + 1 = = 4*x*x + 4*x + 4*y*y + 4*y + 2 = = 2*(2*x*x + 2*x + 2*y*y + 2*y + 1) = = 2 * z; where z is odd number and we can't divide ((2*x+1)^2 + (2*y+1)^2) twice, so there is no integer square root of ((2*x+1)^2 + (2*y+1)^2).
Second way is to build a bipartite graph using algorithm. We will go through vertices and call dfs() from unused vertices placing them to left part, its neighbors to right part and so on.
After that we just find maximum matching and return it's size:
class PythTriplets
{
public:
int gcd(int a, int b)
{
return b?gcd(b, a%b):a;
}
bool edge(int a, int b)//is there an edge between a and b?
{
long long _a = a, _b = b, n = _a*_a + _b*_b;
long long sqr = (long long)(sqrt((double)(n))+1e9);
if(sqr*sqr != n) return false;
return gcd(a, b) == 1;
}
int findMax(vector <string> stick)
{
string s;
for(int i = 0; i < stick.size(); i++)
s += stick[i];
vector<int> v;
istringstream is(s);
int a;
while(is >> a)
v.push_back(a);
vector<int> nums;//even numbers and then odd numbers(first n numbers are even, the rest are odd)
for(int i = 0; i < v.size(); i++)
if(v[i]%2 == 0)
nums.push_back(v[i]);
int n = nums.size(), k = v.size()  n;
for(int i = 0; i < v.size(); i++)
if(v[i]%2 == 1)
nums.push_back(v[i]);
vector<vector<int> > g(nums.size());
for(int i = 0; i < v.size(); i++)
for(int j = i+1; j < v.size(); j++)
if(edge(nums[i], nums[j]))
g[i].push_back(j);
//if you use Kuhn algorithm
KuhnImplementation obj;
// or if you use max flow algorithm
//MaxFlowImplementation obj
return obj.find_max_matching(g, n, k).size();
}
};
Another good example of using this algorithm is problem RookAttack, here we have chessboard with n rows and k columns and we have to place maximum number of rooks there so that no of them will attack each other. But we also have some broken cells, where we can't place rooks. So we create a bipartite graph, where rows are vertices from left part of graph and columns are vertices from right part of graph and we create an edge between row and column if this cell is not broken. After that we just find maximum bipartite matching and return its size. It works because if we use cell board[i][j] then we can't use cells on the same row or column and the same is right for edge in Maximum Bipartite Matching Problem: if we use some edge ab, we can't use edges which has any vertices of a or b.
Implementation:
class RookAttack
{
public:
int howMany(int n, int k, vector <string> cuts)
{
vector<vector<int> > g(n);
vector<vector<bool> > _g(n, vector<bool> (k, true));
for(int i = 0; i < cuts.size(); i++)
{
istringstream is(cuts[i]);
int a, b;
char c;
while(is >> a >> b)
{
_g[a][b] = false;
is >> c;
}
}
for(int i = 0; i < n; i++)
for(int j = 0; j < k; j++)
if(_g[i][j])
g[i].push_back(j+n);
//if you use Kuhn algorithm
KuhnImplementation obj;
// or if you use max flow algorithm
//MaxFlowImplementation obj
return obj.find_max_matching(g, n, k).size();
}
};


What's up? Why no admin comments my article? Do I have any problems with it? Maybe I can fix it. 

I can't really give a thoughtful comment on a nontrivial recipe like this so fast, especially since I have other recipes to comment as well (not to mention that there is another recipe on the same topic). Just have patience, please. 

Maybe should I rewrite it to make it more сlearly  understanding? 

The pesudocode contains a small error: it must be kuhn(pairs[q]) instead of just kuhn(q).
About PythTriplets Problem. A lot of people solved it without knowing that the graph is bipartite (me included). The algorithm may be easily modified to work for arbitrary graph heuristically. This heuristics is often accepted unless you meet serious problemsetter=) For example, in problem Work scheduling DFS heuristics was accepted until Burunduk1 added some tests. Now it seems to be unsolvable in this incorrect way. 

I believe that "improved version" is not a good idea of acceleration. There is an easy approach which works better than simply precalculating greedy matching.
Do not clear used marks after you find one path. Instead of it run a series of DFSes over all vertices in a single phase. One such phase takes strictly O(V+E) time (graph full traversal) and can find more than one increasing path at once. Moreover, the first phase will behave precisely as greedy algorithm, so it supersedes you improvement. After running each phase you should clear used marks and run the next phase. Terminate when no path is found during one phase.
To apply this improvement to your simple implementation do the following: 1. Store pairs for vertices of left part too. Add declaration: vector<int> pair2; Add initialization: pair2 = vector<int> (n, 1); Do not forget to set it in DFS: pairs [to] = v; pair2 [v] = to; //added line return true; 2. Replace your DFS(kuhn) calls code in find_max_matching with this:
int phases = 0;
bool haschanged;
do {
fill(used.begin(), used.end(), false);
haschanged = false;
//remember to start only from free vertices which are not visited yet
for (int i = 0; i < n; ++i)
if (pair2[i] < 0 && !used[i])
haschanged = kuhn (i);
phases++;
} while (haschanged);
The full working time is O(P(N + M)) where P is number of phases. It is clear that it won't exceed min(n, k)<=V. I've just run your solution with this modification on PythTriplets problem with assert(phases <= 10). It is OK. 10 is much less than 100=)
This acceleration is also applicable to maxflow problem with unit capacities. Run DFS series in a single phase without clearing used in between. And do not use "used" mark for source/sink. 

I believe that you are right. I will add your algorithm in a couple of days. Thank you, it is always a big pleasure to get advice from neartarget :) 

you know, I have just tested my simple implementation, improved implementation and your and maxflow implementation. Max  flow implementation was least effective. My improved implementation was much faster(79 times) than simple. Your was faster than my improved 23 times. 

Hello all,
I've been studying this topic lately, and I came out with a question which answer I have not been able to figure out.
First of all, this question is actually about matching in general graphs, not bipartite graphs. It turns out that the algorithm for solving this problem is very similar to "Kuhn's algorithm" but when it finds an oddlength cycle it shrinks it into a single node and continues the search. The algorithm is called "Edmonds' Blossom algorithm".
However, it is well known that a given matching is the maximum one if and only if no more augmenting paths can be found on the graph. So, my question is the following one:
Kuhn's algorithm simply finds an augmenting path in the graph for including new matched edges. Therefore in principle it should also work for general graphs: if no more augmenting paths are found, then necessarily we have a maximum matching!. So I implemented the algorithm adding some extra information in the DFS for avoiding cycles and tested it for bipartite graphs and worked perfectly, however when it comes to general graphs it fails... of course the reason is because there are augmenting paths the algorithm cannot find (apparently because of oddlength cycles, a.k.a. blossoms), but why?
All of this converges to a simple question: why in Edmonds' algorithm is it necessary to shrink blossoms? I've read many papers where it is explained how to shrink blossoms and why an augmenting path in the reduced graph is still valid in the expanded one, but they never explain why this is actually necessary!
Thanks for your help. 

To find alternating paths we always follow the matched edge, when the node in consideration is ODD. Now imagine there is an odd cycle which has the node u (marked ODD). If there is an edge going out directly from u to a freenode v, then v will never be matched as we will always encounter this odd cycle. But if we reduce the cycle (contract the blossom) into a pseudonode we can then find an alternating path from the root node (another free node) to v using the pseudonode.
Hope this clarifies your doubt. 

"Berja's theorem" should be replaced by "Berge's lemma". Correcting this typo will greatly help interested readers to research the topic. Thanks a lot for a great article! 
