Register Now
Member Count: 626,478 - April 18, 2014  [Get Time]
Login
forums   
Round Tables
News Discussions
Algorithm Matches
Marathon Matches
NASA Tournament Lab
Software Forums
TopCoder Cookbook
High School Matches
Sponsor Discussions
Search Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
Forums TopCoder Cookbook Algorithm Competitions - New Recipes Solving Maximum Bipartite Matching Problem
Solving Maximum Bipartite Matching Problem | Reply
NAME
Solving Maximum Bipartite Matching Problem

PROBLEM
In this article we shall speak about Solving Maximum Bipartite Matching Problem. There is a bipartite graph containing N vertices (n vertices in left part and k = N-n vertices in right part of graph) and M edges. We are to find maximum bipartite matching, i.e. mark maximum number of edges, so that no one of them have adjacent vertices with each other. This problem can be easily solved in two ways.

SOLUTION
First way: Kuhn’s algorithm.

Chain with length k is a simple path (i.e. it contains no repeated vertices or edges) that has k edges in the bipartite graph. Alternating chain (in a bipartite graph, with respect to some matching) is a chain in which the edges alternately belong / not belong to the matching. The increasing chain (in a bipartite graph, with respect to a matching) is an alternating chain, whose initial and final vertices (as well as edges) do not belong to matching.

Berja’s theorem:
Matching is maximal if and only if there are no increasing chains with respect to it.


So, let’s notice that if we find an increasing chain, we can increase our matching by one. We will go along this increasing chain and mark edges which are not marked and unmark edges that were marked (first edge is not marked by definition – we will mark it, second is marked, so we will unmark it and so on). It is obvious that by doing this operation we will increase our matching by one, because length of increasing chain is always odd and because in this chain we had [k/2]+1 unmarked edges and [k/2] marked edges and after rematching we have [k/2] unmarked edges and [k/2]+1 marked edges in the chain.

So the main idea of the algorithm is to search increasing chains while we can and increase matching.

Well, new problem is how to find increasing chains. Kuhn’s algorithm is based on dfs (depth first search) or bfs (breadth first search) algorithm.

Complexity of searching an increasing chain is O(N+M) and maximum number of them is N/2, overall asymptotic is O(N*N + N*M) = O(NM)

Let’s see how we will search increasing chains: we shall use dfs. We will call dfs only from vertices of graph’s left part. From left part it goes to right only using not marked edges, and from right to left only using marked edges. In our implementation dfs will be called only from left graph’s vertices and will return bool – true if it found chain and false if not. From current vertex algorithm will go through all adjacent not marked edges to vertex from right part called TO. If TO has no adjacent marked edges, dfs will return true, because it is last vertex of increasing chain. If it has, we will call dfs from TO’s neighbor on marked edge and return true if it returned true.

Pseudocode:
bool kuhn(vertex v) 
{
        if (used[v]) return false;
        used[v] = true;
 
        for(vertex q adjacent to v) 
        {
                if((q has no pair) or kuhn(pairs[q])) 
                {
                        pairs[q] = v;
                        return true;
                }   
        }
}
find_max_matching 
{
        for(vertex v = {1,..,n}) 
        {
                used = {0};
                kuhn(v);
        }
}

Implementation (C++):

#include<vector>
#include<utility>
using namespace std;
 
class KuhnImplementation
{
public:
	int n, k;
	vector < vector<int> > g;
	vector<int> pairs;
	vector<bool> used;
 
 
	bool kuhn (int v) 
        {
		if (used[v])  return false;
		used[v] = true;
		for (int i = 0; i < g[v].size(); ++i) 
                {
			int to = g[v][i]-n;
			if (pairs[to] == -1 || kuhn (pairs[to])) 
                        {
				pairs [to] = v;
				return true;
			}
		}
		return false;
	}
 
	vector<pair<int, int> > find_max_matching(vector<vector<int> > &_g, int _n, int _k) 
        {
		g = _g;
		//g[i] is a list of adjacent vertices to vertex i, where i is from left patr and g[i][j] is from right part
		n = _n;
		//n is number of vertices in left part of graph
		k = _k;
		//k is number of vertices in right part of graph
 
		pairs = vector<int> (k, -1);
		//pairs[i] is a neighbor of vertex i from right part, on marked edge
		used = vector<bool> (n, false);
		for (int i = 0; i < n; ++i) 
                {
			fill(used.begin(), used.end(), false);
			kuhn (i);
		}
	
		vector<pair<int, int> > res;
		for(int i = 0; i < k; i++)
			if(pairs[i] != -1)
				res.push_back(make_pair(pairs[i], i+n));
 
		return res;
 
	}
};


Improved implementation:

Let’s modify algorithm in next way. Do not clear used marks after you find one path. Instead of it run a series of DFS-es over all vertices in a single phase. One such phase takes strictly O(V+E) time (graph full traversal) and can find more than one increasing path at once. Moreover, the first phase will behave precisely as greedy algorithm (which is also improvement). After running each phase you should clear used and run the next phase. Terminate when no path is found during one phase.
#include<vector>
#include<utility>
using namespace std;
 
class KuhnImplementation
{
public:
	int n, k;
	vector < vector<int> > g;
	vector<int> pairs_of_right, pairs_of_left;
	vector<bool> used;
 
 
	bool kuhn (int v) 
	{
		if (used[v])  return false;
		used[v] = true;
		for (int i = 0; i < g[v].size(); ++i) 
		{
			int to = g[v][i]-n;
			if (pairs_of_right[to] == -1 || kuhn (pairs_of_right[to])) 
			{
				pairs_of_right[to] = v;
				pairs_of_left[v] = to;
				return true;
			}
		}
		return false;
	}
 
	vector<pair<int, int> > find_max_matching(vector<vector<int> > &_g, int _n, int _k) 
	{
		g = _g;
		//g[i] is a list of adjacent vertices to vertex i, where i is from left patr and g[i][j] is from right part
		n = _n;
		//n is number of vertices in left part of graph
		k = _k;
		//k is number of vertices in right part of graph
 
		pairs_of_right = vector<int> (k, -1);
		pairs_of_left = vector<int> (n, -1);
		//pairs_of_right[i] is a neighbor of vertex i from right part, on marked edge
		//pairs_of_left[i]  is a neighbor of vertex i from left part, on marked edge
		used = vector<bool> (n, false);
 
 
		bool path_found;
		do {
			fill(used.begin(), used.end(), false);
			path_found = false;
			//remember to start only from free vertices which are not visited yet
			for (int i = 0; i < n; ++i)
				if (pairs_of_left[i] < 0 && !used[i])
					path_found |= kuhn (i);
		} while (path_found);
 
 
		vector<pair<int, int> > res;
		for(int i = 0; i < k; i++)
			if(pairs_of_right[i] != -1)
				res.push_back(make_pair(pairs_of_right[i], i+n));
 
		return res;
	}
};
Re: Solving Maximum Bipartite Matching Problem (response to post by ibra) | Reply
Second way: Maximum flow algorithm.

Solving this problem using maximum flow algorithm is obvious. We will create two new vertices: S and T. We will create edges from S to all vertices of left part of graph and we will create edges from all vertices of right part of graph to T. All edges capacity will be equal to 1. After that we need to find maximum flow from S to T.
All used edges between left and right parts are maximum matching. It is obvious why it works. It is easier to understand, especially if you understand and know how to code max flow algorithm.

Implementation:

#include<vector>
#include<utility>
using namespace std;
 
class MaxFlowImplementation
{
public:
	vector<vector<int> > g;
	vector<bool> used;
	int n, k;
 
	bool find_path(int a, int b) 
        {
		if(a == b) return true;
		if(used[a]) return false;
		used[a] = true;
		for(int i = 0; i < n+k+2; i++) 
			if(g[a][i] > 0 && find_path(i, b))
                        {
				g[a][i]--;
				g[i][a]++;
				return true;
			}
		return false;
	}
 
	vector<pair<int, int> > find_max_matching(vector<vector<int> > &v, int _n, int _k)
        {
		//v[i] is a list of adjacent vertices to vertex i, where i is from left part and v[i][j] is from right part
		n = _n;
		//n is number of vertices in left part of graph
		k = _k;
		//k is number of vertices in right part of graph
 
		g = vector<vector<int> >(n+k+2, vector<int>(n+k+2));
		//g[i][j] = 1 if there is edge between vertex i from left part 
		//            			    and vertex j from right part
		for(int i = 0; i < v.size(); i++)
                    for(int j = 0; j < v[i].size(); j++)
                        g[i][v[i][j]] = 1;
		int S = n+k;
		int T = n+k+1; 
		for(int i = 0; i < n; i++)
			g[S][i] = 1;
		for(int i = 0; i < k; i++)
			g[n+i][T] = 1;
 
		vector<vector<int> > _g(g);
 
		used = vector<bool> (n+k+2, false);
		while(find_path(S, T))
			fill(used.begin(), used.end(), false);
 
		vector<pair<int, int> > res;
		for(int i = 0; i < n; i++)
			for(int j = n; j < n+k; j++)
				if(g[i][j] < _g[i][j])
					res.push_back(make_pair(i, j));
 
		return res;
	}
};


DISCUSSION
Solving maximum bipartite problem can be useful to solve problems using Hungarian algorithm, minimum vertex cover, etc. To tell the truth, while max - flow implementation is easier to understand and implement, Kuhn algorithm is more effective, especially improved implementation.


Good example of using this algorithm in solving Topcoder Problem is PythTriplets Problem. We have numbers. Some pairs of these numbers are good. For example, we have {a, b, c, d}. Let good pairs be: (a,b) (b,c) (c,d). We need to select maximum amount of pairs so that no number to be in different pairs. Here they are (a,b) and (c,d).

Of course first you need to do is build a graph. Each vertex is a number. After creating edges between vertices (create edge between a and b if gcd(a, b) = 1 and a2 + b2 = (some integer)2). The trick is that it is a bipartite graph. How to build a bipartite graph? There are two ways here.

First way is to notice that if we place all even numbers to left part and all odd numbers to right part then we will get a bipartite graph. How to prove it?
First no edge can be between even numbers because
gcd(2*x, 2*y) >= 2.
Second no edge can be between odd numbers because
(2*x+1)^2 + (2*y+1)^2 =
= 4*x*x + 4*x + 1 + 4*y*y + 4*y + 1 =
= 4*x*x + 4*x + 4*y*y + 4*y + 2 =
= 2*(2*x*x + 2*x + 2*y*y + 2*y + 1) =
= 2 * z;
where z is odd number
and we can't divide ((2*x+1)^2 + (2*y+1)^2) twice, so there is no integer square root of ((2*x+1)^2 + (2*y+1)^2).

Second way is to build a bipartite graph using algorithm.
We will go through vertices and call dfs() from unused vertices placing them to left part, its neighbors to right part and so on.

After that we just find maximum matching and return it's size:

class PythTriplets 
{
public:
        int gcd(int a, int b)
        {
                return b?gcd(b, a%b):a;
        }
 
        bool edge(int a, int b)//is there an edge between a and b?
        {
		long long _a = a, _b = b, n = _a*_a + _b*_b;
		long long sqr = (long long)(sqrt((double)(n))+1e-9);
		if(sqr*sqr != n) return false;
		return gcd(a, b) == 1;
        }
 
        int findMax(vector <string> stick) 
        {
		string s;
		for(int i = 0; i < stick.size(); i++)
			s += stick[i];
 
		vector<int> v;
		istringstream is(s);
		int a;
		while(is >> a)
			v.push_back(a);
 
		vector<int> nums;//even numbers and then odd numbers(first n numbers are even, the rest are odd)
		for(int i = 0; i < v.size(); i++)
			if(v[i]%2 == 0)
				nums.push_back(v[i]);
		int n = nums.size(), k = v.size() - n;
		for(int i = 0; i < v.size(); i++)
			if(v[i]%2 == 1)
				nums.push_back(v[i]);
 
		
		vector<vector<int> > g(nums.size());
		for(int i = 0; i < v.size(); i++)
			for(int j = i+1; j < v.size(); j++)
				if(edge(nums[i], nums[j]))
					g[i].push_back(j);
 
		//if you use Kuhn algorithm
		KuhnImplementation obj;
                // or if you use max flow algorithm
                //MaxFlowImplementation obj
		return obj.find_max_matching(g, n, k).size();
        }
};




Another good example of using this algorithm is problem RookAttack, here we have chessboard with n rows and k columns and we have to place maximum number of rooks there so that no of them will attack each other. But we also have some broken cells, where we can't place rooks.
So we create a bipartite graph, where rows are vertices from left part of graph and columns are vertices from right part of graph and we create an edge between row and column if this cell is not broken. After that we just find maximum bipartite matching and return its size. It works because if we use cell board[i][j] then we can't use cells on the same row or column and the same is right for edge in Maximum Bipartite Matching Problem: if we use some edge a-b, we can't use edges which has any vertices of a or b.

Implementation:

class RookAttack 
{
public:
        int howMany(int n, int k, vector <string> cuts) 
        {
		vector<vector<int> > g(n);
		vector<vector<bool> > _g(n, vector<bool> (k, true));
		
		for(int i = 0; i < cuts.size(); i++)
		{
			istringstream is(cuts[i]);
			int a, b;
			char c;
			while(is >> a >> b)
			{
				_g[a][b] = false;
				is >> c;
			}
		}
 
		for(int i = 0; i < n; i++)
			for(int j = 0; j < k; j++)
				if(_g[i][j])
					g[i].push_back(j+n);
 
		//if you use Kuhn algorithm
		KuhnImplementation obj;
                // or if you use max flow algorithm
                //MaxFlowImplementation obj
		return obj.find_max_matching(g, n, k).size();
        }
};
Re: Solving Maximum Bipartite Matching Problem (response to post by ibra) | Reply
What's up? Why no admin comments my article? Do I have any problems with it? Maybe I can fix it.
Re: Solving Maximum Bipartite Matching Problem (response to post by ibra) | Reply
I can't really give a thoughtful comment on a non-trivial recipe like this so fast, especially since I have other recipes to comment as well (not to mention that there is another recipe on the same topic). Just have patience, please.
Re: Solving Maximum Bipartite Matching Problem (response to post by Nickolas) | Reply
Maybe should I rewrite it to make it more сlearly - understanding?
Re: Solving Maximum Bipartite Matching Problem (response to post by ibra) | Reply
added one more example
Re: Solving Maximum Bipartite Matching Problem (response to post by ibra) | Reply
The pesudocode contains a small error: it must be kuhn(pairs[q]) instead of just kuhn(q).

About PythTriplets Problem. A lot of people solved it without knowing that the graph is bipartite (me included). The algorithm may be easily modified to work for arbitrary graph heuristically. This heuristics is often accepted unless you meet serious problemsetter=) For example, in problem Work scheduling DFS heuristics was accepted until Burunduk1 added some tests. Now it seems to be unsolvable in this incorrect way.
Re: Solving Maximum Bipartite Matching Problem (response to post by ibra) | Reply
I believe that "improved version" is not a good idea of acceleration. There is an easy approach which works better than simply precalculating greedy matching.

Do not clear used marks after you find one path. Instead of it run a series of DFS-es over all vertices in a single phase. One such phase takes strictly O(V+E) time (graph full traversal) and can find more than one increasing path at once. Moreover, the first phase will behave precisely as greedy algorithm, so it supersedes you improvement. After running each phase you should clear used marks and run the next phase. Terminate when no path is found during one phase.

To apply this improvement to your simple implementation do the following:
1. Store pairs for vertices of left part too.
Add declaration: vector<int> pair2;
Add initialization: pair2 = vector<int> (n, -1);
Do not forget to set it in DFS:
pairs [to] = v;
pair2 [v] = to; //added line
return true;
2. Replace your DFS(kuhn) calls code in find_max_matching with this:
    int phases = 0;
		
    bool haschanged;
    do {
      fill(used.begin(), used.end(), false);
      haschanged = false;
      //remember to start only from free vertices which are not visited yet
      for (int i = 0; i < n; ++i)
        if (pair2[i] < 0 && !used[i])
          haschanged |= kuhn (i);
      phases++;
    } while (haschanged);


The full working time is O(P(N + M)) where P is number of phases. It is clear that it won't exceed min(n, k)<=V.
I've just run your solution with this modification on PythTriplets problem with assert(phases <= 10). It is OK. 10 is much less than 100=)

This acceleration is also applicable to maxflow problem with unit capacities. Run DFS series in a single phase without clearing used in between. And do not use "used" mark for source/sink.
Re: Solving Maximum Bipartite Matching Problem (response to post by syg96) | Reply
I believe that you are right. I will add your algorithm in a couple of days. Thank you, it is always a big pleasure to get advice from near-target :)
Re: Solving Maximum Bipartite Matching Problem (response to post by syg96) | Reply
you know, I have just tested my simple implementation, improved implementation and your and max-flow implementation. Max - flow implementation was least effective. My improved implementation was much faster(7-9 times) than simple. Your was faster than my improved 2-3 times.
Re: Solving Maximum Bipartite Matching Problem (response to post by ibra) | Reply
Hello all,

I've been studying this topic lately, and I came out with a question which answer I have not been able to figure out.

First of all, this question is actually about matching in general graphs, not bipartite graphs. It turns out that the algorithm for solving this problem is very similar to "Kuhn's algorithm" but when it finds an odd-length cycle it shrinks it into a single node and continues the search. The algorithm is called "Edmonds' Blossom algorithm".

However, it is well known that a given matching is the maximum one if and only if no more augmenting paths can be found on the graph. So, my question is the following one:

Kuhn's algorithm simply finds an augmenting path in the graph for including new matched edges. Therefore in principle it should also work for general graphs: if no more augmenting paths are found, then necessarily we have a maximum matching!. So I implemented the algorithm adding some extra information in the DFS for avoiding cycles and tested it for bipartite graphs and worked perfectly, however when it comes to general graphs it fails... of course the reason is because there are augmenting paths the algorithm cannot find (apparently because of odd-length cycles, a.k.a. blossoms), but why?

All of this converges to a simple question: why in Edmonds' algorithm is it necessary to shrink blossoms? I've read many papers where it is explained how to shrink blossoms and why an augmenting path in the reduced graph is still valid in the expanded one, but they never explain why this is actually necessary!

Thanks for your help.
Forums TopCoder Cookbook Algorithm Competitions - New Recipes Solving Maximum Bipartite Matching Problem
Previous Thread  |  Next Thread
RSS