Muntasir’s Blog

Random Thoughts

Google Code Jam 2010 Qualification Round

Posted by muntasirkhan on May 12, 2010

The Google Code Jam Qualification Round took place over the last weekend. This is the third time they have organized
this contest themselves – earlier iterations were hosted by TopCoder. This is my fourth GCJ, including my first one in 2006
that was hosted by TopCoder where I did not even get past the qual round. In the past two years I got eliminated in Round-2.
It is very likely that the same thing will happen again this year.

I could not wake up early enough to start at the beginning of the round, and was mostly away from the computer
most of the day. But this was a 24 hours round, and the problems were quite easy, so I managed to solve all
three in the end.

There is an official contest analysis already posted. I am posting my own
analysis here to explain my own train of thought and to basically act as sort of post-contest
review to organize my ideas.

The first problem asked for figuring out whether the lightbulb at the end of a series of ‘snappers’
would be on after K slaps.

Initially all Snappers are off, so we can see them as a list of 0’s:
[0,0,0,0,...,0,0]

At the first snap, only the leftmost one is turned on (all other have no power).
[0,0,0,0,...,0,1]

At the next snap, the second one is turned on while the first one is turned off.
[0,0,0,0,...,1,0]

At the third snap, the first one is again turned on, all other remain as they were.
[0,0,0,0,...,1,1]

We can see that for the n-th snap, the list looks exactly like the binary representation of n.

0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111

Ok, so we got a potential solution. We need to prove it before we can really be sure.

We can see that the binary representation is valid for 0 claps.

Say we have n snaps, and the binary representation is valid for that. Now to convert the list for n
to that for n+1 claps, we have to flip every snapper that has power. We start at the leftmost bit and flip it,
because it always has power. If the left most bit was a zero we stop there. If the left left most bit was a 1
(before flipping), we also flip the next bit – because it had power. We repeat this for every bit until we have
flipped a bit that was originally a 0.

Notice that this is exactly the same process that is used for binary addition by one:

1110100011111
+           1
--------------
1110100100000

Thus if for any n the binary representation gives us the state of the snappers, the binary representation of
n+1 will give us the state of the snappers after n+1 snaps. Since 0 gives a valid binary state representation, so will
1, and thus also 2,which also makes it valid for 3 and so on.

So the binary representation of K gives us the the state of the snappers after K snaps. To see if the bulb has power
after K snaps, we just need to check whether the lower N bits of K are all set (i.e. all N snappers are on).

You can do this any way you want. During the contest, what I did was to simply count the on bits in K%(2^).
For the bulb to be on all N bits must be set.

Another way is to add 1 to K and then check if K%(2^N) is 0. This works because, if all the lower N
bits of K are 1, adding 1 to it will make them all 0’s.

My contest code was in Basic for the small (meant to run in Spaghetti – my own interpreter) and Python for the large.
But I’m posting code in JavaScript(Rhino) which I wrote after the contest here.

#! /usr/bin/rhino

// import some useful Java packages
importPackage(java.io);
importPackage(java.lang);

// Use Java's Reader classes and System.in for input
var reader=BufferedReader(InputStreamReader(System['in']));

// T=number of cases
var T=parseInt(reader.readLine());

for (cas=1;cas<=T;cas++){
    //Each case consists of 2 integers: N and K
    var data=reader.readLine().split("\\s+");
    var N=parseInt(data[0]), K=parseInt(data[1]);

    // (K+1)%(1< k%(1< all ones => all on snappers
    if ((K+1)&(1<<N))
        print("Case #"+cas+": OFF");
    else
        print("Case #"+cas+": ON");
}

(More to come shortly)

Posted in Programming | Leave a Comment »

Hello world!

Posted by muntasirkhan on January 25, 2009

Moving to WordPress from my old account at LiveJournal. I’ll keep my advogato diary for short stuff, and use this one mainly for longer “rants”.

Posted in Uncategorized | Leave a Comment »

Rolling Hashes almost made my day

Posted by muntasirkhan on January 22, 2009

Today, rolling hashes almost made my day. Almost, but not quite. Ended up having a pretty rotten day after all. In today’s TopCoder SRM, the easy (250 point) problem involved checking for a checking strings that end up looking the same when cyclically rotated by a number of places
(see the problem statement here).

Suppose we have a string T=”abcd”. A rotation by i places of T is defined as a string T(i) such that T[j]=T(i)[i+j]. In simple terms, to rotate a string T by 1 place, we take off the first character of T and append it to its end to get T(1). To rotate by x places, we simply do this x times. It is clear that there cannot be more than length(T) distinct rotations of T.

Example:
T=”abcd”

T(0)=”abcd” (no rotations)
T(1)=”bcda”
T(2)=”cdab”
T(3)=”dabc”

The problem actually asks a lot more, the TC site hasn’t been updated yet but if you have a TC account you can check it out in the arena. The tricky part of the problem was, given a string T, count how many T(i) there are where T=T(i). That is, how many rotations of T yield T as the result.

Example:
T=”abab”
T(0)=”abab” (match, 0 counts as a valid rotation here)
T(1)=”baba”
T(2)=”abab” (match)
T(3)=”baba”

So, for this string we would get 2 rotations that are matches.

Now, the simple way of counting involves trying out all rotations of a string and counting the ones that match. That would mean, in a string of length n, trying n different rotations (O(n)), checking to see if the rotated string matches (also O(n)), and the actual work of rotating includes chopping off the first char (again another O(n) ). That results in a total complexity of O(n^3). If you are clever, you can even avoid actually rotating strings by chopping up the first character and appending thus getting you a runtime of O(n^2). One way can be by cleverly manipulating the string indices as we check, thus not having to keep two strings at all. An easier way of going about this is to append T with itself (“abab” -> “abababab”) and then searching for the occurrences of T in it. I will not go into why it works, one can easily convince oneself that it works by trying it out a few examples. I actually ended up using a bit of both.

At this point I coded and tested my solution. TLE on the last case. The strings were short enough, only about a 100 char max, so O(n^2) is quite doable. Turns out, the other part of the problem involved trying out different permutaions that added another m! (not: m not n, another variable from the input) to the overall complexity, thus increasing the runtime by 8! at worst. It was obvious the permutations would have to be tried out, there was no other way. The checking function had to be made faster. A few moments later I realized merely a linear time string search was required. Checking a for T in T+T was just a string search after all.

I am familiar with (i.e. have coded before) two different algorithms for string search, KMP and Rabin-Karp. My memory of KMP was rather rusty, having last coded it more than a year ago. Rabin-Karp, however, I remember quite well. I even set a problem at an online contest at UVa for which my solution used an idea derived from Rabin-Karp. Moreover, the idea behind it is much simpler, although I had reservations about my ability to make a bug free implementation in time.

The main idea behind it is the concept of a hash function. Simply put, you give an object (in this case, a string) as input and the function returns a number. The same Object always returns the same number. But different objects might return the same number, but in a well designed hash function that should be very rare (such an event is called a collision). For a string, we can for our purposes define a hash function as

hash(x)=sum(value(x[i])*a^i)%M + C over all i in [0,length(x)]

The values of a and M define how the hash function behaves, and are randomly chosen large integers for best results in this case.
Here, value(c) can be anything that makes a one-to-one relation between chars and integers (e.g. a=1, b=2…, ascii values etc.)
(Note: computing a hash is therefore O(length(x)) )

Now, if we have two strings x and y,

x=y => hash(x)=hash(y)

But note,

hash(x)=hash(y) =/=> x=y

However, with a well designed randomized hash function, we can say that,

probability(hash(x)=hash(y) => x=y) ~ 1.0

i.e. if their hashes match, it is very probable that the strings also match.

Thus by comparing two numbers (a constant time operation) we can test if two potentially large strings are the same or not. But, we have noticed that computing the hash of a string itself is linear. Thus checking each n length substring of T+T would involved taking O(n) hashes each taking O(n) time to compute, bringing us back to our old complexity of O(n^2).

But notice that in making these O(n) substrings of T, substring from [i,i+n) shares a lot of chars and relative positional characteristics with substring [i+1, i+1+n). Actually, the second substring can be produced from the first by simply discarding its first char and adding the char following the substring (i.e. at i+n) in the string. Consequently, we can compute the hash value for the second by using the value of the first and adjusting it by taking away the value contributed to it by the i-th char and adding the value contributed by the (i+n)th character. In our example hash function it would look like,

hash( x[i+1..n+1) ) = ( hash[ x[i..n) ] – value(x[i])*a^(n-1) + value( x[i+n] ) ) % M

Thus we can compute the next hash value in O(1) time. This gives our solution an overall complexity of O(n), since we generate all O(n) hashes in O(1) after the first one.

In Rabin Karp, if the hash value matches, we check the strings for a match using the the usual O(n) method, thus elimination false positives due to collisions. This is ok if we only need to check whether a string occurs in another or not. But in this problem we also need to count all occurrences, so using that will end up being O(n^2) if all substrings match. So we simply choose a very good randomized hash function and depend on the fact that collisions are so rare that the chances of one happening during system tests is practically zero.

I coded this right away and, after the traditional bug hunt all my code demand, was happy to see my solution solve all sample cases well in time. Confident of my solution, I moved on to the next problem , quite pleased with myself and thinking of a large rating increase.

But my solution failed the system tests. I looked at my code again, but could not find any mistakes. After a few more minutes I gave up. Later on, I tried my solution in the practice rooms, and it turned out it fails even some of the example cases in the system tests. That’s really odd since I tested those cases myself before submitting and it passed all of them. Running systems tests again, I saw my solution now fails on several other cases, but passes some of the old ones. The exact same code giving different results on the same input. My immediate suspicion was that the hash function was getting way too many collisions. I changed my code to use long longs instead of ints and ensured that the value of M would never be too small. I was dumb enough to simply code that M would be in the range [3,2^31), so small values of M would still make my solution fail. I submitted again, and it passes all tests. Code for my practice room submission can be found here.

Perhaps writing this will ensure I remember this lesson. On the bright side, I got a small rating increase despite scoring a duck.

Posted in Programming, SPOJ | Leave a Comment »

Looking for Primes

Posted by muntasirkhan on January 15, 2009

A post in the TopCoder forum led me to this problem. The crux of the problem is to check whether a number is prime, the number being in the range [3,2^32]. It was obvious the naive prime check would time out. At first I thought simply generating all primes under sqrt(2^32) using sieve and using those to test for primality would be enough. This method usually works for problems with numbers in this range. Code (in Python) for both methods follows.

"Naive" Algorithm

from math import *
def is_prime(n):
limit=int(sqrt(n))
if n==2:
 # 2 is the only even prime
 return True
 elif (n&1)==0:
 # n is not 2 and it is even, so it can't be a prime
 return False
 else:
 for i in xrange(3,limit,2): # check only odd divisors
 if n%i==0:
 # n is divisible i which is neither n nor 1
 # it can't be prime
 return False

  return True

Faster Method using Sieve of Eratosthenes

def generate_primes(max_value):
 #run sieve
prime=[True]*max_value # assume every number is prime
prime[0]=False # 0 and 1 are not primes
prime[1]=False
limit=int(sqrt(max_value)) # we only need to look unitl sqrt(n)

 #all even numbers over 2 are non-prime
 for i in xrange(4,max_value,2);
prime[i]=False

 prime_list=[2] # start off list with one prime - 2 
 for(i in xrange(3,limit,2)): # check all odd numbers
 if prime[i]:
prime_list.append(i) # add i to our list
 # cross out all multiples of i
 for j in xrange(i*i,max_value)
prime[j]=False

 # add all primes greater than sqrt to our list
 if limit&1==0:
limit+=1
prime_list.extend([i for i in xrange(limit,max_value,2) if prime[i]])

 return prime_list

def is_prime(prime_list, n):
limit=int(sqrt(n))
 # check all primes below sqrt(n)
 for p in prime_list[:n]:
 if (n%p)==0:
 return False
 #no prime divides it, so it must be a prime
 return True

The simpler method checks all odd numbers in [3,floor(sqrt(n))] for divisibility, while the faster method only checks the primes in the same range. This results in a significant speedup, and is usually fast enough for 32 bit primes in most problems.

It turns out that there are in excess of 6000 primes less than sqrt(2^32), and the maximum distance between a number and it nearest prime in [3,2^32] can be larger than 200. So we can have 200 prime checks, each having to use the %-operator 6000 times giving us a runtime in the order of 10^5 for each case. Notice that there are 10^5 cases, thus our program will run for (~10^5*10^5=)~10^10 iterations. In other words, in excess of 20 minutes on the SPOJ servers while the time limit is 5 seconds.

So the usual prime checks are not fast enough. Time to bring in the heavy artillery. The only practical fast methods of checking big primes use some form of probabilistic check. That means that there is a non-zero probability that your answer is wrong. But usually, if you know what you are doing, that chance is very very low. Since Java already has a ready-made probablyPrime method in its BigInteger class, I thought it would be a good idea to just use that than code something from scratch. I guess the problemsetters had thought of that too, and my simple Java solution timed out. The huge input/output involved in 10^5 cases was probably the dominant factor, but I didn’t think it would be worth it to try the weird optimized Java I/O code. There was nothing more to do than to code a probabilistic prime tester myself.

Probabilistic prime testers, despite their grandiose sounding names, are actually quite simple. The Miller-Rabin test is generally thought to be faster in practice. It uses that the fact (Fermat’s Little Theorem) that

for all primes p and all bases a
a^(p-1)=1 (mod p)

So, if a number is prime a^(p-1)%p will always be 1. If a number n gives a^(n-1)%n!=1, it most certainly isn’t prime. Thus simply checking a large enough number of random values of a gives us an answer which has a high probability of correctness. After all, if n us composite and we check enough values of a, sooner or later we will get a result that is not 1.

Unfortunately, there is a catch. There are certain rare composite numbers that pass this test for all bases. To guard against that, we use another property of primes.

For any prime p, if
x^2=1 (mod p)
x has to be either 1 or -1 (mod p)
i.e. There are no non-trivial roots of unity (mod p) if p is a prime

Thus, if n passes Fermat’s test for a reasonable number of bases, and there is no x other than 1 and p-1 for which (x^2)%p=1, n is probably prime. With a large enough sampling for the values of the base, this compound test gives an answer that is correct with high probability.

Computing the value of a^(p-1)%p is easy using exponentiation by squaring, effectively taking log(p) time and doing it while keeping all intermediate values less than p. Care still has to be taken to avoid overflows while squaring. Checking for non-trivial roots is slightly trickier. Actually, the fact that we are using exponentiation by squaring comes to our rescue. As we exponentiate, whenever we get an even exponent, we simply square the result of exponent/2. Now, suppose the result of a^(exponent/2)%p is x. We use that to compute a^(exponent)%p=square(a^(exponent/2)%m)=x*x. At this stage, if x^2%p is 1, x is a root of unity (mod p). So, x has to be either 1 or p-1. If x is anything else, we can be sure p is not a prime. Thus simply putting this check inside our modular exponentiation code completes or test. Although I used C for this particular problem for speed, I’m pasting Python Code for it. I don’t like posting code for SPOJ problems and all the other code in this post is in Python anyway.


Miller-Rabin Primality Test

from random import *

def prime(n,k): # check if n is prime
limit=k # k samples
q=n-1
 if n==2: return True
 elif ((n&1)==0): return False
 elif (n<2): return False

  for i in xrange(limit):
a=randrange(1,n)
m=q;y=1
z=a
 #modular exponentiation
 while m>0:
 while (m&1)==0:
 #even exponent
 # a^m%n=square(a^(m/2)%n)
x=z;z=(z*z)
 if (z>=n):
z%=n
 #check for roots of unity
 if (z==1 and x!=1 and x!=q):
 return False
m>>=1
 #odd exponent
 # a^m%n=(a*a^(m1))%n
m-=1;y=(y*z)
 if y>=n:
y%=n

  if y!=1:
 # certainly not prime
 return False
 #survived k iterations, n is probably prime
 return True

Usually around 20 iterations (i.e. the value of k) is enough. At SPOJ I experimented , successfully, with much lower values. The probability of an incorrect answer is 4^(-k).

(moving this post from LiveJournal messed up the indentaion in the code, which I am too lazy to fix)

Further Reading:
Fermat’s Little Theorem (Wikipedia)
Primality Test (Wikipedia)
Prime Numbers, Factorizations and Euler Function (Topcoder)
Primality Testing : Non-deterministic Algorithms (TopCoder)

Posted in Uncategorized | Leave a Comment »

Bangladesh going to ACM ICPC World Finals

Posted by muntasirkhan on January 2, 2009

Finally, something worth posting about. Just heard a few days ago that three teams from Bangladesh are going to the ACM ICPC World Finals, the most prestigious programming competition for university students.

Every year only the best programming teams from universities around the world qualify to participate in this event. So it is considered a great honour to qualify to compete there. Usually only one Bangladeshi team (from BUET) gets to go to the world final. Last year, another team from East West University qualified. This year there are three teams – BUET, DU and NSU.

The BUET team qualified after getting a wild card on becoming the runners up (Fudan Univ. were the champions) at our own Dhaka site hosted by NSU. The DU came 3rd (university-wise) and went on to become 2nd at the Kanpur site and getting the wild card there. The NSU team, consisting of my former teammates Samee Zahur and Md. Mustafijur Rahman as well as Ameer Hamza, could not perform to their fullest potential in the Dhaka Regional coming 4th university-wise and 6th overall, but went on to being 5th at Kuala Lumpur and got a wild card.

It might seem a little strange that being 4th in Dhaka did not get them a wild card while being 5th in Kuala Lumpur did. Actually, the number of wild cards varies from site to site and is determined by the number and quality of teams and the quality of the contest (problemset as well as other factors). The actual calculated point values are published, but to be honest the whole thing is quite beyond me so I will not try to explain it. The Kuala Lumpur site had lots of really strong teams like Shanghai Jiaotong Univ. (former World Champs) and NTU. However, the NSU team managed to hold their own there, solving 6 problems out of 10 while the champions solved 7. And I’m sure the fact that NSU is the host of this year’s Dhaka Regional, and has been hosting the regionals for many years, helped to ease the Asia Contests Director’s decision in its favour.

The fact that there are three teams from Bangladesh is a great honour, considering most countries are lucky to get even one team. India, with a much greater population and more developed CS/Math education and lots of very good coders got two teams. Only countries as strong as Russia and China ever get more than three teams, as far as I know.

But I must confess that I am happier for my university than my country. I was happy when two teams qualified last year. But I couldn’t help feeling frustrated that we missed out. Our team got 4th position, just behind the EWU team who beat us in penalty points and went to the world finals. And the year before, the main NSU team missed out after getting 3rd place. Furthermore, both Mustafij and Samee will be leaving NSU soon, this was their last Regional, and there are no viable replacements lined up so far. Those two are among the brightest people I know, and it will not be easy to put together a comparably strong team at NSU. So, this was pretty much  going to be NSU’s last chance for a long time.

The news was very welcome indeed. Though I doubt a lot of people at NSU know of this. We always thought NSU deserved getting a team to the World Finals for quite a few years. And having three teams going from Bangladesh makes it even better. Right now I am looking forward to the Finals, hoping they will put up a good fight.

PS: An unofficial list of all teams going can be found at Dr. Hwang’s blog
PPS: I dwelt more on the NSU team because of my own involvement. Both the other teams are very strong. The DU team managed a second place at Kanpur despite the most apallingly unfair conditions. The BUET team also has three very good coders. Coders from both teams, as well as Samee and Mustafij from NSU, competed in the Google Code Jam Regional Onsites last year.
PPPS: An unofficial version of this year’s ranklist at the ACM Dhaka Regional can be found here.

Posted in Uncategorized | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.