Computer puzzle on job interviews

Posted by admin on January 31, 2006

Yesterday, I went to a job interview. I was asked some difficult
problems and ways to solve them. One of them was: We have a list of a
million phone numbers on the standard input and we have a reduced memory
pc which we want to use it to sort them and use to check later if any
number given is in that list or not.

The solution offered was to use an array of bits and use the number as
an index, so if b[1234 ] == 0 would mean that the phone number 1234
wasn’t on the input list and 1 would mean that we had it.

I have been thinking about the solution they provided me.
I’ve been thinking for a while and would
like to extend the rationale behind.
Ok, so, Let’s imagine we have 1 million phone numbers. If we choose to
use the bit array approach, that means that, given the English phone
number system, we have numbers like these:

01273 121212

That is, eleven numbers. I guess, the initial 0 is always present, so we
can omit it right now. So that leaves us 10 numbers to deal with.
Supposing that there are no numbers starting with a 0 (after the initial
0), that leaves us with the possibility of having number in the range:

1000000000 to
9999999999 (=10 000 000 000, 10 thousand million numbers )

So, there can be a total of 10,000,000,000 - 1,000,000,000 numbers, which is
then, 9,000,000,000.

If you want an array of bits to use for storing if certain number is on
the list or not, that means that you need an array of 9,000,000,000
elements, or, 9,000,000,000 bits. That is 1,125,000,000 bytes, or, 1,072Mb
or a little bit more than 1 Gb, right?

Ok, that’s a lot of memory. And, the thing is that you would need to
have all that array even if it was the best case (that in which all the
numbers would be the same, one single number repeated one million
time). So the memory consumption would be fixed.

On the other hand, if we used a n-ary tree to hold all the information
as something like this:

   0  0  0
   1  1  1
   2  2  2
   3  3  3
   4 /4--4
   5/ 5  5
   6  6  6
   7  7  7
   8  8  8
   9  9  9

would mean that the number 544 (in a 3-number scheme) would exist.

For example, if we used a C struct like this one:

typedef struct Node Node;
struct Node {
        Node *array[10]; /*  */
	};

In which each position of the array of pointers would represent a number
itself and a pointer to the next one.

    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |    ------    \    |null|
    +----+    +----+\   +----+
    |null|    |null| \  |null|
    +----+    +----+  \ +----+
    |null|    |null|   \ XXX |
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |    \    |null|    |null|
    +----+\   +----+    +----+
           \
            \   +----+    +----+
             \  |null|    |null|
              \ +----+    +----+
               \ null|    |null|
                +----+    +----+
                |null|   / XXX |
                +----+  / +----+
                |null| /  |null|
                +----+/   +----+
                |    /    |null|
                +----+    +----+
                |null|    |null|
                +----+    +----+
                |null|    |null|
                +----+    +----+
                |null|    |null|
                +----+    +----+
                |null|    |null|
                +----+    +----+
                |    \    |null|
                +----+\   +----+
                       \
                        \
                         \  +----+
                          \ |null|
                           \+----+
                             XXX |
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+

So that would give us numbers 446 and 912 and 992 in a 3 digit scheme.
We would set up all the pointers to null. If any pointer at any position
is different of null, that means that is a valid number and it points to
the following number (the next array of pointers), so we use the pointer
to both show the number and point to the next structure. The only
exception is for the last node (last digit), that we could make it point
to a invalid address (represented here as XXX).

So, I don’t know if I make myself clear enough, but using this method,
the thing is that in the worst case, we would have 1,000,000
ways to go from the root to the end of each branch, meaning that we
would need 1,000,000 * 10 (the number of digits each phone number has)
nodes to hold all that information. That is 10,000,000 nodes. In this
structure, thats about 40 bytes, which gives us the amount of
400,000,000 bytes, that’s is : 381 Mb approximately, which is 1/3 of the
other approach. But, that’s the worst case scenario. In the average case
scenario, many numbers would be repeated and the total size would be
reduced, actually. And on the best case scenario (one number repeated
one million times) we would only need 10 nodes.

On doing the search, using this n-ary tree, it is O(10) at most if we
need to check the whole number, but as soon we found a null on any
digit’s position, we can stop there, so it would be < O(10) which is a
good number anyway.

the thing is, Am I right, or not?

Ok, so, Let’s imagine we have 1 million phone numbers. If we choose to
use the bit array approach, that means that, given the English phone
number system, we have numbers like these:

01273 121212

That is, eleven numbers. I guess, the initial 0 is always present, so we
can omit it right now. So that leaves us 10 numbers to deal with.
Supposing that there are no numbers starting with a 0 (after the initial
0), that leaves us with the possibility of having number in the range:

1000000000 to
9999999999 (=10 000 000 000, 10 thousand million numbers )

So, there can be a total of 10,000,000,000 – 1,000,000,000 numbers, which is
then, 9,000,000,000.

If you want an array of bits to use for storing if certain number is on
the list or not, that means that you need an array of 9,000,000,000
elements, or, 9,000,000,000 bits. That is 1,125,000,000 bytes, or, 1,072Mb
or a little bit more than 1 Gb, right?

Ok, that’s a lot of memory. And, the thing is that you would need to
have all that array even if it was the best case (that in which all the
numbers would be the same, one single number repeated one million
time). So the memory consumption would be fixed.

On the other hand, if we used a n-ary tree to hold all the information
as something like this:

   0  0  0
   1  1  1
   2  2  2
   3  3  3
   4 /4--4
   5/ 5  5
   6  6  6
   7  7  7
   8  8  8
   9  9  9

would mean that the number 544 (in a 3-number scheme) would exist.

For example, if we used a C struct like this one:

typedef struct Node Node;
struct Node {
        Node *array[10]; /*  */
    };

In which each position of the array of pointers would represent a number
itself and a pointer to the next one.

    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |    ------    \    |null|
    +----+    +----+\   +----+
    |null|    |null| \  |null|
    +----+    +----+  \ +----+
    |null|    |null|   \ XXX |
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |null|    |null|    |null|
    +----+    +----+    +----+
    |    \    |null|    |null|
    +----+\   +----+    +----+
           \
            \   +----+    +----+
             \  |null|    |null|
              \ +----+    +----+
               \ null|    |null|
                +----+    +----+
                |null|   / XXX |
                +----+  / +----+
                |null| /  |null|
                +----+/   +----+
                |    /    |null|
                +----+    +----+
                |null|    |null|
                +----+    +----+
                |null|    |null|
                +----+    +----+
                |null|    |null|
                +----+    +----+
                |null|    |null|
                +----+    +----+
                |    \    |null|
                +----+\   +----+
                       \
                        \
                         \  +----+
                          \ |null|
                           \+----+
                             XXX |
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+
                            |null|
                            +----+

So that would give us numbers 446 and 912 and 992 in a 3 digit scheme.
We would set up all the pointers to null. If any pointer at any position
is different of null, that means that is a valid number and it points to
the following number (the next array of pointers), so we use the pointer
to both show the number and point to the next structure. The only
exception is for the last node (last digit), that we could make it point
to a invalid address (represented here as XXX).

So, I don’t know if I make myself clear enough, but using this method,
the thing is that in the worst case, we would have 1,000,000
ways to go from the root to the end of each branch, meaning that we
would need 1,000,000 * 10 (the number of digits each phone number has)
nodes to hold all that information. That is 10,000,000 nodes. In this
structure, thats about 40 bytes, which gives us the amount of
400,000,000 bytes, that’s is : 381 Mb approximately, which is 1/3 of the
other approach. But, that’s the worst case scenario. In the average case
scenario, many numbers would be repeated and the total size would be
reduced, actually. And on the best case scenario (one number repeated
one million times) we would only need 10 nodes.

On doing the search, using this n-ary tree, it is O(10) at most if we
need to check the whole number, but as soon we found a null on any
digit’s position, we can stop there, so it would be < O(10) which is a
good number anyway.

the thing is, Am I right, or not?

Get protected

Posted by admin on January 31, 2006


The other day I came to this nice ad. On how to get protection ;)
Quite funny, actually

Seen here. They even point you to another protection ad. Some people might consider this to be hard spam.

Don’t you think?

About FON

Posted by admin on January 29, 2006

As part of my studies in my e-business
course at the University of Sussex, I have been asked to write a report of a company that is somehow involved with e-business (e-commerce, use of technology to become more profitable or efficient, etc.).

So, I have decided that I am going to do some research on FON, a company created by Martin Varsavsky who is famous for founding Jazztel and Ya.com, one of the biggest phone companies and ISP in Spain.

FON idea is to profit from creating a network of wireless users around the world, called the p2p of wifi.

My task is, first of all, writing a background briefing report about the company, and afterwards an investment recomendation based on my research and study of the subject.

I chose this company because I have a friend who is working there and because I think it is interesting to see if FON’s business model is profitable or not. By the way, I have no access to internal sources as Teo (hi Teo!!) is quite secretive about it (I think because of NDAs).

Anyway, I’ll post more about it in the near future, I guess. Stay tuned.

On Distributed Computing using RMI

Posted by admin on January 28, 2006

Ok, so I have this assignment in one of my courses at the University of Sussex, Distributed Systems. The thing is that we need to implement a distributed event notification system, in which clients subscribe themselves to the server to which some event generators sends their events.

All of this, using RMI (Remote Method Invocation) in Java.

The assignment is an introduction to distributed objects in java by implementing this little system. In it, each Event Generator, sends events to the server, made of an identification string and a message (in fact, it could be any kind of object).

On the other hand, the server get subscriptions of clients. These send the server a regular expression and a timeout, each of which would be used to choose among the differents events received and to set a maximun time in which the subscription is valid. When doing so, the server sets up a callback function to the clients so, when a matching event is received, the server has a way to send the clients the object (event). I think it can be better understood by looking at the graph above.

So far, it’s been quite interesting to learn this kind of distributed framework, although java is not my kind of language. I want to do some more researh on how to do this kind of things with other languages like ruby o r python. Seems to me interesnting.

Here is a little bit of the code used, just for curiosity:

public static void main(String[] args) {
	if (System.getSecurityManager() == null) {
	System.setSecurityManager(new RMISecurityManager());
	}
	try {
		EventNotificationServer server = new EventNotificationServerImpl();
		String hostName = InetAddress.getLocalHost().getHostName();
		Naming.rebind("//" + hostName + "/"
				+ EventNotificationServer.rmiName, server);
		System.out.println("EventNotificationServerImpl bound to "
				+ hostName);
 
	} catch (Exception e) {
		System.err.println("EventNotificationServerImpl exception: "
				+ e.getMessage());
		e.printStackTrace();
	}
 
}

The assignment is an introduction to distributed objects in java by implementing this little system. In it, each Event Generator, sends events to the server, made of an identification string and a message (in fact, it could be any kind of object).

On the other hand, the server get subscriptions of clients. These send the server a regular expression and a timeout, each of which would be used to choose among the differents events received and to set a maximun time in which the subscription is valid. When doing so, the server sets up a callback function to the clients so, when a matching event is received, the server has a way to send the clients the object (event). I think it can be better understood by looking at the graph above.

So far, it’s been quite interesting to learn this kind of distributed framework, although java is not my kind of language. I want to do some more researh on how to do this kind of things with other languages like ruby o r python. Seems to me interesnting.

Here is a little bit of the code used, just for curiosity:

public static void main(String[] args) {
    if (System.getSecurityManager() == null) {
    System.setSecurityManager(new RMISecurityManager());
    }
    try {
        EventNotificationServer server = new EventNotificationServerImpl();
        String hostName = InetAddress.getLocalHost().getHostName();
        Naming.rebind(”//” + hostName + “/”
                + EventNotificationServer.rmiName, server);
        System.out.println(”EventNotificationServerImpl bound to ”
                + hostName);

    } catch (Exception e) {
        System.err.println(”EventNotificationServerImpl exception: ”
                + e.getMessage());
        e.printStackTrace();
    }

}

On XML and Java

Posted by admin on January 11, 2006

As part of an
assignment
for a course at Sussex University I have to create a Servlet to run a small application, one to deal with a wedding list. It involves quite a few technologies, including XML, javascript, CSS and JAVA.So far I am not specially happy with the course as I haven’t enjoyed it as much as I thought initially.

I think it is because there are so many technologies considered in such a short period of time, but I think that Java itself made me feel very uncomfortable from the very beginning (As an example, I found the Java DOM API to be quite awkward to use). Anyway.

But I must admit that I have learned some topics I was not proficient in, such as javascript, css and XML.

Probably I would feel much better doing these things in a scripting language such as python or ruby, or even perl…

So far I am not specially happy with the course as I haven’t enjoyed it as much as I thought initially.

Back in Brighton

Posted by admin on January 09, 2006

After a long day flying and taking trains, I finally arrived to Brighton yesterday. Right now I am settling down and realising I am here again.

Updating blog

Posted by admin on January 05, 2006

It was high time to update this blog. Not only this blog was requiring new entries, but it also needed a software update.So, I’ve just updated typo to version 2.6.0. I only need to import the old entries into the database to get it running, so it is in a TODO state.

So, I’ve just updated typo to version 2.6.0. I only need to import the old entries into the database to get it running, so it is in a TODO state.