### Introduce difficulty levels / efficiency points

Follow

Vote

6

Unfollow

Some of the problems are quite easy to bruteforce on a modern PC, even with quite inefficient algorithm / implementation. However, most of these algos won't scale well (while being sufficient for the given test dataset). It would be interesting to get additional - larger - datasets for the same problem and the same time limit, so you'll need to have an efficient solution. It is probably better to keep this score separate (or for badges only) as to not penalize people with really ancient hardware who will need efficient algorithm even to pass the base case.

For instance, to complete a problem you solve for 10kbp dataset. In order to get efficiency points you also solve for 1000kbp dataset - this will give you efficiency point which can figure in badge award calculation, or even have a separate ranking. In can be expanded to levels: solve for 10kbp - pass with 0 efficiency points, solve for 100kbp - pass with 1 efficiency point, solve for 1000kbp - pass with 2 efficiency point etc. The size will need to be tuned up such as larger datasets would be solvable by algorithm / implementation improvements only, not simply because someone has more powerful hardware (i.e. datasets will need to be substantially bigger). People with access to clusters / other number-crunching units will have unfair advantage, but that always was the case.

For instance, to complete a problem you solve for 10kbp dataset. In order to get efficiency points you also solve for 1000kbp dataset - this will give you efficiency point which can figure in badge award calculation, or even have a separate ranking. In can be expanded to levels: solve for 10kbp - pass with 0 efficiency points, solve for 100kbp - pass with 1 efficiency point, solve for 1000kbp - pass with 2 efficiency point etc. The size will need to be tuned up such as larger datasets would be solvable by algorithm / implementation improvements only, not simply because someone has more powerful hardware (i.e. datasets will need to be substantially bigger). People with access to clusters / other number-crunching units will have unfair advantage, but that always was the case.

from Victor
·
4 years & 341 days ago
·
0 followers
·
1 answer & 2 comments

Good idea, thank you, Victor!

from Rosalind
·
4 years & 337 days ago

Helpful Answer

You raise a very good point. There is another small issue with the current problem statements (at least the problems I have solved so far): very conservative, explicit upper limits to the size of the problems.

The issue is, on a modern computer, using any high-level language, it should be very easy to implement a solution that can deal with "arbitrary" size input (and by arbitrary I mean "it can fit in the main memory"). Having explicit size limits in the problem description almost invites solutions that either don't at all consider either A) memory and time efficiency; and B) the possibility of buffer overflow with bigger input.

I know that this is not a correctness issue, but in practice (and with real-world problems) not taking care of these problems will come to bite you in the a** eventually.

The issue is, on a modern computer, using any high-level language, it should be very easy to implement a solution that can deal with "arbitrary" size input (and by arbitrary I mean "it can fit in the main memory"). Having explicit size limits in the problem description almost invites solutions that either don't at all consider either A) memory and time efficiency; and B) the possibility of buffer overflow with bigger input.

I know that this is not a correctness issue, but in practice (and with real-world problems) not taking care of these problems will come to bite you in the a** eventually.

Good Comment

I like the size limit specifications because it fits my experience (i.e. with real-world problems) better, where time spent writing code matters -- it's not time-efficient to over-optimise a solution because I'm frequently writing code for things that haven't been done before. For example, there's no need to spend the extra time modifying some code to handle INDELs in 3Gbp sequences if you're only working with 16kbp mitochondrial sequences. I've noticed that some of the earlier problems in Rosalind start off with small inputs, but the later problems have larger inputs which are easier for me to handle as I get into the swing of that particular problem category.

Good Comment