Code Golf

Ben Everard | November 3, 2014 | Story, Uncategorized | 6 Comments

In issue 7, we challenged readers to write the shortest Python code to convert numbers into Roman Numerals. By shortest, we mean the fewest number of characters. Rather unfortunately, the example code we gave in the competition rules contained an error so it produced incorrect Roman numerals, as such we’re awarding prizes for the shortest code that produces the same (incorrect) numerals as our code, and for the shortest code that produces correct Roman numerals. Let’s go with the shortest code that produces correct Roman numerals first:


o="MDCLXVI"
y="00"+input()
n=int(y)//1000*"M"
for c in[2,4,6]:
a=o[c]
n+=int(y[c//2-4])*a
for(g,h,i)in((9,1,2),(5,0,1),(4,1,1)):n=n.replace(a*g,a*h+o[c-i])
print(n)

For those of you not familiar with the ‘//’ operator in Python, it’s the same as / (i.e. divide by), but it throws away any remainders. So, 1/4 = 0 and 5/4 =1. This program makes use of this in a few ways. First, in the line:


n=int(y)//1000*"M"

Beyond 1000, all thousands are replaces with M (nb. this does depend on the exact system of Roman numerals used). This line just calculates the number of thousands, and puts an M for every thousand into the variable n (which is used to build up the final Roman numeral). Things then get a little interesting.

The first for loop goes through the numbers 2, 4 and 6. These are the indexes of the characters C, X and I in the string o. These particular characters are the only Roman Numerals that can repeat (other than M, but this is already dealt with). The number of each of these is simply the number of the decimal at that point, so the number 432 should have 4 C’s, 3 X’s and 2 I’s. We can get the number from the position in the input because it’s still stored in a string. That is, the number of I’s is the rightmost digit in the string, X’s is the second most digit, etc. Python also allows us to count from the back of a string using negative indexing, so (where y is the users input) y[-1] is the ones, y[-2] is the 10s and y-3 is the 100s. This is all done with the line:


n+=int(y[c//2-4])*a

However, this isn’t quite correct Roman numerals because there are some slightly different ones, like 5 (V) and 4 (IV). The second for loop looks for these mistakes and corrects them:


for(g,h,i)in((9,1,2),(5,0,1),(4,1,1)):n=n.replace(a*g,a*h+o[c-i])

Loosely speaking, this loops says, if you find g occurances of a particular character in the string, replace it with h occurrences of g followed by the character at the i offset on the string of characters in the variable o.

There are three different versions g, h and i that this loops through. Lets take a look at the first one as an example, where g is 9, h is 1 and i is 2. Let’s say, the number being converted to roman numerals is 9. In this case, there will be nine I’s at this point. The replace function will switch this string of nine Is to h (that is, 1) I and the character at o[c-i]. c will be 6 and i is 2, and the character at o[4] is X. Therefore, IIIIIIIII is changed to IX. The result of this is the correct Roman numerals being output.

Congratulations to Steve Engledow for this winning entry.

Another interesting solution to the problem that we thing also deserves a mention is as follows:


def r(v):
  for l in(16003,14402,8004,1602,1441,806,645,161,144,88,71,16):
    n,s=l>>4,l&15
    if v>=n:return"IXCMDXLIV"[s:s+2-n%9%4]+r(v-n)
    return''
print r(int(raw_input()))

Simple isn’t it!

This works by encoding the special Roman numerals (1 = I, 4 = IV, 5=V, 9=IX, 10 = X, 40 = XL … 900 = XM, 1000=M). We’re calling them special, because these are the building blocks that are used to create all other Roman numerals.

The numbers in the four loop (16003, 14402 …) each contain two parts that are split by the line:

n,s=l>>4,l&15

Both >> and & work on the binary representations of the numbers (base 2), not the denary (base 10). l>>4 returns the number shifted 4 bits to the right, so 16 (1000 in binary) becomes 1 (1 in binary) and 71 (1000111 in binary) becomes 4 (100 in binary). Any bits that ‘fall off’ the right hand side are lost. There are no remainders in bit shifting.

& is a bitwise and function. That means it takes two binary numbers and applies and & to each bit in turn, so 2 (10 in binary) & 3(11 in binary) is 2. The right bit of the 2 (0) is anded with the right bit of the 3 (1) — this is 0, so the right bit of the answer is 0. Then the left bit of the 2 (1) is anded with the left bit of the three(1) — this is 1, so the left bit of the answer is 1.

The bitwise and function can be used to select which bits of a binary number you’re interested in, since anything anded with 0 is 0, and anything anded with 1 is itself. In this case, 15 (1111) is used as a mask to discard everything but the rightmost four bits. These rightmost four bits are the four discarded by the right shift operator. The code:

n,s=l>>4,l&15

is used to split the number stored in l up. n is the number of the ‘special’ Roman numeral, and s is the index of the character(s) of this numeral in the string in the next line.

if v>=n:return"IXCMDXLIV"[s:s+2-n%9%4]+r(v-n)

For this, though, just an index isn’t sufficient because some of the special Roman numerals are single characters, while others are double characters. Fortunately, there’s a pattern to which ones have how many characters. 4, 9, 40, 90, 400 and 900 have two while 1, 5, 10, 50, 500 and 1000 have 1. You can use a spot of modulo arithmetic to work out how many characters are in a given special Roman numeral. Modulo arithmetic is where you divide a number and take the remainder. So, for example, 10%3 = 1 and 8%4 = 0. In the case of special Roman numerals, 2-n%9%4 will be 1 when there’s 1 character and 2 when there’s 2.

The rest of the code just controls looping through the special Roman numerals and through the entered number to build up the final answer out of these special Roman numeral building blocks.

Congratulations to James Dalby for this entry.

Another approach entirely

As you may have guessed, success in this challenge is really down to how you store the data for the special Roman numerals as this can take up quite a few characters. The shortest solution to the challenge that used our broken Roman numerals found an interesting solution to this:


n=int(input())
while n:
  for i in __file__[:-3].split():
    s,v=i.split('_');t=n-int(v)
    if t>=0:print s,;n=t

This appears quite strange until you realise that the file the code is in is called M_1000 CM_900 D_500 CD_400 C_100 XC_90 L_50 XL_40 X_10 IX_9 V_5 IV_4 I_1.py. Using the filename to store the data is an ingenious solution to the problem. Unfortunately, this entry does fall foul of the rule which stated that the spacing between the Roman numerals must be consistent. It puts a space between some and no spaces between others. However, we were sufficiently impressed with it that we’ve decided include it as a winner anyway.

Congratulations to Antony Semonella for this entry

The shortest entry that did have consistent white space was a little more orthodox:


a='M','C M','D','C D','C','X C','L','X L','X','I X','V','I V','I';b=100,90,50,40,10,9,5,4,1,.9,.5,.4,.1;n=input()
while n:
  for i in range(13):
    d=n-b[i]*10
    if d>=0:print a[i],;n=d

As you can see, this divides the values of the special Roman numerals by 10 to help save space.

Congratulations to Mathew Pottage for this entry.

A Massive thanks to everyone who entered. If you think you’ve got a shorter solution, and want to let us know, leave it in the comments below (although there won’t be any more prizes).

About The Author

Ben Everard

Ben Everard is the co-author of Learning Python with the Raspberry Pi and hacks hardware projects held together with a big dollop of Linux and Free Software glue. He's @ben_everard on Twitter.

6 Comments

Alex November 6, 2014 Reply

May I suggest changing the css for your code example boxes to use a fixed width font? (And maybe remove the “”? If there’s a lot of code examples, I feel like it wastes a lot of space…) Also, as I saw someelse mention in the GCHQ comments, this light grey on white isn’t great for typing comments.

Apart from my gripes about presentation, thanks for posting these results! I hope you guys keep doing more competitions like this.

Alex November 6, 2014 Reply

That’s supposed to be < / > is the double quotes.

Alex November 6, 2014 Reply

blah.. sorry for so many comments not related to the content, but now that I take a second look at it, the “< / >” isn’t really wasting space, yet there’s still something about I don’t like.

Anyhow, I didn’t know about the // operator in Python, so I did learn something valuable here! Thanks.
Kevin December 6, 2015 Reply

I’m going through old issues and had fun with this challenge. I think I’ve got a shorter correct solution than those offered (and in one line…):

print ”.join(map(lambda d,a,b,c:{1:a,2:a*2,3:a*3,4:a+b,5:b,6:b+a,7:b+2*a,8:b+3*a,9:a+c,0:”}[int(d)],list(raw_input().zfill(4)),’MCXI’,’*DLV’,’*MCX’))

Kevin December 6, 2015 Reply

Or a little shorter using a list instead of a dictionary:

print ”.join(map(lambda d,a,b,c:[”,a,a*2,a*3,a+b,b,b+a,b+2*a,b+3*a,a+c][int(d)],list(raw_input().zfill(4)),’MCXI’,’*DLV’,’*MCX’))

Kevin December 6, 2015 Reply

And (final post?), the list conversion isn’t necessary since strings are sequences:

print ”.join(map(lambda d,a,b,c:[”,a,a*2,a*3,a+b,b,b+a,b+2*a,b+3*a,a+c][int(d)],raw_input().zfill(4),’MCXI’,’*DLV’,’*MCX’))

Characters: 126

Another approach entirely

Related Posts

About The Author

Ben Everard

Add a Comment