fogus: Devolving Sigils

Devolving Sigils

Feb 26, 2009

Sigils are little symbols attached to a variable name that provide some information regarding its type, scope, or simply marking it as different from non-variables. There is very little middle-ground on the opinion toward variable sigils among programmers; you either love them, or you hate them. The quintessential language containing sigils is probably Perl, followed by BASIC, and more recently Ruby. I pick these three because they use sigils for different purposes:

BASIC ¹ sigils denote datatypes
- foo$ denotes a variable holding a string
- foo% denotes an integer
Perl sigils denote a datatype category ²
- $foo denotes a scalar type
- @foo denotes an array
- %foo denotes a hash
- &foo denotes a subroutine
Ruby sigils denote a variable’s scope ³
- $foo denotes a global variable
- @foo denotes an instance variable
- @@foo denotes a class variable

I personally like sigils — very much so. However, I tend to prefer the types of sigils used by Ruby rather than the finer-grained meaning attached to Perl and BASIC sigils (which is also the reason that I dislike Hungarian notation). I like being able to read my source and, at a glance, soak in the maximum amount of information. Sigils, when used sparingly, can provide a tremendous service. However, there is a fine line between sigils providing useful information and those akin to line-noise. My head tends to swim when looking at some Perl code due to the presence of sigils, but maybe that’s just me (probably not). Therefore, when I set out to design my own experimental language, effective sigils were high on my wish list.

First Cut

Since my sandbox programming language Ix is based on the CLIPS source base, I wholly adopted the CLIPS convention. That is, CLIPS denotes variables using by pre-pending ? or $? to the front of a symbol. By convention, the former was meant to denote a scalar while the latter was meant for multifield ⁴ variables, but they could both be used interchangeably ⁵. Therefore, a simple reduce function initially looked like this:

fn( reduce
[?f $?lst]
if(empty?($?lst)
call(?f)
else
call(?f first($?lst)
reduce(?f rest($?lst)))))

reduce(_(1 2 3 4 5) +)

Not too bad, but the sigils clutter up what is effectively a succinct function. As an added disadvantage, I decided a long time ago that predicate functions in Ix should have a question mark at the end of them; therefore, in this small function the question mark has two different meanings depending on the context. But even still, I stuck with this syntax for months.

Second Cut

After writing a pile of code in the first version of Ix, I decided to add some syntactic sugar for the (call) function (see its usage above). As a result, the code above became:

fn( reduce
[?f $?lst]
if(empty?($?lst)
?f()
else
?f(first($?lst)
reduce(?f rest($?lst)))))

reduce(_(1 2 3 4 5) +)

This looked a little better than the original, but there were a couple of issues that stuck with me:

The $? sigil was still too noisey
The ?f() form is hideous, and $?f() even moreso
The issue of differing meanings for ? still remained

Third Cut

I initially decided to live with issues #2 and #3 and instead remove the $? form altogether.

fn( reduce
[?f ?lst]
if(empty?(?lst)
?f()
else
?f(first(?lst)
reduce(?f rest(?lst)))))

reduce(_(1 2 3 4 5) +)

Better? It took me a while to learn to hate this new syntax, but eventually I did. While reducing the $? noise, it introduced a whole new problem. That is, when calling predicate functions, the pattern ?(? tended to cause mass confusion (at least for me). My mind would often fill in the second question mark even in its absence thus turning something like symbol?(x) into symbol?(?x). Why is this a problem instead of and outright syntax error? The answer is that symbols in Ix are defined as any sequence of characters not starting with a number, and not containing a small set of delimiters. ⁶ Therefore, in the first call x is a symbol and thus the call to symbol?() always evaluates to true. It took me only a few frustrating debugging sessions to see the err of my ways.

Today

After much despairing over the seeming disparity between wishing to keep sigils and requiring the presence of symbols as defined above, I hit on a very nice compromise. That is, who’s to say that a sigil must be a non-alphabetical character (or sequence thereof)?

fn( reduce
[F Lst]
if(empty?(Lst)
F()
else
F(first(Lst)
reduce(F rest(Lst)))))

reduce(_(1 2 3 4 5) +)

So what happened? Simple. Variables are now identified as starting with a capital letter. Assuredly, this is nothing new in the history of programming language design, but it did solve nicely the issues above:

Less visible noise
Variables and symbols are clearly delineated
F() looks much nicer
There is now only one meaning of ?

Of course, the symbol(x) issue now evolved into symbol(X) issue, I found that the occurrences of this mistake disappeared once the confusing ?(? pattern disappeared likewise. I think I’ve hit on the right formula for sigils in Ix. That is, I’ve reduced the granularity of their meaning to be agnostic of type and scope, while at the same time clearly separating symbols from variables.

Sigils are nice; as long as they are not abused.

-m

Not all dialects use sigils in the same way. ↩
Perl 6 introduces something called a twigil, that is a secondary sigil denoting scope ↩
Ruby symbols are prefixed by the : character, but that is the syntax for a symbol literal (thanks Ola) ↩
A multifield is essentially a list restricted to depth 1. ↩
That is, until the introduction of sequence expansion in CLIPS v6. ↩
Example of valid symbols: foo, b12, +, and jfkdashnsadjhio1231231123123@#@#!$#%#!#@@$#__+++_____+++++ ↩

9 Comments, Comment or Ping

Ola Bini

Nice breakup.

A small point on note #3. The : for a symbol is actually just the literal syntax, just as double quotes are literal syntax for a string. The symbol itself doesn’t have a : in it.

Feb 26th, 2009
fogus

Thanks for the correction. The note has been modified. -m

Feb 26th, 2009
Matt S Trout
Actually, perl sigils don’t denote variable type – they denote conjugation – $ is ‘the’, @ is ‘these’, % is ‘map of’ or so – variable type is denoted via [] or {}. You can see this with:
```
  my $foo = 'foo';
  my @foo = ('zero', 'one', two');
  my $second_foo = $foo[1];
  my @first_and_third_foos = @foo[0,2];
  my %foo = (key1 => 'value1', key2 => 'value2', key3 => 'value3');
  my $key2_foo = $foo{key2};
  my ($key1_foo, $key3_foo) = @foo{'key1','key3'};
```
so looking at the sigil when skimming perl code tells you what you’re going to -get- rather than what you’re operating on, pretty much.

This is, admittedly, really confusing until you get used to it, but once you -are- used to it it can be an extremely useful tool for absorbing information while skimming code.

You’re still perfectly entitled to hate it, of course, but it’s an interesting concept and I figure you might prefer to hate what’s -actually- going on rather than what you thought was going on :)
Feb 26th, 2009
Matt S Trout

… apparently I lose at formatting code for this thing. Any chance you could persuade my previous comment to work properly before you approve it? :( (sorry)

Feb 26th, 2009
fogus

@Matt

Thanks for the post, honestly I have never seen Perl sigils explained in that way, but my experience with it has been brief. While I can see your point, I am not sure that I can see the distinction between my current understanding (datatype category) and your approach. In fact, when reading Perl code I was essentially doing what you describe. Don’t get me wrong, I respect Perl and admire the people I’ve worked with who can speak it fluently — but my mind has trouble digesting it in the same way that it can with Clojure, Python, or even Scala. -m

Feb 26th, 2009
SJS

The problem of using alphabetical characters as sigils introduces exactly the same problem as Hungarian Notation — it confuses the reader, who has no doubt spent years building up equivalence between upper case and lower case letters (especially if they spend any time on IRC).

Personally, I find Perl much easier to read than Python, Clojure, or Scala (and CLISP, ML, OCaML, too), and it’s not because I use Perl every day (I don’t). That hasn’t kept the local Python evangelists from calling me a liar and a degenerate, alas.

Feb 26th, 2009
fogus
@SJS

It’s strange that someone would call you a degenerate because you happen to find Perl more readable than Python. Topics like sigils tend to evoke very subjective responses. To me, the capital letter stands out very clearly from the lower case letter, even after 10 years of IRC usage. ;)

Hungarian notation has a few issues as far as I can see:
1. Any change in a variable’s type, intention, and/or scope always requires a set of code changes.
2. In the case of type, it further increases the annotation requirements of already annotation-heavy languages (i.e. C, C++, Java)
3. Many current IDEs make it irrelevant
4. It is very heavy compared to the benefit gained by using it.
5. It’s bolted onto the host language and therefore not even supported intrinsically.
But, Charles Simonyi is a million times smarter than me, so maybe I’m wrong, but Hungarian Notation has a twinge of OCD to it.

-m
Feb 27th, 2009
a

Not all dialogues use sigils in the same way. ↩

I think you mean “dialects”?

Feb 28th, 2009
Joey

“But, Charles Simonyi is a million times smarter than me, so maybe I’m wrong,”

There is no one so smart that they can’t be wrong.

Dec 29th, 2009

Send More Paramedics