Wednesday, November 3, 2010

Research Log for 2010/11/03

Here's what I got done this evening:

Next TODOs:
  • Learn how to use git better so I'm not so "special" with it.
  • Finish implementation of time-series simulation using cross-correlation coefficient (Galati et al. 1995 paper)

Frustration and more frustration

It's taking a great deal of personal restraint to keep from setting my laptop on fire at the moment.

After many months of *ahem* time off from my Ph.D. work, I've been getting after it over the last month.  For the past couple of weeks, I've been trying to see why some data I generated for my prospectus looked very odd. Frustrated, I did the runs again tonight and everything turned out fine and look as expected--which was unexpected. Flabbergasted at the wasted time, I decided to poke around and see what possibly could have changed to fix it--I don't like miracles when it comes to software. Here's the relevant SVN log (log of changes for the unfamiliar):


------------------------------------------------------------------------
r306 | rmay | 2010-03-21 13:23:45 -0500 (Sun, 21 Mar 2010) | 1 line


Fix missing factor when calculating unattenuated power.
------------------------------------------------------------------------
r307 | rmay | 2010-03-21 14:55:43 -0500 (Sun, 21 Mar 2010) | 1 line


Fix jarbled output fields due to bad ordering of dimensions. (Due to me ignoring what was done in arps_reformat.)  Also work around numpy bug when trying to save attributes in pupyere with a single character.
------------------------------------------------------------------------
r308 | rmay | 2010-03-22 13:29:58 -0500 (Mon, 22 Mar 2010) | 1 line


Change units back now that numpy has been fixed.
------------------------------------------------------------------------
r309 | rmay | 2010-03-26 16:44:45 -0500 (Fri, 26 Mar 2010) | 1 line


Fix bug generation of 2-moment interpolation coordinates. Also improve interpolation by using points logarithmically (instead of linearly) distributed for number concentration.  Also fix problem where we divide fall speed by 0, results in bad values for velocity (and phase, breaking time series generation).
------------------------------------------------------------------------
r310 | rmay | 2010-03-26 16:45:28 -0500 (Fri, 26 Mar 2010) | 1 line


Make commas_reformat copy out the model's reflectivity by default.  It's a useful diagnostic.

r309 is the change that I'm pretty sure fixed the bug. Now, since I'm such a proponent of reproducible research and good scientific software packages, I went to the effort of putting the version of my code that generates the data files. Here's what I see when I look at one of those files:

:VersionNumber = "0.8.dev306" ;

That 306 represents the SVN revision number. Bonus points if you realize what's wrong there. If not, what that means is that I did all my nice data runs for my General Exam without actually using the most recent (and "correct") version of the code. Oops.

While I'm pretty torqued off at myself for wasting quite a bit of time chasing a ghost (and for some reason not installing my newest code before running), not to mention using bad data files for some recent plots, hopefully this provides a good use case and motivation for some good scientific software practices. Namely, using version control and and putting version information in my data files allowed me to at least diagnose what went wrong.  Without this information, I would have simply had to chalk up the fact that my code is producing the right answer to "magic"...or the code gnomes. If you're not already using version control and putting sufficient information into your data files, you need to start right now.

No, I'm serious. Do. It. Now. I'd recommend either:
This event reminds me that I don't take notes on my research worth anything. If I'd had notes, maybe I would have read them and seen something about an important bug fix that I made 7 months ago.  To rectify this, I'm going to start using this blog to log and take research notes.  If nothing else, it should be entertaining to go back and read when I'm done, to laugh at all the times things blew up. Or cry.

Sunday, August 23, 2009

Why linux rocks

I recognize the fact that linux has it's flaws in comparison with the other operating systems, but I think I found something that is easier to do on a default linux install.  I was flying home and had a copy of a Science 2.0 panel video on my laptop. (Good video, you can see that here: http://www.vimeo.com/6077540).  But importantly is how I ended up 'watching it'.  Since it was just a panel, the video was rather pointless and I would have essentially had my whole laptop up running down its battery just so I could listen.  Thanks to ffmpeg, which is installed as a dependency for media programs, I was able to do this:

ffmpeg -i open-science.flv -f mp3 -ab 131072 -vn open-science.mp3
and harvest the audio out of the video.  From my previous experience on windows, doing tihs would have either been impossible or taken hours to find the right utility.  It just struck me as how simple it was to already have the right tool default on linux.  As a result, was able to take a video and listen to it on my mp3 player.  That's just cool.

Wednesday, September 24, 2008

What kind of computing does science really need?

Here's a talk by Greg Wilson of the University of Toronto that pretty well captures what I think about how computing in the sciences really needs to be done, rather than the focus on bigger supercomputers.

Other options (Low Res or podcast) available here.

Tuesday, July 15, 2008

Organic Code Development

I always find it amusing how some of my best coding happens so organically. That is to say, there is no true overarching design usually, just an end goal. Take for instance, my latest creation. I've been working on adding support for wind barbs to the python plotting library matplotlib . I initially tried to design it well, think about where the natural breaks in the code would be and then just build these smaller pieces. This worked for some of the simple things, but in the complexities of figuring out transformations and coordinate systems, it got to be a real bear to work top down. Perhaps things like stub functions and other mock objects might have been able to help here, but that's beyond my software development skill set at the moment.


So out went that approach, and I started with the simple goal of getting a single barb to appear on the plot, using the most direct and manual way possible. Once that was achieved, I added a couple more to the test and made sure that they scaled right, and were positioned properly. From here then some refactoring occurred and things were broken into more logical groups, and manual hard coded things were replaced with more flexible (and arguably) simple code. Here's the (more or less) final result.



I now have 320 lines of commented and documented code ("borrowing" a lot of the documentation and input processing from a similar matplotlib function, quiver). I'm just amused that this kind of procedure always seems to work better for me. I was stuck for days trying to get this working, trying to figure out where to begin. It was so much easier to start with a simple, small goal and build on it. Then again, this might work better for me because I'm a scientist who codes rather than a software engineer/developer. It could also be because every time I do something like this, I'm learning a new API or even an entirely new library. Maybe if I finally developed a deep expertise with something, I might be able to think at a slightly higher level....nah. :)


I think I've really only ever succeeded in designing and implementing one project, my radar emulator . While it turned out well, it's not a design that at this point I'm in love with (though I'm stuck with it at this point). Then again, I started that in 2003, when I was far less experienced. Another project I've been kicking around in my head forever, a simple data visualization program, is persistently stuck in the design phase. Granted, it's predecessor grew organically and became unwieldy, but that was C++, where refactoring is a nightmare. But it did have the benefit of actually existing in code and being (somewhat) useful, which is more than I can say for an idea floating around in my head.

Anyhow, less rambling, more code! On to Skew-T's!

Links for the day (aka. Other people's stuff)

A couple of posts of other people that resonate well with me:
http://www.johndcook.com/blog/2008/07/15/getting-to-the-bottom-of-things/

I think this one hits the nail on the head about multi-tasking and information overload. Also explains why the days that I manage to ignore Google reader and Thunderbird for extended periods of time are my most productive, even though the actual total time spent doing either isn't any less. I probably would see great productivity gains if I managed to restrict how often I look at either of those. I've noticed this summer just how hard it is to get back in a groove on something once you've gotten out, both taking 5 minutes away from intensive coding or taking a multi-day (week?) break from a certain topic. Our brains are like computers in a way, there's only so much you can keep readily accessible.

http://bitecode.co.uk/2008/07/my-views-on-python/

There's not much more I can add here, python just rocks. :) I can't believe there was once upon a time that I did analysis and even generated images using only C (and maybe *shudder* Excel).

Tuesday, July 1, 2008

Programmatically saving Matlab figures

This problem has been bugging me a long time, and I've finally nailed it. So doing a whole bunch of Matlab processing, and instead of saving each figure by hand using the GUI, which can be tedious after more than a few figures, you'd like to have your code include these steps. (Using python and matplotlib of course, this is simple). Matlab does have the saveas to allow doing this, but it has two drawbacks for me:

  1. When I save figures, I usually maximize them so that the spacing looks better and I get higher resolution figures
  2. Regardless of the maximization, saveas resizes everything and you get (usually) a bad mix of spacing and font sizes. So many of my figures have labels and things that look like crap, with tick labels overlapping each other and other issues like this. At least, that's my experience using Matlab on Linux. At any rate, saveas does not normally give me figures I want to hand in with a homework assignment, let alone a presentation or a publication.

I have found a couple of nuggets that allow me to programmatically create graphics just like the ones I would have on screen. The first nugget, which solves the 2nd problem of resizing is as follows:

set(fig, 'PaperPositionMode', 'auto')

where fig is your figure handle. You can also just pass gcf. This wonderful little command, according to the Matlab documentation (where this was buried): "...ensures that the printed version is the same size as the onscreen version. With PaperPositionMode set to auto MATLAB does not resize the figure to fit the current value of the PaperPosition." Fantastic.

The second nugget is a way to set the figure size to fill the screen (I found this one in the Mathworks file share through a Google search):

set(fig,'units','normalized','outerposition',[0 0 1 1]);

You may or may not want to save and restore the original value of units around this call. At any rate, with these two commands, I can now create figures with my code that look how I want them without requiring any intervention on my part using the GUI. That's wonderful for making my workflow better.