Data Driving

Monday, February 20, 2017

Hats off to Hopper!

As I learn more about the history of computing, I'm realizing that many, and maybe most, of the pivotal contributions have come from women.

For example, Ada Lovelace, is credited with developing the first algorithm during her work on Babbage's Analytical Engine.

Today, I learned a little more about Grace Hopper. (The image is from Wikipedia.) As well as developing COBOL (one of the first languages I remember hearing about), I now realize that Rear Admiral Hopper wrote the first compiler.

I consider myself to be a semi-competent and a fairly experienced scientific programmer and I have a hand-waving knowledge of many aspects of 'computer-stuff'. However, I have not the faintest inkling of how I would go about making a compiler.

Hats off to Hopper!

Thursday, February 9, 2017

MATLAB, St. Andrews, and the evolution of culture

Here are two facts:

I grew up near St Andrews in Scotland
I use MATLAB a lot

Any blog that combines these points must be interesting, so I was excited to learn that biologists at the University of St. Andrews are running a MATLAB programming contest in order to measure how a culture evolves.

Saturday, January 28, 2017

Hidden Figures - a Hollywood film where numerical methods come to the rescue

I've just finished watching "Hidden Figures", a current Hollywood film that focuses on Katherine Johson, Dorothy Vaughan, and Mary Jackson, and their important roles in the early stages of NASA's attempts at manned flight.

SPOILER ALERT BELOW

Hidden Figures has been nominated for 3 Oscars (including Best Picture) and I enjoyed it a lot. It also includes many powerful messages, not least, the important role that African-American females played in 1960's US science.

Our community needs to do more to increase recognition of non-white non-male scientists and mathematicians. A single film can't make up for all of our prior mistakes but, as my PhD advisor used to say, "at least the progress vector is now pointing in the right direction".

Hidden Figures also includes some gems relating to scientific computing. There can't be many films where the trumpets sound and Euler's method rides over the hill (albeit in small steps) to save the day. I wasn't quite sure but I thought one of the screenshots in that portion of the film was of relevant pages from Numerical Recipes.

I also enjoyed watching Dorothy Vaughan beat the IBM programmers at their own game after she had to taught herself FORTRAN from a library book. It must have been quite a ride.

All in all, a must-see for anyone that likes the history of scientific computing.

It's also a very good film. The audience clapped at the end at our showing, and judging from the pre-film conversations I overheard, I doubt if many were regular readers of this type of blog :-)

Tuesday, January 10, 2017

Souza's law and the difficulty of providing quality technical support

I spent a year of my postdoc writing SLControl, a fairly sophisticated piece of software for acquiring and analyzing data to do with muscle mechanics. It implemented real-time control loops with 100 µs latencies and allowed us to perform completely new types of experiments.

Writing SLControl was a good investment of time; I, or somebody in my lab, has used it almost every day since 2001. However, providing high quality support has been a challenge. The software does complex things and it's normally connected to even more complicated experimental apparatus. Trouble-shooting requires a lot of experience.

Another problem, I discovered is that people sometimes want to use SLControl for things that it wasn't originally intended for. I discovered this early in the development process when somebody called me up to ask how to fit a 3 parameter exponential to their experimental data. Their measurements had nothing to do with muscle mechanics (SLControl's niche) - the data was just some list of numbers from a fairly random experiment. Maybe it was to do with flow in a river or something. The person requesting help had googled "3 parameter exponential", found a match on the SLControl website, and decided to call me for help.

That, in a roundabout way, gets us to Souza's law. Every time you include a text field on a website, somebody will eventually use it to ask for tech support :-)

Read more on the brilliant MathWorks blog.

http://blogs.mathworks.com/community/2017/01/10/tags-tech-support-and-souzas-law/

Friday, October 14, 2016

How to read things without opening them

Brent Seales is a professor of computer science at the University of Kentucky. He drives several very cool research programs in areas ranging from robotic surgery to advanced image progressing.

Recently, the Economist (one of my favorite weekly reads) featured some of his work. Here's a link.

How to read an old scroll without opening it

Tuesday, October 11, 2016

iCite and W2P ratios - new ways of quantifying scientific productivity

What's the best way of ranking scientists based on their productivity?

There are lots of options including:

number of publications
number of citations
impact factor of the journals the publications are in

All have strengths and weaknesses.

Recently, leaders at my academic institution have started to talk publicly about h-indices. These are calculated for each author as the number (x) of publications he/she has that have each been cited at least x times. That's an interesting idea because it rewards impact; people who publish lots of papers that nobody cites have lower h-indices than people who publish a few manuscripts that are very influential. However the h-index has the drawback that it grows with time (because people publish more papers and they have more time to be cited). This means that it favors seasoned scientists who have been productive for a long time. It's not a great way of identifying a rising star.

I don't think that there will ever be a single perfect metric that scientists can use to quantify productivity but I was excited to see that NIH is supporting the Relative Citation Ratio with their new iCite tool.

The Relative Citation Ratio (RCR) is an article level metric that quantifies scientific influence. To quote a help box from iCite, "It is calculated as the cites/year of each paper, normalized to the citations per year received by NIH-funded papers in the same field."

The Weighted RCR is the sum of the individual RCRs for a group of papers.

This creates an interesting opportunity. If you calculate the ratio of the Weighted RCR to Total Publications (I'll call it W2P) you get a single value that defines the influence of a collection of papers.

If your W2P is greater than 1, your papers are more influential than those of your NIH-funded colleagues. If it's less than 1, your papers are being cited less often than average.

WtoP doesn't scale with the number of papers you publish so it shouldn't depend on the length of your career.

Only time will tell how scientists use these new metrics but I am going to make a bold (and probably rash) prediction. Since NIH is supporting RCRs, I think that they will take over from h-indices as the mostly commonly used measure of productivity in US biomedical science.

For the record, here are my current stats

46 publications
h-index = 22
Weighted RCR = 59.98
W2P = 1.34

and for the truly nerdy

Erdos number = 8

Monday, September 5, 2016

The Pentium Bug

I remember the Pentium Bug in 1994 (I had just started my PhD) but didn't know enough about scientific computing to understand the significance.

Here's a fascinating blog post from Cleve Moler describing what happened.