What would a cat picture be without text?

If you are sitting in front of a computer and reading some text, you are enjoying both the art of typography and the mathematics of curves. It’s a complex computation to take a letter in a font, and place them on a screen. Donald Knuth, the most notable professor of CS at Stanford, made his name creating a system to do just that for books. Charles Simonyi made a notable fortune by creating the first editor that shows what would be printed out as you typed it (this was Microsoft Word, the cornerstone of Microsoft Office). The founder of Adobe, John Warnock started out by laying out the basic mathematical foundations of describing letters in a “device independent manner”, one that would work on a screen or on paper. But credit needs to start with Pierre Bézier, whose developed the curves that bear his name to design the body of cars at Renault.

So, when you see that bold caption on a cat picture, give a thought to the complex bit of math that makes it possible.

Posted in Uncategorized | Leave a comment

Sidebar: Programming as competition.

I was recently embroiled in a Slashdot discussion over the results of the 2013 ACM Collegiate programming contest. No US university was in the top ten, and a lot of notable schools were missing. I made a quick comment about how that actually could be a good thing, as it may show a shift to focusing on the skills that are really useful in programming as a career.

Yea, that went over well. Firstly, it wasn’t as if my comment was well thought out. Or even well written, it was a thought, fairly random It did generate some discussion and the usual misunderstandings (and a few flames) that is part and parcel of internet discussion. And, as far as it goes, it could be worse. Of course, my original post let to some standard responses. Oh, it’s a case of sour grapes. Well, if it was, sour grapes are pretty good; my career is fine. Better than some, worse than others. But all in all, quite good.

But these contests are breeding grounds for ego-driven programming at its worst. Team members came out of wood work, taking about how they are doing so awesome at Google (or wherever). Google used to hire a lot from this pool Turns out, it does little to predict if somebody is a good performer or not (link), and Google has changed their entire process. Microsoft is making similar changes (but not nearly fast enough).

If you looked at these problems, you’d be scared right off. These are the kind of word problems people dread. Oh, and they are the kinds of problems that are made up and rarely occur in the real world. But, solving them does require skill. It’s just a skill that doesn’t always translate to success in the real world. Or academics, for that matter. Mathematical departments have deemphasized problem solving contests as it negatively impacted diversity and was discouraging those that were actually more likely to succeed at the highest level of academic performance from getting degrees.

There’s no room for people that want to turn others away from a discipline by exclusion in order to feel better about their skills. The fact is that most of those that advocate strongly for these kinds of contests are threatened by the notion that programming is a career that is accessible to many when taught correctly. Much easier to dismiss others as not being “real programmers”.

Posted in Uncategorized | Leave a comment

Fourier Transforms FTW

Joseph Fourier was a fairly prolific mathematician and scientist. He get credit for figuring out that our atmosphere helps keep in heat somehow (the greenhouse effect). But, for us, it’s this gem below that we are interested in.

Fourier Transform, standard form.

This is the Fourier Transform. To be fair, that GIF of a dancing dog uses different technology that doesn’t depend on Fourier’s discovery. But, that JPEG you took of your cat does. And that MPEG of that cute baby video. Oh, and that MP3 or AAC file of your favorite songs as well.

If we used a naïve way of storing picture information, we likely store 24 bits of information for each dot on a picture (or, if we are really fancy, 32 bits). For good quality pictures, that adds up. Quickly. To be fair, this naïve way is still used all the time. So called RAW images from a good camera basically do exactly that (they use some of those same tricks that GIF uses to make it smaller). Professional photographers uses gigabytes of storage for photo shoots. Making movies digitally? Terabytes a day. Thankfully, flash is cheap. And hard drives are too, by movie standards. But moving that stuff much around is still a problem. If that lolcat took up megabytes of space, the internet would be crushed under the weight of the laughs. Video? Forget it. What is needed is a trick to reduce the size of the image or images.

And Fourier Transforms hold the key. Basically, Fourier figured out that you can take any function and represent it as a infinite series of sine waves. This helped with some interesting math problems, but it turned out to be super useful in the age of electronics, as creating and adding up sine waves is something circuits are really good at doing. Any radio transmission is essentially just sine waves in the air. So, we can thank Fourier for WiFi  (and a ground breaking actress as well of course. More later).

The trick is that at a certain point, adding more sine waves doesn’t make an noticeable difference to the signal. So, you just store the parameters for those waves, and store those instead. Sum them back up and you get the approximation of the signal.

As it turns out, you can treat a picture as if it was a mathematical function, apply these techniques and bam, pictures are 1/10 the size they would be. If you can tolerate some loss of quality, you get them 1/20 of the size. So, pictures can now be kilobytes big.

You can do the same with music and movies. As it turns out, not everything is amendable to Fourier transformation, but stunningly, a mathematical discovery 300+ years old drives the web of pictures, movies and music. And lest you despair on the trivia of it all, the same technology is at the heart of MRIs and CAT scans.

Posted in Uncategorized | Leave a comment

Turing’s Legacy: Cat Pictures?

Okay, we started out in the 1940s, with a set of mathematical concepts that led to the foundations of computer science. We moved from the transistor and bit to computers with devices. We noted how bits can represent information (of any sort), how CPUs are made to calculate and perform on those bits, and how that CPU uses things like memory to load and store those bites, and device to get new bits and present the results of calculation to users via a display, a printout, or by uploading new information to another computer.

Basically, the modern computer and internet is just a way of shoving bits around. Lots of them. And as Turing and others discovered, once you have bits and machines to calculate with them, you can manipulate information itself.

Thanks to the internet, I was exposed to this quote (paraphrased below):

What would somebody from the 80s be truly surprised by?

That I have a device smaller than a Walkman that allows me to access the largest library of information in the world. And I use it mainly to look at pictures of cats.

One could look at this and despair. But, thankfully, the modern computing revolution drives a lot of good things, and not just lolcats. And it is in the nature of modern computing that information is just information. For a computer, the set of bits that is used to research disease are no different than the bits that make up this picture: Image

It is us humans that determine the value of the bits we see. And here, those sneaky mathematicians managed to make an impact without even knowing it. Next time, let’s see how a French mathematician born in 1768 set the foundation for the modern lolcat. Oh, we will tip our hand to another mathematician later down the way.

Posted in Uncategorized | Leave a comment

Ins and Outs: The IO Subsystem

What people think of a computer is really just the borders of a complex machine. The keyboard, mouse, display, printers, network, and so on are on the edges of the machine. They are the gateway for getting and display information.

To handle the amazing number of devices that can be attached to a computer, computers come with an input output subsystem. If you’ve ever looked at a computer specification, you’ll see things like PCI-Express, SATA, USB, FireWire, Thunderbolt and more. Well, this is just a dizzying array of standardized ways of getting bits from devices into the computers memory hierarchy.

And these standard means of moving bits are complex. The USB (Universal Serial Bus) standard is thousands of pages long, covering everything from the physical layout of the connector (which you always seen to never get right on the first try) to how USB devices send and receive information (the protocol).

Each protocol has its own advantages and disadvantages, and newer, better ones are coming out all the time. But at the end, it’s all about moving bits from one place to another. Move bits to the network device, and you uploaded a video to YouTube. Send bits to your printer device, and that photo you liked is printed.

And the devices that are attached to computers are staggering. The engine of your car. The power grid. Modern manufacturing robotics. Cash registers at your store, the card reader you use to pay with, the scanners that are used to track inventory. The list goes on and on.

Every time you press a key on the keyboard, an complex set of interactions occurs just to get that character to that web page or Word document.

In a real sense, there is no one computer. Any useful computer is a collection of parts, all of them with their own computing power. An affordable inkjet printer has and uses more computing power that they use than full blown computers of twenty years ago.

The complexity of computing is made affordable by scale. The cost for developing devices is large, and it is only by selling large amounts of devices does computing even make sense. Your laptop or desktop at home is the product of billions of dollars of yearly research and development costs. Also, the modern internet runs on truly staggering infrastructure, with its own billion dollar figures.

If you’ve ever plugged in a device only to find that “Device Driver not Found” message, maybe it will help to think about the large amount of devices that do exist. Modern operating systems ship with thousands of special programs (drivers) that just manage the communication between the computer and the device. It’s a complex part of any operating system.

It is hard to express to a layman just how complex a system it is. To get an idea, let’s look through the lens of a common internet trope. The lolcat.

Posted in Uncategorized | Leave a comment

Memory and the Memory Managment Unit

A critical part of every computer is memory. If you go to buy a PC, you will often see how much main memory (4 to 8 Gigabytes is the norm) and hard drive space the machine has.  Main memory is implemented using Dynamic Random Access Memory. This is a set of chips on a small board that plugs into the computer. A hard drive is a larger device that plugs in using a standard interface.

These components form the bottom of what is called the memory hierarchy of a machine. In general, this hierarchy goes from fastest access (CPU registers) to slowest (Hard Drive). But, as the memory gets slower, it also can get bigger. A register can hold 128 bytes of information (at the most). Level 1 Caches hold about 32 kilobytes of information. Level 2 caches are bigger, holding two to eight megabytes of information. Main memory is from four gigabytes to sixteen gigabytes for most machines. And hard drives store one to three terabytes.

But, at each step, speed of access goes down. Accessing information on a hard drive is at least a million times slower than L1 cache. A hard drive is tens of thousands times slower than main memory, and main memory is tens of thousands times slower than cache.

One of the main functions of any operating system is to hide this latency (the time it takes to access information) as best it can. But, if you’ve ever booted up your machine, started to work only to hear the hard drive spin and wait, you can see the vast differences in speed that loading information from a hard drive can take.

At the heart of managing memory is the Memory Management Unit. This part of every CPU is responsible for moving information to and from the CPU into main memory. Before information can be used, it must be available in the main memory. To get information from a hard drive, a different part of system is used (this is called the IO subsystem. More on this later).

The tricky part about the MMU is that it gives every program that runs on a computer the illusion of having all the available memory for it’s use. In fact, it provides the illusion of more memory than a machine might have. To understand this, we have to understand the notion of bit size in architecture.

You may have heard about a CPU being 64 bits. Most desktop and laptop processors are 64 bit now, and many smartphone processors are now as well. Fundamentally, the bit size has a lot of impact on the CPU design, but what is useful for us is the impact on what is termed virtual memory.

Virtual memory is the illusion that a MMU provides to a program. For most programs, the operating system gives the program the illusion that it has two gigabytes (sometimes three) of main memory to use. This is because most programs are still targeted to the older 32 bit model for compatibility. 64 bit programs have the illusion that they have one terabyte of main memory. This is way more than any computer can actually provide. The amount of memory a program thinks it has is called the virtual address space.

Basically, all memory is addressable. An address is merely a set of bit (32 bits or 64 bits depending on the program) that points to where a set of information is. The MMU takes these virtual addresses and maps them to the physical memory. As needed, the MMU in cooperation with the operating system moves information to and from main memory to the hard drive. This operation is called paging. Also, the MMU moves information to and from the main memory to the CPU caches. This operation is called caching.

When a program is being run, it is moving millions of bytes of information back and forth from the CPU to memory as billions of instructions are being run. The instructions and data for every program running on a machine need to be loaded into memory. A special program, called the operating system, is responsible for the management and running of other programs and their memory (along with many other tasks). Computer have special hardware (the BIOS) that find and load the operating system from the hard drive.

The operating system uses a complex set of rules to determine what to page in and out of memory. When these rules don’t match how the computer is being used, the computer can slow down considerably. When a computer is not responding, it is often because it is performing a large set of paging operations. In fact, this often referred to as paging. For example, “My desktop is running really slow, it keeps paging all the time”.

Posted in Uncategorized | Leave a comment

Time Flies…

It’s easy to start a blog, not so easy to keep it up. Life changes, and circumstances get away from you, and before you know it, six months fly by, then eight. Now, it’s almost a year.

But, what is worth starting is worth restarting. So, we will see if this sticks for now.

Posted in Uncategorized | Leave a comment