Sunday, October 16, 2011

Occupy .* is a meta-movement. The meta-revolution will be tweeted.

So, I haven't been really connected to the news for the past several weeks. I've been really engrossed in my own work, and quite honestly, trying to think of ways to make the next dollar. Since we're in full startup mode, when we hit the road we take terrible flights (5 hour delay on my flight home last night) and we stay with friends. My mom always told me that "company, like fish, stinks after three days" - sadly I haven't heeded my mom's advice; I've been staying for 4 and 5 days at a time. In addition to putting up with our excessively long visit, one of our friends is also a shrewd political/cultural observer. He's just the sort of guy who can explain a phenomenon like Occupy Wall Street. He summed it up thusly: the economic disparity in this country means OWS will just get bigger.

Staying in someone's home is a precious experience. You see them in the morning before they are glossed for the day. You eat breakfast and dinner together. And if you break the "company is like fish" rule, you get to see a few of the subplots in their lives develop. Thanks again to my friends and patient hosts if you are reading, I really do appreciate the chance to step into the middle of your homes and your lives for a few days. OWS was a minor subplot for my friend/host who put us up in Ann Arbor. He's a true Michigander, so explaining economic implosion and populist movements comes as naturally to him as grilling bratwurst. I asked him what OWS was demanding, and he said that OWS wasn't demanding anything. He hinted that demanding nothing was kind of the point. He got me thinking about OWS, and then I started reading a bit more. Here's what I think so far:

  1. The Michigander was right. The underlying economic disparity is real, and can only be perceived by "the 99%" unjust. Here is the article that I think is the best synopsis of the facts, selectively biased to represent OWS.
  2. Occupy Wall Street is a meta-movement. They aren't out to make demands, win concessions, or change you. They are out to change what we perceive as just and unjust. Up until now, based on the way wealth has been concentrated, we have all tacitly agreed that wealth concentration is just fine. OWS is slapping us in the face. That's it. I think the real impact will be the way other groups, organizations, and movements internalize the message. 

Thursday, September 15, 2011

Friday, August 26, 2011

private alpha update

Due to a bigger than expected pool, I need to split the alpha invites into two waves. So, some of you have been invited this week (welcome!), and some of you will be getting your invite next week (hang tight!).

Friday, August 19, 2011

Private Alpha Launch

My new thing is ready for private alpha - let me know if you want to be invited!

Thursday, August 18, 2011

browser innards

I'm hoping to find a nice long stretch to read this article on browser architecture. Looks as fascinating as it is useful.
Thanks to Rich for the pointer.

Tuesday, August 16, 2011

a big deal: stanford ai course

The NYTimes covered the viral spread of the stanford ai course, and I am over the moon about it. In addition to thinking Sebastian Thrun is one of the coolest people of all time (self-driving cars would easily be the biggest quality of life improvement possible for first world countries), I think the idea of free instruction is the biggest opportunity for human advancement available today. I'm not using hyperbole when I say that the stanford ai course is a watershed comparable to the Gutenberg bible. The ai course is the proverbial butterfly's wing.

For the last several centuries, the cost to access and distribute information has fallen due to major disruptions: printing presses, cheaper physical delivery, the internet.

But the ai course isn't about content distribution or access. Stanford is offering instruction, based on the gold standard of peer comparison and competition, for free. As of this writing, they have 74,000 registered students. That instant community will collaborate to choose which questions to pose to the professors, and all students will all be ranked against one another. It is an unprecedented increase in the number of people trained in a specific field, and if you believe in humanity, you have to be excited.

I remember learning about the diffusion of solids in a solvent. There are two kinds of dynamics that drive the absorption of a solid into a fluid: chemical and thermodynamic. In the chemically driven process, molecules of the solid have a tendency to separate into the fluid. The lower the saturation of the solid in the fluid, the more likely a molecule is to drift off without being replaced by another molecule bumping into the solid. What's fascinating to me is that this effect is entirely local - if the saturation is higher around the solid mass, molecules are more likely to be replaced and to net zero change. If you had to rely on the chemical process alone to dissolve sugar in your coffee, it would take days to sweeten your favorite beverage.

The thermodynamic effect is convection. Differences in temperature drive fluids to circulate, because the changes in temperature cause changes in density, which in turn cause cooler fluids to fall through warmer fluids. The result is pretty dramatic mixing. The convection driven mixing of your coffee guarantees that the local saturation is always pretty low around your sugar cubes. Fresh, unsweetened, coffee is always swirling by your cubical simple carbohydrate. The mixing drives the time to dissolve down to a manageable minute or so, which is good if you want hot coffee.

Up until now, the world has relied on the diffusion of information in almost all fields. Advanced topics like ai need to be explored, then understood, then standardized, and then instructed before it can be truly common human knowledge. The stanford ai course is information convection. The incredibly broad distribution guarantees that many people who have never been exposed to AI will be taught. By teaching, rather than passively informing, the ai course could enable those new students to teach others. It is hard to imagine a more effective means of advancing ai.

It is exactly this information convection that I want to harness in my next venture - I want to give away tools, processes, and instruction to anyone interested in my kind of problem. I want to teach people how to explore a specific field, and I want them to apply their findings directly and immediately. I hope it has a fraction of the impact that the ai course will have.

Thursday, August 11, 2011

Pleased as Punch - Streaming CSV

I have to admit, there's a little extra spring in my step this morning, and it isn't just because it finally stopped raining (Boston's been wet for a week and a half) - I just published my first gem to : csv_streamer.

I'm working on a project that analyzes a very large data set - historical trading data for US markets. The analysis spits out a similarly large amount of data, and I let users download the final analysis as a csv file (but not the original trading data mr. exchange operator).

I have been using the very nifty csv_builder project, which provides template support for csv. To generate a csv file, you just pop arrays onto a csv object provided to your template, like this:

csv << ["this","is","an","example","line"]

Like all templates in Rails, you get a nice clean separation between your view and your controller.

The problem is, web servers limit the resources available to a single request. You can't spend 30 minutes and gobs of RAM generating a csv file, and then send it back to the browser in one shot. Most production web servers will timeout the request after 30 or 60 seconds. The implementation of the timeout varies, but increasingly web servers base the timeout on the time between data writes, rather than the close of the stream. In other words, you have 30 seconds to send your first byte, and then 30 seconds to make each additional write. The technique of starting the stream quickly, and then dribbling out data over longer periods is called streaming. But Rails 3.0 doesn't have native support for streaming templates.

Rails 3.1 includes template streaming as a marquee feature, however, in that case the feature is more generalized. They want to facilitate parallel requests from the browser (most browsers can handle 4 per domain), so that pages render more quickly. Rails 3.1 will help with the standard templates, but for an extension like csv_builder, the template handler itself needs to be modified.

Luckily, Rails 3.0 does have support for streaming data. The key is to set your controller's response_body to an object that implements "each", as described in this stackoverflow discussion, and in numerous screencasts and howtos.

From what I found googling my face off for three days, most people who need to stream data just do so directly from their controller methods. It works, if you aren't a wonky architect with a penchant for strict separation between your views and controllers who also invest a lot of effort in creating csv_builder templates. So, I wanted csv_builder's templates, but I also wanted streaming support. Ergot, csv_streamer!

csv_streamer is a fork of csv_builder (hopefully my pull request will be accepted and csv_builder will just have streaming). The project was pretty fun, because it involved reading a lot of code and learning all about streaming, ruby blocks/procs and yield. As it turns out, csv is just ideal for streaming, because files are generated a line at a time. My implementation takes advantage, and streams each line as it is generated for maximum smoothness (I meant that in terms of data chunks being small and frequent, but it could be taken brogrammaticly). The problem of streaming html is more complicated, because of the dependencies between document parts. In csv, the header is the only dependency, and it is always served first, so streaming (and stream oriented processing on the client) is simple.

Another interesting aspect of streaming is the dependency on your Rack server. Even if you code a streaming response in your controller or template handler, it will only stream to your client browser if the underlying web server supports streaming. Rails uses Rack, which allows you to swap out your web server quite easily. The default in development mode is the very antiquated WEBrick, which, among other deficiencies, does not support streaming. Both mongrel and the absolutely hilariously named Unicorn do support streaming. I was able to find more examples of configuring Unicorn - github uses it, for example.  Initially, I went with unicorn for development. I use Heroku for production, and it turns out  the default configuration does not provide streaming. Luckily, Heroku cedar allows you to use Unicorn, and there is a fantastic howto from Michael van Rooijen. In addition to streaming, you can pack multiple unicorn processes onto a single Heroku dyno, to optimize your utilization. Michael's post provides some nice benchmarking and analysis to find the optimal number of dynos.

If you need streaming csv support in your Rails app, add csv_streamer to your gemfile and have at it. You can get all the details from the readme on github.

To get you started even quicker, I created a test application to deploy onto heroku and verify everything worked as expected. Again, there's more detail in the readme on github.

I am, however, still stuck on automated testing. csv_builder uses rspec, and while I can invoke the streaming code in the template handler, the implementation of TestResponse doesn't have a timeout and it buffers all writes until the stream is closed. So, it is a good test for functionality - I can prove the data streamed is correct. However, I'd love to have two tests - one that requests very large data in a non-streaming way and verifies that a timeout exception is raised. A second test would stream the same template, and verify succes. Any hints are very welcome - I posted this quandry to stackoverflow as well.

I'll let you know if I figure out a test, in the meantime: Happy Streaming!

Thursday, August 04, 2011

designing with rails routes

I have been starting any new work in Rails by creating or modifying the models in the application, and then adding tests to the models. Up until today, I had found the transition in coding models to coding controllers to be very confusing. Models and models' tests seem very intuitive to me, but I was having trouble putting my finger on what made controllers so tough. I think I finally learned the missing piece - routes!

One of the cooler elements of Rails is the way it inherently supports ReST. The routes.rb file's "resources" keyword allows you to quickly express the structure of your ReSTful API. The buckblog has a nice brief on one of the more useful bits: nested resources, and links to the more canonical tutorials from the Rails community. Here are the most useful things I've learned:

  1. When you are ready to modify your controllers, especially if you want to add a new resource or change relationships between them, start with the routes.rb file, and think of 'rake routes' like a compiler. Make changes and 'rake routes' to make sure they reflect your intent before you start writing tests or views.
  2. Rails does quite a lot to automate and abstract you from the details of routing. You do need to understand a few key concepts though:
    • A controller has many actions. 
    • Routes and Actions are one to one (but...)
    • URLs and Routes are not one to one. ReST uses the http verb to distinguish between reading and deleting at the same URL. In other words, a Route is just a URL plus a verb. 
    • Resources are a collection of routes, usually pointing to one controller. Resources provide nice ReSTful semantics, making the intent of your routing more clear in routes.rb.
  3. Nesting resources is a strong spice, best used sparingly. In my mind, nesting is ideal for certain actions for a 1:N / one-many relationship. I found it very important to understand that you can pick and choose routes you want to nest. A single resource can have both nested and unested routes. Imagine you have the classic "post has many comments" relationship. 
    • I'd suggest using nested resources routes for 
      • :index = since you will almost always want to filter comments down to comments on a single post, build it into your route
      • :create = you will always need to specify a post for your comment, so build it into the route
    • But I'd avoid nesting the resource routes for
      • :destroy, because you'll be most commonly deleting a single comment
      • :show, because :index is for listing, :show is for a single comment. Why require the caller to specify both the post id and the comment id?
    • I think :new is very debatable, and ultimately depends on the structure of your pages. To set the action of the form properly, you need to have the parent's ID. If you are rendering the form in a context that has the parent set already, you may not need to pass it along to the controller via the request. But if you find yourself putting the parent's ID into a parameter, you should nest the :new action.
    • Rails creates convenience functions that will generate the URL path for a particular route. There are decent conventions for the naming, but to be honest, I've found the patterns difficult to remember or apply by hand. Mostly, I have been running into trouble when dealing with controllers/models that have multi-word names, but often with the "natural" way Rails deals with singular/plural names. Now I don't even try to remember, because you can always get the function name from the leftmost column of rake routes. 

      Friday, July 29, 2011

      System Administration Nirvana

      I don't consider myself a true sysadmin, but inevitably you need to dabble a bit as an admin if you want to build anything fun. Part of my current project is a custom python daemon. My web application posts jobs to a database (if you will allow me to call mongo a db...), which the python daemon monitors. When a job is posted to the db, the daemon picks it up and does the processing.

      Of course, I have somewhere between a few and several bugs in my monitoring process. So, from time to time, I need to restart the process. I just invited my first alpha user to test out the site, so while my audience is minuscule, I'm still very worried about the site being dysfunctional.

      Enter RightScale. Today, in about 4 hours, I learned all about monitoring at rightscale (they use collectd) and I enabled it for my job monitoring servers. It was easy to add a plugin to monitor my custom application -- I just configured the standard processes plugin to track my daemon. Immediately, I was able to see count, cpu usage, mem usage, and disk io for my process. Very useful. I added an escalation to email me when the process crashed*. That was neat... but then I had this vision of myself fishing with my son, getting an urgent email, making him quit fishing early (tears), and then speeding home, all just to type "kill -9 ". So, I made a custom alert escalation on rightscale to restart the deamon if it crashes. Pretty simple, but something that would have taken days in the past. I would have spent a week just comparing all the options for monitoring systems, and figuring out how to install one on all my servers. 

      Another nifty trick - when I invited my testers to the site, I wanted to have separate staging and production environments. So, I clicked the "clone" button, and presto, my whole environment was replicated. heroku_san made it even easier for the web application.

      Anyway, wish me luck as the first user tries out my new project!

      * Yeah, some of my bugs are still crashing bugs. Sorry Joel Spolsky, I don't have a QA team for this either. I do have 200+ unit tests though!

      Friday, July 15, 2011

      Dunkin' Donuts and Kettleers

      The Dunkin' Donuts in our little town is always full of interesting people. Munchkins and mild coffee attract fans from across all social strata. Everyone from LandRover driving I-Bankers to LandRover driving local cops to fisherman to pasty software engineers turn up for the same fix. This year, the guy manning the counter is unusually chatty and well-spoken. He's downright loquacious and eloquent. This morning I asked him if he was going to get out to enjoy the weather (top ten day today: warm, breezy, and no humidity). "No, but I go to a baseball game almost every night, so I can't really complain. I write the game summaries for the Cotuit Kettleers and whatever I feel like writing on their blog."  His blog is titled "Living the Dream" and he told me that the pay is so paltry he has to work a full week at Dunkies in order to be able to stay capeside for the summer. Despite staying inside on days like today, he seems pretty thrilled. I'm not sure if he did it intentionally, but he couldn't have chosen a better side job. He will likely meet half the county of Barnstable at that Quags.* For a guy looking to make it in media (dream job is writing for ESPN), meeting people and making an impression is a big deal.  Writing every single day for the summer doesn't hurt either.

      *Quagonut == Donut in family slang. Hence Dunkin' Donuts == Dunkin' Quagonuts, or Quags for short. Like Rock 'n Roll, Quagonut might once have been quite vulgar, but now it is as pure as a powered jelly donut.

      Thursday, July 14, 2011

      pymongo and index creation

      For my current project, I'm working with a pretty big document db - 100M documents so far, and I am considering scaling it to 25Billion over the next few weeks.

      The indexes on the document collections take several minutes to create or ensure. By default, mongo locks the collection while the index is "ensured". For smaller collections, it is no big deal. But for a rebuild that takes up to an hour, the blocking operation is a big problem. Turns out mongodb supports background rebuilds, but it wasn't clear that the PyMongo driver exposed it. PyMongo's documentation on collection.ensure_index didn't mention backgrounding. However, the driver has a passthrough for keyword args, so based on mongodb's documentation for backgrounding, I just tried it. Seems to work!

      Centerville Library

      I discovered yet another reason to love the public libraries out there. We're down capeside, so I had been working from the house this morning. That kind of blew up my whole transition from play to work, since my office is one corner of the kids' play area in the basement. Most of this week I was coding with one hand and wrestling one of the boys with the other.

      Today I drove over to the Centerville Public Library. Quaint little place, and just jammed with people pursuing all kinds of interests. I see a guy writing for a magazine, a bunch of kids doing puzzles (sunny here, but the wind's blowing too hard to do anything even near the water), and there are four guys playing chess. Play isn't the right word - they are doing battle. A grown man, maybe in his 40s just resigned a game and stormed outside to cool off - he was too upset by his play to continue. He was midway through a trouncing, having his hat handed to him by a gentleman twice his age. Nothing like a beating from an octogenarian to make you angry. 

      If I get my next milestone coded up soon, I'm going to ask for a game...

      Wednesday, July 13, 2011

      Sabbatical == Coding 10hrs/day == pure joy

      After a nine year sprint, I decided to take a restful break from Tamale. What's more relaxing than building a new piece of software?  I'm a few weeks away from a preview release, but I'm having a dandy time building. I had forgotten the sheer pleasure of focusing on one idea for hours on end. The daily routine is pretty terrific - wake up with the kids, a family breakfast, and then I head to the public library. In addition to the free wifi, and the surprisingly robust internet speeds, the library is the perfect level of quiet but not boring. I can fit in plenty of quality people watching (mostly moms corralling kids, but there are a lot of retirees too) while my unit tests run. In the past, I relied on starbucks for working on these pet projects. But I have to suggest using your public library. Not only is it free, but I don't overdose on coffee and baked goods (though the library does have a keurig for $1/cup). Best of all, I can walk to the library from my house. I think this is key for productivity - leaving the kids and homelife at home for a few hours lets me really immerse myself in my project. Of course, I do walk home for lunch and sneak in some playtime with the kids, but again, the walk back and forth is a great way to mentally transition from play to work and back again. Likewise, walking home makes me leave all the coding problems at the library, so I can really enjoy my time at home.

      If you are thinking of a web application, before you get started, here is one page you have to read. My friend and Tamale co-founder Nader started Kapost a year and half ago or thereabouts. He took the time to carefully document the best technologies he found during the setup of his new product development team. For my current project, I found his notes invaluable. My personal favorites are:

      • Heroku. After a decade of wrestling with deployment, I bow in awe of Heroku's git push deployment. Rails ain't too shabby either, but I think Heroku is a bigger deal than Rails itself.
      • MongoHQ. Despite the ribbing I'll receive for saying this, I'm a fan of mongodb, which is webscale. MongoHQ  does a great job with provisioning new instances on AWS.
      • AWS and RightScale. I've done hobby work on the google app engine, but this is the first time I've used amazon. In addition to the infrastructure I'm using that runs on top of AWS, (web application via heroku, and the mongodb instances via mongohq), I needed to run some custom python applications as well as a third-party windows application (I know, I know). RightScale makes the most sense if you want to run infrastructure on multiple clouds (e.g. Rackspace and AWS), which I am not doing. But, the organization of all your scripts and templates in RightScale is so nice I'm using it for a pure AWS deployment. My favorite concept from RightScale is the "roll forward" approach to deployment and the inherent versioning of all your system config and data. I also love Elastic Block Storage. It is amazingly useful to be able to move a hard drive between machines, or use it to hold state as you update a machine config.  

      I actually wrote a slim version of my new application on google app engine. I was pretty happy with the experience, except that I wanted to be able to access the data store from a web application environment and a standalone application environment. GAE was excellent for the web app, putting aside whatever you think of django vs rails of course, but it wasn't possible to share the data storage with non-GAE applications. My biggest hesitation was learning Ruby and Rails. I was excited to be coding, and I feel pretty comfortable in python. However, I didn't know the first thing about Ruby or the Rails framework. Precursory searches left me thinking that I was missing out on something, but that it would be difficult to catch up with the RoR community. In my experience, it is much more effective to keep current on a technology from its inception, than it is to ramp up and stay current with a more mature ecosystem. Rails is roaring along, growing in many directions at once, so it is a bit of a bear to learn on your own. Luckily, I found a tremendous book - Ruby On Rails Tutorial. One thing about this book set it apart from every other technology tutorial I've read - the author urges you to read all the chapters, in order, and to really code as you read. I found it was the perfect way to catch up on RoR - the book explains how the major parts of the community work, what the current best practices are (e.g. he explains how to use github), and how to do more research. Best of all, I was able to use the tutorial as a baseline for my own application. By the final chapters of the book, he has you working out many to many and self-referential data models. Rather than implement the twitter clone that is the basis of the tutorial, I just mapped the ideas to my own problem. For seasoned developers, I think this is the ideal approach. You have example code and explanation of the concepts in the tutorial, and you can be productive and minimize the tedium by translating the examples to your own domain.

      Friday, March 18, 2011

      McDonald's - food ATM

      I went to McDonald's tonight. Besides regretting my decision to consume 10x my average sodium intake, I have been thinking about how automated it is. Since I paid by credit card, the only work the cashier did was say hello, type the number 8 into the checkout terminal (I had a #8 meal), and pass me the receipt and a paper cup.
      Impressive, but I think the transaction was as unhealthy for the cashier as it was for me. It can't be good for your mental state to be a cashier who doesn't count change or even swipe a credit card.
      Maybe if there were just food ATMs instead, McD could elimnate cashier positions and then pay the people making the food a living wage? I guess the company would just pocket the savings.
      It may seem harsh to alk about firing all these cashiers, but I'm not sure how the cashier's responsibilties can ever justify more than minimum wage. I don't know if this is valid economic reasoning, but it seems like having half as many employees earning twice as much as they do now, covering twice the responsibilities would be good.
      The workers would have more interesting work and make a better living. The country would have fewer minimum wage jobs (assuming all McD employees are min wage), but many more living wage jobs. McD would have happier employees, which almst certainly means happier customers. In a way, Starbucks proves this with their employment model. Starbucks pays a living wage, provides benefits, and in my experience has better service than any fast food chain.

      Monday, March 14, 2011

      sponsoring higher education

      There are often headlines lamenting the US unversity system's inability to produce students in the right areas. Will we have enough software engineers? Enough nanotechnologists? Chemists?

      At the same time, I don't think most entering college freshmen have enough exposure to any profession to make an informed choice about their area of study. Most people I know switched majors at least once. The only profession that universities can competently explain and expose for students is academia. I know the common wisdom is that college is where you should explore and figure out what you want to do, but I think that is just terrible advice.

      Maybe that made sense in the 60s, when a degree was relatively cheap, a bachelor's virtually guaranteed employment, and everyone was into "figuring themselves out". Welcome to the new age, where not only does a degree not guarantee work, it costs an absolute fortune. I just don't think it makes sense to invest so much money in a college degree without knowing, at least roughly, what you want from your education. I may sound old when I say this, but students need to take responsibility for maximizing what they do with their four years of undergraduate study.

      The problem, in my mind, is the lack of exposure. There are so many professions, but graduating high school students are likely only exposed to a small sample based around their parents and pop culture (aside: the social network could do more to create software engineers than all public and private programs combined. Software engineering looked cool, even sexy, in that movie). But, the sooner students can find the field that inspires their passion, the sooner they can start on a path to expertise. I'm not arguing against liberal arts, I think companies like Apple are the culmination of our investment in liberal arts education. I am saying you will have a better liberal arts education if you are aiming at a particular field. So, how can we create a system that exposes students to interesting professions, and helps target university graduates for the most demanded degrees?

      Corporate Sponsorship.

      Companies should hire high school students into internships designed to train them in the fundamenals of a field. Then, if students choose a relevant degree program at an accredited university, the company shoud pick up their tuition, books, and living expenses. In exchange, students should intern each summer, maintain a minimum GPA, and commit to full time employment for four years following graduation. Smart companies investing in the long run would jump at the chance to add some predictability to their hiring, especially for cyclically demanded professions like software engineers. Smart students can go to college for free, and know they are directing their talents wisely.

      Tuesday, January 18, 2011

      Evergreen to Relocate Factory to China

      NYT Economx Blog reports that Evergreen, a Massachusetts solar panel startup, is moving 800 jobs to China to contain costs. This is an example of exactly the effect I predicted on this blog - Green economy dollars spurring Chinese rather than American job growth We need to invest in the means of production - a second industrial revolution - not just in the products of high growth industry. Subsidizing innovation in industrial production that can dramatically reduce the labor needed for manufacturing, and we will actually create more American jobs.

      Thursday, January 13, 2011

      Just trying some new software...

      ...for editing posts from the iPad. Hoping it is easy to work with content from flipboard...