COVID-19 Dashboards
GitHub staff machine learning engineer Hamel Husain has built a free, open source project called the COVID-19 Dashboards, sharing daily updated graphs about coronavirus cases globally. He’s received 300,000+ views of the site already, including data scientists, doctors, and epidemiologists, and has over 23 contributors worldwide. Husain worked with the non-profit organisation fast.ai to build this project.
Data in a pandemic
The spread of a pandemic brings with it great uncertainty, and when things are unclear, data matters more than ever. Large, complex systems like healthcare or government rely on accurate information to make informed decisions, and policy makers rely on predictive trends and charts to make decisions. However, not everyone who analyses data communicates their findings to the world - and Hamel saw the opportunity to help people globally combine their brain power, on an open-source framework: creating COVID19 Dashboards.
How it started
Hamel is based in Oregon, but works remotely for GitHub, and in just two days he had put the site live. He started by hosting just three or four graphs that he’d seen posted on Twitter, putting them on the site and then asking the authors what they thought. It started to gain momentum. The CEO of GitHub has tweeted about it, Hamel had Business Insider write an article about it, and he has also had messages from people all over the world about the dashboards, including Doctors and Epidemiologists, with data from John Hopkins University live updating.
Why dashboards?
Hamel Husain previously worked at AirBnb, and currently works at GitHub: a company that brings together the world's largest community of developers to discover, share, and build better software. Through years of open-source collaboration, he saw an opportunity to help the response to the pandemic. “I think the Trump administration” Hamel says, “has couched this all as the trolly problem: you either hit one pedestrian or the other one. And that’s not really the right way to think about it, economy Vs. lives. I think it’s a fallacy of judgement. If a lot of people die that’s going to hurt the economy too. I don’t think it’s a choice that we need to make, and I'm really saddened that we’ve taken that approach. When I look at the numbers I see a calamity coming, a historic number of deaths, and it’s hard to forget that”.
He’s not claiming to be an expert on what’s happening: “I don’t know that much about it, even though the side project deals with it; I’m not an epidemiologist, I don’t practice medicine so in that sense I don’t really have useful opinions about it.” He is, at heart, a data scientist - and one of the problems that data scientists have is being able to communicate information freely and openly with low friction. “You’ll do an analysis of some kind, maybe write some code, you write a report, have some visualisations - and then you want to share it with the world. Really, in science the most important thing is the ability to share information, so what we’re trying to do is make that easier”.
What’s particularly special about the COVID19 Dashboards is that unlike normal articles or graphs, you can actually see where the data is coming from. You might be asking ‘how do you calculate that rate? How do you even show that? What tools do you use?’ and with Hamel’s COVID19 Dashboards you can actually answer those questions. “It’s a useful resource for people at a meta level to go find information, it’s very accessible”, he says.
Making data science globally accessible
For data scientists, one way to share cutting edge information is to submit it to a journal, but this a slow, and very academic, process. Another way is to write a blog. Most people will use something like Medium to write a blog, but if you’re a data scientist it’s a little harder: you have to copy and paste your code, format it, add charts and graphs: there’s a big barrier. Which is also a barrier to collaboration: ”if someone reads your article and wants to read the code, you have to link somewhere else, so there’s low transparency around how you built a dashboard or came to a certain conclusion” says Hamel.
COVID19 dashboards run on what was built during another of Hamel’s side projects: fastpages. Fastpages came out of working with a non-profit called fast.ai, run by Jeremy Howard: ex-president and Chief Scientist at Kaggle. Fast.ai was set up to make Deep Learning more accessible. “They are a wonderful project and community, focused on education of machine learning and data science, and I participate in that community”, Hamel says passionately, “it’s actually a tragedy. In the data science community we have some wonderful tools that we use mainly internally.” For example, with Jupyter, you can create notebooks: you write some code, the code will generate a graph - and then you can share the whole thing with another data scientist who can see both your analysis and code, all in one place. That’s what Hamel wants to turn into something collaborative. “Using Github”, he explains, “you can share your Jupyter notebook into a folder and it becomes a blogpost automatically. It’s hosted by GitHub for free.”
He had just finished building fastpages, when the world changed: “this virus came up. And I saw a lot of people sharing notebooks everywhere on the internet and I thought ‘maybe they should use fastpages for this. Why not?’ Fastpages is really exciting, a lot of people are using it, and this is a situation where people really want to share information. I thought let’s create a site to see what it’s like, and maybe it’s a good use case for fastpages. I had no idea it would be as popular as it was.”
Collaboration at the heart of the COVID19 Response
A lot of the dashboards have been edited and corrected: there are people who are making these charts, others are reviewing charts, others are fixing charts, others are organising the community. “It allows things to be crowdsourced in a really robust way. We’ve listed all the contributors too. There’s some folks at John Hopkins - I don’t know them - and we’re using their open source data too”.
Side projects and main work
Hamel seems to expertly mix his side projects with his main work, and GitHub plays a central role in fastpages and the COVID19 dashboards. “A lot of my side projects I can blend with my work” he explains “it took a long time to get in this situation but as it happens that there’s a strong intersection between what I work on and what I like to do as a side project.” With the COVID19 Dashboards, Github serendipitously benefits too: more people are using it, and it showcases a new way of using GitHub, with new features.
“It’s hard to do this” admits Hamel “but if you have a side project, you want to figure out how to make it your job as fast as possible. You have to be opportunistic and be creative about what that means. There’s been no boundary between side projects and my work for like two years. It was a sudden shift. Something turned on inside my head.”
Hamel started to contribute to lots of open source data, and saw an opportunity to share them with the outside world. “I was like, OK I’ll build a NLP model”, he says, “that summarises the issues that people open - and I can share that with the world: the data, the code, everything. That was a shift for me. Sharing information. You do a project? You write a blog post, and you share it. For a long time I didn't think that was useful, but I started doing that, and when I learnt something new I’d write a blogpost about it, and I learnt the unreasonable effectiveness of doing that. That gave me a lot of excitement, and it was a positive feedback loop. Oh you can build things, and share it, and many people will find it useful? Its incredibly satisfying. So I thought more about building things and sharing them.”
How to progress and finish a side project
In terms of structuring a project, Hamel works backwards: “I try to imagine what the blog post of video or talk that i want to about this, and how exciting this would be if it existed. And then I work backwards and I build it. I work my way towards the blog post as fast as possible. And everything else gets worked out in the middle. I need to get personally excited. I don’t care if anyone else gets excited, I think ‘this is amazing’, and if I personally believe this is cool I just start building it. When it’s close to being done I might ask somebody [about it].”
He’s not this confident all the time. “When I first think of the idea, I think to myself: how am I going to do that? Is it even possible? And then halfway through I think ‘maybe I should just give up? Am i really going to follow through on this? It will be really embarrassing to myself if I give up on it…’ I’ve maybe had ten or fifteen different side projects, and I think doing it in spite of that [doubt] is really confidence boosting. Consistently following through gives you a lot of confidence. You can look back at that and say there’s been 3 or 4 of these situations before, I can figure this out.”
“I’ve got colleagues that talk a lot about side projects.I have a lot of friends who will talk about ideas and they’ll debate the ideas for hours, but no one will do anything, and it’s a waste of time for everyone involved. 99% of people who are interested in side projects want to talk about them but not do them. It’s hard to get over the barrier of just doing it. Once you’ve crossed that barrier from talking to doing, you don’t want to talk that much, you only want to talk to people who are also set on doing.”
COVID19 Dashboards currently have 23 contributors listed, and the data is updated twice a day, with support from GitHub. The phrase Hamel seems to use most when talking about side projects is “Why not?”. He said it three times in this interview. For those thinking of creating their own side project, or helping build something to help those fight COVID19,- ‘Why not?’ might be a good place to start.
Check out the COVID-19 Dashboards here, check out fastpages here, or the fast.ai community here.