Wednesday, 14 October 2015

Voice of Developer (VoD): Open Source, Not That Scary

www.toptal.com
BY ANNA CHIARA BELLINI - DIRECTOR OF ENGINEERING @ TOPTAL
As a developer, it’s exciting and challenging to stay up to speed with the latest trends in technology. Every day, new languages, frameworks and devices capture our attention and spur conversations in meetups, forums and chats. However, our developer community is made of people, not tools, and it’s fascinating to explore its sociopolitical aspects (for lack of a better word; “social” tends to be associated with social networks, these days).
At Toptal, we recently had some interesting conversations on how much women contribute to open source and what may be preventing them from contributing more, so we have investigated the matter. Having been part of that conversation with Breanden Beneschott and Bozhidar Batsov, I got wondering: Bozhidar is one of the top open source contributors on GitHub. Where am I? If you check my public GitHub account as of today, it is mainly small test projects that I used in class for my students. They are half-baked, and definitely not representative of my skills or expertise. (You will have to take my word on this.) If someone were to consider hiring me based on what they can find in that account, I guess I would have a hard time making a living. Still, I have been a professional developer for more than 20 years, and in my everyday job I’ve been using more open source software than I care to remember. Over time, I have hacked the Linux kernel to bend it to some specific need, tweaked every router and NAS that I bought, patiently waited for months in the Raspberry Pi waiting list to get my hands on it and get my home-made domotics as I like it. Still, none of these tweaks and tests ever made it to my GitHub to become open source. Also, aside from fixing a bug in one of the first versions of Tomcat, I never contributed to an open source project. Curious, isn’t it?
You could think it’s just lack of time or interest, but I know it is not. As for my personal projects, I may have thought nobody could really be interested in what I have done, but mostly it is just that the very idea of publishing my work there, for everyone to see and for posterity, scares me a lot. And while you can always tear down a personal project from GitHub, the day you try and contribute to a widely available open source project, there is no turning back. What if my code is not good enough? What if I didn’t understand the problem correctly? What if my pull request gets rejected? What if people troll me?
Is open source that scary?
Is open source that scary?
A quick round of calls with fellow friend developers, mostly female, soon convinced me I am not the only one with this problem, but for an engineer there are no problems, only solutions, right?
This is an important problem to solve, because contributing to an open source project can make a dramatic difference:
  • During your careerMany clients will look at your social everything before deciding to hire you; your GitHub account and your LinkedIn résumé are top of the list, together with your Facebook and Twitter profiles. You should use them wisely.
  • For your technical skills: Examining a codebase written by other developers, and often very good ones, teaches you a lot. The ability to extricate the meaning out of a badly written codebase will challenge and teach you just as much.
  • For your soft skills: Open source software is a collaborative process, and almost all interesting projects out there are built by teams. Learning to work with other developers through the tools that everybody uses, to blend in with the team, to communicate efficiently, is what will make you a great developer, not just a skilled one.
  • For the community: Every single bit you contribute to an open source project counts. The more you contribute the better, but even fixing one small typo in a translation will make the final product better.
  • For your network: You can send hundreds of résumés to companies, but nothing works like having colleagues with personal connections. Getting actively involved in an open source project will ensure you meet people and gain their respect, and your reputation will grow, which is invaluable for any professional.
This is my little personal journey in fighting this fear. Publishing this article is part of the journey itself. I am writing it in hope that anyone who is blocked on writing a blog post, or is afraid of making even a small contribution, will see that in the end, it wasn’t so scary. Also, it is meant to help anyone who would love to contribute to open source, but doesn’t really know where to begin, so I will get started with the basics.

What is an Open Source Software and where do I find it?

Open source software, or OSS for short, is any software released with its source code, and includes a license that allows you to modify and redistribute it. It can be delivered anywhere: on a website, through a mailing list or with an owl. The most common scenario, and the one that we are interested in, is when the codebase is maintained on a collaborative repository. Here, we are focusing on Github, but there are other options, such as SourceForge and Bitbucket. Github is very friendly, has a huge user base, can be used for any kind of code, and with any development environment you use. Importantly, it is also widely used for non open source projects. Chances are that your next client project will be hosted there, so knowing how to work with it is a useful skill in itself.

What if I don’t know how to code?

If you’re reading this, it’s likely that you want to learn how to code. You can find amazing courses on several free and paid websites. You should choose one language to learn; if you don’t have a preference, go with JavaScript. You already have everything you need to start on your web browser and it is one of the most widely used, and marketable, skills. My personal favourite is Python, which is used both in web development and in scientific applications. I, also, have a personal favourite beginner course, “Intro to computer science” on Udacity. I like it because it is a hands-on course, where you work on a project as you learn. You can also find several other courses on Coursera, Khan Academy and PluralSight.

What if I don’t know Git?

As mentioned before, knowing Git is important, so, take a Git class. Do it even if you’ve been working with Git for a while; you don’t know how much you don’t know about Git until you really study it. Do it if you cannot confidently explain what the rebase command does. Do it even if a wrong rebase doesn’t scare you. I took the full Git path on Code School, but again, you can explore other sites for more options.

How do I choose a project on GitHub?

It’s probable that you use some OSS in your everyday development. Choosing a familiar framework is good starting point; you are already familiar with the features and how framework works. When you dive into the source code, you will learn more, and you will understand its logic even more clearly. If there is a technology or tool that you particularly like, look for projects that mention it, or for the tool’s project itself. As a last resort, you can check the projects on GitHub Showcases and start by choosing a category that interests you.
For example, a quick search for “Raspberry” in GitHub’s search shows more than 17 thousand repositories. It is easy to get lost, so look for a project with a good community and good issue tracking. When choosing a project, check the number of:
  • Contributors: Target anything above ten contributors. This should ensure that project has enough interest and is not simply a small team effort. If you are new to OSS, or not too skilled, limit your search to projects with at most fifty contributors; larger communities imply larger codebases and more complicated projects.
  • Commits: Go for projects that have at least a thousand commits, and where the most recent activity is no more than a week old. A project that has been inactive for a month or more is old and stale in OSS terms, and you probably won’t get fast any responses. Daily activity is the sign of a healthy project.
  • Issues: Issues are open problems, bugs that have been reported or requested features to implement. They will give you a starting point and are a good metric of the interest in the project.
Also, find out what the project’s the major language is; you can see the language statistics in the top bar of the main project page. Take some time to read the tone of the discussion, see how friendly and educated the comments are. Some projects are infamous for their aggressive communities, thus they may not be the right starting point.
I chose ScyllaDB a columnar data storage project since I have a fascination for data-anything that is performance related. I’ve never worked with it, but I expect to be able to dive into its codebase. It may be simpler to work with tools that I know, but I’m taking this as a challenge and an occasion to learn something new. For the rest, it fits the bill perfectly; it has 18 contributors, 6.5k commits (the most recent was 23 hours ago at the time of writing), 178 open issues and appears active.

What do I do now?

First, clone the repository and install the software on your machine to get an idea of its moving parts. Then, start reading through the issues. Once you feel ready, see if you can reproduce the issue on your machine and then start analyzing what makes the software misbehave.
Another approach would be to find something that you can improve, or modify, yourself. Maybe you noticed a typo, or a misaligned font, for instance. I chose to fix a small bug, specifically a wrong variable name used in a script’s documentation.
It seems tiny, but wrong documentation is much worse than no documentation. Users will install ScyllaDB and follow through the installation steps, they will rely blindly on what is written in that script, and will end up in heaps of frustration. This was perfect for my abilities, and fixing it will require me to follow the whole process, and get a bit familiar with the codebase. Bugfixing is boring, but it’s a great start to find your way into a project.

Creating a fork

This may be trivial, but at the moment, for the ScyllaDB project, I am Ms. Nobody; it would be risky to allow me to make changes to their code without supervision. What I need to do is create a “fork” in my own GitHub account. Here is my ScyllaDB fork. It is my own playground where I have access to all the code, and I can modify the files as I wish. If I wanted to create my own version of ScyllaDB and tweak it to do something completely different than its original purpose, I could do so here. Creating a fork is simple; go to the project’s main page and click the “fork” button. Not scary at all.

Time to fix the bug

Now, it is time to test the code on your computer and make necessary modifications. First of all, make sure you have installed Git client on your machine. Then, add your SSH public key to the GitHub, and make sure it is loaded by your ssh-agent. Getting the code locally is simple; just use the git clone command that points to your fork, instead of the main branch:
git clone git@github.com:acbellini/scylla.git
By now, you should have tested the project on the main branch, so you are going to build your code locally and test it the same way. Keep in mind you will have to fork any other GitHub projects on which your project relies, as references are relative. In my case, I had to fork seastar, scylla-ami and scylla-swagger-ui.
The bug I need to fix is relatively simple; the documentation in conf/scylla.yaml mentions three configurable directories: One for data files, one for commit logs and one, apparently unused, for caches, all of them defaulting to some subdirectory of $CASSANDRA_HOME:
Diving into the open source code
Diving into the open source code
Diving into the code, it shows that the defaults are different and, as mentioned in the issue #372 that I started from, $CASSANDRA_HOME should not be used. I validate my hypothesis by testing the code with a couple of different settings, by removing the setting from the config file and checking which directories are used. Once confident enough that everything is correct, I can add, commit and push the modified file:
git add conf/scylla.yaml
git commit -m 'Correct default directories values in conf/scylla.yaml #372'
git push
Note that I introduced the issue number preceded by a hash in the commit message. This will tell GitHub to automatically link my code to the issue itself.
Another important thing to note is that, when I surveyed the code, I realized that the third directory, the one for caches, is actually not used. It’s tempting to go a step too far and remove this setting itself, or add a comment that is not used, but that would be outside the scope of issue #372, and it would be wrong to commit anything that is not strictly related to this issue. You must keep your changes focused and limited to the task at hand.
At this point the code is fixed and is on GitHub, in my private fork. This is where the scary part comes in: Asking the ScyllaDB people to accept my code. This is called a pull request.

The final step: the pull request

I like to create pull requests directly from the web interface on GitHub. I find it more intuitive and error-proof than trying to do it from the command line. All I have to do to create my pull request is click the small green button next to my branch name:
Creating the pull request on GitHub
Note that the comment automatically computed by GitHub, my branch now has one new commit, but since creating my fork there are 14 more commits in the main repository, so I will click the green icon on the left.
Comparing changes before creating the pull request
Luckily, my single commit doesn’t conflict with the 14 others, so GitHub informs me I am good to go. I don’t need to add any other comment or message. The commit message, while being very short, says it all: What my code change does and what it is related to. As I click the last button to confirm my request, I wonder what it was that I found so scary just a few days ago. There is no monster roaring at me right now, and the flames of hell don’t seem to be burning. Honestly, it was not scary at all. In the unlikely case I got it wrong, my fix will not be accepted and that will be it.
If you now check the issue details, you can see that GitHub added automatically a note that there is a pull request referencing this issue. This is the magic of that #372 in the commit message. This will help avoiding other people wasting time to fix something that has already been fixed.
Open source is not that scary at all
Open source is not that scary at all.

Final notes

Now I am waiting for my pull request to be accepted, I will receive a notification when that happens. Keep in mind this can take a few days, even weeks; someone has to review my code, test it works as described, fixes the problem, and, ultimately, make sure it doesn’t adversely affect the functionality of the rest of the code (read: creates new bugs). All this takes someone’s time, so be patient. In the end, when my pull request is accepted, ScyllaDB will have one more contributor, one less issue and I will have my first OSS contribution. Now, it is time for you to try it, too. After all, it is not scary at all.

No comments:

Post a Comment