http://www.toptal.com/
Women are
underrepresented in tech. This realization is nothing new. Just look at numbers
released by Facebook, Google, Intel, Slack, and many, many
more. But the numbers might be even worse than these reports imply.
At a recent
tech event, I overheard a side conversation about the lack of gender diversity
in tech. The small group was discussing the fact that even though women make up
about 30% of the workforce in tech, higher level engineering teams rarely have
more than a few women.
One of the
participants in this conversation commented that this was because male
developers are just generally more talented than female developers. No one in
the group objected.
Hmm…
From personal
experience at Toptal and my university experience in engineering at Princeton,
which was nearly 50/50 male vs. female, I know this is false. I’ve worked with
a number of incredible, profoundly smart female engineers in all kinds of
roles. Yet the numbers don’t seem to match my own experience, especially when
you start looking at more senior engineering roles.
And
addressing this disparity is important. It’s not just diversity for the sake of
diversity. If men and women are equally intelligent, statistically speaking,
then out of the smartest ten people in the world, five should be male and five
should be female. Thus, if your team is anything less than an equal balance of
men and women, then your team is probably not the best it can be.
If your team is anything less than an equal balance
of men and women, then your team is probably not the best it can be.
In a perfect
system, diversity is a probabilistic result. But these aren’t the results we’re
seeing.
After overhearing this conversation, I wanted to take a look at numbers
to better understand if/where software team building tendencies were going
wrong. I searched Google for trends in the gender breakdown across skill levels
in software engineering, but I wasn’t able to find much, so I decided to look
at the publicly available data on GitHub. I scraped 5,000 profiles to get
names, number of followers, number of contributions, and number of
repositories. I then used the open source package genderize.io to figure out the gender of each profile.
There were so
few women in this first batch that I had to add more data to make even simple
graphs significant, so I scraped 15,000 more.
Here’s what I found:
Is Open
Source Open To Women?
Open Source
Is Dominated by Men
Even before
getting into any further analysis, it was obvious that the percentage of women
was extremely low. Of the 20,000 profiles, genderize.io was able to confidently
determine the gender of 15,374. Of those, just 6.0% (926) were women. The
disparity gets more severe once you start taking a look at user activity.
Let’s take 10
contributions as the cutoff for the difference between a user who has just
created a profile and maybe experimented a bit and one who has at least delved
into an open source project or started their own. The result: 5.4% women.
Just 5.4% of GitHub users with over 10 contributions from our random
sample are female.
In fact, if
we divide users into buckets according to their number of contributions (with a
minimum of 1,000 users in each bucket), the percentage of female users tends to
decrease as contributions go up.
Not only are
there far fewer females on GitHub than tech industry gender diversity numbers
might suggest, but it looks like the percentage of females decreases as user
activity increases.
I kept
digging, looking at gender across number of followers and number of
repositories, and observed the same trend. This was especially clear when
looking at the number of repositories:
Again, we see
that the percentage of females decreases as we move to buckets with more
repositories.
So what’s
going on here? Is GitHub activity a reasonable indicator of programming
expertise in the first place? (I think it is.) Are talented female engineers
less likely to actively contribute to open source than their male counterparts?
Are these results another indicator of the tech industry’s entry/retention
problems when it comes to female engineers?
Why Are the
Numbers in the Open Source Community So Low?
Numbers for
women in the tech industry are already pretty bleak, but they’re even worse in
open source projects.
A lot of
previous research has focused on the reasons why women are not willing to
embark in STEM-related subjects and careers. Some conclude a general lack of
interest in STEM subjects. Others believe women decide against pursuing STEM
careers after being stereotyped by family and teachers. Still others cite a
lack of role models or a combination of multiple causes.
According to a studyon gender in StackOverflow, “The issue
of gender and STEM-related subjects has been studied for several years, and
mostly from the point of view of ‘why’ women do not engage with scientific
studies or careers. Lesser attention has so far been given to quantify the
phenomenon and representation of women in online communities (as
technology-‘users’), what are their levels of participation, and whether
differences can be detected at the gender level. Only anecdotal evidence has
been gathered on how specific communities actively discourage women from
participating.”
But when we spend so much time focusing on why there are fewer women
pursuing STEM-related subjects, we lose focus on another important disparity:
if 28% ofCS masters degrees go to women, why are the numbers in the open source community so much lower?
There are a
few possibilities to consider when thinking about an answer to this question:
1. Maybe
there isn’t a strong correlation between programming talent and GitHub
activity.
In the tech
industry, many developers go to GitHub early in their careers as it’s a
prerequisite to be taken seriously. However, it seems that fewer aspiring
female developers view open source this way. Is it possible that this data is
all coincidental and does not mean much in relation to the number of talented
female software engineers in the tech industry?
I discussed the question with two engineers at Toptal, Anna-Chiara
Bellini and Bozhidar Batsov. Anna-Chiara has over 20 years of software engineering experience across a variety of
academic and business settings, and Bozhidar is number 98 on the list of most activeGitHub contributors in the world.
Both agreed that while being active on GitHub is typically a good
indicator of engineering expertise, the reverse isn’t true, mentioning that
they know plenty of great engineers who aren’t involved in open source at all.
The tech industry agrees too, with many companies assessing GitHub profiles
during hiring processes (although this practice seemsto be quite biased, which isn’t
really a surprise given the results of my study).
GitHub activity is generally a good indicator of engineering expertise,
but the reverse isn’t true… Plenty of great engineers aren’t on GitHub.
Bozhidar
suggested that open source contributors are often more likely to be the type of
people who push for big internal changes in a company setting. Anna-Chiara
commented that it takes a great deal of confidence to contribute to open
source, something that she thought may be more difficult for female developers
to overcome, given the tech industry’s poor history with welcoming women.
There are
certainly several biases that could potentially be at play with this GitHub
data (including the fact that almost 25% of the names couldn’t be classified as
male/female with confidence).
However,
Bozhidar, Anna-Chiara, and I agreed that GitHub activity level is generally a
good indicator of programming expertise. Yet this data suggests a trend of
talented female programmers choosing to discontinue (or never start) their open
source pursuits in favor of other options.
2. Numbers
cited in tech company reports include non-tech roles.
Many
companies in the tech industry cite that they employ between 25 and 30 percent
women. This number, however, can be misleading. Most of these larger numbers -
yes, they are the larger ones - include both technical and non-technical roles.
As you begin
to examine the percentage of female employees in technical roles, the numbers
drop even lower.
At Facebook,
32 percent of employees are female, but only 16 percent of technical roles
belong to women. At Google, there’s a similar drop of 30 percent female
employees in the company as a whole to 18 percent in technical roles. Slack
drops from 39 percent female overall to 18 percent in engineering roles. Of the
companies I’ve examined, Intel has the smallest jump, going from 24.1 percent
female overall to 19.4 percent in technical roles.
So even
though many companies boast a percentage of female employees that is about a
quarter or even a third of the company, the number of women in technical roles
is actually much lower. It seems that claims of 15 to 20 percent would be more
accurate.
But that
still leaves a huge disparity between the percentage of women involved in
technical or engineering roles at tech companies and the percentage of women
who contribute to open source projects on GitHub.
3. Female
programmers are leaving the tech industry.
If activity on GitHub correlates with seniority and expertise, then the
extremely low number of active female contributors (low even compared to female
contributors overall) could be explained by the alarmingly highdeparture rate of female engineers from the tech industry.
Among women who join the tech industry, 56 percent
leave by mid-career, which is double the attrition rate for men.
If the tech industry can’tretain as many women past
their mid-career mark, then it’s likely that they won’t be contributing to many
open source projects either.
But this line
of reasoning also begs the question: Is the correlation between seniority and
contribution actually true? Many frequent OSS contributors are relatively new
programmers who are trying to establish a name for themselves - so where are
the women from that group?
4. GitHub can
be an unwelcoming community for female programmers.
Commentingon an article about
women in tech, one female developer says, “In regards to the open source
projects - I’ve been thinking about this recently. I actually haven’t committed
to any and it definitely puts a kink in my career… I feel like it’s a circle I
can’t get into. But mostly I fear the excessive spotlight of being a sole
female programmer on a publicly available project. In light of how women are treated
on the internet, this fear does not seem unreasonable.”
Anna-Chiara
believes this kind of apprehension is a common theme amongst female engineers,
especially when it comes to OSS. When I asked her if she thought women were
less likely to contribute to open source projects, she responded, without
hesitation, yes.
Anna-Chiara
also brought up the possibility that female GitHub users might try to adopt a
gender-neutral or male name to ensure they would be taken seriously (remember
that genderize.io was not able to confidently determine the gender of about a
quarter of the profiles scraped).
That does not mean, however, that female contributors are not out there.
Bozidhar brings up Exercism.io, a popular project started by Katrina Owen that has several female
contributors. He also mentions BodilStokke, a female developer from
Norway with an extremely extensive history of popular open source
contributions.
Anna-Chiara
also suggests that if a project had women among the top contributors or
leaders, female developers might be more likely to contribute to it.
Unfortunately, compared to the number of male-dominated projects out there,
female-led OSS projects are hard to find.
But the issue
is larger than just OSS. “If I think of the women I know in development, it’s
nowhere close to the 20% that you hear about at these big companies. I don’t
think it’s even anywhere close to 10%,” Anna-Chiara tells me. “The result of
this analysis of GitHub doesn’t surprise me.”
5. Implicit
biases that shape the tech industry might be trickling into GitHub.
Eric Ries points out problems of implicitbiases in the tech industry.
Even if individual people within systems are not biased, it is still extremely
easy for those systems to become biased. People also have unconscious biases,
which complicates the issue even further.
In his
article, Eric uses the example of orchestras, which were primarily all-male
until the 1970s. People believed that male performers had a superior aptitude
for music than female performers. However, once orchestras started separating
musicians from judges with a physical screen during auditions, the numbers
shifted significantly, and people began to accept that men and women played
equally well on average.
If similar
biases come into play with hiring systems in the tech industry, it could help
explain the smaller percentage of female software engineers that I discussed
earlier. And if fewer female software engineers are being hired, those effects
could trickle into open source communities like GitHub. If someone is rejected
for full-time programming roles, they might come to believe that they are not
as talented, and would therefore be less likely to have the confidence to
contribute to open source projects.
No comments:
Post a Comment