Success at Apache: Security in Practice
by Jarek Potiuk
This post is about the Apache Software Foundation's Security process and security mindset of the Apache Software project’s PMC put to the best use in practice. From this post you can learn why security practices we apply at our projects are important and how they work when they are applied correctly and when the right security-driven mindset is applied by the PMCs but also how important it is for the users of the Apache Software Foundation projects to keep their software updated - including latest security fixes.
The idea of this article was triggered by a recent blog post of the security researcher Ian Caroll that has earned USD 13.000 on bug bounties by simply following up the results of Apache Security process applied by the Apache Airflow PMC. This saved quite a few businesses a lot of trouble, but it was only possible due to the foundations laid down by the ASF and the PMC of the project.
Here is what Ian Caroll has to say about it: “This issue was a great example of how ASF's transparent way of fixing and disclosing vulnerabilities worked to protect users of their software, and gave many organizations a wake-up call on ensuring they upgrade and protect their open-source software.”
Apache Airflow is one of the most common orchestration software used in the industry currently, and due to its nature, it sounds like an important vector of attack - if you run it internally in your company, you are likely to interact with pretty much all your systems, and if you manage to break in through Airflow, it might cascade into as many systems you connect to. Therefore the Apache Airflow PMC takes security very seriously. So seriously that we have the whole discussion panel about Apache Airflow Security at the Airflow Summit that is coming soon - July 8-16th.
This post's main point is to show how important it is to follow the security best practices for all the software lifecycle and how important it is to think about it at every step of building and releasing the software (and beyond).
Let's start from the very beginning: making sure the code development process is secure. Like most of the ASF projects, the Apache Airflow project is developed in GitHub and together with a growing number of projects we use GitHub Actions to run continuous integration. There are a number of best practices and security hardening practices published by Github that you should follow when you run your CI with GitHub Actions, and we rigorously follow them, including monitoring of the "Security blog of GitHub" and following it’s advisories.
And we have not stopped there. We actively think and discuss the potential security threats and ways how - for example supply chain attacks can be performed on our project, and we share our findings at the discussion mailing lists of the ASF and introducing recommendations for all ASF projects to make use of the best practices. One of the results there is documenting the practices and sharing them at the builds@apache.org. But we also raised a few security issues to GitHub and as a result of that (at least that’s the feedback we got from GitHub) they implemented some improvements that we apply in practice. The recent example of that is a change implemented by GitHub to allow control of permissions of the GitHub Token used during the CI build which resulted in this PR. Few months ago, we raised concern that having the blanket "write" permission is quite dangerous, and GitHub responded and implemented the change, which allowed us to limit the scope of tokens used for our builds and increase protection against a wide range of attacks - with the supply-chain attacks being recently the most prominent ones, leading to ransomware threats and millions of dollars paid to hackers.
This is where the security mindset for the Apache Airflow PMC starts with and this lays the foundation for the next steps where the Apache Software Foundation takes a crucial role in - releasing the software and monitoring for security vulnerabilities. The ASF has a rather well established process for disclosing and following up with security vulnerabilities for the ASF projects. One that is very straightforward and simple to follow for everyone involved - starting from security researchers, who raise those issues, going through the voluntary (!) security team of the ASF that has to handle (from the upcoming annual report) 387 reports of possible vulnerabilities spanned across 95 of the top level ASF projects, which led to 155 CVEs (Common Vulnerabilities and Exposures) assigned, and end up with the PMC that has to handle solving the issues and follow up with reporting. Heck, ASF even introduced an internal portal to report and keep track of all the CVEs as well as report the yearly security summary report and video.
This process is very clear about responsible disclosure and publishing the vulnerabilities, the way how security researchers, the ASF security team and PMC can collaborate when security is discovered. Quite a recent experience there was discovering and announcing CVE-2021-29621: User enumeration in database authentication in Flask-AppBuilder. This issue was reported to the ASF - following the process - by Dolev Farhi he responsibly disclosed it together with proof-of-concept reproducible scenario that allowed us to quickly verify that the issue exists and (more importantly) that allowed us to verify that the issue is fixed when we fixed it.
At the end of the process this is the message we got from Dolev: "Truly enjoyed working with you. Thanks so much for your help in bringing this to closure and making Airflow what it is."
The CVE was an interesting one because it was not an issue with the Airflow code, but it was introduced by a dependency of Airflow - Flask-AppBuilder. Fortunately the process is built in the way that we can involve and collaborate with other projects in solving it, and we got excellent support from Daniel Gaspar. We tried and tested the fix locally, provided it to Daniel which let Daniel quickly implement it and release a new version of Flask AppBuilder fixing it. This was also important for the Apache Superset project (Daniel is a PMC there as well) which also uses Flask-AppBuilder and suffered from the same vulnerability. This shows how security is a distributed issue and how much cooperation is important and how much a good security process should embrace it. I truly enjoyed cooperation with Daniel, and Dolev as we helped to test release candidate of Flask AppBuilder. Later on, when the CVE was published, we announced it following the regular announcement process.
Here is what Daniel has to say about it: "A great example of multiple open source projects working together, elevating each other to higher quality. The whole is greater than the sum of the parts. Got a clear report with a proposed fix, reproducible steps all backed by the ASF security process, it was a breeze to fix and release."
This leads to the most important point. We can do only as much as we can when it comes to developing and releasing our software. But then it’s up to our users to upgrade to the latest versions. If they don’t, they remain vulnerable. This was the actual reason for the blog post I mentioned initially - despite announcing a CVE-2020-17526 and releasing a fixed version a long time ago, many of our users did not follow the announcements and did not upgrade to the latest version of Airflow. I must stress here the importance of this step - as long as our users do not upgrade to fixed versions, there is not much we can do to help them. It's all in our users' hands! This time it ended up with just USD 13.000 paid to Ian in the form of bounties, because Ian is a responsible security researcher (so called "white hat"). But imagine some bad characters doing the same thing Ian did.
Of course we understand that this might sometimes be difficult to migrate to newer versions of a software, but here we also have another solution that we applied last year, and one that might seem surprising at first, but makes perfect sense when you look at the consequences. Consistent versioning and release support predictability. When we announced Airflow 2.0 last year, there was a small but important change we introduced - full support for Semantic Versioning which we follow rigorously since. We also published a predictable version lifecycle. Why is this important ? Because the users might be pretty sure that they can safely upgrade “patchlevel” version of Airflow when it gets released without even thinking about potential migration problems. Also when you release the "feature" - minor version of Airflow, we promise it is backwards-compatible and even if the migration process might be a bit longer, they can apply it without worrying about spending a lot of time for the migration of their DAGs (DAGs are the users workflow definitions that some of our customers have many thousands of as their entire data processing is orchestrated by Airflow).
We also publish (and will continue to) the support schedule for our major releases, so that the users can be prepared and plan migration to new major releases in advance. As with all software we sometimes will implement backwards-incompatible changes which will cause our users to spend more time on migrations. Those old releases will stop receiving security fixes at some date and the best you can do as a user is to migrate to the supported version before the date!
Which leads to the last and most important point in this article. If you are a diligent reader and look at the announcement I mentioned above for CVE-2021-29621, you will see that the fix for that is only released for Airflow 2 series. Why? Because Airflow 1.10 just reached its end-of-life on June 17th 2021. When we released Airflow 2, half a year ago, we agreed in the community that we will only support Airflow 1.10 with critical/security fixes for 6 months. And we did - for example the CVE-2020-17526 has been addressed in the Airflow 1.10.14.
But this time is over now. This is the first security vulnerability that we addressed only for Airflow 2. If you are still using Airflow 1.10 - you are on your own now. You are no longer protected by the security process of the ASF, the security team of ASF and airflow PMC. What’s even more - security researchers who raise the issues, even if they find it, might not be eager to responsibly disclose it, knowing also that the issue will not be fixed anyway. When you read about the next ransomware attack and millions of dollars paid, think if you would like one day your company to face this kind of dilemma. Even if it costs time and money to keep your software updated, preventing this kind of problem is far cheaper than dealing with the consequences of such an attack.
Upgrade NOW! to the latest release of Airflow 2 and keep on doing it for the future releases!
Be sure to join us at Airflow Summit online 8-16 July https://airflowsummit.org/ --registration is free and open to all.
# # #
Jarek Potiuk started to work on the Apache Airflow project in September 2018. He became an Apache Airflow committer in April 2019 and a member of the Apache Airflow Project Management Committee (PMC) in October 2019. He was elected an ASF Member in April 2021. He is an Apache project mentor in Outreachy and Google Summer of Code and was a mentor in Google Season of Docs. Jarek is an independent Open Source Contributor and Advisor and always keen on making it easier for people with different backgrounds to join OSS projects.
= = =
"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache