Inside Infra: Chris Lambertus –Part I

Part I of the last of the "Inside Infra" interview series with members of the ASF Infrastructure team features Chris Lambertus, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.

"...The thing that we're fighting against is the safety and longevity of the old technology. For quite some time, our primary concern was that the hardware that was running this, which was 15 years old, was going to fail."

What's your name and how is it pronounced?

My name is Chris Lambertus (“Kris Lamb bert uhss”): it's pronounced exactly how it's spelled.

When and how did you get involved with the ASF?

I've been aware of the ASF probably at least since the inception of the ASF. I've been working in IT for quite a long while, and I've been very familiar with the ASF projects, because I use them daily in my career. I didn't actually get involved with the ASF until a buddy of mine who was working on the CloudStack project mentioned to me that (ASF VP Infrastructure) David Nalley was looking for somebody to do some contract Infra work. Long story short, I talked to David and I tossed in an application. I was eventually hired as a part time contractor.

...So CloudStack, we're talking about 2012, when CloudStack first came into the Apache Incubator, or was it after that?

I've been aware of the ASF probably since HTTPd, since the original Web server came out. I joined the team late 2014.

Explain your role within the Infra team —how did you get here? Were they looking for someone who specializes in something particular?

My understanding was David was really looking for somebody that had a background in production systems engineering and had been doing it for a long time in a production environment. That's something that I had been doing since 1992: I've been essentially a professional production systems administrator. I knew that skill set was definitely in line with what David was looking for. I think I brought that to the table pretty well. That's basically what I've been doing ever since as a contractor.

What are you responsible for specifically?

That's a complex question, because the ASF Infra sysadmins are essentially responsible for everything. All of us are all responsible for all of the things. We do tend to specialize a little bit. My current project is probably reengineering the mail system: it's the largest one I'm working on right now. I do tend to focus a lot of my efforts on backups. Beyond that, I do a lot of JIRA and Confluence work with Gavin (ASF Infra team member Gavin McDonald). But Puppet, configuration management, again, all these things are things that all the Infra guys support.

In past interviews everyone has basically said, “we do everything”. How does it work? There's no hierarchy. Everyone does everything. Do queries come in and everyone jumps on them? Do you have a round-robin way of getting stuff done? How do you manage with so much going on with Infra? How do you cope with that?

“Cope with it” is an accurate term. Each of us has... I don't really want to call it a specialty, but definitely a focus. If some question comes in about the new mail routing or something that I've been specifically working on, that would go into my bucket as a priority. Certain people have history with certain types of projects. Gavin (Infra team member Gavin McDonald), for example, has been heavily involved with the Continuous Integration infrastructure for many, many, many, many years. So, he tends to be the font of knowledge for all things CI-related.

We tend to break things up that way. Some of the team members definitely have skillsets above and beyond general system administration work. Humbedooh (Infra team member Daniel Gruno’s username) is a very skilled programmer. He then ends up owning a lot of the software that Infra has developed and he has developed. So, questions regarding that, and specialty configurations related to software that he's written tend to go into his bucket. Because of the nature of the team, and because of the nature of the time zones that we're all in, the responsibility of dealing with issues follows on whoever is on call, first of all, and then whoever is awake and available, to handle any situation that comes up regardless of who "owns" the technology.

Describe a typical workday for you.

Apache work for me is basically: I wake up in the morning, hopefully not at 3:00 in the morning, but get out of bed and plop down in front of the computer. Essentially, my lifestyle is I've always been a computer guy. I've always been really focused on computer system administration, not only as my work but also as a hobby. So, I spend the vast majority of my day behind the computer, whether I'm working on Apache stuff or working on other projects, things like that. That'll go on until 11:00 at night. So, my "workday" is essentially me living my life and doing tasks as they arrive and doing projects as necessary and getting things done that need to get done.

...All the things.

Regardless of the time of day, yeah.

How do you keep your workload organized? Folks have all sorts of different systems. Are you an Evernote type of person, or do you keep your own journal? Do you have a certain system to help manage your workload?

Jira is the primary basis for managing my workload with the ASF. We've done a lot of work in terms of building technologies around Jira. Our service level agreement reporting tools I find extremely useful for seeing what's in the queue, what needs to be done, what hasn't been touched in a while, things like that. That really drives a lot of my day-to-day efforts in terms of replying to tickets and servicing customers.

In addition to that, I also use Jira to track my projects. So, if I have a project going on, that's usually a Jira ticket. And then I can go back and refer to those and see where things are, what needs to be done. I've never been a big one for lists of notes. I do have notes that I keep, but by and large, the things that are on the top of my stack maintain on the top of my brain at the same time. So, I don't feel like I forget a lot of things, but I don't take a lot of notes, which is what it is.

…And then there's people who have everything except the monitors covered in Post-it's.

I don't do a Post-it thing, but I have little text files everywhere with notes and things in them.

So, you all have day-to-day tasks that you manage, as well as things that require your immediate attention, as well as long term projects. In my earlier interviews with other Infra team members, everyone's been saying that I have to talk to you, because you're handling “The Email Project”. For those who aren't aware, standard operating procedure at the ASF is “if it didn't happen on-list, it didn't happen”. So, you have, if I'm understanding this correctly, 21 years’ worth of email archives that you're working on. What's going on with this project? What are you handling? Why is it so important?

Well, as you know, email is the lifeblood of the Foundation. Everything that happens here happens on a list. Because of that, the Foundation has amassed a very large quantity of email archives. Those archives are fundamental to the provenance of the Foundation. So, maintaining those and keeping those safe and available is really a top goal of the Infra team.

The mail project, such as it is, is essentially to upgrade and migrate our existing legacy email system to a modern, more supported system. The current email system as it stands was engineered by folks, volunteers, some staffers, I would guess, over 10 years ago, maybe 15 years ago, running on FreeBSD, which we don't really use too much anymore. Actually, we don't really use it at all. They used technologies that were interesting at the time, but are perhaps not so well supported today. So, a lot of it is modernization.

A lot of it is taking a lot of that old tribal knowledge that really doesn't exist anymore and bringing it into the modern era, documenting all the weird little settings that we have and all the edge cases that we manage in email, management of the list systems, mailing lists and their configuration, and making sure that gets upgraded, migrated, modernized. Doing that all in such a way that we don't a) lose anything, or b) suffer any downtime. So, it's a large project. That's really what I've been working on probably for the better part of the last two years, bringing that up to the present era.

You’re like the Titan Atlas: carrying the heavens on your shoulders. That's a massive, massive undertaking. Is there like a deadline for this —where's the end for this project? Is it never ending?

I feel more like Sisyphus than Atlas, but the deadline is as soon as possible. The thing that we're fighting against is the safety and longevity of the old technology. For quite some time, our primary concern was that the hardware that was running the old email system, which was 15 years old, was going to fail. In fact, it did. But fortunately, I basically copied the whole thing off to a separate colocation facility. So, we had an archive of it when it went down, and I was able to bring it all back up.

So, that wasn't a problem. I mean, it was a problem, but it wasn't a disaster as it could have been. So, the deadline is as soon as possible. But in reality, it's going to work until it stops working. I'm not sure how to better state that, because the technology is so old and we really need to get off of it and onto new technology. But there's no hard and fast timeline. Nobody's really cattle prodding me to get it done, but it's the absolute top priority that I have.

...That was actually my follow-up question. Is the “as soon as possible” official, or is this something you're setting for yourself because you just want to get it done?

Oh, that's definitely an official timeline. Yeah.

...I remember our first email servers were a machine under Brian Behlendorf’s desk at the Wired offices. So, we've come a long way since then.

We have, yes.

...You're handling this behemoth. Are you also dealing with the day-to-day putting out the fires, as well as everybody else?

Absolutely, yes.

The volume and scale of this project seems so huge. Again, the word 'cope' keeps coming to mind, because knowing what I know —and I don't even know— it's just scratching the tip of the iceberg: it seems astronomical in terms of scale and scope. Are you building everything from scratch for this project? Are you using any kind of commercial packages? This is a huge overhaul. Tell us more about it.

Multitasking has been in my blood for my entire life. I don't typically have a problem of splitting my time and my attention and my energies between multiple projects. You are absolutely right: this is a titanic project. It's one of the reasons why it's taken so long. Like I said, we've been working on this for several years at this point. The reason it's taken so long is twofold: One is I can't spend 100% of my attention on it or else I would go absolutely crazy. So I partition that. I partition my mind and my time, if you will. Just a little bit of time here working on this, working on this particular aspect of it, then I'll go work on some tickets. So, I'll go work on something else. If I was only working on the mail, then other things wouldn't get done, right?

I have to partition it that way. I think the main way I've tackled this type of project... Again, my experience in system administration going back so far, I've worked on a lot of very large scale projects. So, this is in the middle in terms of the scale. But the biggest thing is to break it down into multiple components as small a component as you really can. The first thing to do is to analyze the existing system. "What is it? How is it running? How is it tied together? How are these things all related? Where are the pieces? Where are the tendrils? How far do they go?" “Write that down.”

I started developing documentation that explained a lot of stuff. There was some documentation that existed. I take that and I carry it forward then into the new system. Okay, "what things do I want to keep? What things do I HAVE to keep? What things are legacy? What things don't we use anymore?" That process of discovery, of understanding how it was built, why it was built and what we're still using, and what we don't need to use anymore, is probably the vast majority of the work--just to understand it. Once that's done, we say, "Can we use the old technology, or do we need to use a different technology?"

In the case of the Foundation, we're extremely tied to the way that ezmlm, our mailing list system, works. ezmlm is extremely tied to Qmail. So, converting those into other tools, basically, I'll say, it's too complicated. With the amount of data that we have and the amount of dependence that we have on those configurations, migrating it to a different system would be incredibly difficult. So, what we've done is there are modern versions (and updates for) these pieces of software, ezmlm and Qmail.

What we've done is I've taken those packages and I built them for modern operating systems. I've patched them with current technology, TLS and various modern email stuff, and put that into configuration management and built a system that deploys all those packages in a reproducible fashion. So, at any time, I can just turn on a new machine. I could type in, "This machine is the new mail router," and run Puppet, our configuration management software, on it. It'll deploy all that software automatically.

That's probably the second part of this huge phase of developing this. The phase that we're in right now is testing it to make sure that it works the same way as the old one works. Once that's verified, then we can actually look at migrating the old data onto the new system and deploying it into production. I think that answered your question.

I think so, but it made me think of another question: How did this wind up being "your" project? Was this assigned to you? Did you jump on it going, "Yeah, I'm taking it"? How did you wind up with this?

That is a very good question. I don't really know. I think probably just because I had been working with... Back in, 2015 maybe, we were actually having this exact same discussion: "what do we need to do to migrate this EZMLM, all these mail archives, all this stuff to a new modern system?"

One of the things that we looked at was, "Can we transfer this? Can we translate this to something like Mailman or some newer type of mailing list management system?" We looked at a couple of options. The biggest problem we had was that the archivers were terrible. So, Humbedooh basically ended up writing this thing that became Pony Mail as the answer to that system. Ultimately, that turned out to be a great effort. I think it's going to take us a long way. But in the end, I was the one to continue to work on the email system. For whatever reason, I guess it just became my thing. Maybe because I was the only one willing to do it. I don't know.

...Is the legacy system going to be powered by Apache Pony Mail (incubating) at some point, or is it already in the process?

So yeah, lists.apache.org is our primary advertised archive system. That is what we're telling people to use. In terms of what happens to the old system, that remains a little bit under discussion. I don't know the ultimate disposition of that, but the current plan is lists.apache.org will be the primary access to the mail archives.

I noticed that Pony Mail goes back quite a bit, but it didn't originally go back as far as it does now in terms of the archives. I’m curious to see if everything eventually is going to be migrated to it.

Yes, yes, we actually have a plan to load the previous archives in there. We loaded a subset when we first started it up. I believe they go back to 2012 right now. So yes, we do have a plan to load the previous archives.

Great. I understand some Apache projects and their communities are always asking for new services. How does Infra decide which products you support? Who gets assigned to take the lead on introducing new services or new products? I understand that you develop your own custom solutions as well. How do these get divvied up? Is everything in queue? How does it get done?

When you're talking about a project requesting a service, I think the first thing we look at is, "Is this service extremely specific to this one project, or is it something that has broad appeal to the Foundation?" If it has broad appeal to the Foundation, we've got multiple requests for it, it's a service that we feel we can provide, given the amount of time that we have available, then it's something that we would consider doing.

Obviously, there's a lot of other thought that goes into that in terms of what it is, what it does, what it needs to do, who needs access to it, that we have to evaluate. But generally, if it has broad appeal to the Foundation, it would be something we would look into. If it doesn't, if it's something that's very specific to a certain project, what we typically recommend is that a project request their own VM. They can run the service themselves. That's typically how we’ve approached that in the past.

Has the team been in a situation where you're like, "Hey, this is a really cool thing, let's bring it in," and then throw it on projects or see if anybody wants to do it? Does the converse happen also, where you guys have insight as to something that's hot and new and you think that would be a great fit for Infra, but you have to find a "problem" to connect it to; or is that not something that you deal with? Is your work all reactive, or do you ever come into a situation where you say, "Long-term planning: we want to introduce something brand new"?

I think probably up until maybe five years ago, the work was almost entirely reactive. But the team and the processes that David (Nalley) now put together have really pushed us more in a direction of future planning, of taking the time and taking the mindset of, "What can we do long term to better support projects?" I think selfserve.apache.org is a great example of that. That's something that grew out of a small subset of tools. We got very positive feedback from Committers and Projects about using selfserve.apache.org.

That tool has grown extensively since it was developed. I think one of the best things that we provided recently is the .asf.yaml system, which allows projects to essentially set up 90% of their project metadata in GitHub. It lets them set labels. It lets them set notifications. It lets them set all kinds of things, all self-service. So, it's taken a huge load off of Infra in terms of responding to tickets, and also put a lot of that control in the hands of the projects. That's been incredibly well received. It's definitely, I think, one of the best things that we've done for projects in a while. I think it's a fantastic tool.

That's great. Now, it's also a new way of institutionalizing, so to speak, of "scratch your own itch", but in a way that that's a common deployment. You can do your own thing, but there's a common mechanism or method of doing it, because before —it was like the Wild West, back in the '90s— everyone's just doing their own thing. It didn't really matter, but it wouldn't scale properly: you guys can’t really support them because everyone's doing something and it was a one-off. It's interesting to see that selfserve.apache.org has standardized or unified that process.

Yeah, and one of the things that I really like about it too is because we have so many different projects --they're so varied-- the people that work on them are so varied in their skill sets and their desires and their interest level and their skill level and all this. What we want to be able to do is empower projects to use the tooling and take advantage of the skill sets that they have available. So, we don't want to arbitrarily enforce, "Oh, you must use this particular technology," but we also don't want random technologies to proliferate, like you alluded to, the Wild West. So, it's a very refined balance between, "How do you allow projects to do their own thing in a way that's scalable and supportable?" That's a complex task. It's difficult to manage. I think Self-Serve (selfserve.apache.org) goes a long way to support that.

Speaking of Self-Serve and other solutions the team is providing, the strategic process of figuring out where to go —direction— I know you have David (Nalley), I know you have Greg (ASF Infrastructure Administrator Greg Stein). Does the entire team participate in this? How does this work: is it top-down, or is it bottom-up? Are you guys saying, "Hey, there's a new thing that we should do"? I presume you don't have an annual strategy, but rather an ongoing rolling process; how do strategic decisions get made?

It's a collaborative effort, for sure. I think we do have an annual —when we get together at ApacheCon, we do tend to have a lot of discussions about strategy and about future direction. That's one of the things that we try to do as Infra with our team meetups, and with ApacheCon as well, to get together in person in a room and talk about where we're going to go, what we want to do. I say the process is collaborative, because sometimes it comes from the direction of Greg, or the Board, or David, or whoever. Sometimes it comes from a staffer saying, "Hey, it'd be cool if we could do this."

Sometimes it comes from Projects, or Committers, and they say, "Hey, can we go in this direction? I think it would be useful for X reasons." It just depends. By and large, the decisions for a future strategy are brought up by whoever thinks of it and are discussed within the team at a peer-to-peer level, right? We have very few situations where Greg or David or somebody will come down and say, "Thou shalt do it this way." Yeah, very uncommon to have that happen. It's a very collaborative environment, which I appreciate and works well for me.

So, in light of the pandemic, you guys didn't have your face-to-face. Did you do a virtual annual meeting? Or did it just not happen?

Well, we have a weekly team meeting. Yeah, we didn't do any virtual thing beyond that.

With so many projects at the ASF now, with 350 projects and initiatives and growing and so few of you in Infra, you must be constantly learning new things. How do you keep abreast of what's new? How do you close your skills gap? How do you stay ahead of everything?

I follow a few mailing lists, discussion boards, Reddit, and other similar sources. I typically learn new things when I need to implement new technology to solve a problem. "How do we provide 'X'?" I’ll go research it and learn that way. I also find out about new things from my hobby projects or other work.

...It's not like "I want to take Blah University to become certified in X" or anything like that, right? I mean, you’d do that from your own interest, but it's not something that's required of the job unless it comes up, right?

No, that's never been required of the job. Personally, I'm very much a self-directed learner. If I'm interested in something, I will absolutely seek out the resources to do so. I will say that there's not a lot of time for that stuff, at least not for me. I got a lot going on, right? So, having the time to sit down and take a class or go through that process, I find very difficult. I don't really learn that way very well either. So, class-based learning has never been for me.

...Not linear. Yeah.

Yeah. So, typically, if I want to learn something new —I've been trying to learn Python, because it's definitely a gap for me— I find it incredibly difficult, because it's very hard for me to sit down and watch a video on programming, right? I got to have a reason. I got to have a thing to do. I need to have a project that requires it. And then I go and I figure it out.

...Got it. So, it's purpose-driven education. You need an end result.

Yeah, exactly. That's how I've always operated.

Inside Infra: Chris Lambertus --Part I

[END OF PART I]