2015 was a rough year. The explanation behind why it was so rough deserves its own post, but, for now, I was extremely happy to have a eleven days of “vacation” on the calendar (despite not being completely disconnected from work things all of those days).

The majority of time was spent in Pennsylvania at my parent’s house. I drove up for the normal Christmas festivities and planned to stay until my sister’s 30th birthday on New Year’s day. After high school, I really didn’t keep in touch with more than a few people from the area (this number is shrinking all the time). As such, I saw a big chunk of time which I knew I wouldn’t have anything to do. To me, more than a few days of “nothing” is more stressful than it is relaxing, so I made a plan to try to catch up on lots of open source work that I had been ignoring.

I often don’t talk about the open source work I do (aside from a tweet), but I thought since I was devoting a big chunk of time (about 3 days), it would be good to actually talk about what I worked on and keep me on track while I work.

Disclaimer: I separate my “dayjob” work from my “open source” work. While I primarily write for open source projects at my dayjob, not all of my open source contributions are directly relevant to my current tasks. As such, I don’t view any of this as “work”. This is a hobby and something I enjoy. I call it “work”, but I don’t think of it like my paying “work”.

Apache Accumulo

Apache Accumulo is (probably?) the open source project I’ve been contributing to longest. As such, it’s very easy for me to fly by, picking up lots of fixes very quickly.

Documentation updates

One of the best things you can do for an open source project is to make sure that commonly asked questions are written down in the official documentation. Accumulo has a user manual which is the official reference material for the project. A user had recently asked how to run multiple TabletServers (Accumulo’s per-node process) over one of the project’s mailing lists. I closed ACCUMULO-4072 after writing a new section in the user manual which covers the considerations in running multiple TabletServers on one host, the configuration changes required, and steps to start and stop the other processes.

Release preparation

Another important thing for an open source software project to focus on is making new releases. Accumulo currently has 3 release lines that we port changes to: 1.6, 1.7 and 1.8. 1.6 and 1.7 are our maintenance release lines, while 1.8 is new development work. We also use JIRA to manage our changes, but this requires some effort when deciding what needs to be fixed before a release is made. I went through the open issues for the next releases on 1.6 and 1.7, 1.6.5 and 1.7.1 respectively, and triaged which issues actually should be completed and which issues should be pushed to the next release.

Low-hanging fixes

As I mentioned earlier, it’s very easy for me to come along and pick up lots of little fixes to Accumulo in one swoop. From my issue triaging, I found 8 issues that I’d easily be able to knock out.

  • ACCUMULO-4081 A simple bugfix backport for a concurrency performance issue.
  • ACCUMULO-4082 A simple bugfix backport for another concurrency performance issue.
  • ACCUMULO-3254 Javadoc improvements to Accumulo table properties.
  • ACCUMULO-4036 Removed verbose/unnecessary logging.
  • ACCUMULO-4064 Include version information on startup.
  • ACCUMULO-4094 Documentation on error handling in the Accumulo BatchWriter.
  • ACCUMULO-3274 Avoid some excessive toString()’ing.
  • ACCUMULO-4056 Update a dependency to avoid shipping a vulnerability.

Apache Yetus

Apache Yetus, in their own words, is a collection of libraries and tools that enable contribution and release processes for software projects. In other, they make it super easy to automate the testing to run over contributions from new developers to a project.

In YETUS-263, I contributed a patch which includes a personality for Apache Accumulo. A Yetus personality defines the tasks that should be run over some set of changes. This lets us define things like:

  • Code style verification
  • The automated tests to run
  • Other static analysis tools (e.g. findbugs)

Hopefully, this personality will help Accumulo get to the point where we can easily wire up automated contribution testing which should lessen the amount of effort the developers need to exert to apply user contributions.

Web hosting

I own the (virtual) machine which this blog runs on. I usually enjoy running my own server, as it lets me explore and learn a bunch of new things along the way. Of course, this often results in me finding things broken for months on end.

Jenkins init.d script

One of the big reasons I run my own machine is that I can set it up to do automated builds for open source projects I regularly contribute to. This helps give back to the community in some cases (more tests being run more frequently) and can also offload my own necessary tests from my work machine.

I find that my Jenkins instance likes to die for some reason every now and again. Sadly, the stdout and stderr for the Java process wasn’t being redirected to files which means that I lost the reason why Jenkins crashed. Redirecting this output should help me tweak the process in the future to prevent it dying again.

Monit updates

In looking at Jenkins being dead, I also had to wonder why my Monit instance didn’t alert me or restart it on its own. Somehow, the alert was turned off. So, that was an easy fix through Monit’s HTTP interface.

Apache Slider

Apache Slider is a YARN application designed to make running application on YARN a bit easier. Slider has the notion of “app-packages” which define how some other application should be run on YARN, the configuration properties exposed, and lots of other features. Some app-packages already provided include Apache Accumulo, Apache HBase and Apache Storm.

Apache Tomcat App-Package

In March 2015, I started work on an app-package for Apache Tomcat. Tomcat lends itself well to Slider’s model because HTTP applications tend to follow the REST model in which instances of Tomcat can be dynamically managed instead of statically managed. By December 2015, I needed to get this code finished and committed.

One big feature I needed to add was the ability for users to specify WAR file(s) when creating a Slider application using the Tomcat app-package. This lets Slider ship a single Tomcat app-package and users can define what web applications to run in their Slider app via configuration only. Thankfully, having HDFS as a storage mechanism and YARN’s resource localization support made this extremely simple. Users add a new configuration property which specifies an HDFS URI to a file, Slider will tell YARN to localize that HDFS to local disk when it creates the YARN container, and then Slider makes sure the WAR file is included in the local Tomcat installation.

Along the way, I also found a few other issues with Slider that bothered me, mostly related to the Slider web-UI.

  • SLIDER-1040 General HTML/CSS formatting issues
  • SLIDER-1041 Application exports aren’t included in export list endpoint

I’m really excited that I finally was able to push this feature in as it was something I did solely on my own time (even though my initial Slider interactions where dayjob related). Tomcat on YARN was something that was commonly asked for my users, but we only ever had a “sorry, we’re working on it” answer to give. But, now that’s all changed!

One missing piece in dynamically deploying web applications on YARN is how clients find and use these HTTP servers. While this is something we could solve at the Slider “level”, it makes more sense to work on this at the YARN “level” instead. It’s a bit of a copout, but it did let me commit the code I have today while (hopefully) contributing as the necessary improvements are made in YARN. Look for this new feature in Apache Slider 0.91!

After it’s all done

I’m really glad I forced myself to write this all down. It’s too easy to trivialize the work you do on your own time. This creates a cycle of negativity for me where I feel bad that I don’t work on my side projects, but I get discouraged from working on them because I don’t feel like I am making “enough” progress. In the end, this was helpful for me and that’s great. I hope others can also benefit from this list, serving as some motivation to write some more code too.

Final Winter Break 2015 Tally:

  • 3 different open source projects.
  • Contributions spanned bug fixes, new features and documentation.
  • Work included “sysadmin” and development work.