In the wake of events following the Give Local America Day calamity that saw the entire program grind to a halt at mid-day and leave thousands of donors and nonprofits stranded, there is no shortage of commentary. Details surrounding why the web based payment platform collapsed are scant and the online fundraising and crowdfunding platform at the center of the storm, Kimbia, continues to attract a considerable amount of ire.
82As a technology provider that specializes in performing arts organizations, a worst-case scenario like this is always a concern. Working with clients via many of the same considerations surrounding the Give Local America campaign is a routine aspect of that work; as such, I wanted to offer up a bit of insider perspective to the chorus of non-tech based discussion surrounding this incident.
Although Kimbia has yet to release a detailed post-mortem, they have provided enough information to get a good idea of what happened. Keep in mind, the views presented here are not in any way definitive but they should help provide a better understanding of how these issues are connected.
Per Kimbia’s press release, here are the three major points from their technical overview of the shutdown in service:
- Kimbia: Removed the affected hardware from service.
In English: One or more components in the data center that houses the machines responsible for handling traffic failed. It is unknown if this was simply bad timing or the result of something triggered by a software based action. - Kimbia: Reduced leaderboard functionality and focused solely on the ability to serve donation forms
In English: The leaderboard was likely causing a larger than anticipated server resource strain, which could have contributed to the hardware failure. In turn, disabling that functionality would reduce server load and free up those resources for critical payment processing action (which is what they mean by serving up donation forms). Why that leaderboard may have caused so much strain is as of yet, unknown. - Kimbia: Implemented measures to reduce other potential risks.
- In English: In all likelihood, they likely threw a combination of additional software and hardware related solutions at the problem. Likely suspects here include reducing as many functions as possible contributing to the overall server load but weren’t necessary for payment processing and moving the entire platform to a server capable of handling the increased server load.
I am particularly anxious to see if Kimbia will release any technical details surrounding the issues, especially those related to server load and apache requests. Moreover, it will be fascinating to see if any new site functionality introduced for the 2016 campaign tied to donation processing but not included in typical load testing may have produced higher than anticipated server loads.
It Isn’t Likely A Problem With Scaling, Capacity, Or Nonprofits Not Being Tall Enough To Ride The Giving Day Coaster
The part you want to be prepared to wrap your head around is even though capacity (i.e. web traffic) likely triggered the failure, it wasn’t the cause.
As such, much of the online discussion I’ve come across on this topic, including a thoughtful post on 5/8/16 from Beth Kanter, tend to head down rabbit holes that may inadvertently lead away from cause-centric solutions.
Simply put, the problems experienced during the May 5th episode are probably easier to avoid than not. My hunch is the trouble stems not from a lack of necessary hardware, or inadequate load testing, but more with inefficient platform design.
And when compared to something like the discussions going on about ecommerce scaling, this is a much simpler nut to crack.
Time, and Kimbia’s transparency, will tell.
But until we cross that threshold, the notion that a large, nation-wide giving event is by its very nature prohibitive or out of reach for the nonprofit sector is ultimately self-defeating. Likewise, considering alternatives to a shortcoming that has too many unknowns may be therapeutic, but it could inadvertently do more harm than good if it goes too far.
How Did You Handle The Shutdown?
Many groups, including most of my clients participating in the event, did a good job at leveraging social media and direct email marketing to redirect donors to their existing donation platforms. And yes, I will be the first to admit feeling a twinge of anxiety driven tightness in my chest but thankfully, everything operated as expected.
Did your organization participate in Give Local America Day? If so, what did you do as a result of the shutdown (other than grow a few extra gray hairs)?
What you think a causecentric solution might be?
Hi Beth, that’s where I’m in the “work in progress” camp. Assuming Kimbia releases enough technical details about the shutdown, those solutions should be fairly straightforward.
My go-to hypothesis at this point still focuses on the “inefficient platform design” direction (over-designed/under-engineered). If that ends up being the case, the good news is it removes the hardware capacity variables related to scaling up for spike level traffic from the list of concerns. Consequently, the focus would shift toward a tighter development process geared toward keeping processes that contribute to aggregated server load under control.