Previous Post
Next Post

 

By Christopher Cicchitelli

As a fervent Second Amendment supporter and the founder of a technology company, I’ve followed the ATF eForms story with great personal interest. It showed great promise at first, cutting wait times down to under 60 days…and then it imploded. Exactly what happened was a question we’ve all asked, and until reading the ATFs announcement yesterday I didn’t really have a good idea. However, in that announcement I believe the ATF tipped us off to the key that unravels the mystery . . .

First, some full disclosure: prior to starting my company, CastleOS, I worked as a contractor to a government R&D lab where, among other responsibilities, I designed integrated systems and managed servers for multiple civilian and military agencies.

Now for some key facts we know: despite the ATFs initial claims otherwise, the eForms system was unusable even when batch processing was not taking place. Even if we presume Silencer Shop initiated batch processes outside the 4-5 AM window they claim, I find it hard to believe they did it all day every day.

In addition, in today’s ATF announcement, they claimed there are “memory allocation errors” within the system. In order to keep it usable for all, it needs to be rebooted multiple times a day, taking up to an hour for each reboot. That’s key, and I’ll come back to it later.

In addition to these facts, there are several questions we are all asking. How was batch processing ongoing if the system wasn’t built for it? Why does the system need to be rebooted every few hours (this is 2014 after all!)? Whose fault is it really?

After doing some research, I think I have narrowed down the realm of possibilities for the answers to those questions, so here goes. One of the first things I noticed when visiting the eForms website (well, after I got to the website to load – it took a few minutes) was the technology it’s built upon: JavaServer Pages. To say JSP isn’t exactly the most popular platform these days would be an understatement. While still around for legacy reasons, you don’t often seen it used for new projects without a specific reason – a reason I don’t see the ATF having. From my own experience as a contractor, I wonder if this was a matter of the contracting company having available labor in this speciality, and thus pushing the platform on the ATF. If so, the ATF contracting officer would certainly deserve some blame for this as well.

The other immediate observation was the speed of the website loading up. It repeatedly took about five minutes to reach the login screen at around 7pm EST. What this shows is a faulty system architecture design. A proper design would separate all tiers, including at the servers themselves, into the following three categories: user interface, data broker, and data warehouse.

There’s no doubt that there is a heavy amount of processing of data going on in this system, and it’s slowing the whole system down – so there isn’t a proper separation of tiers in this design. The fact that the data warehousing can slow down the most basic parts of the user interface – the website itself and login page – is a major failure. So much so that for the millions of dollars spent on this website, I’d argue the government would have recourse to recoup (at least some of) the money it laid out.

Next is the question of how batch processing was happening if it wasn’t designed for it. The ATF has continually sounded the horn about the batch processing, but the truth is it’s misdirection. Whether the eForms system has a batch option or whether the Silencer Shop is using a custom method — possibly something as simple as a macro — to automate the process is 100% irrelevant. The reason is for all intents and purposes, a batch process just simulates the effect of multiple stores logging in at the same time. What we are seeing is a system that goes down with very little load, possible just a few dozen simultaneous users, or one user uploading a few dozen forms a night, and I think the reason lies in the ATF’s “memory allocation error”.

Traditionally, a memory allocation error is when data is corrupted or misplaced on actual memory. I don’t think the ATF has an actual error with the server memory itself – using JavaServer Pages should prevent that – but rather is PR spin for a more complicated resource allocation error.

What I think is happening is as forms are entered into the system, they enter a bureaucratically-inflated workflow. In that workflow, the system is probably generating loads of actual documents in addition to lots of database entries and so forth. (After all, do you really expect the ATF not to keep a backup paper trail? Pfff.) In other words, each application isn’t as simple as the typical web form we are used to using, and it does require some real horsepower to process it from start to finish.

Now normally that’s not an issue, each server has its maximum number of users, and you add servers as needed. The government even has a gov-only cloud it can use to deploy servers on demand – so keeping up with demand shouldn’t require more than a few minutes to boot up a new instance. However with the eForms system, the opposite seems to be true, and they can never bring enough servers online to keep up with demand (they tried at the beginning). I believe the reason is because load isn’t the issue, batch or otherwise, but a fundamental flaw in the design of the system: applications are gobbling up resources as they are processed, and then when complete, not releasing those resources.

The proof of that is the reboots every few hours to clear the “memory allocation errors” – if the code isn’t releasing the resource, you need to reboot. (Why it takes an hour isn’t clear – but that may have more to do with their attempts jury rig fixes into the system, and/or having to bring systems online in a certain order.)

I also believe that in the beginning, before too many people were using the eForms system, this same flaw was present, but was being covered up by nightly reboots. Once the load increased, the problem literally spread exponentially, and now the system can only run for roughly 4 hours until all the servers run out of resources.

The saddest thing about this for me, is not that the system failed, I’ve seen that far too many times before in government. It’s that so many millions were spent to bring a system online with such a fundamental flaw. I’d give the Obamacare website team a break long before I’d cut ATF any slack for this fiasco.

Previous Post
Next Post

55 COMMENTS

  1. Harry Reid claims that the Koch brothers are to blame, but Joe Biden says it was global warming that effed-up the BATFE’s computers. Personally, I think it’s just government incompetence at work. But I could be wrong.

    • The government is notoriously bad at technology. They pick a vendor and cross their fingers most of the time. And they overpay like crazy (even more than usual) for outdated technology.

  2. So then, what the ATF needs to do is to load their software onto the NSA servers. That would be good for all of us…..

    • If they did that, all you would have to do is call your lgs and tell them you were interested in a silencer. The stamp would arrive the next day and the $200 would already be debited from your account.

    • The incompetence of the system is the shield and the fog used by the few in real power to reach their actual ends.

  3. There are many gun owners who are also software developers — we should create a system and donate it to the ATF. It would be to our benefit as well as theirs.

    • Why would we want to assist an unconstitutional agency perform unconstitutional acts on patriots?

      What they need to do is stop the uploading forms nonsense, and allow people to simply fill out the forms online. They could streamline it if they really wanted to, and if they hired a half-way competent contractor to do the work, it wouldn’t cost that much either. Three experienced .Net coders could whip a site like that out in a month or so.

      If the site is so busy that they can’t process stuff, they could buffer data and have a backgrodnd task update the database.

    • Honestly, there are some weird laws about that which prohibits free work. I’m not a lawyer, so check out the CFR or talk to one. It gets complex.

      • This is correct, the government cannot accept free work. But that doesn’t stop one from charging a fair price and skipping the 1000% markup usually attached to these kinds of projects 🙂

        • Gotta pay for that cube and health care premium. And the blended rates need to be reasonable market rates, too. You can’t undercut competition too obviously otherwise you run into gifting, bribery, or other issues. Part of the scoring is based upon cost.

  4. I’m still trying to wrap my head around why the NFA has not been modernized to just use the NICS system and charge the tax at the point of sale… You’d think with the demand for money that our government has, the millions that would flow in from the extra tax revenue would be a motivator.

    • I’m still trying to wrap my head around the NFA in the first place.

      So we start with Prohibition, which produces a new class of criminal: the bootlegger. This class of criminal likes to use SBRs, SBSs, and full-auto weapons. Iconic weapons such as Thompsons, sawed-off shotguns, and the BAR come to mind.

      I still have no idea why suppressors, which are safety equipment, are in the purview of the NFA.

      Another problem is that bootleggers routinely soup up their cars to outrun the police.

      So the police can’t stop them from producing alcohol OR from transporting it, but they sure can crack down on the guns they like to use!

      The NFA is a government solution to a problem that was created by the government and has yet to be repealed. I say we work on that.

      • Technically the NFA was passed after prohibition ended. Prohibition ended with the passage of the 21st amendment on Dec. 5, 1933. The NFA was passed on June 26, 1934. A cursory search on google doesn’t give me a date on when it was introduced in congress. However, it was a direct result of prohibition era gangsters. I find it ironic that the government passes a law (amendment) that ends up creating a large black market of people that are violent, then repeals the law (amendment) and passes a different law under the pretense of combating gangsters that they themselves created in the first place.

        In my personal opinion, SBRs, SBSs, and suppressors should absolutely be removed from the NFA. I am conflicted about MG’s. I can at least see an argument for keeping them in that category (although the ’86 law should definitely go away). On one hand, there’s personal freedoms and the fact that the government is incapable of enforcing its laws very effectively. Plus, there’s the fact that you’re effectively banning an item rather than outlawing an action. On the other hand, MG’s are the only category under the NFA (with the possible exception of AOW, which is too vague for me to really comment on) that one could argue makes a firearm more dangerous. If the argument in congress actually came up, I would undoubtedly fall on the side of personal liberty (if nothing else, because the ATF is woefully incompetent), but I can at least see the opposing argument having some level of merit, even if I disagree with it.

        End rant.

        • You quote the EXCUSE used for the NFA. The REASON was that here in the middle of the Great Depression, the repeal of Prohibition was going to turn a whole slew of Federal agents out of work, an excuse had to be found to keep them employed.

          And I agree, we should remember to always, always mention that we want it frickin’ REPEALED!

      • Don’t forget, Dillinger, Machine Gun Kelley, Clyde Barrow, Capone’s bunch etc. got there stuff the old fashioned way, they stole them from armories and police.

  5. I have worked with all sorts of government accounting software. None of it works correctly either. When Obama Care come out, I predicted it would be screwed up. Contractors come in and make all kinds of promises. The true fact is that the government is way to big to collectively put into one system. It is the most mismanaged business in the world.

    • Whoa, you predicted that the Obamacare website would be screwed up? Check out Nostradamus over here!

      Seriously, though, I have yet to see *any* government IT system that works. It’s always outdated (if not straight-up obsolete) technology (i.e. deploying a new JSP application in 2013) slapped together by incompetent contracting firms (if they were any good at their jobs, they wouldn’t be bottom-feeding off government contracts, they’d be making plenty of money in the private market) to meet a specification that is, impossibly, both too specific and too vague at the same time. And it always costs 3-10 times the quoted price. That part is understandable, since the six guys who are still programming in COBOL and Ada don’t come cheap.

  6. I agree messing up memory allocation/deallocation in java is about impossible (yay garbage collection!) So your guess is Create form->create bunch of temporary documents on server drive -> never delete temp documents-> overflow /tmp or /var (or heck just plain old )/ -> reboot fix everything?

    So what is your opinion on PHP? (to fail quiet for my likings)

    • That just about sums it up, and it’s probably not just forms and documents, but even temporary memory objects. I’ve seen entire projects launch without any testing at all, wouldn’t surprise me if that’s what happened here.

      I’m not a fan of PHP for corporate projects (with some exceptions). Although I’m partial to it, I believe corporate IT systems should be written in .NET, and if they’re not, there should be a damn good reason why.

      Also, the federal government is technically a “Microsoft shop”, and has many volume licenses. That said, we worked with all sorts of tech at the lab, but thankfully, never JSP.

      • Hmm Never used .net, I am a bit wary of anything strictly microsoft. (Of course my only experience in the area was a php site for a summer internship, gah php I hate dynamic typing/non-explicitly variable declaring) Then again I dislike webprogramming in general…..

      • We are encouraged to use open source technologies to avoid vendor lock-in. It’s all Java all the time over here. If you know what you’re doing, you can do some pretty awesome things with it.

        In my experience, there are few technologies that are outright unsuitable for most applications. JSPs by their nature are not a ridiculous choice in a well designed system. There are easier technologies (and harder ones!) but few are outright broken for this kind of application. It’s not like they’re doing anything particularly difficult. Which is more the reason to wonder why they’re doing such a poor job of it.

    • No it’s not. Tax the GC too much and your process spends more time reclaiming than doing useful work. Same thing for not payimg attention to growth curves (looping). I’ve seen such horrible things in. Net and Java that it should violate international law, be considered crimes against humanity, and an affront to God. Some developers will have a lot to explain at the pearly gates.

    • I love PHP. Standard middle ware application. I am not sure what the tuning possibilities are for it but I’d think the limiting factor on speed would be the disks and the database retreive before a small interpreter would be slowing you down.

      OTOH middleware is a whole thing onto itself and when you get into a large,high transaction web/database applications a robust, enterprise-level middle ware server might be more appropriate.

      • Correct, PHP would just be the presentation layer. It shouldn’t be used beyond that. In fact, a PHP website shouldn’t be talking to a database directly at all, but rather a web service or similar acting as a data broker.

      • Interesting thought about the php not interacting w/db might have to think about that. As for the annoyances I have, not strict enough types/declarations and form processing (yes lets have 50000 variables in $post and try to avoid name collisions) Of course if I used the object oriented parts rather than ignoring it I probably would have done better.

      • Ugh. PHP. The language that rivals JS for the poorest language design ever.

        I mean, “07”==7, but “08”!=8 – really?

        • PHP tries to implicitly convert strings on both sides to numbers if possible when doing the comparison – if that succeeds, they are compared as numbers, otherwise as strings. The rules for conversion are the same as the syntax for number literals in the language, which inherits them from C – so a literal beginning with 0 is interpreted as octal. Now 07 is a valid octal number, and so “07” will be successfully converted to a number and then compared as such, yielding true. On the other hand, 08 is not a valid octal number, and so 8 is converted to string “8”, which is then converted against “08” using string comparison, yielding false.

        • They all have their little gotchas, don’t they? 😀

          Most of my web applications are pretty simple so my needs are simple, too.

          Web stuff is just a hobby so I try to use the path of least resistence there.

          My main activities are enterprise DBA/engineering and SAN admin. Operations.

  7. I appears to be built using the exact same platform as the online payroll app for Paychex. The odd look & feel, user unfriendly “steps” to progress through and the general overall process is identical. I think it is an off-the-shelf platform of some type. myapps.paychex.com does actually work reliably though.

    • The similar look and feel is common to Java platform apps. That said, the government does purchase and modify off the shelf systems. But for the millions spent on this, it sounds custom to me.

  8. Betcha 10:1 it’s GC. Easiest to overlook, hard hitting and simplest issue. Sad part is perfmon or similar can id it in a heartbeat. Folks skip the simple and best practices nowadays. Too much rliance on tools doing the thinking.

    Secondarily, there probavly are spme growth issues, too. Overall alg proly is O(n^y^z) LOL

    • Or O(J!B!T!)
      As for the Gc, is it that bad? I mean I am just a student and think that gc is somewhat inneficient, but I mean it shouldn’t be that bad? Honestly I prefer C-s elegance to java (but know the java libs better, and wish C had namespaces)

      • Garbage Collection (GC) should not take the majority of process time in planned systems. Simple errors can cause a significant increase in memory consumption that creates memory pressure. Consider declaring memory and newing an object in a loop vs declaring memory and then newing to the declared memory.

        for (int i=0; i< j; i++)
        (
        foo o = new foo();
        … do stuff….
        )

        vs

        foo o;

        for (int i=0; i< j; i++)
        (
        o = new foo();
        … Do stuff….
        )

        (yea, I know modern compilers and optimizers should handle cases like this, but it's a contrived example)

        The first example will allocate memory on an iteration basis while the second will use the same memory space (contrived example, i know, but it's illustrative). Now, consider each example in loops within loops within loops….. memory gets consumed pretty quickly and often. When thresholds are hit, the GC will start trying to figure out what is out of scope, rings get filled, by promotion, and GC spends a ton of time doing it's job.

        • Doesn’t that still create a new Foo() each time? o is just a pointer to the heap where each Foo is stored….
          public class ttag
          {
          public static void main(String[] args)
          {
          Object o;
          for(int i =0; i<10;++i)
          {
          o=new Object();
          System.out.println(o);
          }
          }
          }

          java.lang.Object@41d5550d
          java.lang.Object@1cc2ea3f
          java.lang.Object@40a0dcd9
          java.lang.Object@1034bb5
          java.lang.Object@7f5f5897
          java.lang.Object@4cb162d5
          java.lang.Object@11cfb549
          java.lang.Object@5b86d4c1
          java.lang.Object@70f9f9d8
          java.lang.Object@2b820dda

          I presume the hashcode of a vanilla object is related to address.

        • In that contrived example, the declaration occurrs in the same statement as the new so more memory (sizeof foo) is consumed rather than using the same memory location. So, if the loop were 100 in the contrived example, we’re talkimg the difference between 100 * sizeof(foo) vs sizeof(foo) being set 100 times.

          Another example is

          ” Hello world “.LTrim().RTrim().Replace(“w”, “W”)

          Which has two imtermediate strings vs

          ” Hello world “.Trim().Replace(“w”, “W”)

          Which would have one intermediary string.

        • I think that new memory is allocated in both cases….

          Oh and iirc strings in java are immutable, so String.trim creates a new string, as does string.replace…… so thats 3 vs 2, now what I used to do was
          String s=””
          for()
          {
          s+=expression
          }
          oops new String per iteration

        • Those are new instance, it’s the reference counting that is impacted, and sent through gc. Compiler optimization tries to take care of tihing, but creating and dangling referencs will accumulate and orce garbage collection. We’re talking rates of creation vs deallocation.

        • Gene, you’re plainly wrong here. “Foo o” in Java is roughly equivalent to “Foo* o” in C++ – it’s just a pointer on the stack, nothing gets allocated there in either language.

          The only thing that does the heap allocation in your example is the “new”, and that one is inside the loop either way, so it’ll allocate the same exact amount of instances.

          Furthermore, stack allocation is done entirely on the entrance to the function, and consists of shifting the stack pointer by the predefined (computed at compile time) amount – it doesn’t matter if your “Foo o” is inside the loop or not, the compiler and the JIT know perfectly well that there’s only one “o” in play at a time, so there will only be one slot on the stack for it (or, more likely, it’ll be enregistered).

          Also, there’s no reference counting in Java. It uses a mark-and-sweep tracing garbage collector.

        • Based upon my understanding of Java int19h is right (well he said basically what I said, but still)

      • If you look at amortized time over the runtime of the process, than a typical Java app will actually spend less time allocating/deallocating things than a typical C/C++ app.

        A heap allocation in native code is actually fairly expensive, because of the internal heap structures it has to maintain, which it needs because it cannot randomly reorder blocks (since that’d invalidate the pointers). Deallocation is not all that cheap, either.

        In contrast, in a VM with a GC, allocations are lightning-fast because they usually just increment the single “end of current gen0 heap” pointer. Deallocations are mostly fast on a per-object basis because a gen0 heap can be cleared very easily, being a simple structure (if all new objects are at the end, you can again just decrement a single pointer – this is a typical scenario when a long-running loop allocates a bunch of temp stuff). This is further simplified by the fact that with a VM that knows where every reference is, GC is free to move and reorder blocks in memory – so it can move the long-lived objects away, and keep the short-lived heap nicely defragmented so that it can be dealloc’d in large blocks.

        The price you pay for this is responsiveness. While amortized time is less, it’s batched up in bursts of activity that cause a perceptible slowdown, as thousands of objects are dealloc’d at once. This works great on servers, reasonably well on desktop UI, and meh on mobile devices w/touch where responsiveness is very important for smooth UI. VM designers have been trying to fight this with true background-threaded lock-free GCs, but lock-free can only go so far.

        • For a third solution I like what Mozilla rust is doing memorywise…

          As for this
          “If you look at amortized time over the runtime of the process, than a typical Java app will actually spend less time allocating/deallocating things than a typical C/C++ app”

          I might have to try to find a cite for that, not that I think it is impossible… its just hard to believe a tracing gc can be faster than malloc/free I suppose heap fragmentation could be painful though

        • Think about what malloc has to do to give you an allocated memory block: it has to find a free space on the heap for that block. Because the heap is fragmented, and because advanced allocators usually use different pools for different-sized objects to reduce that fragmentation, it’s not just a simple “grab N bytes at the end here”; for more convoluted cases, it may even need to do a linear search. Plus there’s the cost of updating the data structure that backs the heap, such as e.g. a tree.

          If you think about it on a higher abstraction level, it also sort of makes sense. With a GC, you basically trade memory use (because of non-freed objects hanging around till next GC) for speed. The more memory a GC can waste, the easier it is for it to allocate super-cheap, and the faster it works. So a GC tuned to perform better on average than manual memory management will use several times more memory on average.

          This has some links to the papers with numbers (but note that this is for GCs circa 2005; we have better stuff these days):
          http://stackoverflow.com/questions/755878/any-hard-data-on-gc-vs-explicit-memory-management-performance

        • Awesome will read.
          As for malloc/free I would imagine
          keep a heap/priority queue of unallocated regions by size
          keep a hash table of allocated pointers

          call malloc() -> take largest region, if it is big enough for data store it there, add pointer to hash, along with memory page etc. Take unused segment add to heap o(logn)
          biggest entry too small? Call the os get a new page of memory

          Free()-> hash passed pointer, find neighboring blocks o(1), if they are not in use remove neghboring blocks from heap splice together and reheap O(logn)

          Gc? well yeah you could defrag the heap, but you can do the same in manual provided the pointers used aren’t the true pointers but point to pointers, so basically a second virtual memory system. (which I imagine gc does)

        • GC+VM can move the blocks around without an extra pointer indirection, because it can just walk the object graph and update the moved pointers (since it knows where all objects and their fields are).

          As for your malloc algorithm, it’s a bit too simplistic because it will often inadvertently use large blocks for small allocations, and subsequently a follow-up large allocation that would fit will now have to use a new page. Hence why they usually maintain several different heaps internally, one for “big” objects, one for smaller stuff etc. If you can move objects around, you can also organize them by lifetime (gen0, gen1 etc), which is also helpful since gen0 is likely to be allocated and deallocated in bulk, so you can try to batch that.

          The main perf gain from GC is actually from batching. Basically, your typical C code, a function mallocs on entry, uses that buffer to do whatever it does (say, concat some strings), frees on exit. Every malloc/free pair does all the lookups and heap maintenance separately, even if you call it many thousand times in a loop. OTOH, with a GC, the function allocates (which just advances the “heap top” pointer), and doesn’t free. Then at the end of the loop, the entire allocated memory is freed in one single swipe, with an extra overhead of walking the object graph to determine that this is indeed what should be freed.

    • I don’t think garbage collection is an issue. All GC does is automatically dispose of memory objects you’re no longer using. In an unmanaged language like C or C++, you need to explicitly declare an object into memory, use it, and when complete, dispose of it.

      Garbage collection just allows you to skip the last step, declare a variable, use it, and when done with it, leave it for the garbageman to clean up. Makes coding both easier/quicker and more reliable, at the cost of performance (the performance hit is more to do with interpreting unmanaged code than the actual GC itself).

      In this case, the issue is resources are being declared and then kept in use, so the GC doesn’t even get the chance to clear it out…

  9. With the US government’s fascination with old platforms and systems, I was placing bets on VAX/VMS or Windows NT 3.51.

LEAVE A REPLY

Please enter your comment!
Please enter your name here