Your average piece of software can include hundreds or even thousands of files. I mean think of all the graphics files, templates, and fonts that are installed onto your PC install something as mundane as a word processing application.


And when you think about how many different software apps are out there on the market today, or even simply all the ones that are no longer with us, trying to keep a record of all of them would be a near impossible task.

But computer scientists at the National Institute of Standards and Technology (NIST) are doing just that. For more than 15 years they have been working on the impossible task of maintaining an up-to-date and accurate archive of the world’s software. It is a task that they will never complete, because the amount of software in circulation keeps growing. But they have succeeded in creating the largest publicly known collection of its kind in the world.

Called the National Software Reference Library (NSRL), and the collection is about to get a whole lot larger. On December 15, 2016, the NSRL expanded to include its first batch of 23,000 mobile apps for Android and iOS, while another 200,000 will be added over the course of 2017. The NSRL so far contains some 50 million software programs, ranging from word processing applications to old Atari games and potential malware.

But the NSRL is more than just a database for posterity. It’s also a critical tool used in law enforcement and national security investigations. Every file in the NSRL is run through a computational procedure that generates a unique digital fingerprint for that file, expressed as a string of 40 letters and numbers. NIST publishes those fingerprints in a Reference Data Set (RDS) that is updated quarterly and freely available to the public.

“Our goal is to help investigators, so we prioritize the software they are most likely to encounter in the field,” said Doug White, the NIST computer scientist heading up NSRL. “We also focus on what we consider dual-use software – things that can be used for good or bad, including keystroke loggers and network scanners.”

Forensics investigators don’t just use the database to filter out irrelevant files either. White went onto say that the FBI recently enlisted the help of NIST to assist its investigation into the disappearance of Malaysia Airlines flight MH370 to figure out which flight path it may have taken. The FBI wanted a hash of files associated with every flight simulator program it had. White helped by adding more than 120,000 flight map-related files, and making them available to the FBI.

While much of the software in the NSRL is donated by the companies that make it, another segment of the database is composed of free software, in many cases the free trial versions of programs companies often distribute. After that, White decides which titles to purchase with limited funds.

