IMLS’ Count of Museums in the US May Be Exaggerated

LIMLS Museum Distribution by Type 2014q3ooking for museums in your county or state?  Want to know how you compare to other museums across the nation? You’ll find them in the Museum Universe Data File recently produced by the Institute of Museum and Library Services (IMLS).  It’s a free database of museums in the US that includes names, addresses, contact information, total revenue and expenses, and GIS data.

IMLS held a webinar today to explain the datafile and answer questions.  They constructed the list from several sources, including the Internal Revenue Service and Foundation Center,  which were then reconciled to remove duplicates. Data will be updated every six months based on continuing research and community feedback. IMLS will be using the datafile to conduct sampling surveys for future research projects, such as collections care, to inform their programs and share results with the field and Congress, however, you are encouraged to use it as well.

The big news is that of the 35,144 total museums in the US, most are related to history (here’s AASLH’s perspective).  Historical societies, historic sites, and historic preservation organizations comprise the lion’s share at 16,880 (48%) and when combined with the 2,636 (7.5%) history museums, that makes up more than half.   Before you get too excited, several participants noted that the data may be exaggerated because some entries don’t seem to be museums, such as associations, foundations, friends’ groups, and businesses. Indeed, in my region, it includes organizations I wouldn’t consider a museum, such as the Athletic Hall of Fame at Einstein High School (a series of plaques on a wall) and the St. Petersburg Brodsky Museum Foundation (the museum is actually in Russia, not the US). IMLS recognizes that there may be errors, but states this is a process and a “stake in the ground” about the vitality of the cultural sector.  In some cases, it’ll take more research to make distinctions.  For example, some friends’ groups operate museums, others are merely fundraising arms.

Data files in csv and xls formats with documentation is available online. Patrick Murray-John at Hacking the Humanities has already played with the data to create an experimental interactive map using Omeka and has made the data available in GitHub. IMLS is considering API and other tools, so if you’re interested, see their developer page. IMLS is also developing a searchable interactive map showing locations of museums color-coded by discipline, which they hope to release in coming months. This could be overlaid with other data, such as flood maps, to determine priorities for natural disasters.

As you use the datafile, please be aware that it may contain errors, so check it out and send corrections and comments to

14 thoughts on “IMLS’ Count of Museums in the US May Be Exaggerated

    1. Max van Balgooy Post author

      Hmm. I wonder if it was overlooked because it’s not an independent entity? I can’t recall for sure, but is it owned and operated by the university? If so, that might mean that other college and university museums may have been inadvertently omitted.


  1. thehistorylist

    Thanks, Max, for posting this update. What the IMLS is doing is close to being useful. They definitely need an API, but the most glaring shortcoming is data cleansing and then keeping the dataset up to date going forward. Turn this is into a platform and let the community handle it. That’s not (historically) the library science way of doing things, but with 35,000+ entities, it’s the only practical approach. In fact, they (or Patrick or others with some CS skills and tools) should look at venue harmonization with the Foursquare database (


  2. thehistorylist

    An update: After reading through some of the information on methodology, it looks like the IMLS may have done what I had suggested in my comment, but in a far more robust way. Their documentation mentions using Factual, and if they used Factual’s places “cross walk” ( they would have had the benefit of the Foursquare data as well as many other sources. At the same time, looking more closely at the data and methodology you see something like a magnet school in California that has a zoo program, and with Zoo in the title, that organization gets pulled in to the dataset, which again underscores the value of making this a platform that would allow someone to immediately down vote or flag an entry, just as the first person in this comment thread could have gone in and added his institution.


    1. Max van Balgooy Post author

      Thanks, Lee. With your background in IT, you understand the developer side of things much better than me. But I do concur on your suggestion that providing an easy way for people to flag or question entries easily will improve the database quickly. IMLS did mention they would like to provide real-time updates to the datafile, but they don’t have the resources to do this more than semi-annually.


  3. John Dichtl

    Fascinating issue and discussion, Max. Thanks for the illumination. By the way, of the 9 “museums” within 5 miles of my zip code (using Patrick Murray-John’s interactive map), I’d say only 3, maybe 4, are really museums.


  4. Bob Beatty

    Here’s my take… I’m not sure that having organizations listed as “museums” on this list is a bad thing, whether or not they are *true* museums by our definition or not. Sure they could be called “Cultural Organizations” or “Institutions that Support Cultural Organizations” (as the examples cited in the post and above), but then we’re parsing definitions and risk confusing stakeholders–in IMLS’s case, Congress.

    IMHO, it’s much easier to work with a definition we all understand (museums), than to try and slice the pie in so many small pieces that it’s uber-difficult to see the forest for the trees.

    Sure some aren’t true museums by some peoples’ definitions, but then some true museums, such as Chucalissa, aren’t event in the dataset.

    My guess is when all’s scrubbed out, we’ll be somewhere near the 35,000 number–and if we add city/county historical societies with no collections, I bet the number is still low.

    Either way, I see nothing wrong with what we have now. It certainly gives us a MUCH better baseline from which to operate than we did before. And certainly enhances the argument about the importance of our discipline due to the sheer volume of history museums/entities out there.

    Just my .02. But I know in our field we often spend a tremendous amount of time parsing definitions, words, you name it. And I think we need to be careful not to look this gift horse in the mouth.


  5. Carlos Manjarrez

    I have enjoyed the discussion and would like to add in a few comments of my own. I think it is very important for people to see the MUDF file we have produced at IMLS as the start of a process. I have been collecting social and economic data for 20+ years and it is my opinion that the days of static population estimates are coming to a close. People like the idea of finite estimates. It is what they are used to. But we are in a new age of data gathering, where information flows like a river… and every day that river gets stronger. The radical change in the availability of information shifts certain many social science assumptions and demands that we retool many of our methodologies.

    IMLS approached the museum data collection with this in mind. Given the dynamism and organizational variability in the museum sector we knew it would be folly to draw data from one or two sources. So instead, we pulled-in data from a wide variety of sources, public and private. This includes administrative data from IRS and IMLS, data from over 1,000 private foundations and data from one of the most comprehensive data aggregators in operation. Next, we developed a set of standardized, data processing procedures to normalize the information we collected – not unlike the process Census uses for the County Business Patterns data collection. And I can tell you that this is no small task. We went through more than 80,000 records to standardize fields, de-duplicate institutions, and impute missing data, again using conservative and replicable procedures (which are detailed in the data documentation).

    The entire time we were focused on the future of this file. As someone commented above, we did use Factual as a data provider and we are in negotiations with them to release the Factual ID in the next iteration of the Museum Universe Data File. We also intend to add social media identifiers to the records so that researchers can build-in even more cross walks. At IMLS we would love to have a real-time data verification tool. But this type of data collection can be very, very difficult to get approved and built within the Federal context given existing regulations.

    Once we created a file using the procedures outlined above we had a choice to make. We could have held onto the file we had and employed a more detailed verification process that would have cost hundreds of thousands of dollars (and would likely be out of date by the time it was finished) or we could release the file we had processed into the public domain.
    We obviously chose the latter and we did so at some risk. We knew that some people would disagree with individual records in the field. We also know that some “Friends of” museum types would actually operate museums and others operate as fund raising entities and nothing more. But there was no way to know that a priori. The procedures we used were limited to the data we had. If we did not have data in the file that could be used to verify or impute information then we did not go further.

    As for museum type, people should know that this too is something that must evolve over time. The first step will be to develop clear, objective definitions of museums by type that are based on observable phenomena AND can be operationalized in a consistent and reliable manner. When those exist they can be applied consistently for all records in the MUDF and we will not need to default to the definitions used by the data providers.

    As we have said repeatedly, we will continue to refine our data standardization procedures and will enhance the file over time. We welcome your comments and your input. If you have questions or comments about the file please do not hesitate to contact us.


  6. thehistorylist

    Carlos, thanks for speaking up. I was hoping that someone involved with the project would participate in the excellent discussion that Max started.

    I agree with you completely on how the world has changed. Data is no longer static and the earlier concept of files has been replaced by real-time data in the cloud accessed via APIs.

    The task the IMLS undertook recently is a huge one, as you note. We also have better tools at our disposal than when the count was first done.

    Today there’s something of a build versus buy decision.

    It appears that Factual does what we need. They use many more datasets and because they draw from so many widely-used data sources, and those data sources, in turn, make it easy for individuals and institutions to correct and update the data, it’s likely to remain up to date.

    This isn’t surprising since they’ve raised $27 million, assembled a large team of computer scientists and data scientists, and they’ve been working on this core problem for years.

    (To be clear, I’m not associated with Factual in any way.)

    The shortcoming with both their system and the just-released IMLS data is the ability to sort by institution type. Here moving away from the idea of distinct categories and to the use of things like tags would help address the issue that we’ve all identified. A simple example from my local institution: A historical society housed in a historic house that includes a small museum and library.

    This is a problem they face and their site suggests that they’re working on this, too.

    Has the IMLS approached Factual about how they could provide the data as a service, making it available to all via an API? This might involve negotiating a way that data might be tagged in a more fine-grained and relevant (to us) way than they might otherwise do. This would also enable the IMLS to provide (via the API or through a Factual-powered portal) an interface to search and sort and lift the impossible burden of meeting the very hight people have for this type of data today.


  7. Carlos Manjarrez

    We have not approached Factual to set up a direct relationship because it would run counter to Federal regulations that preclude government from privileging a specific vendor. So, what we have done is make the data that we have available to all researchers and all commercial and non-profit entities alike – in a single release file. I feel pretty confident that the MUDF file is the closest approximation to the museum universe, broadly defined. It will undergo changes…. as all data do. But it is the only list I know of that has attempted to normalize the many different data sources that are out there.

    As for tagging, I agree that a singular categorical description can hardly do justice to the variability of museum types and services. As I’ve stated above, the museum types in the data file are derived from the categories in our source data. As you know, Factual does not have detailed museum types. So we had to make decision rules. In the instances when a record was found in two sources and only one museum type was available (e.g. IRS and Factual), we used the museum type from the single source. When there was agreement between two sources there was no issue and we assigned the same type. In the event of museum type disagreement we used the type of either IMLS (first) or IRS (second). We chose not to include multiple museum types from each source if there was disagreement just as we chose not to include multiple addresses per museum if there was data disagreement. We thought it would be confusing to data users and add little value overall.

    There is nothing technically stopping IMLS from adding secondary (or tertiary types). What is needed however, is a set of clear, objective definitions that can be operationalized into a standardized instrument. We need that before we can assign more detailed types. There are many hard questions implied. What objective criteria should we use to distinguish between a historic site and a history museum? How would that be operationalized into a few questions? How should we categorize a children’s art museum? As an art museum, as a children’s museum, as both, as a entirely new category? These are things that require much, much more discussion in the field. I don’t think there are many in the museum sector that would be comfortable with IMLS assigning categories by fiat. As such, we have defaulted to the data that is available. But we look forward to working with people in the field to determine more precise measures moving forward.


    1. thehistorylist

      Carlos, thanks for the additional information.

      Regarding . . .

      – Contracting: I was thinking of the model where the agency contracts for the service the way they contract for or acquire other products and services, through an open process that allows anyone to respond. The data would remain free to anyone who wants it. (The National Archives, for one, has done some projects like this.)

      – Publishing the data: Any plans for an API?

      – Updating: Any plans for a better system than sending things in via e-mail?

      – Tagging: Great point, and it’s unlikely that we’ll ever end up with perfect definitions, especially if coming up with definitions that everyone can agree on is the required first step. Public tagging is one simple alternative I realize that this runs counter the traditional library science way of doing things, but it would be a realistic approach that would be easy to implement once a platform and publishing front end are in place. Of course the good news is that this has all become pretty quick and simple with the tools and hosted services available today.

      Good luck with the project. Interested to see what folks do with the data set now and in the future.


  8. Carlos Manjarrez

    I am not sure I understand the contracting question but perhaps best left for another forum.

    RE: API
    We hope to have APIs in place for ALL IMLS data by the end of this fiscal year.

    RE: Better systems for gathering data
    Collecting data from people using a standardized instruments requires detailed OMB review via the Paperwork Reduction Act. We are doing what we can within resource contraints and Federal regulations.

    RE: Tagging
    This is a fine way to collect data but it does avoid the problem of having to operationalize categories for future statistical collections.

    Thanks again for your interest.


    1. thehistorylist

      Carlos, please tell us more about “future statistical collections,” since those are the ones that we can influence. Your reference to “operationaliz[ing] categories” may help those of us who don’t work for the IMLS understand more about what the agency is trying to do with this effort. Using as an example my community’s all-volunteer historical society on whose board I serve: It’s our town’s historical society and is housed in a historic building. It contains a small museum and library. Is it important to the IMLS’s mission, budget, or operations that we fit uniquely into one category? If so, please explain.


      1. Carlos Manjarrez

        The need to operationalize terms is not specific to IMLS or its mission. It is a matter if sound social science practice. Lack of specificity in survey items greatly reduces the valid and reliable of the data being collected. My office manages the agency’s Federal statistical program and we are bound by all of the same requirements of the other programs and agencies in the Federal Statistical System.

        I should also note that the decisions IMLS makes in its statistical program should not be confused with the programmatic or funding decisions IMLS makes in its library or museum grant programs. The research office is an entirely separate program that is statutorily defined in Section 9108 of the IMLS authorizing legislation.

        Forgive me for monopolizing this forum. This will be my last post on the subject.


Comments are closed.