For years the STM publishing industry has been focused on the development of XML-based workflows and technologies that leverage XML for the expressed purpose of enriching content and making content more discoverable. Millions of dollars of investment have been made in order to ensure that content is prepared, distributed, and stored as XML in order to drive multi-channel publishing models both now and in the future.
Yet I’m often reminded that while XML has been foremost in publishers’ and suppliers’ minds when preparing content, it’s PDF that is foremost in the minds of users. At a meeting last year with a major science publishing customer, we were discussing a new workflow and when the topic of HTML presentation and online PDFs came up. Their VP of Web Development and Operations put it in perspective by saying, “While the online HTML may be the journal of record, in the eyes of users the PDF is the file of record.” This was confirmed again, when during our full-text mobile app development PDF download was cited as the most important feature after the HTML presentation in the app in order for the app to be useful for readers.
At this year’s AAP/PSP 2012 conference, I sat in on a session entitled The Game-Changers? Four Organizations that Could Revolutionize Scholarly Publishing. It was described in the program as, “Across Scientific, Technical, Medical and Scholarly publishing, new players are opening up untapped content distribution channels, innovating around established business models, and enhancing the end-user experience – and none of them would characterize themselves as STMS publishers. This fast-paced, interactive session introduces the philosophies, approaches, products and services of four organizations changing the industry landscape.”
From the description, would you have expected that 3 of the 4 companies presenting have a business model that is almost completely based on PDFs? Yet DeepDyve (http://www.deepdyve.com/), Mendeley (http://www.mendeley.com/), and PubGet (http://pubget.com/) all based their business model on the delivering, storing, sharing, or finding PDFs of STM articles. Only Temis (http://www.temis.com/), with their semantic enrichment solutions, varied from the PDF-based business model.
PDFs have been around since Adobe created them in 1993 and the world of communication has never been the same again. Cross platform, electronic versions of the printed page, PDFs have dominated as the communication vehicle of choice for every part of the publishing process from peer review, to author proofs, to the final delivery online. Yet as a publishing services provider, the majority of our focus in the production workflow is all about the underlying XML content. Are we putting too much emphasis on the XML content when what readers have proven time and time again is that they want to read the PDFs? Ask just about any STM publisher about their online journal analytics and I’m sure you’ll hear that the average online session length is less than 2 minutes and that the full-text PDF is downloaded far more than the full-text HTML is accessed.
However, in order for users to be able to read PDFs they first have to find them. And while in many cases that starts with a Google or Google Scholar search, the results of those searches would be dramatically different if the full-text HTML didn’t exist. So while PDFs are a great format for consuming full-text content, whether you print it out or read it on your iPad, if the user can’t find the article then they can’t consume the information.
What’s the future of scholarly publishing? How long will PDFs be around? Are users really concerned about the fact that the PDFs don’t always contain all of the article information like supplemental material, multi-media, and reference links? Are products like Utopia Documents (http://getutopia.com/documents/) what readers really want? Add a comment below or send me an email to let me know your thoughts. firstname.lastname@example.org