Retrievium: A Chemical Information Storage and Retrieval Platform

Case Number: 15-108


  • Allows users to store, index and share their digital assets in chemistry
  • Supports advanced analytics and search functionality, useful for rational compound design, lead identification/optimization, development of structure-activity relationships, etc.
  • The private installation of the platform that could be linked to the central (open) repository or be used for the management and searching of private or other publically available data.


Large-scale datasets are widely regarded as one of the most valuable scientific research currencies of the 21st century, and the creative stewardship of research data is a growing priority. Additionally, most of the world's major technological and societal challenges are being addressed through research in chemistry, including climate change, energy, security, food supply, and health. Retrievium is a comprehensive, open, and interactive data management platform for the chemical sciences with high interoperability between existing open repositories. It also has capabilities for advanced analytics including unstructured pattern recognition and intelligent, meaning-based search and retrieval.


The Retrievium Database Platform provides a more effective method of storing, indexing, sharing, and describing chemical structures and chemical information digitally. It also allows users to interact with large databases across disparate repositories in addition to being able to distill new knowledge via intelligent, meaning-based search and retrieval capabilities. The application of machine learning technology, as a value-added component of the platform, facilitates unstructured pattern recognition within the vast chemical data landscape, which promises to yield new industrial catalysts, advanced materials, medicines, and transformative computational simulation techniques among countless other substantial applications.


  • There are no completely analogous databases or platforms available
  • The Retrievium database contains highly accurate 3D structures of compounds determined by quantum chemical calculations and provides a source of novel compounds and descriptors
  • Retrievium uses a unique method to describe chemical structures which allows for more accurate searches, quantification of similarity between two different molecules, as well as the identification of similar structures that are not found by other methods
  • Retrievium uses the Chemical Markup Language (CML), an open and well documented standard to describe chemical information, as its data model. Together with the Application Programming Interface (API) this allows for the building of other software or web-services that can query the database programmatically.

Stage of Development: Initial model developed and in the process of creating an updated version

Opportunity: Licensing

