Melting Points

Researchers: Jean-Claude Bradley, Andrew SID Lang, Antony J Williams, and Evan M Curtin
License: CC0
Sets and models are numbered to be consistent with the labels in the ONSChallenge Open Notebook

Original Datasets (Raw Data)

ONSMP000 (xlsx) - 15591 full raw entries from Alfa Aesar containing duplicates and non numerical values (about).
ONSMP003 (zip - xls) - 4450 measurements from a paper by M Karthikeyan (2005). Includes SMILES and many descriptors.
ONSMP005 (xls) - 277 measurements from a paper by CAS Bergstrom (2003). Drug molecules separated as training and validation sheets. SMILES provided.
ONSMP014 (xls) - 1070 melting points (not simple numeric format) from DrugBank - includes SMILES, InChI, LogP and aqueous solubility when available.
ONSMP018 (csv) - 1631 curated melting points with CSIDs original compiled by HM Bell, Dept of Chemistry, Virginia Tech, Blacksburg, VA 24061. Released as Open Data.
ONSMP019 (csv) - 3217 full raw entries from the Oxford MSDS sheets containing ranges, salts, metal, etc.
ONSMP021 (xlsx) - 287 melting points taken from paper by LD Hughes (2008).
ONSMP023 (xlsx) - 11,645 compounds with raw melting points from the April 2011 PHYSPROP database of 43,544 compounds, melting points released as Open Data.
ONSMP026 (csv) - 3757 melting points (original file - not simple numeric format) extracted from scripts crystal data files by Will Griffiths and released as Open Data.
ONSMP035 (xlsx) - 1334 melting points with names, CSIDs, and SMILES collected by students as part of the Cheminfo Validation Project (2014-10-01) (datasheet of all physical properties collected).

Curated Datasets
Historically, melting point datasets have been low quality, by which we mean they contain both incorrect values and values associated with incorrect structures. Our ultimate goal was to create melting point models for organic compounds but the low quality of the available data made the task of modeling an already difficult to model physical property even more difficult. Over the course of several years we have curated the melting point data into two Open Data (CC0) datasets that can be used for modeling purposes:

Bradley Melting Point Dataset (xlsx) - Dataset of Open Melting Points. 28,645 measurements including those found to be incorrect (marked as 'do not use'). csid corresponds to Chemspider ID. (2014) doi: 10.6084/m9.figshare.1031637
Bradley Double Plus Good Melting Point Dataset - Dataset of 2,706 highly curated melting points (compounds with more than one non-identical measurement from different sources - no inorganics, salts or compounds that posses chiral centers or cis/trans isomerism.) (2018) doi: 10.6084/m9.figshare.5756523

See our presentation below for more details about our curation process:

Print Format
Melting point data book (pdf) - Book of highly curated double+ validated melting points (ONSMP029 - 2,706 highly curated double+ validated (range: 0.1-5 C) unique compounds with compounds that had at least one chiral center, possessed cis/trans isomerism, were inorganic or a salt removed). Also available from Nature Precedings doi: 10.1038/npre.2011.6229.1

Melting Point Models

MeltingPointModel013 - More modeling of melting points using Open Babel
MeltingPointModel012 - Links to PMML Model on ONSChallenge Website
MeltingPointModel011 - Modeling melting points using Open Babel
MeltingPointModel010 - Using QSARDB to create a melting point model web service