AS003

Solvent Coefficients For Alternative 'Safe' Solvents
All content, models and data are released as CC0 - the default license for all our ONS work. This page is a duplicate (backup) of the original ONSChallenge page AbrahamSolventModel003
 * Researchers:** Jean-Claude Bradley and Andrew SID Lang

Objective
To investigate the predicted solvation properties of solvents deemed safe by the EPA. These solvents will be compared to solvents with known Abraham solvent coefficients with the outlook of both potentially replacing existing solvents with safer solvents and to find potential new safe solvents to investigate that reside in a part of the chemical space currently not occupied by solvents with known Abraham coefficients.

Procedure
The Abraham general solvation model uses the LFER

log P = c + e E + s S + a A + b B + v V

where c,e,s,a,b,v are the solvent coefficients and E,S,A,B,V are the solute descriptors, see this brief discussion of the model. The Abraham coefficients are found via linear regression from measured data. The standard procedure is to allow the c-coefficient (the intercept) to float in the linear regression. It has been suggested that c should not be negative[1]. We suggest that little predictive ability will be lost if we just require c to be zero. This will also allow easier comparison between solvents. Thus in order to compare both current solvents with each other and potential new solvents with current solvents, we decided to re-calculate the coefficients for known solvents e_0, s_0, a_0, b_0, v_0 by making c zero. This was achieved by calculating the log P values in 90 solvents for 2144 compounds with known Abraham descriptors from our Open Abraham Descriptors Database and then re-running the linear regression using R. The following code with results is typical: code setwd(".../MakingCZero") mydata = read.csv(file="makingczeroreadyforR.csv",head=TRUE,row.names="csid") fit <- lm(isopropyl.myristate ~ 0 + E + S + A + B + V,data=mydata) summary(fit)
 * 1) summary of fit

[output] Call: lm(formula = isopropyl.myristate ~ 0 + E + S + A + B + V, data = mydata)

Residuals: Min      1Q   Median       3Q      Max -0.55191 -0.25598 -0.13732 0.00069  1.78549

Coefficients: Estimate Std. Error t value Pr(>|t|) E 0.977259   0.011781   82.95   <2e-16 *** S -1.294959  0.014814  -87.41   <2e-16 *** A -1.870114  0.020493  -91.26   <2e-16 *** B -4.017729  0.015120 -265.73   <2e-16 *** V 3.939081   0.007844  502.19   <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2503 on 2139 degrees of freedom Multiple R-squared: 0.9958,    Adjusted R-squared:  0.9958 F-statistic: 1.009e+05 on 5 and 2139 DF, p-value: < 2.2e-16 [output] code The following table lists the original solvent coefficients together with the c=0 adjusted coefficients. Not surprisingly, the largest changes in coefficient values occur for solvents with c-values furthest away from zero. What is a little intriguing is that all the coefficients move consistently that same way. That is, solvents with negative c-values all saw an increase in e and b (and a decrease in s,a, and v) when recalculation was performed, whereas solvents with positive c-values all saw an increase in s,a, and v (and decrease in e and b). By multiplying the average absolute deviation by the average descriptor value gives a measure of the degree by which the coefficients were changed. The adjusted coefficients changed (as measured by e.g. AAE(v_0) * Mean(V)) in the order v (0.124), s (0.043), e (0.013), b (0.011), a (0.010). Using the above adjusted coefficients new RF models were created using R (v3.0.0) and Rajarshi Guha's CDK Descriptor Calculator (v1.3.9). First we used R to perform feature selection code library(caret) #for feature selection setwd(".../MakingCZero") mydata = read.csv(file="CDKReady4RFeatureSelectection.csv",head=TRUE,row.names="Title") ncol(mydata)
 * c || e || s || a || b || v || solvent || e_0 || s_0 || a_0 || b_0 || v_0 ||
 * 0.17 || 0.4 || -1.01 || 0.06 || -3.96 || 4.04 || 1-butanol || 0.387596 || -0.97209 || 0.108258 || -3.97885 || 4.12895 ||
 * 0.22 || 0.27 || -0.57 || -2.92 || -4.88 || 4.46 || 1-chlorobutane || 0.254833 || -0.516892 || -2.847674 || -4.910816 || 4.57048 ||
 * -0.06 || 0.62 || -1.32 || 0.03 || -4.15 || 4.28 || 1-decanol || 0.6203042 || -1.3327395 || 0.0090799 || -4.1464701 || 4.2497972 ||
 * 0.04 || 0.4 || -1.06 || 0 || -4.34 || 4.32 || 1-heptanol || 0.3948325 || -1.0545913 || 0.0143256 || -4.3468223 || 4.3352649 ||
 * 0.12 || 0.71 || -1.62 || -3.18 || -4.8 || 4.32 || 1-hexadecene || 0.696611 || -1.588924 || -3.143734 || -4.810655 || 4.38202 ||
 * 0.12 || 0.49 || -1.16 || 0.05 || -3.98 || 4.13 || 1-hexanol || 0.482763 || -1.137261 || 0.090893 || -3.993082 || 4.190662 ||
 * -0.03 || 0.49 || -1.04 || -0.02 || -4.24 || 4.22 || 1-octanol || 0.4909845 || -1.0517092 || -0.0336817 || -4.2309735 || 4.2009611 ||
 * 0.15 || 0.54 || -1.23 || 0.14 || -3.86 || 4.08 || 1-pentanol || 0.52385 || -1.193867 || 0.188379 || -3.88273 || 4.154244 ||
 * 0.14 || 0.41 || -1.03 || 0.25 || -3.77 || 3.99 || 1-propanol || 0.393466 || -0.996062 || 0.291203 || -3.784515 || 4.05757 ||
 * 0.18 || 0.29 || -0.13 || -2.8 || -4.29 || 4.18 || 1,2-dichloroethane || 0.278894 || -0.090721 || -2.742532 || -4.313978 || 4.274203 ||
 * 0.12 || 0.35 || -0.03 || -0.58 || -4.81 || 4.11 || 1,4-dioxane || 0.33675 || -0.003994 || -0.542209 || -4.825876 || 4.173499 ||
 * 0.1 || 0.62 || -1.8 || -3.07 || -4.29 || 4.52 || 1,9-decadiene || 0.606347 || -1.771424 || -3.037034 || -4.304097 || 4.571908 ||
 * 0.13 || 0.25 || -0.98 || 0.16 || -3.88 || 4.11 || 2-butanol || 0.242337 || -0.945988 || 0.198721 || -3.897972 || 4.179391 ||
 * 0.19 || 0.35 || -1.13 || 0.02 || -3.57 || 3.97 || 2-methyl-1-propanol || 0.338508 || -1.082867 || 0.07594 || -3.591543 || 4.064762 ||
 * 0.21 || 0.17 || -0.95 || 0.33 || -4.09 || 4.11 || 2-methyl-2-propanol || 0.153709 || -0.897144 || 0.397899 || -4.111567 || 4.217508 ||
 * 0.12 || 0.46 || -1.33 || 0.21 || -3.75 || 4.2 || 2-pentanol || 0.445389 || -1.303994 || 0.243289 || -3.759434 || 4.260301 ||
 * 0.1 || 0.34 || -1.05 || 0.41 || -3.83 || 4.03 || 2-propanol || 0.334933 || -1.025916 || 0.437909 || -3.839249 || 4.084005 ||
 * 0.32 || 0.51 || -1.69 || -3.69 || -4.81 || 4.4 || 2,2,4-trimethylpentane || 0.485433 || -1.610035 || -3.586041 || -4.850718 || 4.563507 ||
 * 0.07 || 0.36 || -1.27 || 0.09 || -3.77 || 4.4 || 3-methyl-1-butanol || 0.35352 || -1.255543 || 0.11342 || -3.779382 || 4.43679 ||
 * 0.31 || 0.31 || -0.12 || -0.61 || -4.75 || 3.94 || acetone || 0.286706 || -0.04747 || -0.508846 || -4.792269 || 4.102844 ||
 * 0.41 || 0.08 || 0.33 || -1.57 || -4.39 || 3.36 || acetonitrile || 0.044129 || 0.423135 || -1.4362 || -4.443325 || 3.576285 ||
 * 0.14 || 0.46 || -0.59 || -3.01 || -4.63 || 4.49 || benzene || 0.452175 || -0.554143 || -2.963555 || -4.643338 || 4.564318 ||
 * 0.1 || 0.29 || 0.06 || -1.61 || -4.56 || 4.03 || benzonitrile || 0.277045 || 0.081936 || -1.574291 || -4.574904 || 4.07839 ||
 * -0.02 || 0.44 || -0.42 || -3.17 || -4.56 || 4.45 || bromobenzene || 0.4369346 || -0.4279276 || -3.1781219 || -4.5563083 || 4.4367652 ||
 * 0.25 || 0.26 || -0.08 || -0.77 || -4.86 || 4.15 || butanone || 0.236179 || -0.022077 || -0.68909 || -4.886268 || 4.274532 ||
 * 0.25 || 0.36 || -0.5 || -0.87 || -4.97 || 4.28 || butyl acetate || 0.336024 || -0.442788 || -0.788152 || -5.004535 || 4.408761 ||
 * 0.05 || 0.69 || -0.94 || -3.6 || -5.82 || 4.92 || carbon disulfide || 0.6819348 || -0.9318396 || -3.5870443 || -5.8248542 || 4.9458403 ||
 * 0.2 || 0.52 || -1.16 || -3.56 || -4.59 || 4.62 || carbon tetrachloride || 0.506806 || -1.112282 || -3.496545 || -4.619008 || 4.720644 ||
 * 0.07 || 0.38 || -0.52 || -3.18 || -4.7 || 4.61 || chlorobenzene || 0.3752336 || -0.5056045 || -3.1613969 || -4.7083071 || 4.6478423 ||
 * 0.19 || 0.11 || -0.4 || -3.11 || -3.51 || 4.4 || chloroform || 0.089413 || -0.357874 || -3.051291 || -3.537934 || 4.493193 ||
 * 0.16 || 0.78 || -1.68 || -3.74 || -4.93 || 4.58 || cyclohexane || 0.770769 || -1.640414 || -3.689346 || -4.948869 || 4.659072 ||
 * 0.04 || 0.23 || 0.06 || -0.98 || -4.84 || 4.32 || cyclohexanone || 0.2216766 || 0.0668337 || -0.962833 || -4.8469389 || 4.3348761 ||
 * 0.19 || 0.72 || -1.74 || -3.45 || -4.97 || 4.48 || decane || 0.706653 || -1.697274 || -3.389851 || -4.993254 || 4.571974 ||
 * 0.18 || 0.39 || -0.99 || -1.41 || -5.36 || 4.52 || dibutyl ether || 0.37973 || -0.94367 || -1.357836 || -5.379393 || 4.614845 ||
 * 0.33 || 0.3 || -0.44 || 0.36 || -4.9 || 3.95 || dibutylformamide || 0.275223 || -0.3577 || 0.462459 || -4.944045 || 4.122737 ||
 * 0.32 || 0.1 || -0.19 || -3.06 || -4.09 || 4.32 || dichloromethane || 0.076325 || -0.111839 || -2.957248 || -4.130085 || 4.487963 ||
 * 0.35 || 0.36 || -0.82 || -0.59 || -4.96 || 4.35 || diethyl ether || 0.329941 || -0.737491 || -0.477868 || -5.000297 || 4.530001 ||
 * 0.21 || 0.03 || 0.09 || 1.34 || -5.08 || 4.09 || diethylacetamide || 0.0167 || 0.139253 || 1.409338 || -5.111165 || 4.197615 ||
 * -0.27 || 0.08 || 0.21 || 0.92 || -5 || 4.56 || dimethylacetamide || 0.104844 || 0.145499 || 0.831639 || -4.970213 || 4.418588 ||
 * -0.31 || -0.06 || 0.34 || 0.36 || -4.87 || 4.49 || DMF || -0.034208 || 0.27126 || 0.264139 || -4.827615 || 4.330082 ||
 * -0.19 || 0.33 || 0.79 || 1.26 || -4.54 || 3.36 || DMSO || 0.341749 || 0.745718 || 1.200253 || -4.516908 || 3.262081 ||
 * 0.11 || 0.67 || -1.64 || -3.55 || -5.01 || 4.46 || dodecane || 0.658583 || -1.616828 || -3.508611 || -5.020509 || 4.517868 ||
 * 0.22 || 0.47 || -1.04 || 0.33 || -3.6 || 3.86 || ethanol || 0.453079 || -0.983328 || 0.396198 || -3.623493 || 3.971272 ||
 * -0.17 || -0.02 || 0 || 0.07 || -0.37 || 0.45 || ethanol/water(10:90)vol || -0.009316 || -0.04156 || 0.010626 || -0.350331 || 0.365271 ||
 * -0.25 || 0.04 || -0.04 || 0.1 || -0.83 || 0.92 || ethanol/water(20:80)vol || 0.062777 || -0.098993 || 0.01722 || -0.800521 || 0.786631 ||
 * -0.27 || 0.11 || -0.1 || 0.13 || -1.32 || 1.41 || ethanol/water(30:70)vol || 0.127804 || -0.160996 || 0.049426 || -1.282852 || 1.276293 ||
 * -0.22 || 0.13 || -0.16 || 0.17 || -1.81 || 1.92 || ethanol/water(40:60)vol || 0.148332 || -0.210817 || 0.102648 || -1.781936 || 1.804907 ||
 * -0.14 || 0.12 || -0.25 || 0.25 || -2.28 || 2.42 || ethanol/water(50:50)vol || 0.134901 || -0.285203 || 0.207128 || -2.257463 || 2.342294 ||
 * -0.04 || 0.14 || -0.34 || 0.29 || -2.68 || 2.81 || ethanol/water(60:40)vol || 0.1406465 || -0.3442878 || 0.281256 || -2.6697239 || 2.7916452 ||
 * 0.06 || 0.09 || -0.37 || 0.31 || -2.94 || 3.1 || ethanol/water(70:30)vol || 0.0794107 || -0.3529953 || 0.331463 || -2.9438845 || 3.1344568 ||
 * 0.17 || 0.18 || -0.47 || 0.26 || -3.21 || 3.32 || ethanol/water(80:20)vol || 0.161026 || -0.424495 || 0.314211 || -3.233463 || 3.411426 ||
 * 0.24 || 0.21 || -0.58 || 0.26 || -3.45 || 3.55 || ethanol/water(90:10)vol || 0.193477 || -0.51767 || 0.338521 || -3.480739 || 3.669878 ||
 * 0.33 || 0.37 || -0.45 || -0.7 || -4.9 || 4.15 || ethyl acetate || 0.342809 || -0.369036 || -0.596948 || -4.94523 || 4.318697 ||
 * 0.09 || 0.47 || -0.72 || -3 || -4.84 || 4.51 || ethylbenzene || 0.459437 || -0.701228 || -2.970828 || -4.855741 || 4.56218 ||
 * -0.27 || 0.58 || -0.51 || 0.72 || -2.62 || 2.73 || ethylene glycol || 0.599449 || -0.574819 || 0.631321 || -2.585314 || 2.5908 ||
 * 0.14 || 0.15 || -0.37 || -3.03 || -4.6 || 4.54 || fluorobenzene || 0.140337 || -0.340978 || -2.985464 || -4.618238 || 4.611483 ||
 * -0.17 || 0.07 || 0.31 || 0.59 || -3.15 || 2.43 || formamide || 0.083307 || 0.267828 || 0.536608 || -3.131516 || 2.344771 ||
 * 0.3 || 0.64 || -1.76 || -3.57 || -4.95 || 4.49 || heptane || 0.61919 || -1.685129 || -3.477189 || -4.983132 || 4.640776 ||
 * 0.09 || 0.67 || -1.62 || -3.59 || -4.87 || 4.43 || hexadecane || 0.659893 || -1.59632 || -3.559573 || -4.880281 || 4.47815 ||
 * 0.33 || 0.56 || -1.71 || -3.58 || -4.94 || 4.46 || hexane || 0.53342 || -1.631725 || -3.473425 || -4.980698 || 4.634317 ||
 * -0.19 || 0.3 || -0.31 || -3.21 || -4.65 || 4.59 || iodobenzene || 0.312539 || -0.352762 || -3.271785 || -4.629052 || 4.489752 ||
 * -0.61 || 0.93 || -1.15 || -1.68 || -4.09 || 4.25 || isopropyl myristate || 0.977259 || -1.294959 || -1.870114 || -4.017729 || 3.939081 ||
 * 0.12 || 0.38 || -0.6 || -2.98 || -4.96 || 4.54 || m-xylene || 0.366587 || -0.574078 || -2.941283 || -4.976929 || 4.598299 ||
 * 0.28 || 0.33 || -0.71 || 0.24 || -3.32 || 3.55 || methanol || 0.311909 || -0.649107 || 0.329542 || -3.354582 || 3.690751 ||
 * 0.35 || 0.22 || -0.15 || -1.04 || -4.53 || 3.97 || methyl acetate || 0.194997 || -0.067588 || -0.923983 || -4.571216 || 4.15239 ||
 * 0.34 || 0.31 || -0.82 || -0.62 || -5.1 || 4.43 || methyl tert-butyl ether || 0.279699 || -0.737134 || -0.510026 || -5.139775 || 4.600429 ||
 * 0.25 || 0.78 || -1.98 || -3.52 || -4.29 || 4.53 || methylcyclohexane || 0.762327 || -1.924196 || -3.439318 || -4.323834 || 4.654703 ||
 * 0.28 || 0.13 || -0.44 || 1.18 || -4.73 || 3.86 || N-ethylacetamide || 0.105071 || -0.374993 || 1.269385 || -4.764184 || 4.002187 ||
 * 0.22 || 0.03 || -0.17 || 0.94 || -4.59 || 3.73 || N-ethylformamide || 0.016449 || -0.114321 || 1.004651 || -4.616979 || 3.84333 ||
 * -0.03 || 0.7 || -0.06 || 0.01 || -4.09 || 3.41 || N-formylmorpholine || 0.6981457 || -0.0694897 || 0.0048883 || -4.0885654 || 3.38906 ||
 * 0.06 || 0.33 || 0.26 || 1.56 || -5.04 || 3.98 || N-methyl-2-piperidone || 0.3271873 || 0.2705115 || 1.5746338 || -5.0436057 || 4.0124292 ||
 * 0.09 || 0.21 || -0.17 || 1.31 || -4.59 || 3.83 || N-methylacetamide || 0.19721 || -0.150831 || 1.334533 || -4.600626 || 3.879615 ||
 * 0.11 || 0.41 || -0.29 || 0.54 || -4.09 || 3.47 || N-methylformamide || 0.397604 || -0.260136 || 0.578616 || -4.099689 || 3.529845 ||
 * 0.15 || 0.53 || 0.23 || 0.84 || -4.79 || 3.67 || N-methylpyrrolidinone || 0.519565 || 0.259902 || 0.887089 || -4.813222 || 3.749914 ||
 * -0.2 || 0.54 || 0.04 || -2.33 || -4.61 || 4.31 || nitrobenzene || 0.551741 || -0.003723 || -2.388352 || -4.584066 || 4.213974 ||
 * 0.02 || -0.09 || 0.79 || -1.46 || -4.36 || 3.46 || nitromethane || -0.0933342 || 0.7985957 || -1.4544755 || -4.3676129 || 3.4722537 ||
 * 0.24 || 0.62 || -1.71 || -3.53 || -4.92 || 4.48 || nonane || 0.599859 || -1.65665 || -3.456521 || -4.951366 || 4.605711 ||
 * 0.08 || 0.52 || -0.81 || -2.88 || -4.82 || 4.56 || o-xylene || 0.511059 || -0.793315 || -2.857401 || -4.831364 || 4.601817 ||
 * -0.1 || 0.15 || -0.84 || -0.44 || -4.04 || 4.13 || octadecanol || 0.155261 || -0.863525 || -0.466854 || -4.028093 || 4.075935 ||
 * 0.23 || 0.74 || -1.84 || -3.59 || -4.91 || 4.5 || octane || 0.719433 || -1.785636 || -3.512058 || -4.936095 || 4.620999 ||
 * 0.17 || 0.48 || -0.81 || -2.94 || -4.87 || 4.53 || p-xylene || 0.463092 || -0.772761 || -2.885801 || -4.895116 || 4.617725 ||
 * 0.57 || 0.72 || -1.03 || -1.3 || -4.51 || 3.45 || peanut oil || 0.66965 || -0.89221 || -1.12075 || -4.58151 || 3.74435 ||
 * 0.37 || 0.39 || -1.57 || -3.54 || -5.22 || 4.51 || pentane || 0.35651 || -1.481294 || -3.418818 || -5.261024 || 4.703599 ||
 * 0 || 0.17 || 0.5 || -1.28 || -4.41 || 3.42 || propylene carbonate || 0.1672359 || 0.505135 || -1.2809844 || -4.4080414 || 3.4234811 ||
 * 0 || 0.15 || 0.6 || -0.38 || -4.54 || 3.29 || sulfolane || 0.1468503 || 0.6009136 || -0.3799049 || -4.541574 || 3.2903215 ||
 * 0.22 || 0.36 || -0.38 || -0.24 || -4.93 || 4.45 || THF || 0.345051 || -0.331628 || -0.167145 || -4.96046 || 4.564853 ||
 * 0.13 || 0.43 || -0.64 || -3 || -4.75 || 4.52 || toluene || 0.420597 || -0.614527 || -2.961869 || -4.763681 || 4.588524 ||
 * 0.33 || 0.57 || -0.84 || -1.07 || -4.33 || 3.92 || tributyl phosphate || 0.543888 || -0.760593 || -0.965937 || -4.373768 || 4.087161 ||
 * 0.4 || -0.09 || -0.59 || -1.28 || -1.27 || 3.09 || trifluoroethanol || -0.125647 || -0.50143 || -1.155862 || -1.322677 || 3.290636 ||
 * 0.06 || 0.6 || -1.66 || -3.42 || -5.12 || 4.62 || undecane || 0.5979334 || -1.6471654 || -3.4017847 || -5.1276719 || 4.6493317 ||

[output] [1] 207 [output]

nzv <-nearZeroVar(mydata) # remove zeros and other small variance columns mydata <- mydata[, -nzv] ncol(mydata)

[output] [1] 111 [output]

cor.mat = cor(mydata) highCorr <- findCorrelation(cor.mat, cutoff = .90, verbose = TRUE) mydata <- mydata[, -highCorr] ncol(mydata)
 * 1) find correlation r > 0.90
 * 1) remove the highly correlated columns

[output] [1] 68 [output]

write.csv(mydata, file = "CDKFeatureSelected.csv") code Then the models themselves were created using code like code library("randomForest") #for modeling setwd(".../MakingCZero") mydata = read.csv(file="CDKReady4Ra.csv",head=TRUE,row.names="Title")

mydata.rf <- randomForest(a_0 ~ ., data = mydata,importance = TRUE) print(mydata.rf) [output] Call: randomForest(formula = a_0 ~ ., data = mydata, importance = TRUE) Type of random forest: regression Number of trees: 500 No. of variables tried at each split: 22

Mean of squared residuals: 0.2272567 % Var explained: 91.89 [output] varImpPlot(mydata.rf,main="Random Forest Variable Importance")
 * 1) get variable importance plot

saveRDS(mydata.rf, file = "arfmodel")
 * 1) save the model

test.predict <- predict(mydata.rf,mydata) write.csv(test.predict, file = "RFTestPredicta.csv") code The models were used to predict the coefficients of the training set to examine if any of the solvents were outliers. This could indicate that certain solvent coefficients were in need of updating. The solvents which had the largest errors were (the first 5 being especially suspect): trifluoroethanol, carbon disulfide, formamide, isopropyl myristate, ethylene glycol, DMF, octadecanol, DMSO, chloroform, nitromethane, carbon tetrachloride, N-formylmorpholine, methylcyclohexane, sulfolane, N-methylacetamide.
 * 1) predict using the random forest model
 * 1) write the predictions to the working directory

EPA Solvents
SMILES for the potential new safe solvents were extracted from ChemSpider using the CAS and names. Solvent that already have measured coefficients plus Fatty acids (C16-18 and C18-unsatd., methyl esters), (Glycerides, mixed decanoyl and octanoyl), (Soybean oil, methyl esters), (Tripropylene glycol n-butyl ether), (White mineral oil, petroleum), (Fatty acids, C12-18, methyl esters), and Polypropylene glycol.

CDK descriptors were then calculated which in turn allowed us to predict the solvent coefficients: By calculating the distance to each solvent with known coefficients - sqrt(sum((measured-predicted)/measuredSD)^2) - we identified possible solvent replacements that are predicted to have similar solvation properties: Principle component analysis, in R, was used to help visualize where both current and potential new solvents lie in the chemical space. code setwd(".../MakingCZero") mydata = read.csv(file="EPA-PCA4R.csv",head=TRUE,row.names="Title") pc1 <- prcomp(mydata, scale. = T) x <- pc1$x summary(pc1)
 * List Call || CAS || List Name || solvent SMILES || solvent CSID || e_0p || s_0p || a_0p || b_0p || v_0p ||
 * Green [Circle] || 107-41-5 || 2-Methyl-2,4-pentanediol || C(C(C)(C)O)C(C)O || 13884973 || 0.286 || -0.584 || -0.082 || -3.820 || 3.973 ||
 * Green [Circle] || 107-88-0 || 1,3-Butanediol || C(C(C)O)CO || 13837670 || 0.414 || -0.656 || 0.180 || -3.759 || 3.927 ||
 * Green [Circle] || 57-55-6 || 1,2-Propanediol || C(C(C)O)O || 13835224 || 0.386 || -0.452 || 0.253 || -3.450 || 3.590 ||
 * Green [Circle] || 107-98-2 || 1-Methoxy-2-propanol || C(C(C)O)OC || 7612 || 0.314 || -0.645 || 0.034 || -3.545 || 3.935 ||
 * Green [Circle] || 56-81-5 || Glycerol || C(C(CO)O)O || 733 || 0.405 || -0.433 || 0.069 || -3.422 || 3.476 ||
 * Green [Circle] || 110-98-5 || 1,1'-Dimethyldiethylene glycol || C(C(O)C)OCC(O)C || 7796 || 0.356 || -0.356 || -0.217 || -3.615 || 3.848 ||
 * Green [Circle] || 25265-71-8 || Dipropylene glycol || C(C(O)OC(CC)O)C || 30467 || 0.368 || -0.375 || -0.356 || -4.018 || 3.983 ||
 * Green [Circle] || 88917-22-0 || Propanol 1 (or 2)-2-methoxymethyl ethoxy, acetate || C(C(OC(=O)C)OCCCOC)C || 2299699 || 0.358 || -0.226 || -0.853 || -4.453 || 4.016 ||
 * Green [Circle] || 56539-66-3 || 3-Methyl-3-methoxybutanol || C(C(OC)(C)C)CO || 55953 || 0.311 || -0.636 || -0.195 || -3.886 || 4.001 ||
 * Green [Circle] || 106-65-0 || Dimethyl succinate || C(C(OC)=O)CC(OC)=O || 13848341 || 0.342 || 0.052 || -0.778 || -4.438 || 3.832 ||
 * Green [Circle] || 627-93-0 || Dimethyl adipate || C(C(OC)=O)CCCC(OC)=O || 11824 || 0.347 || -0.153 || -0.938 || -4.456 || 3.881 ||
 * Green [Circle] || 20324-32-7 || 1-(2-Methoxy-1-methylethoxy)-2-propanol || C(C(OCC(C)O)C)OC || 23782 || 0.332 || -0.540 || -0.259 || -3.537 || 3.838 ||
 * Green [Circle] || 108-65-6 || Propylene glycol methyl ether acetate || C(C)(=O)OC(COC)C || 7658 || 0.273 || -0.034 || -0.702 || -4.508 || 4.005 ||
 * Green [Circle] || 5131-66-8 || Propylene glycol n-butyl ether || C(CCC)OCC(C)O || 19942 || 0.424 || -0.629 || -0.297 || -3.632 || 4.062 ||
 * Green [Circle] || 1119-40-0 || Dimethyl glutarate || C(CCCC(=O)OC)(=O)OC || 13605 || 0.342 || -0.099 || -0.855 || -4.422 || 3.849 ||
 * Green [Circle] || 504-63-2 || 1,3-Propanediol || C(CO)CO || 13839553 || 0.434 || -0.626 || 0.231 || -3.732 || 3.601 ||
 * Green [Circle] || 34590-94-8 || Dipropylene glycol methyl ether || C(OC(CO)C)C(OC)C || 23783 || 0.308 || -0.337 || -0.252 || -3.650 || 3.926 ||
 * Green [Circle] || 1569-01-03 || 1-Propoxy-2-propanol || C(OCCC)C(C)O || 14551 || 0.417 || -0.580 || -0.238 || -3.607 || 3.988 ||
 * Green [Circle] || 14035-94-0 || Pentanedioic acid, 2-methyl-, 1,5-dimethyl ester || CC(C(=O)OC)CCC(=O)OC || 105525 || 0.344 || -0.144 || -0.783 || -4.365 || 3.879 ||
 * Green [Circle] || 108-32-7 || Propylene carbonate || CC1COC(O1)=O || 7636 || 0.218 || 0.300 || -1.024 || -4.347 || 3.581 ||
 * Half Green [Circle] || 4437-85-8 || 1,3-Dioxolan-2-one, 4-ethyl- || C(C1OC(=O)OC1)C || 96547 || 0.263 || 0.085 || -0.850 || -4.477 || 3.864 ||
 * Half Green [Circle] || 931-40-8 || 4-Hydroxymethyl-1,3-dioxolan-2-one || C(C1OC(=O)OC1)O || 88417 || 0.282 || 0.082 || -0.587 || -3.530 || 3.529 ||
 * Half Green [Circle] || 97-64-3 || Ethyl lactate || C(OC(C(C)O)=O)C || 13837423 || 0.241 || -0.067 || -0.402 || -3.764 || 3.962 ||
 * Yellow [Triangle] || 5989-27-5 || D-limonene || [C@H]1(C(=C)C)CC=C(CC1)C || 20939 || 0.558 || -1.297 || -3.188 || -4.832 || 4.527 ||
 * Yellow [Triangle] || 29911-28-2 || Dipropylene glycol monobutyl ether || C(C(OCC(C)O)C)OCCCC || 23142 || 0.464 || -0.715 || -0.589 || -3.693 || 3.983 ||
 * Yellow [Triangle] || 112-34-5 || Diethylene glycol mono-N-butyl ether || C(C)CCOCCOCCO || 13839549 || 0.461 || -0.549 || -0.362 || -3.569 || 3.813 ||
 * Yellow [Triangle] || 112-53-8 || 1-Dodecanol || C(CCCCCCCCCCC)O || 7901 || 0.517 || -1.170 || -0.181 || -4.132 || 4.148 ||
 * Yellow [Triangle] || 25498-49-1 || Propanol, [2-(2-methoxymethylethoxy)methylethoxy]- || COCCCOCCCOC(CC)O || 30564 || 0.426 || -0.508 || -0.459 || -3.829 || 3.894 ||
 * Current Solvent || Possible Alternate Solvent || Distance ||
 * 1-octanol || 1-dodecanol || 0.295 ||
 * ethanol || 1,3-butanediol || 0.576 ||
 * 1-propanol || 1,3-butanediol || 0.585 ||
 * acetone || propylene glycol methyl ether acetate || 0.499 ||
 * methyl acetate || propylene glycol methyl ether acetate || 0.502 ||
 * benzonitrile || propylene glycol methyl ether acetate || 0.576 ||
 * 1,4-dioxane || propylene glycol methyl ether acetate || 0.677 ||
 * methanol || 1,2-propanediol || 0.517 ||
 * methanol || 1-(2-methoxy-1-methylethoxy)-2-propanol || 0.574 ||
 * methanol || 1-methoxy-2-propanol || 0.617 ||
 * methanol || glycerol || 0.722 ||
 * 2,2,4-trimethylpentane || D-limonene || 0.619 ||
 * hexane || D-limonene || 0.621 ||

[output] Importance of components: PC1   PC2    PC3     PC4    PC5 Standard deviation    1.7181 1.0367 0.7622 0.56519 0.2702 Proportion of Variance 0.5904 0.2150 0.1162 0.06389 0.0146 Cumulative Proportion 0.5904 0.8053 0.9215 0.98540 1.0000 [output] code

Results
Solvents recommended to be updated with high priority: trifluoroethanol, carbon disulfide, formamide, isopropyl myristate, ethylene glycol.

Possible alternative solvents
 * Current Solvent || Possible Alternate Solvent || Distance ||
 * 1-octanol || 1-dodecanol || 0.295 ||
 * ethanol || 1,3-butanediol || 0.576 ||
 * 1-propanol || 1,3-butanediol || 0.585 ||
 * acetone || propylene glycol methyl ether acetate || 0.499 ||
 * methyl acetate || propylene glycol methyl ether acetate || 0.502 ||
 * benzonitrile || propylene glycol methyl ether acetate || 0.576 ||
 * 1,4-dioxane || propylene glycol methyl ether acetate || 0.677 ||
 * methanol || 1,2-propanediol || 0.517 ||
 * methanol || 1-(2-methoxy-1-methylethoxy)-2-propanol || 0.574 ||
 * methanol || 1-methoxy-2-propanol || 0.617 ||
 * methanol || glycerol || 0.722 ||
 * 2,2,4-trimethylpentane || D-limonene || 0.619 ||
 * hexane || D-limonene || 0.621 ||

Possible new safe solvents in a new part of the chemical space: 4-Hydroxymethyl-1,3-dioxolan-2-one and Ethyl lactate.