AS005

Determining the optimal ethanol/water ratio to maximize solubility in ethanol/water cosolvent systems

 * Researchers: Andrew SID Lang, Enrique F Valderrama Araya, and Bryant G Wilson**

** Introduction **
Predicting the solubility of a compound in the ethanol/water cosolvent system is important for certain organic syntheses and associated recrystallization procedures as well as drug delivery techniques for drugs that have poor aqueous solubilities. Here, building on the work by Abraham and Acree [1], we present models that can be used to find the optimal ethanol/water volume fractions to optimize solubilities.

Abraham solvent coefficients, polynomial fit
Using the data from[ref] for the standard Abraham solvent coefficients [ref2], different polynomial orders where calculated using the statistical software R[ref?] and using Maxima-CAS[ref]. Were observed for the best fit all agree reasonably well to the same polynomial order poly(3), with the exception for the solvent coefficient, "e", which a polynomial of order 5 was need to fit it properly. The statistics variation on the reference data[ref2] in some cases R ^2 is only of 0.75, which could justify the oscillation of the data and the poor fit for poly(3) to the solvent for "e". However more interesting is to be able to fit with a polynomial of 3rd order all the solvent coefficients equations, and as well what are the implications of this. We need to investigate the physics which make it more agree with poly(3) or even poly(5), but not for even order of polynomial fits. The coefficients for the fit for each Abraham solvent coefficient are:

math \begin{align*} c(x) &= c_3x^3 + c_2x^2+c_1x+c_0 \\ &=-2.9544x^3+5.4541x^2-2.2659x\\ &R_c^2=0.998\\ e(x) &=e_3x^3 + e_2x^2+e_1x+e_0 \\ &=1.6224x^3-1.955x^2+0.75363x\\ &R_e^2=0.943\\ s(x) &=s_3x^3 + s_2x^2+s_1x+s_0 \\ &=-1.55x^3+1.216x^2-0.61351x\\ &R_s^2=0.981\\ a(x) &=a_3x^3 + a_2x^2+a_1x+a_0 \\ &=-0.16603x^3-0.081092x^2+0.54252x\\ &R_a^2=0.987\\ b(x) &=b_3x^3 + b_2x^2+b_1x+b_0 \\ &=2.6951x^3-2.2816x^2-3.9776x\\ &R_b^2=0.99975\\ v(x) &=v_3x^3 + v_2x^2+v_1x+v_0 \\ &=-1.6412x^3+0.64977x^2+4.7751x\\ &R_v^2=0.99953\\ \end{align*} math


 * [[image:c3_R011515.png width="300" height="246"]] || [[image:e3_R011515.png width="300" height="246"]] || [[image:s3_R011515.png width="300" height="246"]] ||
 * [[image:a3_R011515.png width="300" height="246"]] || [[image:b3_R011515.png width="300" height="246"]] || [[image:v3_R011515.png width="300" height="246"]] ||

The polynomial fit of order 5
present better fit, but we need to add 2 more parameters per curve, a total of 12 new parameters when the solubility is calculated. The plot of the fit by polynomial of order 5 can be seen it below.

math \begin{align*} c(x)&=-3.2372x^5+7.2247x^4-8.3099x^3+6.9211x^2-2.3763x+9.1259\times 10^{-4}\\ R^2_c&= 0.999647791\\ e(x)&=-3.125x^5+13.657x^4-16.528x^3+7.1208x^2-0.65726x-0.0033129\\ R^2_e&= 0.979587075\\ s(x)&=-16.346x^5+32.864x^4-21.548x^3+4.4432x^2-0.44657x+0.0042168\\ R^2_s&= 0.997913048\\ a(x)&=13.333x^5-31.072x^4+24.431x^3-7.603x^2+1.2394x-8.4615\times 10^{-4}\\ R^2_a&= 0.994812227\\ b(x)&=2.0032x^5-10.16x^4+17.111x^3-9.8035x^2-2.7494x-0.0040892\\ R^2_b&= 0.999831822\\ v(x)&=15.16x^5-30.302x^4+16.517x^3-2.0617x^2+4.5447x+0.0017867\\ R^2_v&= 0.99996795\\ \end{align*} math
 * [[image:c5_R011515flo.png width="300" height="246"]] || [[image:e5_R011515flo.png width="300" height="246"]] || [[image:s5_R011515flo.png width="300" height="246"]] ||
 * [[image:a5_R011515flo.png width="300" height="246"]] || [[image:b5_R011515flo.png width="300" height="246"]] || [[image:v5_R011515flo.png width="300" height="246"]] ||

and the coefficients forcing to pass by zero are: math \begin{align*} c(x)&=-3.1921x^5+7.0956x^4-8.1733x^3+6.8559x^2-2.3628x\\ R^2_c&= 0.999658870812513\\ e(x)&=-3.2886x^5+14.126x^4-17.024x^3+7.3576x^2-0.70635x\\ R^2_e&= 0.990292688177851\\ s(x)&=-16.138x^5+32.268x^4-20.917x^3+4.1418x^2-0.38408x\\ R^2_s&= 0.998972730323543\\ a(x)&=13.292x^5-30.953x^4+24.305x^3-7.5426x^2+1.2268x\\ R^2_a&= 0.998852551326878\\ b(x)&=1.8013x^5-9.5814x^4+16.499x^3-9.5113x^2-2.81x\\ R^2_b&= 0.999956141048395\\ v(x)&=15.248x^5-30.554x^4+16.784x^3-2.1894x^2+4.5712x\\ R^2_v&= 0.999991881854275\\ \end{align*} math

The polynomial of order 4 with the coefficient of order zero floating
math \\ \text{Coefficients of poly(4) was also calculated to obtain the solubility parameters, first without forcing the coefficient of order zero to be zero:} math


 * [[image:c4_R011515flo.png width="300" height="246"]] || [[image:e4_R011515flo.png width="300" height="246"]] || [[image:s4_R011515flo.png width="300" height="246"]] ||
 * [[image:a4_R011515flo.png width="300" height="246"]] || [[image:b4_R011515flo.png width="300" height="246"]] || [[image:v4_R011515flo.png width="300" height="246"]] ||

math \begin{align*} c(x)&=-0.8683x^4-1.242x^3+4.4123x^2-2.0718x-0.002972\\ R^2_c&= 0.998929123\\ e(x)&=5.845x^4-9.7051x^3+4.6989x^2-0.3633x-0.0070629\\ R^2_e&= 0.978199158\\ s(x)&=-8.0012x^4+14.141x^3-8.2251x^2+1.0911x-0.015399\\ R^2_s&= 0.990957548\\ a(x)&=2.2611x^4-4.6799x^3+2.7303x^2-0.014856x+0.015154\\ R^2_a&= 0.958191172\\ b(x)&=-5.1515x^4+12.737x^3-8.251x^2-2.9378x-0.0016853\\ R^2_b&= 0.99982576\\ v(x)&=7.5991x^4-16.583x^3+9.6875x^2+3.1186x+0.019979\\ R^2_v&= 0.99982558\\ \end{align*} math

The polynomial of order 4 with the coefficient of order zero force to be zero
math \text{Coefficents of poly(4) was also calculated to obtain the solubility parameters, forcing the coefficient of order zero to be zero:} math


 * [[image:c4_R011515FZ.png width="300" height="246"]] || [[image:e4_R011515FZ.png width="300" height="246"]] || [[image:s4_R011515FZ.png width="300" height="246"]] ||
 * [[image:a4_R011515FZ.png width="300" height="246"]] || [[image:b4_R011515FZ.png width="300" height="246"]] || [[image:v4_R011515FZ.png width="300" height="246"]] ||

math \begin{align*} c(x)&=-0.81158x^4-1.3744x^3+4.5172x^2-2.1039x\\ R^2_c&= 0.998944196\\ e(x)&=5.9798x^4-10.02x^3+4.9483x^2-0.43968x\\ R^2_e&= 0.989518172\\ s(x)&=-7.7073x^4+13.455x^3-7.6815x^2+0.92454x\\ R^2_s&= 0.995457566\\ a(x)&=1.9719x^4-4.0051x^3+2.1953x^2+0.14902x\\ R^2_a&= 0.990306339\\ b(x)&=-5.1194x^4+12.662x^3-8.1915x^2-2.956x\\ R^2_b&= 0.999954751\\ v(x)&=7.2178x^4-15.693x^3+8.9822x^2+3.3347x\\ R^2_v&= 0.999901916\\ \end{align*} math

Comparison for different fits of the coefficients forcing to pass by zero

 * [[image:Polfit_ces.png width="400" height="627"]] || [[image:Polfit_abv.png width="400" height="627"]] ||

Interactive section:
{ c(x)=0.004 -2.276x+5.457x^2-2.955x^3 [|click here to plot and play with c(x)] e(x)=-0.047 +1.073x-2.536x^2+1.935x^3 s(x)=0.042 -0.905x+1.757x^2-1.851x^3 a(x)=0.001 +0.558x-0.147x^2-0.113x^3 b(x)=0.037 -4.244x-1.791x^2+2.428x^3 v(x)=-0.037 +5.039x+0.135x^2-1.346x^3 }

**Rcode:**
This code will generate the coefficients of fits on a csv file, which will be use later on opensource software [|maxima] code setwd("C:/R") mydata = read.csv(file="R011515.csv",head=TRUE) fitc <- lm(c ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fite <- lm(e ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fits <- lm(s ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fita <- lm(a ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fitb <- lm(b ~ 0+ poly(X, 3, raw=TRUE),data=mydata) fitv <- lm(v ~ 0+ poly(X, 3, raw=TRUE),data=mydata) summary(fitc) str(summary(fitc)) write.table(fitc[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitc)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitc)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 * 1) write.table(summary(fitc)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")

write.table(fite[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fite)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fite)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 * 1) write.table(summary(fite)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")

write.table(fits[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fits)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fits)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 * 1) write.table(summary(fits)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")

write.table(fita[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fita)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fita)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 * 1) write.table(summary(fita)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")

write.table(fitb[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitb)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitb)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 * 1) write.table(summary(fitb)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")

write.table(fitv[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitv)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") write.table(summary(fitv)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double") code
 * 1) write.table(summary(fitv)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")

**Maxima code:**
Need it files: and the coefficients calculated by R :

code /* [wxMaxima batch file version 1] [ DO NOT EDIT BY HAND! ]*/ /* [ Created with wxMaxima version 13.04.2 ] */

/* [wxMaxima: input  start ] */ /* coefficients file name */ cfn: R011515; /*polynomial order */ PO: 3;

/* 2 column csv with first the 4 coefficients then R^2 and R^2adj for each */ m: read_matrix("C:/R/fit_poly3_011515_.csv", 'csv); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ for i:0 thru PO step 1 do c[i]:m[i+1,2]$ for i:0 thru PO step 1 do ldisplay(c[i]); for i:0 thru PO step 1 do e[i]:m[i+1+(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(e[i]); for i:0 thru PO step 1 do s[i]:m[i+1+2*(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(s[i]); for i:0 thru PO step 1 do a[i]:m[i+1+3*(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(a[i]); for i:0 thru PO step 1 do b[i]:m[i+1+4*(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(b[i]); for i:0 thru PO step 1 do v[i]:m[i+1+5*(PO+3),2]$ for i:0 thru PO step 1 do ldisplay(v[i]); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ c(x):=sum (c[i]*x^i, i, 0, PO); e(x):=sum (e[i]*x^i, i, 0, PO); s(x):=sum (s[i]*x^i, i, 0, PO); a(x):=sum (a[i]*x^i, i, 0, PO); b(x):=sum (b[i]*x^i, i, 0, PO); v(x):=sum (v[i]*x^i, i, 0, PO); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ mraw: read_matrix(concat("C:/R/",string(cfn),".csv"), 'csv); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ xc: [ [mraw[2,1],mraw[2,2]], [mraw[3,1],mraw[3,2]], [mraw[4,1],mraw[4,2]], [mraw[5,1],mraw[5,2]], [mraw[6,1],mraw[6,2]], [mraw[7,1],mraw[7,2]], [mraw[8,1],mraw[8,2]], [mraw[9,1],mraw[9,2]], [mraw[10,1],mraw[10,2]], [mraw[11,1],mraw[11,2]], [mraw[12,1],mraw[12,2]] ]; /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ plot2d(discrete, xc], c(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5, [legend, "c(x): exp. data", "c(x): poly(3) fit"], /*[label,["R^2=0.998038422",0.66,-0.2] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/c",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ xe: [ [mraw[2,1],mraw[2,3]], [mraw[3,1],mraw[3,3]], [mraw[4,1],mraw[4,3]], [mraw[5,1],mraw[5,3]], [mraw[6,1],mraw[6,3]], [mraw[7,1],mraw[7,3]], [mraw[8,1],mraw[8,3]], [mraw[9,1],mraw[9,3]], [mraw[10,1],mraw[10,3]], [mraw[11,1],mraw[11,3]], [mraw[12,1],mraw[12,3]] ]; /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ plot2d(discrete, xe], e(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5, [legend, "e(x): exp. data", "e(x): poly(3) fit"], /*[label,["R^2=0.944630971",0.66,0.4] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/e",string(PO),"_",string(cfn),".png")]); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ xs: [ [mraw[2,1],mraw[2,4]], [mraw[3,1],mraw[3,4]], [mraw[4,1],mraw[4,4]], [mraw[5,1],mraw[5,4]], [mraw[6,1],mraw[6,4]], [mraw[7,1],mraw[7,4]], [mraw[8,1],mraw[8,4]], [mraw[9,1],mraw[9,4]], [mraw[10,1],mraw[10,4]], [mraw[11,1],mraw[11,4]], [mraw[12,1],mraw[12,4]] ]; /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ plot2d(discrete, xs], s(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5, [legend, "s(x): exp. data", "s(x): poly(3) fit"], /*[label,["R^2=0.981372471",0.66,-1] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file,concat("C:/R/s",string(PO),"_",string(cfn),".png")]); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ xa: [ [mraw[2,1],mraw[2,5]], [mraw[3,1],mraw[3,5]], [mraw[4,1],mraw[4,5]], [mraw[5,1],mraw[5,5]], [mraw[6,1],mraw[6,5]], [mraw[7,1],mraw[7,5]], [mraw[8,1],mraw[8,5]], [mraw[9,1],mraw[9,5]], [mraw[10,1],mraw[10,5]], [mraw[11,1],mraw[11,5]], [mraw[12,1],mraw[12,5]] ]; /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ plot2d(discrete, xa], a(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5, [legend, "a(x): exp. data", "a(x): poly(3) fit"], /*[label,["R^2=0.986234862",0.66,0.05] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/a",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ xb: [ [mraw[2,1],mraw[2,6]], [mraw[3,1],mraw[3,6]], [mraw[4,1],mraw[4,6]], [mraw[5,1],mraw[5,6]], [mraw[6,1],mraw[6,6]], [mraw[7,1],mraw[7,6]], [mraw[8,1],mraw[8,6]], [mraw[9,1],mraw[9,6]], [mraw[10,1],mraw[10,6]], [mraw[11,1],mraw[11,6]], [mraw[12,1],mraw[12,6]] ]; /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ plot2d(discrete, xb], b(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5, [legend, "b(x): exp. data", "b(x): poly(3) fit"], /*[label,["R^2=0.999726899",0.66,-0.7] ],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/b",string(PO),"_",string(cfn),".png")]); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ xv: [ [mraw[2,1],mraw[2,7]], [mraw[3,1],mraw[3,7]], [mraw[4,1],mraw[4,7]], [mraw[5,1],mraw[5,7]], [mraw[6,1],mraw[6,7]], [mraw[7,1],mraw[7,7]], [mraw[8,1],mraw[8,7]], [mraw[9,1],mraw[9,7]], [mraw[10,1],mraw[10,7]], [mraw[11,1],mraw[11,7]], [mraw[12,1],mraw[12,7]] ]; /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ plot2d(discrete, xv], v(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5, [legend, "v(x): exp. data", "v(x): poly(3) fit "], /*[label,["R^2=0.999529421",0.66,0.5]],*/ [xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"], [gnuplot_term, png],[gnuplot_out_file, concat("C:/R/v",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]); /* [wxMaxima: input  end   ] */

/* [wxMaxima: input  start ] */ c(x); e(x); s(x); a(x); b(x); v(x); /* [wxMaxima: input  end   ] */

/* Maxima can't load/batch files which end with a comment! */ "Created with wxMaxima"$

code

Results

 * Maximum Solubility**

Using the equation log_10 P = c+e**E**+s**S**+a**A**+b**B**+v**V**, the maximum solubility of compounds with known Abraham descriptors in any water-ethanol system can be predicted. The following code uses Python 2.7.6 to calculate maximum and minimum solubilities for a list of compounds;it also produces plots for each compound. The program uses sympy to find where the derivative of //equation// equals 0 and then substitutes those values into the original equation to find the max/min. It requires non-standard libraries including //joblib//, //numpy//, //sympy//, and //matplotlib//. The ``//x**2``//is required for// sympy// to properly parse exponents. The program uses the file with ";" as delimiters. [[file:[[file:newfile3.ods|Compounds]] is an updated version of ONSAD001 which includes the max/min solubilities for each compound using the 3rd order polynomial; it does not include duplicates or any compounds with unknown Abraham descriptors.

The following calculates the max/min of the compounds in ONSAD001 using 3rd, 4th, or 5th order polyfit. It also creates a graph of each compound and it's solubility. At the end it will create a historgrams showing the frequency of the maximums and then the minimums.

code format="python" from __future__ import print_function import joblib from joblib import Parallel, delayed import matplotlib.pyplot as plt import matplotlib.mlab as mlab import numpy as np from sympy import * import os from math import exp

polyfits=['5'] #for now it can only run one item at a time (e.g. '3'). If I can figure out why the program freezes my computer when there is more than one item, then it could run multiple polyfits at in sequence. for polyfit in polyfits: err=False Hash={} location={} #This is to change all the output files for appropriate fit. if polyfit=='3': cx='(-2.954445751*x**3+5.454132888*x**2-2.265884303*x)' ex='(1.622408565*x**3-1.954950801*x**2+0.753630794*x)' sx='(-1.550023723*x**3+1.216033377*x**2-0.613513212*x)' ax='(-0.166034064*x**3-0.081092064*x**2+0.542524049*x)' bx='(2.695058138*x**3-2.281640778*x**2-3.977643487*x)' vx='(-1.641156287*x**3+0.649768963*x**2+4.775062366*x)' print('doing 3') if polyfit=='4': cx='(-0.81158*x**4-1.3744*x**3+4.5172*x**2-2.1039*x)' ex='(5.9798*x**4-10.02*x**3+4.9483*x**2-0.43968*x)' sx='(-7.7073*x**4+13.455*x**3-7.6815*x**2+0.92454*x)' ax='(1.9719*x**4-4.0051*x**3+2.1953*x**2+0.14902*x)' bx='(-5.1194*x**4+12.662*x**3-8.1915*x**2-2.956*x)' vx='(7.2178*x**4-15.693*x**3+8.9822*x**2+3.3347*x)' print('doing 4') if polyfit=='5': cx='(-3.1921*x**5+7.0956*x**4-8.1733*x**3+6.8559*x**2-2.3628*x)' ex='(-3.2886*x**5+14.126*x**4-17.024*x**3+7.3576*x**2-0.70635*x)' sx='(-16.138*x**5+32.268*x**4-20.917*x**3+4.1418*x**2-0.38408*x)' ax='(13.292*x**5-30.953*x**4+24.305*x**3-7.5426*x**2+1.2268*x)' bx='(1.8013*x**5-9.5814*x**4+16.499*x**3-9.5113*x**2-2.81*x)' vx='(15.248*x**5-30.554*x**4+16.784*x**3-2.1894*x**2+4.5712*x)' print('doing 5') def replace(text,Echo,Sierra,Austin,Bravo,Victor): Replaces a Hash of characters dic={'cx':cx,'ex':ex,'sx':sx,'ax':ax,'bx':bx,'vx':vx,'Echo':Echo,'Sierra':Sierra,'Bravo':Bravo,'Austin':Austin,'Victor':Victor} for i, j in dic.iteritems: text = text.replace(i, j)           return text with open('./ONSAD001.csv', 'r') as f:       newfile=open('./newfile'+polyfit,'w') for descriptors in f:           descriptors=descriptors.strip('\n').strip('\'') descriptors=descriptors.split(';') if descriptors[0]=='key': print('True') print(descriptors[0]+';'+descriptors[1]+';'+descriptors[2]+';'+descriptors[3]+';'+descriptors[4]+';'+descriptors[5]+';'+descriptors[6]+                         ';'+descriptors[7]+';'+descriptors[8]+';'+descriptors[9]+';'+descriptors[10]+';'+descriptors[11]+';'+descriptors[12]+                          ';'+descriptors[13]+';'+descriptors[14]+';'+' Max Solubilty [0,1]; Min Solubility[0,1]; root[0]; root[1]; root[2]; root[3]',file=newfile) else: if descriptors[7]=='-123' or descriptors[8]=='-123' or descriptors[9]=='-123' or descriptors[10]=='-123' or descriptors[11]=='-123': pass else: if descriptors[4] in Hash: #descriptors[4] is the CSID value=Hash.get(descriptors[4]) #get the date from the CSID if value[14]<descriptors[14]: #descriptors[14] is the current compound. If the old date is less then just skipp it                               Hash.update({descriptors[4]:descriptors}) else: pass else: Hash.update({descriptors[4]:descriptors}) newfile.close newfile=open('./newfile'+polyfit,'a')

def run(csid): filterroots=[] OoR=[] newfile=open('./newfile'+polyfit,'a') predictions=[] values={} #print(descriptors) line=Hash.get(csid) #print(csid) #print(line) compound=line[1] Echo=line[7] #9-13 Sierra=line[8] Austin=line[9] Bravo=line[10] Victor=line[11] equation='cx+ex*Echo+sx*Sierra+ax*Austin+bx*Bravo+vx*Victor' equation=replace(equation,Echo,Sierra,Austin,Bravo,Victor) x=Symbol('x') Alpha=0 Omega=1 y=sympify(equation) y=simplify(y) #print(equation) #print(y) yprime=y.diff(x) #print(yprime) roots=solve(yprime,x) OoR.append(roots[0]) OoR.append(roots[1]) for item in roots: guess=complex(item).real if guess <= 1 and guess >=0: filterroots.append(complex(item).real) try: values.update({filterroots[0]:y.subs(x,filterroots[0])}) values.update({filterroots[1]:y.subs(x,filterroots[1])}) values.update({filterroots[2]:y.subs(x,filterroots[2])}) values.update({filterroots[3]:y.subs(x,filterroots[3])}) except: pass values.update({Alpha:y.subs(x,Alpha)}) values.update({Omega:y.subs(x,Omega)}) maximum=max(values.items, key=lambda x: x[1])[0] minimum=min(values.items, key=lambda x: x[1])[0] print(line[0],';',line[1],';',line[2],';',line[3],';',line[4],';',line[5],';',line[6],';',line[7],';',line[8],';',line[9],';',line[10],';',line[11],';',line[12],             ';',line[13],';',line[14],';',maximum,';',minimum,';',OoR[0],';',OoR[1],';','4',';','5', file=newfile) newfile.close

out='./Plots/compounds'+polyfit+'/'+str(csid)+'.png' for item in np.arange(0,1,.01): predictions.append(y.subs(x,item)) xp=np.linspace(0,1,100) _=plt.plot(xp,predictions) plt.ylim(float(min(predictions)*1.1),float(max(predictions)*1.1)) plt.ylabel('Log P') plt.xlabel('Vol') plt.title(compound) plt.gca.tick_params(axis='x', pad=15) try: plt.savefig(out) except OSError ,e: os.mkdirs('./Plots') plt.savfig(out) print('created Plots dir') plt.close try: parallel=Parallel(n_jobs=-1,verbose=1000)(delayed(run)(csid) for csid in Hash) except: pass do=['MAX','MIN'] # this will run for 'MAX' and then 'MIN'. for item in do: numbers=[] with open('./newfile'+polyfit,'r') as f:               print(item) try: for line in f:                           line=line.split(';') try: if item =='MAX': float(line[15])<2 numbers.append(float(line[15].strip('\n'))) if item =='MIN': float(line[16])<2 numbers.append(float(line[16].strip('\n'))) except: pass except: pass out='./'+item+'histogram'+polyfit+'.png' plt.xlabel('Vol') print('passed') plt.hist(numbers,bins=25) plt.gca.tick_params(axis='x', pad=15) plt.savefig(out) plt.show

code

The maximum solubilities were calculated for Hydrocotisone (0.761910624963585), Benzamide (0.805242993382671), Valdecoxib (0.906636072114208), Antipyrine (0.655966697948519), Aminopyrine (0.847692071360370) using the cubic polynomials. A complete list of the compounds can be found in the previously mentioned Compounds file.




 * [[image:LogSvaldecoxib.png width="300" height="246"]] || [[image:LogSHydrocortison.png width="300" height="246"]] || [[image:LogSbenzamide.png width="300" height="246"]] ||
 * [[image:LogSantipyrine.png width="300" height="246"]] || [[image:LogSbetulin.png width="300" height="246"]] || [[image:LogSriboflavin.png width="300" height="246"]] ||

Riboflavin is interesting because its maximum occurs at .2.

2212 compounds were used in making the following histogram. The histogram is not very surprising, but it quickly shows that most of the compounds have maximum solubilities near 1. There were approximately 150 compounds that have maximum solubilities of 0. The histogram on the left includes all compounds in 100 bins, and the histogram on the right only includes compounds in 25 bins.




 * 4th Order Polynomial Fit**

The 5th order polynomials fit produces similar results, but the histogram creating by rerunning the histogram is somewhat interesting because the slight curve in the 3rd order is gone and more compounds have maximums near 1. The histogram can be reproduced by running the parallelized.py first, and then histogram.py.
 * 5th Order Polynomial Fit**

The following links contain all the plots of the 3rd, 4th, and 5th order polynomials. On a technical note, they are .tar.gz; so they are compressed and archived. Some older versions of 7-Zip or WinRAR require that you extract them twice - first to the .tar file and then to the unarchived folder.







Differences between poly(3), poly(4), and poly(5) fits for the floating order zero coefficient and when we force the order zero coefficient to be zero.
The following plots shows the histograms of where in the volume ratio scale we have maximum solubility for ~ 2000 compounds. This plots was obtained by the python code compiled on a windows machine using [|Spyder] for interface. First row is for floating, second row is for force zero. The columns increase with the polynomial order.


 * [[image:MAXhistogram3_bin25_FLO.png width="300" height="246"]] || [[image:MAXhistogram4_bin25_FLO.png width="300" height="246"]] || [[image:MAXhistogram5_bin25_FLO.png width="300" height="246"]] ||
 * [[image:MAXhistogram3_bin25_FZ.png width="300" height="246"]] || [[image:MAXhistogram4_bin25_FZ.png width="300" height="246"]] || [[image:MAXhistogram5_bin25_FZ.png width="300" height="246"]] ||

Histograms of maximums and minimums for drug solubility versus polynomial fit

 * [[image:Histo_poly_min.png width="400" height="647"]] || [[image:Histo_poly_max.png width="400" height="647"]] ||

Comparison of Drug solubility maximums position versus polynomial fit

 * [[image:LogSantipyrine.png width="400"]] || [[image:LogSbenzamide.png width="400"]] ||

Can be observed on the following graph, the difference on prediction for maximum solubility as different polynomial order of approximation of the experimental data was used on the model. This difference on prediction can be as high of 15% for caffeine and less than 5% for 1,4 Dioxane, Rifapentine, Benzamide and Adenin. Negligible difference for Ammonia, Betulin, Hydrocortisone and Cimetidine.



The Jouyban-Acree model can be used to approximate the solubility of a compound in the water-ethanol system at various temperatures: math \begin{align*} log S_{M,T} &= (1-x) log S_{W,T} + x log S_{E,T} + \frac{x(1-x)}{T} (558.45 + 358.6 E + 22.01 S - 352.97 A + 130.48 B - 297.1 V)\\ &+ \frac{x(1-x)(1-2x)}{T} (45.67 - 165.77 E - 321.55 S + 479.48 A - 409.51 B + 827.63 V)\\ &+ \frac{x(1-x)(1-2x)^2}{T} (-493.81 - 341.32 E + 866.22 S - 36.17 A + 173.41 B - 555.48 V)\\ \end{align*} math where x is the volume fraction of ethanol and T is the temperature in degrees Kelvin. Using: math \begin{align*} T&=298.15 K\\ log P_M &= log S_M - log S_W\\ log P_E &= log S_E - log S_W\\ \end{align*} math
 * Comparison with the Jouyban-Acree Model **

We obtain... math \begin{align*} log P_M &= x log P_E + x(1-x) (1.873 + 1.203 E + 0.074 S - 1.184 A + 0.438 B - 0.996 V)\\ &+ x(1-x)(1-2x) (0.153 - 0.556 E - 1.078 S + 1.608 A - 1.374 B + 2.776 V)\\ &+ x(1-x)(1-2x)^2 (-1.656 - 1.145 E + 2.905 S - 0.121 A + 0.582 B - 1.863 V)\\ \end{align*} math

Expanding in terms of x gives: math \begin{align*} c(x)&=6.625 x^4 -12.944 x^3+5.949x^2+0.592x\\ e(x)&=4.579x^4-10.270x^3+6.189x^2-0.027x\\ s(x)&=-11.621x^4+21.086x^3-11.365x^2+0.866x\\ a(x)&=0.485x^4+2.246x^3-3.034x^2+0.629x\\ b(x)&=-2.326x^4+1.906x^3+0.775x^2-3.950x\\ v(x)&=7.452x^4-9.353x^3+1.984x^2+3.773x\\ \end{align*} math

Comparison with our model, where in this case we have let c= 0: math \begin{align*} c(x)&=-0.812x^4-1.374x^3+4.517x^2-2.104x\\ e(x)&=5.980x^4-10.020x^3+4.948x^2-0.440x\\ s(x)&=-7.707x^4+13.455x^3-7.682x^2+0.925x\\ a(x)&=1.972x^4-4.005x^3+2.195x^2+0.149x\\ b(x)&=-5.119x^4+12.662x^3-8.192x^2-2.956x\\ v(x)&=7.218x^4-15.693x^3+8.982x^2+3.335x\\ \end{align*} math

We see that the Jouyban-Acree model restricted to T=298.15 K ....


 * [[image:cx_compp_.png width="300" height="246"]] || [[image:ex_compp.png width="300" height="246"]] || [[image:sx_compp.png width="300" height="246"]] ||
 * [[image:ax_compp.png width="300" height="246"]] || [[image:bx_compp.png width="300" height="246"]] || [[image:vx_compp.png width="300" height="246"]] ||

The coefficients are function of the temperature, but they are not very sensitive, assuming the coefficients for logP_E stay the same as we increase the temperature, which I doubt that is the case. :P


 * T=198 K || T=298 K || T=398 K ||
 * [[image:c_Jouyban-vs-DATA_R031015_198.png width="300" height="246"]] || [[image:c_Jouyban-vs-DATA_R031015_298.png width="300" height="246"]] || [[image:c_Jouyban-vs-DATA_R031015_398.png width="300" height="246"]] ||
 * [[image:e_Jouyban-vs-DATA_R031015_198.png width="300" height="246"]] || [[image:e_Jouyban-vs-DATA_R031015_298.png width="300" height="246"]] || [[image:e_Jouyban-vs-DATA_R031015_398.png width="300" height="246"]] ||
 * [[image:s_Jouyban-vs-DATA_R031015_198.png width="300" height="246"]] || [[image:s_Jouyban-vs-DATA_R031015_298.png width="300" height="246"]] || [[image:s_Jouyban-vs-DATA_R031015_398.png width="300" height="246"]] ||
 * [[image:a_Jouyban-vs-DATA_R031015_198.png width="300" height="246"]] || [[image:a_Jouyban-vs-DATA_R031015_298.png width="300" height="246"]] || [[image:a_Jouyban-vs-DATA_R031015_398.png width="300" height="246"]] ||
 * [[image:b_Jouyban-vs-DATA_R031015_198.png width="300" height="246"]] || [[image:b_Jouyban-vs-DATA_R031015_298.png width="300" height="246"]] || [[image:b_Jouyban-vs-DATA_R031015_398.png width="300" height="246"]] ||
 * [[image:v_Jouyban-vs-DATA_R031015_198.png width="300" height="246"]] || [[image:v_Jouyban-vs-DATA_R031015_298.png width="300" height="246"]] || [[image:v_Jouyban-vs-DATA_R031015_398.png width="300" height="246"]] ||

**References**
1. Abraham MH and Acree Jr. WE: Partition coefficients and solubilities of compounds in the water-ethanol system. J Solution Chem. doi: 10.1007/s10953-011-9719-x