Determining the optimal ethanol/water ratio to maximize solubility in ethanol/water cosolvent systems

Researchers: Andrew SID Lang, Enrique F Valderrama Araya, and Bryant G Wilson

Introduction

Predicting the solubility of a compound in the ethanol/water cosolvent system is important for certain organic syntheses and associated recrystallization procedures as well as drug delivery techniques for drugs that have poor aqueous solubilities. Here, building on the work by Abraham and Acree [1], we present models that can be used to find the optimal ethanol/water volume fractions to optimize solubilities.

Method

Abraham solvent coefficients, polynomial fit

Using the data from[ref] for the standard Abraham solvent coefficients [ref2], different polynomial orders where calculated using the statistical software R[ref?] and using Maxima-CAS[ref]. Were observed for the best fit all agree reasonably well to the same polynomial order poly(3), with the exception for the solvent coefficient, "e", which a polynomial of order 5 was need to fit it properly. The statistics variation on the reference data[ref2] in some cases R^2 is only of 0.75, which could justify the oscillation of the data and the poor fit for poly(3) to the solvent for "e". However more interesting is to be able to fit with a polynomial of 3rd order all the solvent coefficients equations, and as well what are the implications of this. We need to investigate the physics which make it more agree with poly(3) or even poly(5), but not for even order of polynomial fits.
The coefficients for the fit for each Abraham solvent coefficient are:



c3_R011515.png
e3_R011515.png
s3_R011515.png
a3_R011515.png
b3_R011515.png
v3_R011515.png

The polynomial fit of order 5

present better fit, but we need to add 2 more parameters per curve, a total of 12 new parameters when the solubility is calculated. The plot of the fit by polynomial of order 5 can be seen it below.

c5_R011515flo.png
e5_R011515flo.png
s5_R011515flo.png
a5_R011515flo.png
b5_R011515flo.png
v5_R011515flo.png


and the coefficients forcing to pass by zero are:


The polynomial of order 4 with the coefficient of order zero floating




c4_R011515flo.png
e4_R011515flo.png
s4_R011515flo.png
a4_R011515flo.png
b4_R011515flo.png
v4_R011515flo.png



The polynomial of order 4 with the coefficient of order zero force to be zero





c4_R011515FZ.png
e4_R011515FZ.png
s4_R011515FZ.png
a4_R011515FZ.png
b4_R011515FZ.png
v4_R011515FZ.png



Comparison for different fits of the coefficients forcing to pass by zero


Polfit_ces.png
Polfit_abv.png

Interactive section:

{
c(x)=0.004 -2.276x+5.457x^2-2.955x^3 click here to plot and play with c(x)
e(x)=-0.047 +1.073x-2.536x^2+1.935x^3
s(x)=0.042 -0.905x+1.757x^2-1.851x^3
a(x)=0.001 +0.558x-0.147x^2-0.113x^3
b(x)=0.037 -4.244x-1.791x^2+2.428x^3
v(x)=-0.037 +5.039x+0.135x^2-1.346x^3
}

Rcode:

This code will generate the coefficients of fits on a csv file R-fits , which will be use later on opensource software maxima
setwd("C:/R")
mydata = read.csv(file="R011515.csv",head=TRUE)
fitc <- lm(c ~ 0+ poly(X, 3, raw=TRUE),data=mydata)
fite <- lm(e ~ 0+ poly(X, 3, raw=TRUE),data=mydata)
fits <- lm(s ~ 0+ poly(X, 3, raw=TRUE),data=mydata)
fita <- lm(a ~ 0+ poly(X, 3, raw=TRUE),data=mydata)
fitb <- lm(b ~ 0+ poly(X, 3, raw=TRUE),data=mydata)
fitv <- lm(v ~ 0+ poly(X, 3, raw=TRUE),data=mydata)
summary(fitc)
str(summary(fitc))
# write.table(summary(fitc)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(fitc[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fitc)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fitc)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 
# write.table(summary(fite)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(fite[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fite)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fite)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 
# write.table(summary(fits)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(fits[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fits)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fits)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 
# write.table(summary(fita)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(fita[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fita)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fita)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 
# write.table(summary(fitb)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(fitb[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fitb)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fitb)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
 
# write.table(summary(fitv)$call, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(fitv[1], file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fitv)$r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")
write.table(summary(fitv)$adj.r.squared, file = "fit_poly3_011515.csv", sep = ",", append = T, col.names = T,qmethod = "double")

Maxima code:

Need it files: Rawdata and the coefficients calculated by R :coefficients of fits

/* [wxMaxima batch file version 1] [ DO NOT EDIT BY HAND! ]*/
/* [ Created with wxMaxima version 13.04.2 ] */
 
/* [wxMaxima: input   start ] */
/* coefficients file name */
cfn: R011515;
/*polynomial order */
PO: 3;
 
/* 2 column csv with first the 4 coefficients then R^2 and R^2adj for each */
m: read_matrix("C:/R/fit_poly3_011515_.csv", 'csv);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
for i:0 thru PO step 1 do c[i]:m[i+1,2]$
for i:0 thru PO step 1 do ldisplay(c[i]);
for i:0 thru PO step 1 do e[i]:m[i+1+(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(e[i]);
for i:0 thru PO step 1 do s[i]:m[i+1+2*(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(s[i]);
for i:0 thru PO step 1 do a[i]:m[i+1+3*(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(a[i]);
for i:0 thru PO step 1 do b[i]:m[i+1+4*(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(b[i]);
for i:0 thru PO step 1 do v[i]:m[i+1+5*(PO+3),2]$
for i:0 thru PO step 1 do ldisplay(v[i]);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
c(x):=sum (c[i]*x^i, i, 0, PO);
e(x):=sum (e[i]*x^i, i, 0, PO);
s(x):=sum (s[i]*x^i, i, 0, PO);
a(x):=sum (a[i]*x^i, i, 0, PO);
b(x):=sum (b[i]*x^i, i, 0, PO);
v(x):=sum (v[i]*x^i, i, 0, PO);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
mraw: read_matrix(concat("C:/R/",string(cfn),".csv"), 'csv);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
xc: [
[mraw[2,1],mraw[2,2]],
[mraw[3,1],mraw[3,2]],
[mraw[4,1],mraw[4,2]],
[mraw[5,1],mraw[5,2]],
[mraw[6,1],mraw[6,2]],
[mraw[7,1],mraw[7,2]],
[mraw[8,1],mraw[8,2]],
[mraw[9,1],mraw[9,2]],
[mraw[10,1],mraw[10,2]],
[mraw[11,1],mraw[11,2]],
[mraw[12,1],mraw[12,2]]
];
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
plot2d([[discrete, xc], c(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "c(x): exp. data", "c(x): poly(3) fit"],
/*[label,["R^2=0.998038422",0.66,-0.2] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/c",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
xe: [
[mraw[2,1],mraw[2,3]],
[mraw[3,1],mraw[3,3]],
[mraw[4,1],mraw[4,3]],
[mraw[5,1],mraw[5,3]],
[mraw[6,1],mraw[6,3]],
[mraw[7,1],mraw[7,3]],
[mraw[8,1],mraw[8,3]],
[mraw[9,1],mraw[9,3]],
[mraw[10,1],mraw[10,3]],
[mraw[11,1],mraw[11,3]],
[mraw[12,1],mraw[12,3]]
];
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
plot2d([[discrete, xe], e(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "e(x): exp. data", "e(x): poly(3) fit"],
/*[label,["R^2=0.944630971",0.66,0.4] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/e",string(PO),"_",string(cfn),".png")]);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
xs: [
[mraw[2,1],mraw[2,4]],
[mraw[3,1],mraw[3,4]],
[mraw[4,1],mraw[4,4]],
[mraw[5,1],mraw[5,4]],
[mraw[6,1],mraw[6,4]],
[mraw[7,1],mraw[7,4]],
[mraw[8,1],mraw[8,4]],
[mraw[9,1],mraw[9,4]],
[mraw[10,1],mraw[10,4]],
[mraw[11,1],mraw[11,4]],
[mraw[12,1],mraw[12,4]]
];
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
plot2d([[discrete, xs], s(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "s(x): exp. data", "s(x): poly(3) fit"],
/*[label,["R^2=0.981372471",0.66,-1] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file,concat("C:/R/s",string(PO),"_",string(cfn),".png")]);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
xa: [
[mraw[2,1],mraw[2,5]],
[mraw[3,1],mraw[3,5]],
[mraw[4,1],mraw[4,5]],
[mraw[5,1],mraw[5,5]],
[mraw[6,1],mraw[6,5]],
[mraw[7,1],mraw[7,5]],
[mraw[8,1],mraw[8,5]],
[mraw[9,1],mraw[9,5]],
[mraw[10,1],mraw[10,5]],
[mraw[11,1],mraw[11,5]],
[mraw[12,1],mraw[12,5]]
];
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
plot2d([[discrete, xa], a(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "a(x): exp. data", "a(x): poly(3) fit"],
/*[label,["R^2=0.986234862",0.66,0.05] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/a",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
xb: [
[mraw[2,1],mraw[2,6]],
[mraw[3,1],mraw[3,6]],
[mraw[4,1],mraw[4,6]],
[mraw[5,1],mraw[5,6]],
[mraw[6,1],mraw[6,6]],
[mraw[7,1],mraw[7,6]],
[mraw[8,1],mraw[8,6]],
[mraw[9,1],mraw[9,6]],
[mraw[10,1],mraw[10,6]],
[mraw[11,1],mraw[11,6]],
[mraw[12,1],mraw[12,6]]
];
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
plot2d([[discrete, xb], b(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "b(x): exp. data", "b(x): poly(3) fit"],
/*[label,["R^2=0.999726899",0.66,-0.7] ],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/b",string(PO),"_",string(cfn),".png")]);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
xv: [
[mraw[2,1],mraw[2,7]],
[mraw[3,1],mraw[3,7]],
[mraw[4,1],mraw[4,7]],
[mraw[5,1],mraw[5,7]],
[mraw[6,1],mraw[6,7]],
[mraw[7,1],mraw[7,7]],
[mraw[8,1],mraw[8,7]],
[mraw[9,1],mraw[9,7]],
[mraw[10,1],mraw[10,7]],
[mraw[11,1],mraw[11,7]],
[mraw[12,1],mraw[12,7]]
];
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
plot2d([[discrete, xv], v(x)], [x,0,1],[style,[points,4,2,1], [lines,2,5]],
[legend, "v(x): exp. data", "v(x): poly(3) fit "],
/*[label,["R^2=0.999529421",0.66,0.5]],*/
[xlabel, "Vol fraction, Ethanol-Water mixtures"],[ylabel, "arbitrary units"],
[gnuplot_term, png],[gnuplot_out_file, concat("C:/R/v",string(PO),"_",string(cfn),".png")],[gnuplot_preamble,"set key bottom"]);
/* [wxMaxima: input   end   ] */
 
/* [wxMaxima: input   start ] */
c(x);
e(x);
s(x);
a(x);
b(x);
v(x);
/* [wxMaxima: input   end   ] */
 
/* Maxima can't load/batch files which end with a comment! */
"Created with wxMaxima"$
 
 

Results

Maximum Solubility

Using the equation log_10 P = c+eE+sS+aA+bB+vV, the maximum solubility of compounds with known Abraham descriptors in any water-ethanol system can be predicted. The following code uses Python 2.7.6 to calculate maximum and minimum solubilities for a list of compounds;it also produces plots for each compound. The program uses sympy to find where the derivative of equation equals 0 and then substitutes those values into the original equation to find the max/min. It requires non-standard libraries including joblib, numpy, sympy, and matplotlib. The //x**2is required for sympy// to properly parse exponents. The program uses the descriptors file with ";" as delimiters. [[file:Compounds is an updated version of ONSAD001 which includes the max/min solubilities for each compound using the 3rd order polynomial; it does not include duplicates or any compounds with unknown Abraham descriptors.

The following calculates the max/min of the compounds in ONSAD001 using 3rd, 4th, or 5th order polyfit. It also creates a graph of each compound and it's solubility. At the end it will create a historgrams showing the frequency of the maximums and then the minimums.

from __future__ import print_function
import joblib
from joblib import Parallel, delayed
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import numpy as np
from sympy import *
import os
from math import exp
 
polyfits=['5'] #for now it can only run one item at a time (e.g. '3'). If I can figure out why the program freezes my computer when there is more than one item, then it could run multiple polyfits at in sequence.
for polyfit in polyfits:
    err=False
    Hash={}
    location={}
    #This is to change all the output files for appropriate fit.
    if polyfit=='3':
        cx='(-2.954445751*x**3+5.454132888*x**2-2.265884303*x)'
        ex='(1.622408565*x**3-1.954950801*x**2+0.753630794*x)'
        sx='(-1.550023723*x**3+1.216033377*x**2-0.613513212*x)'
        ax='(-0.166034064*x**3-0.081092064*x**2+0.542524049*x)'
        bx='(2.695058138*x**3-2.281640778*x**2-3.977643487*x)'
        vx='(-1.641156287*x**3+0.649768963*x**2+4.775062366*x)'
        print('doing 3')
    if polyfit=='4':
        cx='(-0.81158*x**4-1.3744*x**3+4.5172*x**2-2.1039*x)'
        ex='(5.9798*x**4-10.02*x**3+4.9483*x**2-0.43968*x)'
        sx='(-7.7073*x**4+13.455*x**3-7.6815*x**2+0.92454*x)'
        ax='(1.9719*x**4-4.0051*x**3+2.1953*x**2+0.14902*x)'
        bx='(-5.1194*x**4+12.662*x**3-8.1915*x**2-2.956*x)'
        vx='(7.2178*x**4-15.693*x**3+8.9822*x**2+3.3347*x)'
        print('doing 4')
    if polyfit=='5':
        cx='(-3.1921*x**5+7.0956*x**4-8.1733*x**3+6.8559*x**2-2.3628*x)'
        ex='(-3.2886*x**5+14.126*x**4-17.024*x**3+7.3576*x**2-0.70635*x)'
        sx='(-16.138*x**5+32.268*x**4-20.917*x**3+4.1418*x**2-0.38408*x)'
        ax='(13.292*x**5-30.953*x**4+24.305*x**3-7.5426*x**2+1.2268*x)'
        bx='(1.8013*x**5-9.5814*x**4+16.499*x**3-9.5113*x**2-2.81*x)'
        vx='(15.248*x**5-30.554*x**4+16.784*x**3-2.1894*x**2+4.5712*x)'
        print('doing 5')
    def replace(text,Echo,Sierra,Austin,Bravo,Victor):
            '''Replaces a Hash of characters'''
            dic={'cx':cx,'ex':ex,'sx':sx,'ax':ax,'bx':bx,'vx':vx,'Echo':Echo,'Sierra':Sierra,'Bravo':Bravo,'Austin':Austin,'Victor':Victor}
            for i, j in dic.iteritems():
                    text = text.replace(i, j)
            return text
    with open('./ONSAD001.csv', 'r') as f:
        newfile=open('./newfile'+polyfit,'w')
        for descriptors in f:
            descriptors=descriptors.strip('\n').strip('\'')
            descriptors=descriptors.split(';')
            if descriptors[0]=='key':
                    print('True')
                    print(descriptors[0]+';'+descriptors[1]+';'+descriptors[2]+';'+descriptors[3]+';'+descriptors[4]+';'+descriptors[5]+';'+descriptors[6]+
                          ';'+descriptors[7]+';'+descriptors[8]+';'+descriptors[9]+';'+descriptors[10]+';'+descriptors[11]+';'+descriptors[12]+
                          ';'+descriptors[13]+';'+descriptors[14]+';'+' Max Solubilty [0,1]; Min Solubility[0,1]; root[0]; root[1]; root[2]; root[3]',file=newfile)
            else:
                    if descriptors[7]=='-123' or descriptors[8]=='-123' or descriptors[9]=='-123' or descriptors[10]=='-123' or descriptors[11]=='-123':
                        pass
                    else:
                        if descriptors[4] in Hash: #descriptors[4] is the CSID
                            value=Hash.get(descriptors[4]) #get the date from the CSID
                            if value[14]<descriptors[14]: #descriptors[14] is the current compound. If the old date is less then just skipp it
                                Hash.update({descriptors[4]:descriptors})
                            else:
                                pass
                        else:
                            Hash.update({descriptors[4]:descriptors})
    newfile.close()
    newfile=open('./newfile'+polyfit,'a')
 
    def run(csid):
        filterroots=[]
        OoR=[]
        newfile=open('./newfile'+polyfit,'a')
        predictions=[]
        values={}
        #print(descriptors)
        line=Hash.get(csid)
        #print(csid)
        #print(line)
        compound=line[1]
        Echo=line[7]  #9-13
        Sierra=line[8]
        Austin=line[9]
        Bravo=line[10]
        Victor=line[11]
        equation='cx+ex*Echo+sx*Sierra+ax*Austin+bx*Bravo+vx*Victor'
        equation=replace(equation,Echo,Sierra,Austin,Bravo,Victor)
        x=Symbol('x')
        Alpha=0
        Omega=1
        y=sympify(equation)
        y=simplify(y)
        #print(equation)
        #print(y)
        yprime=y.diff(x)
        #print(yprime)
        roots=solve(yprime,x)
        OoR.append(roots[0])
        OoR.append(roots[1])
        for item in roots:
            guess=complex(item).real
            if guess <= 1 and guess >=0:
                filterroots.append(complex(item).real)
        try:
            values.update({filterroots[0]:y.subs(x,filterroots[0])})
            values.update({filterroots[1]:y.subs(x,filterroots[1])})
            values.update({filterroots[2]:y.subs(x,filterroots[2])})
            values.update({filterroots[3]:y.subs(x,filterroots[3])})
        except:
            pass
        values.update({Alpha:y.subs(x,Alpha)})
        values.update({Omega:y.subs(x,Omega)})
        maximum=max(values.items(), key=lambda x: x[1])[0]
        minimum=min(values.items(), key=lambda x: x[1])[0]
        print(line[0],';',line[1],';',line[2],';',line[3],';',line[4],';',line[5],';',line[6],';',line[7],';',line[8],';',line[9],';',line[10],';',line[11],';',line[12],
              ';',line[13],';',line[14],';',maximum,';',minimum,';',OoR[0],';',OoR[1],';','4',';','5', file=newfile)
        newfile.close()
 
        out='./Plots/compounds'+polyfit+'/'+str(csid)+'.png'
        for item in np.arange(0,1,.01):
            predictions.append(y.subs(x,item))
        xp=np.linspace(0,1,100)
        _=plt.plot(xp,predictions)
        plt.ylim(float(min(predictions)*1.1),float(max(predictions)*1.1))
        plt.ylabel('Log P')
        plt.xlabel('Vol')
        plt.title(compound)
        plt.gca().tick_params(axis='x', pad=15)
        try:
            plt.savefig(out)
        except OSError ,e:
            os.mkdirs('./Plots')
            plt.savfig(out)
            print('created Plots dir')
        plt.close()
    try:
        parallel=Parallel(n_jobs=-1,verbose=1000)(delayed(run)(csid) for csid in Hash)
    except:
        pass
    do=['MAX','MIN'] # this will run for 'MAX' and then 'MIN'.
    for item in do:
            numbers=[]
            with open('./newfile'+polyfit,'r') as f:
                print(item)
                try:
                        for line in f:
                            line=line.split(';')
                            try:
                                if item =='MAX':
                                    float(line[15])<2
                                    numbers.append(float(line[15].strip('\n')))
                                if item =='MIN':
                                    float(line[16])<2
                                    numbers.append(float(line[16].strip('\n')))
                            except:
                                pass
                except:
                    pass
            out='./'+item+'histogram'+polyfit+'.png'
            plt.xlabel('Vol')
            print('passed')
            plt.hist(numbers,bins=25)
            plt.gca().tick_params(axis='x', pad=15)
            plt.savefig(out)
            plt.show()
 

The maximum solubilities were calculated for Hydrocotisone (0.761910624963585), Benzamide (0.805242993382671), Valdecoxib (0.906636072114208), Antipyrine (0.655966697948519), Aminopyrine (0.847692071360370) using the cubic polynomials. A complete list of the compounds can be found in the previously mentioned Compounds file.
Aminopyrine.pngHydrocotisone.pngBenzamide.png

Valdecoxib.pngAminopyrine.png

LogSvaldecoxib.png
LogSHydrocortison.png
LogSbenzamide.png
LogSantipyrine.png
LogSbetulin.png
LogSriboflavin.png


Riboflavin is interesting because its maximum occurs at .2.
1043.png

2212 compounds were used in making the following histogram. The histogram is not very surprising, but it quickly shows that most of the compounds have maximum solubilities near 1. There were approximately 150 compounds that have maximum solubilities of 0. The histogram on the left includes all compounds in 100 bins, and the histogram on the right only includes compounds in 25 bins.

histogram3.png
histogram3less1.png


4th Order Polynomial Fit



5th Order Polynomial Fit
The 5th order polynomials fit produces similar results, but the histogram creating by rerunning the histogram is somewhat interesting because the slight curve in the 3rd order is gone and more compounds have maximums near 1 . The histogram can be reproduced by running the parallelized.py first, and then histogram.py.
MAXhistogram5.png

The following links contain all the plots of the 3rd, 4th, and 5th order polynomials. On a technical note, they are .tar.gz; so they are compressed and archived. Some older versions of 7-Zip or WinRAR require that you extract them twice - first to the .tar file and then to the unarchived folder.








Differences between poly(3), poly(4), and poly(5) fits for the floating order zero coefficient and when we force the order zero coefficient to be zero.


The following plots shows the histograms of where in the volume ratio scale we have maximum solubility for ~ 2000 compounds. This plots was obtained by the python code compiled on a windows machine using Spyder for interface. First row is for floating, second row is for force zero. The columns increase with the polynomial order.

MAXhistogram3_bin25_FLO.png
MAXhistogram4_bin25_FLO.png
MAXhistogram5_bin25_FLO.png
MAXhistogram3_bin25_FZ.png
MAXhistogram4_bin25_FZ.png
MAXhistogram5_bin25_FZ.png

Histograms of maximums and minimums for drug solubility versus polynomial fit


Histo_poly_min.png
Histo_poly_max.png

Comparison of Drug solubility maximums position versus polynomial fit


LogSantipyrine.png
LogSbenzamide.png

Can be observed on the following graph, the difference on prediction for maximum solubility as different polynomial order of approximation of the experimental data was used on the model. This difference on prediction can be as high of 15% for caffeine and less than 5% for 1,4 Dioxane, Rifapentine, Benzamide and Adenin. Negligible difference for Ammonia, Betulin, Hydrocortisone and Cimetidine.

Poly_Prediction_Interesting_Compounds.png



Comparison with the Jouyban-Acree Model
The Jouyban-Acree model can be used to approximate the solubility of a compound in the water-ethanol system at various temperatures:

where x is the volume fraction of ethanol and T is the temperature in degrees Kelvin. Using:


We obtain...


Expanding in terms of x gives:


Comparison with our model, where in this case we have let c= 0:


We see that the Jouyban-Acree model restricted to T=298.15 K ....

cx_compp_.png
ex_compp.png
sx_compp.png
ax_compp.png
bx_compp.png
vx_compp.png

The coefficients are function of the temperature, but they are not very sensitive, assuming the coefficients for logP_E stay the same as we increase the temperature, which I doubt that is the case. :P

T=198 K
T=298 K
T=398 K
c_Jouyban-vs-DATA_R031015_198.png
c_Jouyban-vs-DATA_R031015_298.png
c_Jouyban-vs-DATA_R031015_398.png
e_Jouyban-vs-DATA_R031015_198.png
e_Jouyban-vs-DATA_R031015_298.png
e_Jouyban-vs-DATA_R031015_398.png
s_Jouyban-vs-DATA_R031015_198.png
s_Jouyban-vs-DATA_R031015_298.png
s_Jouyban-vs-DATA_R031015_398.png
a_Jouyban-vs-DATA_R031015_198.png
a_Jouyban-vs-DATA_R031015_298.png
a_Jouyban-vs-DATA_R031015_398.png
b_Jouyban-vs-DATA_R031015_198.png
b_Jouyban-vs-DATA_R031015_298.png
b_Jouyban-vs-DATA_R031015_398.png
v_Jouyban-vs-DATA_R031015_198.png
v_Jouyban-vs-DATA_R031015_298.png
v_Jouyban-vs-DATA_R031015_398.png

References

1. Abraham MH and Acree Jr. WE: Partition coefficients and solubilities of compounds in the water-ethanol system. J Solution Chem. doi: 10.1007/s10953-011-9719-x