EU-kalat: Difference between revisions

Latest revision as of 09:04, 17 March 2021

[show] This page is a study. The page identifier is Op_en3104
Moderator:Arja (see all)

Upload data {{#opasnet_base_link:Op_en3104}}

EU-kalat is a study, where concentrations of PCDD/Fs, PCBs, PBDEs and heavy metals have been measured from fish

Question

The scope of EU-kalat study was to measure concentrations of persistent organic pollutants (POPs) including dioxin (PCDD/F), PCB and BDE in fish from Baltic sea and Finnish inland lakes and rivers. ^[1] ^[2] ^[3].

Answer

The original sample results can be acquired from Opasnet base. The study showed that levels of PCDD/Fs and PCBs depends especially on the fish species. Highest levels were on salmon and large sized herring. Levels of PCDD/Fs exceeded maximum level of 4 pg TEQ/g fw multiple times. Levels of PCDD/Fs were correlated positively with age of the fish.

Mean congener concentrations as WHO2005-TEQ in Baltic herring can be printed out with this link or by running the codel below.

+ Show code - Hide code

# This code is Op_en on page [[EU-kalat]].

library(OpasnetUtils)
library(ggplot2)

eu <- Ovariable("eu", ddata = "Op_en3104", subset = "POPs")
eu <- EvalOutput(eu)
#levels(eu$POP)
eu <- eu[eu$Fish_species == "Baltic herring" , ]

tef <- Ovariable("tef", ddata = "Op_en4017", subset = "TEF values")
tef <- EvalOutput(tef)
colnames(eu@output)[colnames(eu@output) == "POP"] <- "Congener"
levels(eu$Congener) <- gsub("HCDD", "HxCDD", levels(eu$Congener))
levels(eu$Congener) <- gsub("HCDF", "HxCDF", levels(eu$Congener))
levels(eu$Congener) <- gsub("CoPCB", "PCB", levels(eu$Congener))
#levels(eu$Congener)

euteq <- eu * tef
#head(euteq@output)
temp <- aggregate(euteq@output["Result"], by = c(euteq@output["Congener"], euteq@output["Catch_location"]), FUN = mean)

temp2 <- temp
temp2$Result <- temp2$Result / sum(temp2$Result)

cat("Congener TEQ profile of Baltic herring.\n")
oprint(temp2)
 
ggplot(temp, aes(x = Congener, y = Result, fill = Congener)) + geom_bar(stat = "identity") + facet_wrap(~ Catch_location) + 
labs(x = "Congener", y = "TEQ concentration (pg/g fat)") + theme_gray(base_size = 18) + coord_flip()

temp2 <- temp[temp$Congener %in% levels(temp$Congener)[1:17],]

ggplot(temp2, aes(x = Congener, y = Result, fill = Congener)) + geom_bar(stat = "identity") + facet_wrap(~ Catch_location) + 
  labs(
    title="Dioxin concentrations in Baltic herring",
    x = "Congener",
    y = "TEQ concentration (pg/g fat)"
  ) + theme_gray(base_size = 18) + coord_flip()

Rationale

Data

Data was collected between 2009-2010. The study contains years, tissue type, fish species, and fat content for each concentration measurement. Number of observations is 285.

There is a new study EU-kalat 3, which will produce results in 2016.

Calculations

Preprocess

Preprocess model 22.2.2017 [4]
- Objects used in Benefit-risk assessment of Baltic herring and salmon intake
Model run 25.1.2017 [5]
Model run 22.5.2017 with new ovariables euRaw, euAll, euMain, and euRatio [6]
Model run 23.5.2017 with adjusted ovariables euRaw, eu, euRatio [7]
Model run 11.10.2017: Small herring and Large herring added as new species [8]
Model rerun 15.11.2017 because the previous stored run was lost in update [9]
Model run 21.3.2018: Small and large herring replaced by actual fish length [10]
Model run 26.3.2018 eu2 moved here [11]

See an updated version of preprocess code for eu on Health effects of Baltic herring and salmon: a benefit-risk assessment#Code for estimating TEQ from chinese PCB7

+ Show code - Hide code

# This is code Op_en3104/preprocess on page [[EU-kalat]]
library(OpasnetUtils)
library(ggplot2)
library(reshape2)

# Divide herring into small and large (as new "species")
euRaw <- Ovariable("euRaw", ddata = "Op_en3104", subset = "POPs")
#euRaw <- EvalOutput(euRaw)

#temp <- euRaw[euRaw$Fish_species == "Baltic herring" , ]
#temp$Fish_species <- as.factor(
#  ifelse(
#    as.numeric(as.character(temp$Length_mean_mm)) >= 170,
#    "Large herring",
#    "Small herring"
#  )
#)
#euRaw <- combine(euRaw, temp, name = "euRaw")

# levels(TEF$Group)
#[1] "Chlorinated dibenzo-p-dioxins" "Chlorinated dibenzofurans"    
#[3] "Mono-ortho–substituted PCBs"   "Non-ortho–substituted PCBs"   

indexguide <- Ovariable(data=data.frame( # This is needed to make these indices marginals in eu (which is in long format).
  Length = 1,
  Year = 1,
  Weight = 1,
  Large = 1,
  Result = 1
))

eu <- Ovariable(
  "eu",
  dependencies = data.frame(
    Name=c("euRaw", "TEF", "indexguide"),
    Ident=c(NA,"Op_en4017/initiate", NA)
  ),
  formula = function(...) {
    euRaw$Length<-as.numeric(as.character(euRaw$Length_mean_mm))
    euRaw$Year <- as.numeric(substr(euRaw$Catch_date, nchar(as.character(euRaw$Catch_date))-3,100))
    euRaw@marginal[colnames(euRaw@output) %in% c("Length","Year")] <- TRUE
#    euRaw$Weight<-as.numeric(as.character(euRaw$Weight_mean_g))
#    euRaw$Large <- euRaw$Length >= 170
    
    eu <- euRaw[,c(1:4, 10, 20, 21, 18)] # See below + Length, Year, Result
    colnames(eu@output)[1:5] <- c("THLcode", "Matrix", "Compound", "Fish", "N")
    
    temp <- oapply(eu * TEF, cols = "Compound", FUN = "sum")
    colnames(temp@output)[colnames(temp@output)=="Group"] <- "Compound"
    eu <- combine(eu, temp)
    
    eu$Compound <- factor( # Compound levels are ordered based on the data table on [[TEF]]
      eu$Compound, 
      levels = unique(c(levels(TEF$Compound), levels(eu$Compound)))
    )
    eu$Compound <- eu$Compound[,drop=TRUE]
    
    return(eu)
  }
)

eu2 <- Ovariable(
  "eu2",
  dependencies=data.frame(Name="eu", Ident = NA),
  formula = function(...) {

replaces <- list(
  c("Chlorinated dibenzo-p-dioxins", "PCDDF"),
  c("Chlorinated dibenzofurans", "PCDDF"),
  c("Mono-ortho-substituted PCBs", "PCB"),
  c("Non-ortho-substituted PCBs", "PCB")
)
eu2 <- eu
for(i in 1:length(replaces)) {
  levels(eu2$Compound)[levels(eu2$Compound)==replaces[[i]][1]] <- replaces[[i]][2]
}

eu2@marginal[colnames(eu2@output) %in% c("Length","Year")] <- TRUE # Indexguide should take care of this but it doesn't!
eu2 <- unkeep(eu2, prevresults = TRUE, sources = TRUE)
eu2 <- oapply(eu2, cols = "TEFversion", FUN = "sum") # Sums up dioxin+furan and non+monoortho. This goes wrong if > 1 TEFversion.
return(eu2)
}
)

euRatio <- Ovariable(
  "euRatio",
  dependencies = data.frame(Name=c("eu")),
  formula = function(...) {
    euRatio <- eu[
      eu$Compound == "2378TCDD" & eu$Matrix == "Muscle" & result(eu) != 0 , ] # Zeros cannot be used in ratio estimates
    euRatio$Compound <- NULL
    euRatio <- log10(eu / euRatio)@output
    euRatio <- euRatio[!euRatio$Compound %in% c("2378TCDD", "2378-TCDD", "TCDD") , ]
    return(euRatio)
  }
)

# Analysis: a few rows disappear here, as shown by numbers per fish species. Why?
# aggregate(eut@output, by = eut@output["Fish"], FUN = length)[[2]]
# [1] 1181  141  131   55  158   51   96   30   36   40  102   49   61  180 eu
# [1] 1173  141  131   55  158   51   96   27   36   40  102   49   53  180 eu/eut
# There are samples where TCDD concenctration is zero.
#eu@data[eu@data$POP == "2378TCDD" & eu@data$euResult == 0 , c(1,4)][[1]]
#eu@data[eu@data[[1]] %in% c("09K0585", "09K0698", "09K0740", "09K0748") , c(1,3,4,18)]
# Some rows disappear because there is no TCDD measurement to compare with:
#8 Baltic herrings, 8 sprats, 3 rainbow trouts.
# Conclusion: this is ok. Total 2292 rows.

################## Data for the main congeners and species only

#> unique(eu$Congener)
#[1] 2378TCDD     12378PeCDD   123478HCDD   123678HCDD   123789HCDD   1234678HpCDD
#[7] OCDD         2378TCDF     12378PeCDF   23478PeCDF   123478HCDF   123678HCDF  
#[13] 123789HCDF   234678HCDF   1234678HpCDF 1234789HpCDF OCDF         CoPCB77 ...     

# Remove the four PCDDFs with too little data (>70% BDL) and all non-PCDDF
# aggregate(eu@data$euResult, by = eu@data["POP"], FUN = function(x) mean(x == 0))

#[1] Baltic herring Sprat          Salmon         Sea trout      Vendace       
#[6] Roach          Perch          Pike           Pike-perch     Burbot        
#[11] Whitefish      Flounder       Bream          River lamprey  Cod           
#[16] Trout          Rainbow trout  Arctic char   Small herring   Large herring

indices <- list(
  Compound.TEQ2 = c("PCDDF", "PCB"),
  Compound.PCDDF14 = as.character(unique(euRaw$POP)[c(1:12, 14, 15)]), # 7 OCDD should be removed
  Fish.Fish16 = as.character(unique(euRaw$Fish_species)[c(1:4, 6:14, 17, 19, 20)])
)

#> indices
#$Compound.TEQ2
#[1] "PCDDF" "PCB"  
#
#$Compound.PCDDF14
# [1] "2378TCDD"     "12378PeCDD"   "123478HCDD"   "123678HCDD"   "123789HCDD"   "1234678HpCDD"
# [7] "OCDD"         "2378TCDF"     "12378PeCDF"   "23478PeCDF"   "123478HCDF"   "123678HCDF"  
#[13] "234678HCDF"   "1234678HpCDF"
#
#$Fish.Fish16
# [1] "Baltic herring" "Sprat"          "Salmon"         "Sea trout"      "Roach"         
# [6] "Perch"          "Pike"           "Pike-perch"     "Burbot"         "Whitefish"     
#[11] "Flounder"       "Bream"          "River lamprey"  "Rainbow trout"  "Small herring" 
#[16] "Large herring" 

#ggplot(euRaw@output, aes(x=Length, y=Weight))+geom_point()+
#  facet_wrap(~Fish_species, scales="free")

#oprint(summary(
#  euRaw[euRaw$Fish_species %in% c("Baltic herring","Salmon") & euRaw$POP == "2378TCDD" , ],
#  marginals=c("Catch_location","Large","Fish_species"),
#  fun = "length"
#))

objects.store(euRaw, eu, eu2, euRatio, indices, indexguide)
cat("Ovariables euRaw, eu, eu2, euRatio, and indexguide and list indices stored.\n")

Bayes model for dioxin concentrations

Model run 28.2.2017 [12]
Model run 28.2.2017 with corrected survey model [13]
Model run 28.2.2017 with Mu estimates [14]
Model run 1.3.2017 [15]
Model run 23.4.2017 [16] produces list conc.param and ovariable concentration
Model run 24.4.2017 [17]
Model run 19.5.2017 without ovariable concentration [18] ⇤--#: . The model does not mix well, so the results should not be used for final results. --Jouni (talk) 19:37, 19 May 2017 (UTC) (type: truth; paradigms: science: attack)

----#: . Maybe we should just estimate TEQs until the problem is fixed. --Jouni (talk) 19:37, 19 May 2017 (UTC) (type: truth; paradigms: science: comment)

Model run 22.5.2017 with TEQdx and TEQpcb as the only Compounds [19]
Model run 23.5.2017 debugged [20] [21] [22]
Model run 24.5.2017 TEQdx, TECpcb -> PCDDF, PCB [23]
Model run 11.10.2017 with small and large herring [24] (removed in update)
Model run 12.3.2018: bugs fixed with data used in Bayes. In addition, redundant fish species removed and Omega assumed to be the same for herring and salmon. [25]
Model run 22.3.2018 [26] Model does not mix well. Thinning gives little help?
Model run 25.3.2018 with conc.param as ovariable [27]

+ Show code - Hide code

# This is code Op_en3104/bayes on page [[EU-kalat]]

library(OpasnetUtils)
library(reshape2)
library(rjags) # JAGS
library(ggplot2)
library(MASS) # mvrnorm
library(car) # scatterplotMatrix

objects.latest("Op_en3104", code_name = "preprocess") # [[EU-kalat]] eu, eu2, euRatio, indices

conl <- indices$Compound.TEQ2
# fisl <- indices$Fish.Fish16
# fisl <- c("Baltic herring","Large herring","Salmon","Small herring")
fisl <- c("Baltic herring","Salmon")
C <- length(conl)
Fi <- length(fisl)
N <- 1000
conl
fisl

eu2 <- EvalOutput(eu2)

# Hierarchical Bayes model.

# PCDD/F concentrations in fish.
# It uses the TEQ sum of PCDD/F (PCDDF) as the total concentration
# of dioxin and PCB respectively for PCB in fish.
# PCDDF depends on size of fish, fish species, catchment time, and catchment area,
# but we omit catchment area. In addition, we assume that size of fish has
# zero effect for other fish than Baltic herring.
# Catchment year affects all species similarly. 

eu2 <- eu2[eu2$Compound %in% conl & eu2$Fish %in% fisl & eu2$Matrix == "Muscle" , ]
eu3 <- reshape( 
  eu2@output, 
  v.names = "euResult", 
  idvar = c("THLcode", "Fish"),
  timevar = "Compound", 
  drop = c("Matrix"), 
  direction = "wide"
)

oprint(head(eu3))

#> colnames(eu3)
#[1] "THLcode"        "Fish"           "N"              "Length"         "Year"          
#[6] "euResult.PCDDF" "euResult.PCB"  

conc <- data.matrix(eu3[6:ncol(eu3)])

if(FALSE){
  # Find the level of quantification for dinterval function
  LOQ <- unlist(lapply(eu3[6:ncol(eu3)], FUN = function(x) min(x[x!=0]))) 
  # With TEQ, there are no zeros. So this is useful only if there are congener-specific results.
  names(LOQ) <- conl
  
  
  conc <- sapply(
    1:length(LOQ),
    FUN = function(x) ifelse(conc[,x]==0, 0.5*LOQ[x], conc[,x])
  )
}
# This version of the model looks only at Baltic herring and salmon.
# It assumes that all fish groups have the same Omega but mu varies.

mod <- textConnection(
  "
  model{
  for(i in 1:S) { # S = fish sample
  #        below.LOQ[i,j] ~ dinterval(-conc[i,j], -LOQ[j])
  conc[i,1:C] ~ dmnorm(muind[i,], Omega[fis[i],,])
  muind[i,1:C] <- mu[fis[i],1:C] + lenp[fis[i]]*length[i] + timep*year[i]
  }
  
  # Priors for parameters
  # Time trend. Assumed a known constant because at the moment there is little time variation in data.
  # https://www.evira.fi/elintarvikkeet/ajankohtaista/2018/itameren-silakoissa-yha-vahemman-ymparistomyrkkyja---paastojen-rajoitukset-vaikuttavat/
  # PCDDF/PCB-concentations 2001: 9 pg/g fw, 2016: 3.5 pg/g fw. (3.5/9)^(1/15)-1=-0.06102282
  timep ~ dnorm(-0.0610, 10000) 
  lenp[1] ~ dnorm(0.01,0.01) # length parameter for herring
  lenp[2] ~ dnorm(0,10000) # length parameter for salmon: assumed zero
  
  for(i in 1:Fi) { # Fi = fish species
  Omega[i,1:C,1:C] ~ dwish(Omega0[1:C,1:C],S)
  pred[i,1:C] ~ dmnorm(mu[i,1:C]+lenp[i]*lenpred+timep*timepred, Omega[i,,]) # Model prediction.
  for(j in 1:C) { 
  mu[i,j] ~ dnorm(0, 0.0001) # mu1[j], tau1[j]) # Congener-specific mean for fishes
  }
  }
  }
  ")

jags <- jags.model(
  mod,
  data = list(
    S = nrow(eu3),
    C = C,
    Fi = Fi,
    conc = log(conc),
    length = eu3$Length-170, # Subtract average herring size
    year = eu3$Year-2009, # Substract baseline year
    fis = match(eu3$Fish, fisl),
    lenpred = 233-170,
    timepred = 2009-2009,
    Omega0 = diag(C)/100000
  ),
  n.chains = 4,
  n.adapt = 100
)

update(jags, 1000)

samps.j <- jags.samples(
  jags, 
  c(
    'mu', # mean by fish and compound
    'Omega', # precision matrix by compound
    'lenp',# parameters for length
    'timep', # parameter for Year
    'pred' # predicted concentration for year 2009 and length 17 cm
  ), 
  #  thin=1000,
  N
)
dimnames(samps.j$Omega) <- list(Fish = fisl, Compound = conl, Compound2 = conl, Iter=1:N, Chain=1:4)
dimnames(samps.j$mu) <- list(Fish = fisl, Compound = conl, Iter = 1:N, Chain = 1:4)
dimnames(samps.j$lenp) <- list(Fish = fisl, Iter = 1:N, Chain = 1:4)
dimnames(samps.j$pred) <- list(Fish = fisl, Compound = conl, Iter = 1:N, Chain = 1:4)
dimnames(samps.j$timep) <- list(Dummy = "time", Iter = 1:N, Chain = 1:4)

##### conc.param contains expected values of the distribution parameters from the model

conc.param <- Ovariable(
  "conc.param",
  dependencies = data.frame(Name = "samps.j", Ident=NA),
  formula = function(...) {
    conc.param <- list(
      Omega = apply(samps.j$Omega, MARGIN = 1:3, FUN = mean),
      lenp = cbind(
        mean = apply(samps.j$lenp, MARGIN = 1, FUN = mean),
        sd = apply(samps.j$lenp, MARGIN = 1, FUN = sd)
      ),
      mu = apply(samps.j$mu, MARGIN = 1:2, FUN = mean),
      timep = cbind(
        mean = apply(samps.j$timep, MARGIN = 1, FUN = mean),
        sd = apply(samps.j$timep, MARGIN = 1, FUN = sd)
      )
    )
    names(dimnames(conc.param$lenp)) <- c("Fish","Metaparam")
    names(dimnames(conc.param$timep)) <- c("Dummy","Metaparam")
    conc.param <- melt(conc.param)
    colnames(conc.param)[colnames(conc.param)=="value"] <- "Result"
    colnames(conc.param)[colnames(conc.param)=="L1"] <- "Parameter"
    conc.param$Dummy <- NULL
    conc.param$Metaparam <- ifelse(is.na(conc.param$Metaparam), conc.param$Parameter, as.character(conc.param$Metaparam))
    return(Ovariable(output=conc.param, marginal=colnames(conc.param)!="Result"))
  }
)

objects.store(conc.param, samps.j)
cat("Lists conc.params and samps.j stored.\n")

######################3

cat("Descriptive statistics:\n")

# Leave only the main fish species and congeners and remove others

#oprint(summary(
#  eu2[eu2$Compound %in% indices$Compound.PCDDF14 & eu$Fish %in% fisl , ],
#  marginals = c("Fish", "Compound"), # Matrix is always 'Muscle'
#  function_names = c("mean", "sd")
#))

ggplot(eu2@output, aes(x = euResult, colour=Compound))+stat_ecdf()+
  facet_wrap( ~ Fish)+scale_x_log10()

ggplot(eu2@output, 
       aes(x = Length, y = euResult, colour=Compound))+geom_point()+
  facet_grid(Compound ~ Fish, scale="free_x")+scale_y_log10()

scatterplotMatrix(t(exp(samps.j$pred[1,,,1])), main = "Predictions for all compounds for Baltic herring")
scatterplotMatrix(t(exp(samps.j$pred[,1,,1])), main = "Predictions for all fish species for PCDDF")
scatterplotMatrix(t(samps.j$Omega[,1,1,,1]))
#scatterplotMatrix(t(cbind(samps.j$Omega[1,1,1,,1],samps.j$mu[1,1,,1])))

plot(coda.samples(jags, 'Omega', N))
plot(coda.samples(jags, 'mu', N))
plot(coda.samples(jags, 'lenp', N))
plot(coda.samples(jags, 'timep', N))
plot(coda.samples(jags, 'pred', N))

Initiate conc_pcddf for PFAS disease burden study

Initiate euw data.frame

This code is similar to preprocess but is better and includes PFAS concentrations from op_fi:PFAS-yhdisteiden tautitaakka. It produces data.frame euw that is the EU-kalat + PFAS data in wide format and, for PFAS but not EU-kalat, a sampled value for measurements below the level of quantification.

+ Show code - Hide code

# This is code Op_en3104/preprocess2 on page [[EU-kalat]]
library(OpasnetUtils)
library(ggplot2)
library(reshape2)

openv.setN(1)
opts = options(stringsAsFactors = FALSE)
euRaw <- Ovariable("euRaw", ddata = "Op_en3104", subset = "POPs") # [[EU-kalat]]

eu <- Ovariable(
  "eu",
  dependencies = data.frame(
    Name=c("euRaw", "TEF"),
    Ident=c(NA,"Op_en4017/initiate")
  ),
  formula = function(...) {
    out <- euRaw
    out$Length<-as.numeric(as.character(out$Length_mean_mm))
    out$Year <- as.numeric(substr(out$Catch_date, nchar(as.character(out$Catch_date))-3,100))
    out$Weight<-as.numeric(as.character(out$Weight_mean_g))

    out <- out[,c(1:6, 8: 10, 14:17, 19:22, 18)] # See below 
    
    #[1] "ﮮTHL_code"             "Matrix"                "POP"                   "Fish_species"         
    #[5] "Catch_site"            "Catch_location"        "Catch_season"          "Catch_square"         
    #[9] "N_individuals"         "Sex"                   "Age"                   "Fat_percentage"       
    #[13] "Dry_matter_percentage" "euRawSource"           "Length"                "Year"                 
    #[17] "Weight"                "euRawResult"          
    
    colnames(out@output)[1:13] <- c("THLcode", "Matrix", "Compound", "Fish", "Site", "Location", "Season",
                                   "Square","N","Sex","Age","Fat","Dry_matter")
    out@marginal <- colnames(out)!="euRawResult"
    
    tmp <- oapply(out * TEF, cols = "Compound", FUN = "sum")
    colnames(tmp@output)[colnames(tmp@output)=="Group"] <- "Compound"
    # levels(tmp$Compound)
    # [1] "Chlorinated dibenzo-p-dioxins" "Chlorinated dibenzofurans"     "Mono-ortho-substituted PCBs"  
    # [4] "Non-ortho-substituted PCBs"   
    levels(tmp$Compound) <- c("PCDD","PCDF","moPCB","noPCB")
    
    out <- OpasnetUtils::combine(out, tmp)
    
    out$Compound <- factor( # Compound levels are ordered based on the data table on [[TEF]]
      out$Compound, 
      levels = unique(c(levels(TEF$Compound), unique(out$Compound)))
    )
    out$Compound <- out$Compound[,drop=TRUE]
    
    return(out)
  }
)

eu <- EvalOutput(eu)

euw <- reshape( 
  eu@output, 
  v.names = "euResult", 
  idvar = c("THLcode", "Matrix", "Fish"), # , "Site","Location","Season","Square","N","Sex","Age","Fat", "Dry_matter","Length","Year","Weight"
  timevar = "Compound", 
  drop = c("euRawSource","TEFversion","TEFrawSource","TEFSource","Source","euSource"), 
  direction = "wide"
)
colnames(euw) <- gsub("euResult\\.","",colnames(euw))
euw$PCDDF <- euw$PCDD + euw$PCDF
euw$PCB <- euw$noPCB + euw$moPCB
euw$TEQ <- euw$PCDDF + euw$PCB
euw$PFOA <- euw$PFOA / 1000 # pg/g --> ng/g
euw$PFOS <- euw$PFOS / 1000 # pg/g --> ng/g
euw$PFAS <- euw$PFOA + euw$PFOS

#################### PFAS measurements from Porvoo

conc_pfas_raw <- EvalOutput(Ovariable(
  "conc_pfas_raw",
  data=opbase.data("Op_fi5932", subset="PFAS concentrations"), # [[PFAS-yhdisteiden tautitaakka]]
  unit="ng/g f.w.")
)@output

conc_pfas_raw <- reshape(conc_pfas_raw,
                         v.names="conc_pfas_rawResult",
                         timevar="Compound",
                         idvar=c("Obs","Fish"),
                         drop="conc_pfas_rawSource",
                         direction="wide")

colnames(conc_pfas_raw) <- gsub("conc_pfas_rawResult\\.","",colnames(conc_pfas_raw))
conc_pfas_raw <- within(conc_pfas_raw, PFAS <- PFOS + PFHxS + PFOA + PFNA)
conc_pfas_raw$Obs <- NULL

euw <- orbind(euw, conc_pfas_raw)

objects.store(euw)
cat("Data.frame euw stored.\n")

Initiate conc_param using Bayesian approach

Bayesian approach for PCDDF, PCB, OT, PFAS.

Model run 2021-03-08 [28]
Model run 2021-03-08 [29] with the fish needed in PFAS assessment
Model run 2021-03-12 [30] using euw
Model run 2021-03-13 [31] with location parameter for PFAS
Model run 2021-03-17 [32] location parameter not plotted because problems with older R version in Opasnet.

+ Show code - Hide code

# This is code Op_en3104/pollutant_bayes on page [[EU-kalat]]
# The code is also available at https://github.com/jtuomist/pfas/blob/main/conc_pcddf_preprocess.R

library(OpasnetUtils)
library(reshape2)
library(rjags) # JAGS
library(ggplot2)
library(MASS) # mvrnorm
library(car) # scatterplotMatrix

#' Find the level of quantification for dinterval function
#' @param df data.frame
#' @return data.matrix
add_loq <- function(df) { # This should reflect the fraction of observations below LOQ.
  LOQ <- unlist(lapply(df, FUN = function(x) min(x[x!=0], na.rm=TRUE))) 
  out <- sapply(
    1:length(LOQ),
    FUN = function(x) ifelse(df[,x]==0, 0.5*LOQ[x], df[,x])
  )
  out <- data.matrix(out)
  return(out)
}

#size <- Ovariable("size", ddata="Op_en7748", subset="Size distribution of fish species")
#time <- Ovariable("time", data = data.frame(Result=2015))

objects.latest("Op_en3104", code_name = "preprocess2") # [[EU-kalat]] euw

# Hierarchical Bayes model.

# PCDD/F concentrations in fish.
# It uses the TEQ sum of PCDD/F (PCDDF) as the total concentration
# of dioxin and PCB respectively for PCB in fish.
# PCDDF depends on size of fish, fish species, catchment time, and catchment area,
# but we omit catchment area. In addition, we assume that size of fish has
# zero effect for other fish than Baltic herring.
# Catchment year affects all species similarly. 

eu3 <- euw[!colnames(euw) %in% c("MPhT","DOT","BDE138")] # No values > 0

eu3 <- eu3[eu3$Matrix == "Muscle" , ]
eu3$Locat <- ifelse(eu3$Location=="Porvoo",2,
                       ifelse(eu3$Location=="Helsinki, Vanhankaupunginlahti Bay",3,1))
locl <- c("Finland","Porvoo","Helsinki")

#conl_nd <- c("PFAS","PFOA","PFOS","DBT","MBT","TBT","DPhT","TPhT")
conl_nd <- c("PFAS","PFOS") # TBT would drop Porvoo measurements
fisl <- fisl_nd <- c("Baltic herring","Bream","Flounder","Perch","Roach","Salmon","Whitefish")

eu4 <- eu3[rowSums(is.na(eu3[conl_nd]))<length(conl_nd) & eu3$Fish %in% fisl_nd ,
           c(1:5,match(c("Locat",conl_nd),colnames(eu3)))]

conc_nd <- add_loq(eu4[conl_nd])

conl <- c("TEQ","PCDDF","PCB") # setdiff(colnames(eu3)[-(1:5)], conl_nd)
eu3 <- eu3[!is.na(eu3$PCDDF) & eu3$Fish %in% fisl , c(1:5, match(conl,colnames(eu3)))]

oprint(head(eu3))
oprint(head(eu4))

C <- length(conl)
Fi <- length(fisl)
N <- 200
thin <- 100
conl
fisl
conl_nd
fisl_nd

eu3 <- eu3[rowSums(is.na(eu3))==0,]
conc <- add_loq(eu3[conl]) # Remove rows with missing data.

# The model assumes that all fish groups have the same Omega but mu varies.

mod <- textConnection(
  "
  model{
    for(i in 1:S) { # S = fish sample
      #        below.LOQ[i,j] ~ dinterval(-conc[i,j], -LOQ[j])
      conc[i,1:C] ~ dmnorm(muind[i,], Omega[fis[i],,])
      muind[i,1:C] <- mu[fis[i],1:C] #+ lenp[fis[i]]*length[i] + timep*year[i]
    }
    for(i in 1:S_nd) {
      for(j in 1:C_nd) {
        conc_nd[i,j] ~ dnorm(muind_nd[i,j], tau_nd[j])
        muind_nd[i,j] <- mu_nd[fis_nd[i],j] + mulocat[locat[i]] #+ lenp[fis[i]]*length[i] + timep*year[i]
      }
    }
    
    # Priors for parameters
    # Time trend. Assumed a known constant because at the moment there is little time variation in data.
    # https://www.evira.fi/elintarvikkeet/ajankohtaista/2018/itameren-silakoissa-yha-vahemman-ymparistomyrkkyja---paastojen-rajoitukset-vaikuttavat/
    # PCDDF/PCB-concentations 2001: 9 pg/g fw, 2016: 3.5 pg/g fw. (3.5/9)^(1/15)-1=-0.06102282
  #  timep ~ dnorm(-0.0610, 10000) 
  #  lenp[1] ~ dnorm(0.01,0.01) # length parameter for herring
  #  lenp[2] ~ dnorm(0,10000) # length parameter for salmon: assumed zero
    
    for(i in 1:Fi) { # Fi = fish species
      Omega[i,1:C,1:C] ~ dwish(Omega0[1:C,1:C],S)
      pred[i,1:C] ~ dmnorm(mu[i,1:C], Omega[i,,]) #+lenp[i]*lenpred+timep*timepred, Omega[i,,]) # Model prediction.
      for(j in 1:C) { 
        mu[i,j] ~ dnorm(0, 0.0001) # mu1[j], tau1[j]) # Congener-specific mean for fishes
      }
    }
    # Non-dioxins
    mulocat[1] <- 0
    mulocat[2] ~ dnorm(0,0.001)
    mulocat[3] ~ dnorm(0,0.001)
    for(j in 1:C_nd) {
      tau_nd[j] ~ dgamma(0.001,0.001)
      for(i in 1:Fi_nd) { # Fi = fish species
        pred_nd[i,j] ~ dnorm(mu[i,j], tau_nd[j])
        mu_nd[i,j] ~ dnorm(0, 0.0001)
      }
    }
  }
")

jags <- jags.model(
  mod,
  data = list(
    S = nrow(conc),
    S_nd = nrow(conc_nd),
    C = C,
    C_nd = ncol(conc_nd),
    Fi = Fi,
    Fi_nd = length(fisl_nd),
    conc = log(conc),
    conc_nd = log(conc_nd),
    locat = eu4$Locat,
    #    length = eu3$Length-170, # Subtract average herring size
    #    year = eu3$Year-2009, # Substract baseline year
    fis = match(eu3$Fish, fisl),
    fis_nd = match(eu4$Fish, fisl_nd),
    #    lenpred = 233-170,
    #    timepred = 2009-2009,
    Omega0 = diag(C)/100000
  ),
  n.chains = 4,
  n.adapt = 200
)

update(jags, 1000)

samps.j <- jags.samples(
  jags, 
  c(
    'mu', # mean by fish and compound
    'Omega', # precision matrix by compound
    #    'lenp',# parameters for length
    #    'timep', # parameter for Year
    'pred', # predicted concentration for year 2009 and length 17 cm
    'pred_nd',
    'mu_nd',
    'tau_nd',
    'mulocat'
  ), 
  thin=thin,
  N*thin
)
dimnames(samps.j$Omega) <- list(Fish = fisl, Compound = conl, Compound2 = conl, Iter=1:N, Chain=1:4)
dimnames(samps.j$mu) <- list(Fish = fisl, Compound = conl, Iter = 1:N, Chain = 1:4)
#dimnames(samps.j$lenp) <- list(Fish = fisl, Iter = 1:N, Chain = 1:4)
dimnames(samps.j$pred) <- list(Fish = fisl, Compound = conl, Iter = 1:N, Chain = 1:4)
dimnames(samps.j$pred_nd) <- list(Fish = fisl_nd, Compound = conl_nd, Iter = 1:N, Chain = 1:4)
dimnames(samps.j$mu_nd) <- list(Fish = fisl_nd, Compound = conl_nd, Iter = 1:N, Chain = 1:4)
dimnames(samps.j$tau_nd) <- list(Compound = conl_nd, Iter = 1:N, Chain = 1:4)
#dimnames(samps.j$timep) <- list(Dummy = "time", Iter = 1:N, Chain = 1:4)
dimnames(samps.j$mulocat) <- list(Area = locl, Iter = 1:N, Chain = 1:4)

##### conc_param contains expected values of the distribution parameters from the model

conc_param <- list(
  Omega = apply(samps.j$Omega, MARGIN = 1:3, FUN = mean),
  #      lenp = cbind(
  #        mean = apply(samps.j$lenp, MARGIN = 1, FUN = mean),
  #        sd = apply(samps.j$lenp, MARGIN = 1, FUN = sd)
  #      ),
  mu = apply(samps.j$mu, MARGIN = 1:2, FUN = mean),
  #      timep = cbind(
  #        mean = apply(samps.j$timep, MARGIN = 1, FUN = mean),
  #        sd = apply(samps.j$timep, MARGIN = 1, FUN = sd)
  #      )
  mu_nd =  apply(samps.j$mu_nd, MARGIN = 1:2, FUN = mean),
  tau_nd =  apply(samps.j$tau_nd, MARGIN = 1, FUN = mean),
  mulocat = apply(samps.j$mulocat, MARGIN = 1, FUN = mean)
)
#    names(dimnames(conc_param$lenp)) <- c("Fish","Metaparam")
#    names(dimnames(conc_param$timep)) <- c("Dummy","Metaparam")

conc_param <- melt(conc_param)
colnames(conc_param)[colnames(conc_param)=="value"] <- "Result"
colnames(conc_param)[colnames(conc_param)=="L1"] <- "Parameter"
conc_param$Compound[conc_param$Parameter =="tau_nd"] <- conl_nd # drops out because one-dimensional
conc_param$Area[conc_param$Parameter =="mulocat"] <- locl # drops out because one-dimensional
conc_param <- fillna(conc_param,c("Fish","Area"))
for(i in 1:ncol(conc_param)) {
  if("factor" %in% class(conc_param[[i]])) conc_param[[i]] <- as.character(conc_param[[i]])
}
conc_param <- Ovariable("conc_param",data=conc_param)

objects.store(conc_param)
cat("Ovariable conc_param stored.\n")

######################3

cat("Descriptive statistics:\n")

# Leave only the main fish species and congeners and remove others

#oprint(summary(
#  eu2[eu2$Compound %in% indices$Compound.PCDDF14 & eu$Fish %in% fisl , ],
#  marginals = c("Fish", "Compound"), # Matrix is always 'Muscle'
#  function_names = c("mean", "sd")
#))

#tmp <- euw[euw$Compound %in% c("PCDDF","PCB","BDE153","PBB153","PFOA","PFOS","DBT","MBT","TBT"),]
#ggplot(tmp, aes(x = eu2Result, colour=Fish))+stat_ecdf()+
#  facet_wrap( ~ Compound, scales="free_x")+scale_x_log10()

dimnames(samps.j$mulocat)

scatterplotMatrix(t(exp(samps.j$pred[2,,,1])), main = paste("Predictions for several compounds for",
                                                            names(samps.j$pred[,1,1,1])[2]))
scatterplotMatrix(t(exp(samps.j$pred[,1,,1])), main = paste("Predictions for all fish species for",
                                                            names(samps.j$pred[1,,1,1])[1]))
scatterplotMatrix(t(samps.j$Omega[2,,1,,1]), main = "Omega for several compounds in Baltic herring")

scatterplotMatrix(t((samps.j$pred_nd[1,,,1])), main = paste("Predictions for several compounds for",
                                                            names(samps.j$pred_nd[,1,1,1])[1]))

#scatterplotMatrix(t((samps.j$mulocat[,,1])), main = paste("Predictions for location average difference",
#                                                            names(samps.j$pred_nd[,1,1,1])[1]))

#plot(coda.samples(jags, 'Omega', N))
plot(coda.samples(jags, 'mu', N*thin, thin))
#plot(coda.samples(jags, 'lenp', N))
#plot(coda.samples(jags, 'timep', N))
plot(coda.samples(jags, 'pred', N*thin, thin))
plot(coda.samples(jags, 'mu_nd', N*thin, thin))
plot(coda.samples(jags, 'mulocat', N*thin, thin))
tst <- (coda.samples(jags, 'pred', N))

Initiate conc_poll

+ Show code - Hide code

#This is code Op_en3104/conc_poll on page [[EU-kalat]]

library(OpasnetUtils)

conc_poll <- Ovariable(
  "conc_poll",
  dependencies = data.frame(
    Name=c("conc_param"), #,"lengt","time"),
    Ident=c("Op_en3104/pollutant_bayes")#,NA,NA)
  ),
  formula=function(...) {
    require(MASS)
    tmp1 <- conc_param + Ovariable(data=data.frame(Result="0-1")) # Ensures Iter #  lengt + time + 
    tmp2 <- unique(tmp1@output[setdiff(
      colnames(tmp1@output)[tmp1@marginal],
      c("Compound","Compound2","Metaparam","Parameter")
    )])
    tmp2$Row <- 1:nrow(tmp2)
    tmp3 <- merge(tmp2,tmp1@output)
    out <- data.frame()
    for(i in 1:nrow(tmp2)) {
      
      ############## PCDDF (with multivariate mvnorm)
      
      tmp <- tmp3[tmp3$Row == i , ]
      Omega <- solve(tapply(
        tmp$conc_paramResult[tmp$Parameter=="Omega"],
        tmp[tmp$Parameter=="Omega", c("Compound","Compound2")],
        sum # Equal to identity because only 1 row per cell.
      )) # Precision matrix
      con <- names(Omega[,1])
      
      mu <- tmp$conc_paramResult[tmp$Parameter=="mu"][match(con,tmp$Compound[tmp$Parameter=="mu"])] # + # baseline
#        rnorm(1,
#              tmp$conc_paramResult[tmp$Parameter=="lenp" & tmp$Metaparam=="mean"][1],
#              tmp$conc_paramResult[tmp$Parameter=="lenp" & tmp$Metaparam=="sd"][1]
#        ) * (tmp$lengtResult[1]-170) + # lengt
#        rnorm(1,
#              tmp$conc_paramResult[tmp$Parameter=="timep" & tmp$Metaparam=="mean"][1],
#              tmp$conc_paramResult[tmp$Parameter=="timep" & tmp$Metaparam=="sd"][1]
#        )* (tmp$timeResult[1]-2009) # time
      
      rnd <- exp(mvrnorm(1, mu, Omega))
      out <- rbind(out, merge(tmp2[i,], data.frame(Compound=con,Result=rnd)))
      
      #################### PFAS etc (with univariate norm)
      con <- tmp$Compound[tmp$Parameter=="mu_nd"]
      mu <- tmp$conc_paramResult[tmp$Parameter=="mu_nd"]
      tau <- tmp$conc_paramResult[tmp$Parameter=="tau_nd"][match(con,tmp$Compound[tmp$Parameter=="tau_nd"])]
      mulocat <- tmp$conc_paramResult[tmp$Parameter=="mulocat"]
      for(j in 1:length(con)) {
        rnd <- exp(rnorm(1 , mu[j] + mulocat , tau[j]))
        out <- rbind(out,
                     data.frame(tmp2[i,],Compound = con[j],Result = rnd)
                     )
      }
    }
    out$Row <- NULL
#    temp <- aggregate(
#      out["Result"],
#      by=out[setdiff(colnames(out), c("Result","Compound"))],
#      FUN=sum
#    )
#    temp$Compound <- "TEQ"
    out <- Ovariable(
      output = out, # rbind(out, temp),
      marginal = colnames(out) != "Result"
    )
    return(out)
  }
)

objects.store(conc_poll)
cat("Ovariable conc_poll stored.\n")

Initiate conc_pcddf for Goherr

Model run 19.5.2017 [33]
Model run 23.5.2017 with bugs fixed [34]
Model run 12.10.2017: TEQ calculation added [35]
Model rerun 15.11.2017 because the previous stored run was lost in update [36]
12.3.2018 adjusted to match the same Omega for all fish species [37]
26.3.2018 includes length and time as parameters, lengt ovariable initiated here [38]

+ Show code - Hide code

# This is code Op_en3104/initiate on page [[EU-kalat]]

library(OpasnetUtils)

lengt <- Ovariable(
  "lengt",
  dependencies=data.frame(Name="size",Ident=NA),
  formula = function(...) {
    size$Lower <- as.numeric(as.character(size$Lower))
    rep <- unique(size@output[
      colnames(size@output)!="Lower" & size@marginal
      ])
    out <- data.frame()
    name <- paste(size@name, "Result", sep="")
    for(j in 1:nrow(rep)) {
      siz <- numeric()
      tmp <- merge(rep[j,,drop=FALSE], size@output)
      tmp <- tmp[order(tmp$Lower),]
      num <- tmp[[name]]/sum(tmp[[name]])
      for(i in 1:(nrow(tmp)-1)) { # Pick at random from each bin
        siz <- c(siz,runif(
          openv$N,
          tmp$Lower[i],
          tmp$Lower[i+1]
        )[1:ceiling(num[i]*openv$N)])
      }
      out <- rbind(out, cbind(
        rep[j,,drop=FALSE],
        Iter=1:openv$N,
        Result=sample(siz, openv$N) # Take fixed amount and shuffle
      ))
    }
    return(Ovariable(output=out, marginal=!grepl("Result", colnames(out))))
  }
)

conc_pcddf <- Ovariable(
  "conc_pcddf",
  dependencies = data.frame(
    Name=c("conc.param","lengt","time"),
    Ident=c("Op_en3104/bayes",NA,NA)
  ),
  formula=function(...) {
    require(MASS)
    tmp1 <- lengt + time + conc.param + Ovariable(data=data.frame(Result="0-1")) # Ensures Iter
    tmp2 <- unique(tmp1@output[setdiff(
      colnames(tmp1@output)[tmp1@marginal],
      c("Compound","Compound2","Metaparam","Parameter")
    )])
    tmp2$Row <- 1:nrow(tmp2)
    tmp3 <- merge(tmp2,tmp1@output)
    out <- data.frame()
    con <- levels(tmp1$Compound)
    for(i in 1:nrow(tmp2)) {
      tmp <- tmp3[tmp3$Row == i , ]
      mu <- tmp$conc.paramResult[tmp$Parameter=="mu"][match(con,tmp$Compound[tmp$Parameter=="mu"])] + # baseline
        rnorm(1,
          tmp$conc.paramResult[tmp$Parameter=="lenp" & tmp$Metaparam=="mean"][1],
          tmp$conc.paramResult[tmp$Parameter=="lenp" & tmp$Metaparam=="sd"][1]
        )* (tmp$lengtResult[1]-170) + # lengt
        rnorm(1,
          tmp$conc.paramResult[tmp$Parameter=="timep" & tmp$Metaparam=="mean"][1],
          tmp$conc.paramResult[tmp$Parameter=="timep" & tmp$Metaparam=="sd"][1]
        )* (tmp$timeResult[1]-2009) # time
    
      Omega <- solve(tapply( # Is it sure that PCDDF and PCB are not mixed to wrong order?
        tmp$conc.paramResult[tmp$Parameter=="Omega"],
        tmp[tmp$Parameter=="Omega", c("Compound","Compound2")],
        sum # Equal to identity because only 1 row per cell.
      )) # Precision matrix
    
      rnd <- exp(mvrnorm(1, mu, Omega))
      out <- rbind(out, merge(tmp2[i,], data.frame(Compound=names(rnd),Result=rnd)))
    }
    out$Row <- NULL
    temp <- aggregate(
      out["Result"],
      by=out[setdiff(colnames(out), c("Result","Compound"))],
      FUN=sum
    )
    temp$Compound <- "TEQ"
    out <- Ovariable(
      output = rbind(out, temp),
      marginal = colnames(out) != "Result"
    )
    return(out)
  }
)

objects.store(conc_pcddf, lengt)
cat("Ovariables conc_pcddf, lengt stored.\n")

⇤--#: . These codes should be coherent with POPs in Baltic herring. --Jouni (talk) 12:14, 7 June 2017 (UTC) (type: truth; paradigms: science: attack)

References

↑ A. Hallikainen, H. Kiviranta, P. Isosaari, T. Vartiainen, R. Parmanne, P.J. Vuorinen: Kotimaisen järvi- ja merikalan dioksiinien, furaanien, dioksiinien kaltaisten PCB-yhdisteiden ja polybromattujen difenyylieettereiden pitoisuudet. Elintarvikeviraston julkaisuja 1/2004. [1]
↑ E-R.Venäläinen, A. Hallikainen, R. Parmanne, P.J. Vuorinen: Kotimaisen järvi- ja merikalan raskasmetallipitoisuudet. Elintarvikeviraston julkaisuja 3/2004. [2]
↑ Anja Hallikainen, Riikka Airaksinen, Panu Rantakokko, Jani Koponen, Jaakko Mannio, Pekka J. Vuorinen, Timo Jääskeläinen, Hannu Kiviranta. Itämeren kalan ja muun kotimaisen kalan ympäristömyrkyt: PCDD/F-, PCB-, PBDE-, PFC- ja OT-yhdisteet. Eviran tutkimuksia 2/2011. ISSN 1797-2981 ISBN 978-952-225-083-4 [3]

[eukalatpop-1] A. Hallikainen, H. Kiviranta, P. Isosaari, T. Vartiainen, R. Parmanne, P.J. Vuorinen: Kotimaisen järvi- ja merikalan dioksiinien, furaanien, dioksiinien kaltaisten PCB-yhdisteiden ja polybromattujen difenyylieettereiden pitoisuudet. Elintarvikeviraston julkaisuja 1/2004. [1]

[eukalatmetallit-2] E-R.Venäläinen, A. Hallikainen, R. Parmanne, P.J. Vuorinen: Kotimaisen järvi- ja merikalan raskasmetallipitoisuudet. Elintarvikeviraston julkaisuja 3/2004. [2]

[eukalat2-3] Anja Hallikainen, Riikka Airaksinen, Panu Rantakokko, Jani Koponen, Jaakko Mannio, Pekka J. Vuorinen, Timo Jääskeläinen, Hannu Kiviranta. Itämeren kalan ja muun kotimaisen kalan ympäristömyrkyt: PCDD/F-, PCB-, PBDE-, PFC- ja OT-yhdisteet. Eviran tutkimuksia 2/2011. ISSN 1797-2981 ISBN 978-952-225-083-4 [3]

[1]

[2]

[3]

EU-kalat: Difference between revisions

Latest revision as of 09:04, 17 March 2021

Contents

Question

Answer

Rationale

Data

Calculations

Preprocess

Bayes model for dioxin concentrations

Initiate conc_pcddf for PFAS disease burden study

Initiate euw data.frame

Initiate conc_param using Bayesian approach

Initiate conc_poll

Initiate conc_pcddf for Goherr

See also

References

Navigation menu