Characterizing sediment sources by non-negative matrix factorization of detrital geochronological data

Joel E Saylor, K. E. Sundell, G. R. Sharman

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

This paper explores an inverse approach to the problem of characterizing sediment sources’ (“source” samples) age distributions based on samples from a particular depocenter (“sink” samples) using non-negative matrix factorization (NMF). It also outlines a method to determine the optimal number of sources to factorize from a set of sink samples (i.e., the optimum factorization rank). We demonstrate the power of this method by generating sink samples as random mixtures of known sources, factorizing them, and recovering the number of known sources, their age distributions, and the weighting functions used to generate the sink samples. Sensitivity testing indicates that similarity between factorized and known sources is positively correlated to 1) the number of sink samples, 2) the dissimilarity among sink samples, and 3) sink sample size. Specifically, the algorithm yields consistent, close similarity between factorized and known sources when the number of sink samples is more than ∼3 times the number of source samples, sink data sets are internally dissimilar (cross-correlation coefficient range >0.3, Kuiper V value range >0.35), and sink samples are well-characterized (>150–225 data points). However, similarity between known and factorized sources can be maintained while decreasing some of these variables if other variables are increased. Factorization of three empirical detrital zircon U–Pb data sets from the Book Cliffs, the Grand Canyon, and the Gulf of Mexico yields plausible source age distributions and weights. Factorization of the Book Cliffs data set yields five sources very similar to those recently independently proposed as the primary sources for Book Cliffs strata; confirming the utility of the NMF approach. The Grand Canyon data set exemplifies two general considerations when applying the NMF algorithm. First, although the NMF algorithm is able to identify source age distribution, additional geological details are required to discriminate between primary or recycled sources. Second, the NMF algorithm will identify the most basic elements of the mixed sink samples and so may subdivide sources that are themselves heterogeneous mixtures of more basic elements into those basic elements. Finally, application to a large Gulf of Mexico data set highlights the increased contribution from Appalachian sources during Cretaceous and Holocene time, potentially attributable to drainage reorganization. Although the algorithm reproduces known sources and yields reasonable sources for empirical data sets, inversions are inherently non-unique. Consequently, the results of NMF and their interpretations should be evaluated in light of independent geological evidence. The NMF algorithm is provided both as MATLAB code and a stand-alone graphical user interface for Windows and macOS (.exe and.app) along with all data sets discussed in this contribution.

Original languageEnglish (US)
Pages (from-to)46-58
Number of pages13
JournalEarth and Planetary Science Letters
Volume512
DOIs
StatePublished - Apr 15 2019
Externally publishedYes

Fingerprint

Factorization
factorization
Sediments
sediments
sinks
matrix
age structure
matrices
sediment
cliff
canyon
cliffs
Grand Canyon (AZ)
depocenter
Gulf of Mexico
zircon
Holocene
Graphical user interfaces
drainage
Application programs

Keywords

  • Book Cliffs
  • Grand Canyon
  • Gulf of Mexico
  • low rank approximation
  • mixture modeling
  • U–Pb

ASJC Scopus subject areas

  • Geophysics
  • Geochemistry and Petrology
  • Earth and Planetary Sciences (miscellaneous)
  • Space and Planetary Science

Cite this

Characterizing sediment sources by non-negative matrix factorization of detrital geochronological data. / Saylor, Joel E; Sundell, K. E.; Sharman, G. R.

In: Earth and Planetary Science Letters, Vol. 512, 15.04.2019, p. 46-58.

Research output: Contribution to journalArticle

@article{596646bd5b1b43948872016f9a37890d,
title = "Characterizing sediment sources by non-negative matrix factorization of detrital geochronological data",
abstract = "This paper explores an inverse approach to the problem of characterizing sediment sources’ (“source” samples) age distributions based on samples from a particular depocenter (“sink” samples) using non-negative matrix factorization (NMF). It also outlines a method to determine the optimal number of sources to factorize from a set of sink samples (i.e., the optimum factorization rank). We demonstrate the power of this method by generating sink samples as random mixtures of known sources, factorizing them, and recovering the number of known sources, their age distributions, and the weighting functions used to generate the sink samples. Sensitivity testing indicates that similarity between factorized and known sources is positively correlated to 1) the number of sink samples, 2) the dissimilarity among sink samples, and 3) sink sample size. Specifically, the algorithm yields consistent, close similarity between factorized and known sources when the number of sink samples is more than ∼3 times the number of source samples, sink data sets are internally dissimilar (cross-correlation coefficient range >0.3, Kuiper V value range >0.35), and sink samples are well-characterized (>150–225 data points). However, similarity between known and factorized sources can be maintained while decreasing some of these variables if other variables are increased. Factorization of three empirical detrital zircon U–Pb data sets from the Book Cliffs, the Grand Canyon, and the Gulf of Mexico yields plausible source age distributions and weights. Factorization of the Book Cliffs data set yields five sources very similar to those recently independently proposed as the primary sources for Book Cliffs strata; confirming the utility of the NMF approach. The Grand Canyon data set exemplifies two general considerations when applying the NMF algorithm. First, although the NMF algorithm is able to identify source age distribution, additional geological details are required to discriminate between primary or recycled sources. Second, the NMF algorithm will identify the most basic elements of the mixed sink samples and so may subdivide sources that are themselves heterogeneous mixtures of more basic elements into those basic elements. Finally, application to a large Gulf of Mexico data set highlights the increased contribution from Appalachian sources during Cretaceous and Holocene time, potentially attributable to drainage reorganization. Although the algorithm reproduces known sources and yields reasonable sources for empirical data sets, inversions are inherently non-unique. Consequently, the results of NMF and their interpretations should be evaluated in light of independent geological evidence. The NMF algorithm is provided both as MATLAB code and a stand-alone graphical user interface for Windows and macOS (.exe and.app) along with all data sets discussed in this contribution.",
keywords = "Book Cliffs, Grand Canyon, Gulf of Mexico, low rank approximation, mixture modeling, U–Pb",
author = "Saylor, {Joel E} and Sundell, {K. E.} and Sharman, {G. R.}",
year = "2019",
month = "4",
day = "15",
doi = "10.1016/j.epsl.2019.01.044",
language = "English (US)",
volume = "512",
pages = "46--58",
journal = "Earth and Planetary Sciences Letters",
issn = "0012-821X",
publisher = "Elsevier",

}

TY - JOUR

T1 - Characterizing sediment sources by non-negative matrix factorization of detrital geochronological data

AU - Saylor, Joel E

AU - Sundell, K. E.

AU - Sharman, G. R.

PY - 2019/4/15

Y1 - 2019/4/15

N2 - This paper explores an inverse approach to the problem of characterizing sediment sources’ (“source” samples) age distributions based on samples from a particular depocenter (“sink” samples) using non-negative matrix factorization (NMF). It also outlines a method to determine the optimal number of sources to factorize from a set of sink samples (i.e., the optimum factorization rank). We demonstrate the power of this method by generating sink samples as random mixtures of known sources, factorizing them, and recovering the number of known sources, their age distributions, and the weighting functions used to generate the sink samples. Sensitivity testing indicates that similarity between factorized and known sources is positively correlated to 1) the number of sink samples, 2) the dissimilarity among sink samples, and 3) sink sample size. Specifically, the algorithm yields consistent, close similarity between factorized and known sources when the number of sink samples is more than ∼3 times the number of source samples, sink data sets are internally dissimilar (cross-correlation coefficient range >0.3, Kuiper V value range >0.35), and sink samples are well-characterized (>150–225 data points). However, similarity between known and factorized sources can be maintained while decreasing some of these variables if other variables are increased. Factorization of three empirical detrital zircon U–Pb data sets from the Book Cliffs, the Grand Canyon, and the Gulf of Mexico yields plausible source age distributions and weights. Factorization of the Book Cliffs data set yields five sources very similar to those recently independently proposed as the primary sources for Book Cliffs strata; confirming the utility of the NMF approach. The Grand Canyon data set exemplifies two general considerations when applying the NMF algorithm. First, although the NMF algorithm is able to identify source age distribution, additional geological details are required to discriminate between primary or recycled sources. Second, the NMF algorithm will identify the most basic elements of the mixed sink samples and so may subdivide sources that are themselves heterogeneous mixtures of more basic elements into those basic elements. Finally, application to a large Gulf of Mexico data set highlights the increased contribution from Appalachian sources during Cretaceous and Holocene time, potentially attributable to drainage reorganization. Although the algorithm reproduces known sources and yields reasonable sources for empirical data sets, inversions are inherently non-unique. Consequently, the results of NMF and their interpretations should be evaluated in light of independent geological evidence. The NMF algorithm is provided both as MATLAB code and a stand-alone graphical user interface for Windows and macOS (.exe and.app) along with all data sets discussed in this contribution.

AB - This paper explores an inverse approach to the problem of characterizing sediment sources’ (“source” samples) age distributions based on samples from a particular depocenter (“sink” samples) using non-negative matrix factorization (NMF). It also outlines a method to determine the optimal number of sources to factorize from a set of sink samples (i.e., the optimum factorization rank). We demonstrate the power of this method by generating sink samples as random mixtures of known sources, factorizing them, and recovering the number of known sources, their age distributions, and the weighting functions used to generate the sink samples. Sensitivity testing indicates that similarity between factorized and known sources is positively correlated to 1) the number of sink samples, 2) the dissimilarity among sink samples, and 3) sink sample size. Specifically, the algorithm yields consistent, close similarity between factorized and known sources when the number of sink samples is more than ∼3 times the number of source samples, sink data sets are internally dissimilar (cross-correlation coefficient range >0.3, Kuiper V value range >0.35), and sink samples are well-characterized (>150–225 data points). However, similarity between known and factorized sources can be maintained while decreasing some of these variables if other variables are increased. Factorization of three empirical detrital zircon U–Pb data sets from the Book Cliffs, the Grand Canyon, and the Gulf of Mexico yields plausible source age distributions and weights. Factorization of the Book Cliffs data set yields five sources very similar to those recently independently proposed as the primary sources for Book Cliffs strata; confirming the utility of the NMF approach. The Grand Canyon data set exemplifies two general considerations when applying the NMF algorithm. First, although the NMF algorithm is able to identify source age distribution, additional geological details are required to discriminate between primary or recycled sources. Second, the NMF algorithm will identify the most basic elements of the mixed sink samples and so may subdivide sources that are themselves heterogeneous mixtures of more basic elements into those basic elements. Finally, application to a large Gulf of Mexico data set highlights the increased contribution from Appalachian sources during Cretaceous and Holocene time, potentially attributable to drainage reorganization. Although the algorithm reproduces known sources and yields reasonable sources for empirical data sets, inversions are inherently non-unique. Consequently, the results of NMF and their interpretations should be evaluated in light of independent geological evidence. The NMF algorithm is provided both as MATLAB code and a stand-alone graphical user interface for Windows and macOS (.exe and.app) along with all data sets discussed in this contribution.

KW - Book Cliffs

KW - Grand Canyon

KW - Gulf of Mexico

KW - low rank approximation

KW - mixture modeling

KW - U–Pb

UR - http://www.scopus.com/inward/record.url?scp=85061405653&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061405653&partnerID=8YFLogxK

U2 - 10.1016/j.epsl.2019.01.044

DO - 10.1016/j.epsl.2019.01.044

M3 - Article

VL - 512

SP - 46

EP - 58

JO - Earth and Planetary Sciences Letters

JF - Earth and Planetary Sciences Letters

SN - 0012-821X

ER -