|
|
|
 |
National Science
Foundation Award #0604801 |
 |
 |
 |
Cluster-Based Bootstrapping in Multiple Hypotheses Testing |
| |
| Investigator(s): |
Jeffrey Hart (PI)
|
| Sponsor: |
Texas A&M Research Foundation, TX 77843 9798453806
|
| Start Date/Expiration Date |
2006-06-01 to 2009-05-31 (amended 2006-04-28) |
| Awarded Amount to Date: |
$97,000 |
| Abstract: Cluster-Based Bootstrapping in Multiple Hypotheses Testing
The subject of this investigation is nonparametric methodology designed to provide sound inferences in large-scale multiple testing problems. In particular, situations are considered in which one observes a large
number of small data sets and wishes to test as many hypotheses as there are data sets. A primary concern is ensuring validity of tests when the data generating mechanism is largely unknown. A fundamental model considered is one wherein the distribution of data within small data sets is the same, up to location and scale, for all data sets. This distribution is assumed to determine the sampling distributions of all test statistics, and hence, given an estimate of the within-data-sets distribution, the bootstrap can be used to estimate the requisite sampling distributions. Cluster analysis is investigated as a means of nonparametrically estimating the common within-data-sets distribution and also the joint distribution of location and scale parameters across data sets. Asymptotic properties of such estimates are investigated. These asymptotics allow the number of small data sets to tend to infinity, but bound the sizes of individual data sets. Extensions to models where the distributions across data sets differ with respect to more than just location and scale are also
considered. A key idea in these extensions is defining and consistently estimating one or a small number of reference distributions that define critical values for all test statistics. This allows one to use existing technology to control the false discovery rate even though all test statistics have different sampling distributions, none of which can be estimated consistently.
Important areas of application for the research funded by this grant are microarray analysis and proteomics, both of which provide enormous insight into the study of genetics. The methods investigated have the potential of improving methods of analyzing microarray and proteomics data. Genetics has had and will continue to have a tremendous impact on society, particularly in the area of medicine. Therefore, any method that improves upon existing technology for analyzing genetics data has the potential of enhancing the general
quality of life. |
|
| NSF Org: |
DMS - Division of Mathematical Sciences |
| Award Number: |
0604801 |
| Award Instrument: |
Standard Grant |
| Program Manager: |
Grace L. Yang
DMS Division of Mathematical Sciences
MPS Directorate for Mathematical & Physical Sciences
|
| NSF Program(s): |
STATISTICS |
| Field Application(s): |
Other nsf.applications NEC |
| Program Reference Code(s): |
OTHER RESEARCH OR EDUCATION, OTHR |
| Program Element Code(s): |
1269 |
|
|
| |
 |
|
|
|
|
|
|
|