compendia types: Published Papers(130) Journal or Magazine Articles(44) Working Papers(1) Problem Sets(1) Books(1)

research fields: Computer and Information Sciences(175) Statistics(2) Econometrics(2)

251 through 272 of 272 resultsJournal of Econometrics (2012)

A.M. Robert Taylor, Stephen J. Leybourne, David I. Harvey

This code implements the Harvey, Leybourne and Taylor's union of rejections decision rule. Please read the following pdf document (link "more") for details.

DetailsEconometrica (2006)

Halbert White, Raffaella Giacomini

This code proposes a general framework for out-of-sample predictive ability testing and forecast selection when the model can be misspecified. It can be applied to different types of forecasts issued from both nested and non-nested models using different estimation techniques for a general loss function (chosen by the user). It accommodates both conditional and unconditional evaluation objectives. The null is H0: E( Loss(model A) - Loss(model B) )= 0. The sign of the test-statistics indicates which forecast performs better: a positive test-statistic indicates that model A forecast produces larger average loss than the model B forecast (model B outperforms model ...

DetailsStatistical Analysis and Data Mining

Jacopo Soriano, Timothy Au, David Banks

Computational advertising uses information on web-browsing activity and additional covariates to select advertisements for display to the user. The statistical challenge is to develop methodology that matches ads to users who are likely to purchase the advertised product. These methods not only involve text mining, but also may draw upon additional modeling related to both the user and the advertisement. This paper reviews various aspects of text mining, including n-grams, topic modeling, and text networks, and discusses different strategies in the context of specific business models.

DetailsarXiv

Edwin A. Henneken, Michael J. Kurtz, Alberto Accomazzi

These files contain the data used for constructing figure 2 in "The ADS in the Information Age - Impact on Discovery" (http://adsabs.harvard.edu/abs/2012opsa.book..253H, arXiv:1106.5644). This figure shows the fraction of ADS world usage (for specific regions) as a function of GDP per capita (all values normalized by their 1997 value). For specifics: see paper. The data have been uploaded as a JPG figure, plain text file and a TAR archive with files used by the graphing program DataGraph.

DetailsBulletin of the Ecological Society of America

Deana D. Pennington, William K. Michener

The Kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler. Kepler is designed to help scientists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines. Kepler can operate on data stored in a variety of formats, locally and over the internet, and is an effective environment for integrating disparate software components, such as merging "R" scripts with compiled "C" code, or facilitating remote, distributed execution of models. Using Kepler's graphical user interface, users simply select ...

DetailsRice University Press

Good, Irving John, David Banks, and Eric P. Smith

This book collects the "Comments, Conjectures, & Conclusions" that I. J. Good wrote for the *Journal of Stastical Computing and Simulation*. These notes are a whimsical mixture of mathematics, statistics, social commentary and intelligent speculation. In the canon of notes by eminent mathematicians, they seem most comparable to *Littlewood's Miscellany*. We recommend that readers dip into them at leisure, rather than proceeding linearly or quickly. One wants to appreciate both their quirky donnish style and the buoyantly creative intellect that created them.

Stanford University (2013)

David Donoho, Matan Gavish Coders: Matan Gavish, David Donoho

Coefficient determining optimal location of Hard Threshold for Matrix Denoising by Singular Values Hard Thresholding when noise level is known or unknown. See D. L. Donoho and M. Gavish, "The Optimal Hard Threshold for Singular Values is 4/sqrt(3)", http://arxiv.org/abs/1305.5870 IN: beta: aspect ratio m/n of the matrix to be denoised, 0<beta<=1. beta may be a vector sigma_known: 1 if noise level known, 0 if unknown OUT: coef: optimal location of hard threshold, up the median data singular value (sigma unknown) or up to sigma*sqrt(n) (sigma known); a vector of the same dimension as beta, where coef(i) is the coefficient correcponding ...

DetailsEcological Modelling

Clément Calenge

The practical analysis of space use and habitat selection by animals is often a problem due to the lack of well-designed programs. I present here the “adehabitat” package for the R software, which offers basic GIS (Geographic Information System) functions, methods to analyze radio-tracking data and habitat selection by wildlife, and interfaces with other R packages. These tools can be downloaded freely on the internet. Because the functions of this package can be combined with other functions of R, “adehabitat” provides a powerful environment for the analysis of the space and habitat use.

DetailsStanford University (2013)

David Donoho, Andrea Montanari, Matan Gavish

The program provided calculates the asymptotic minimax MSE, and the asymptotic minimax tuning threshold, of matrix denoising by Singular Value Thresholding: lim_{N->\infty} inf_lambda sup_{rank(X)<= M*rho} MSE ||Xhat_lambda - X||^2_F /MN ... Here: (*) Xhat_lambda is the Singular Value Thresholding denoiser (applying soft thresholding with threshold lambda to each singular value of the data) (*) rho is the asymptotic rank fraction (*) M/N -> beta (the asymptotic aspect ratio) (*) X is an M-by-N matrix, M<=N (*) ||.||_F denotes the Frobenius matrix norm (sum of squares of matrix entries)

Analytical & Bioanalytical Techniques

Mahendra Kumar Trivedi

Bile salt (BS) and proteose peptone (PP) are important biomacromolecules being produced inside the human body. The objective of this study was to investigate the influence of biofield treatment on physicochemical properties of BS and PP. The study was performed in two groups (control and treated). The control group remained as untreated, and biofield treatment was given to treated group. The control and treated BS and PP samples were characterized by particle size analyzer (PSA), Brunauer-Emmett-Teller (BET) analysis, differential scanning calorimetry (DSC), x-ray diffraction (XRD), and thermogravimetric analysis (TGA). PSA results showed increase in particle size (d50 and d99) of ...

DetailsF1000Research

Liliana Florea, Li Song, Steven L Salzberg

Software and associated data from the paper, including: ASprofile software (ASprofile.tar.gz); Exon skipping events with annotations (BodyMap.exon_skipping.tbl); Concatenated GTF transcript file for the 16 tissues (BodyMap.ens61.0.0.gtf.gz); (All) Events extracted from pairwise transcript comparisons (BodyMap.ens61.0.0.as.gz); Expression (FPKM) values for exon skipping events in the 16 tissues (BodyMap.fpkms.tar.gz); and Supporting summary data (Excel) for the alternative splicing analyses (BodyMap.xls).

DetailsPLoS ONE

Victoria Stodden, Peixuan Guo, Zhaokun Ma, Dmitri Zaykin

Journal policy on research data and code availability is an important part of the ongoing shift toward publishing reproducible computational science. This article extends the literature by studying journal data sharing policies by year (for both 2011 and 2012) for a referent set of 170 journals. We make a further contribution by evaluating code sharing policies, supplemental materials policies, and open access status for these 170 journals for each of 2011 and 2012. We build a predictive model of open data and code policy adoption as a function of impact factor and publisher and find higher impact journals more likely ...

DetailsNature Biotechnology

Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, Lior Pachter

High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation1, 2, 3. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, ...

DetailsF1000Research

Wang Liang, Zhao Kai Yong

This paper presents a novel method to predict the functions of amino acid sequences, based on statistical machine translation programs. To build the translation model, we use the “parallel corpus” concept. For instance, an English sentence “I love apples” and its corresponding French sentence “j’adore les pommes” are examples of a parallel corpus. Here we regard an amino acid sequence like “MTMDKSELVQKA” as one language, and treat its functional description as “0005737 0006605 0019904 (Gene Ontology terms)” as a sentence of another language. We select amino acid sequences and their corresponding functional descriptions in Gene Ontology terms to build the ...

DetailsJournal of Statistical Software

Maria Karlsson and Anita Lindmark

Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ﬁnite sample properties. The package ...

DetailsJournal of Econometrics (2011)

Andrew J. Patton

This code compares volatility forecasts from two models (A and B). It produces the t-statistics from Diebold–Mariano–West tests of equal predictive accuracy. The null is expressed as follows H0: Loss(model A) - Loss(model B) = 0. The sign of the t-statistics indicates which forecast performs better for each loss function: a positive t-statistic indicates that model A forecast produces larger average loss than the model B forecast, while a negative sign indicates the opposite. The statistics are displayed for various values of the scale parameter b of the loss function (equation 24, page 252), chosen by the user. The cases ...

DetailsAmerican Journal of Political Science

James Honaker, Gary King

Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in these fields have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the time-series cross-section data structures common in these fields. We attempt to rectify this situation. First, we ...

DetailsPLoS ONE

Marcel A. L. M. van Assen, Robbie C. M. van Aert, Michèle B. Nuijten, Jelte M. Wicherts, K. Brad Wray

De Winter and Happee [1] examined whether science based on selective publishing of significant results may be effective in accurate estimation of population effects, and whether this is even more effective than a science in which all results are published (i.e., a science without publication bias). Based on their simulation study they concluded that “selective publishing yields a more accurate meta-analytic estimation of the true effect than publishing everything, (and that) publishing nonreplicable results while placing null results in the file drawer can be beneficial for the scientific collective” (p.4).

Using their scenario with a small to medium population effect ...

DetailsIEEE Transactions on Information Theory (2006)

Michael Elad

This code graphically illustrates the performance of various denoising algorithms, including the IRLS (Iterative Reweighting Least-Squares) and three versions of shrinkage, namely simple, parallel and sequential. Their performance at the first and last iteration is gauged. For this, Kmax ortho-matrices of a given size (n) are randomly generated to construct the dictionary. The number of non-zeros, the strength of the noise and the number of iterations must also be specified. As random dictionaries are used, the figures may show slight deviation from the paper's figures. The computational time also varies and it can easily exceed 10 minutes. For more information, ...

DetailsBulletin of the Ecological Society of America

Yongtao Guan, Stephen M. Krone

WinSSS is a Windows-based program for simulating stochastic spatial models that are individual based, have discrete special structure, and continuous time. These are commonly referred to as Interacting Particle Systems (IPS) and asynchronously-updated Probabilistic Cellular Automata (PCA). Currently, WinSSS includes the following models:

` 1. Cyclic Resource-Species Model 2. Epidemic Model 3. Basic Contact Process 4. Multi-type Contact Process 5. General Rock-Scissor-Paper Model 6. Voter Model (Linear and Threshold) 7. Greenberg-Hastings Model `

Details
Future Generation Computer Systems

Ewa Deelman, Dennis Gannon, Matthew Shields, Ian Taylor

Scientific workflow systems have become a necessary tool for many applications, enabling the composition and execution of complex analysis on distributed resources. Today there are many workflow systems, often with overlapping functionality. A key issue for potential users of work- flow systems is the need to be able to compare the capabilities of the various available tools. There can be confusion about system functionality and the tools are often selected without a proper functional analysis. In this paper we extract a taxonomy of features from the way sci- entists make use of existing workflow systems and we illustrate this feature ...

Details2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508)

J. Lofberg

The MATLAB toolbox YALMIP is introduced. It is described how YALMIP can be used to model and solve optimization problems typically occurring in systems and control theory. In this paper, free MATLAB toolbox YALMIP, developed initially to model SDPs and solve these by interfacing eternal solvers. The toolbox makes development of optimization problems in general, and control oriented SDP problems in particular, extremely simple. In fact, learning 3 YALMIP commands is enough for most users to model and solve the optimization problems.

Details