Speaker: Hyebin Song (Penn State)
Date: 4-5pm EST on May 14th, 2021
Zoom link: https://psu.zoom.us/j/96312135796
Speaker: Hyebin Song (The Pennsylvania State University)
Title: Statistical inference for high-dimensional and large-scale data with noisy labels
Abstract: In many classification applications, we are presented with data with partially observed or contaminated labels. One example of such an application is in the analysis of datasets from deep mutational scanning (DMS) experiments in proteomics, which typically do not contain non-functional sequences. In many of these settings, the problem of interest is high-dimensional where the number of features is substantially larger than the sample size. Moreover, the rate of contamination is often unavailable depending on the experimental protocols, which further complicates the downstream analysis. In this talk, I will present both parametric and semi-parametric approaches for analyzing noisy, high-dimensional binary data. I will first demonstrate that when the rate of contamination is available, the noisy label model belongs to a generalized linear model with a non-canonical link, and optimal inference is possible despite the non-convex objective. I will then present a new semi-parametric approach based on hard-thresholding for the analysis of high-dimensional noisy labels data when prior knowledge of the contamination rate is unavailable. Finally, I will present an application of our methodology to inferring sequence-function relationships and designing highly stabilized enzymes based on large-scale DMS data.
Date: 4-5pm EST on April 16th, 2021 (Fri)
Zoom link: https://psu.zoom.us/j/96312135796
Speaker: Kiseop Lee (Purdue University)
Title: Data Science and Modern Financial Markets
Abstract: Since the celebrated Black-Scholes model emerged in 1973, quantitative approaches in finance have been state-of-the-art tools in financial markets. Recently, development of high frequency markets based on automated and algorithmic trading together with the boom of data science in other areas also led financial industries to adapt to this new trend. In this talk, we discuss briefly the history of quantitative finance, various machine learning tools currently used in financial markets, challenges and promises.
It is disheartening to see the rise of xenophobia and racial intolerance in America in recent years. The difficulties from the pandemic and, more notably, the misinformation and false narratives spread about the COVID-19 virus certainly played a role in exacerbating anti-Asian sentiment and violence in the States. However, we recognize that racism has always been deeply rooted within the US society. The recent tragedy in Atlanta finally brought it to the attention of mainstream media; it is a tragedy and a disgrace to see such hatred on display. Our heartfelt condolences go out to victims' families, relatives and friends as they grieve the loss of their loved ones.
The Korean International Statistical Society (KISS) unequivocally condemns all the violence and discrimination perpetrated against the Asian Americans and Pacific Islanders (AAPI) communities. We remain committed to supporting the AAPI communities, and to working with other professional and scientific societies to cultivate a diverse, inclusive, equitable, and productive environment. We will continue to support our KISS members, our AAPI colleagues, and AAPI students to meet the challenges facing the AAPI communities and our nation.
MoonJung Cho, KISS President
Jae-Kwang Kim, KISS President Elect
Don Jang, KISS Past President
On behalf of KISS Officers
Our sister societies have also made statements on anti-Asian racism, which can be accessed from the following link:
The first KISS webinar was held with great success. Thank you very much again for your participation! The seminar video was recorded and it is now publicly available in the following link:
Speaker: Jae-Kwang Kim (Iowa State University)
Title: Statistical Inference after Kernel Ridge Regression Imputation under Item Nonresponse
Abstract: Kernel Ridge Regression (KRR) is a modern regression technique based on the theory of Reproducing Kernel Hilbert Space. We use KRR to develop imputation for handling item nonresponse. While the KRR is potentially promising for imputation, its statistical properties are not fully investigated in the literature. We first establish the root-n consistency of the KRR imputation estimators and show that it is optimal in the sense that it achieves the lower bound of the semiparametric asymptotic variance. A consistent variance estimator is also proposed by a novel application of the KRR estimator of the density ratio function. Results from a limited simulation study are also presented to confirm our theory.