Atti Convegno Intermedio SIS2009

  

STATISTICAL METHODS FOR THE ANALYSIS OF LARGE DATA-SETS
Pescara, 23-25 settembre 2009

ISBN 978-88-6129-425-7
Editore: CLEUP – Padova

 

Plenary session

Jerome H. Friedman
Fast Sparse Regression and Classification

Marco Riani
Problems and Challenges in the Analysis of Complex Data: Static and Dynamic Approaches

  • Statistical analysis of high-dimensional gene expression time series

Rainer Opgen‐Rhein
Efficient estimation of correlations and dependencies in high‐dimensional gene expression data

Claudia Angelini, Daniela De Canditiis, Marianna Pensky
Bayesian models for time‐course microarray analysis: from genes’ detection to Clustering

Maurice Berk, Giovanni Montana
Detecting differentially expressed genes in longitudinal microarray experiments

  • Methodological developments and data availability for integration of micro/macro and social network dimensions of sociodemographic behaviour

Giuseppe Micheli
Gestalt switches in the idea of context: A macro dimension of the world for every theory of action

Daniele Vignoli, Anna Matysiak
“Reconciling” micro and macro in family demography research: An update

Andrej Kveder
Generations and Gender Programme ‐ Micro‐macro data source on generational and gender ties

  • Bayesian networks and their application

Robert G. Cowell
Auto generation of large Bayesian networks for problems in forensic genetics

Paola Vicard
Applications of Bayesian networks in official statistics

Fabio Corradi
Some issues on the identification of relatives of individuals included in a database of DNA profiles

  • Financial modelling with large dataset

Wolfgang Härdle, Nikolaus Hautsch, Andrija Mihoci
Modelling and forecasting liquidity supply using semiparametric factor dynamics

Philip Saks, Dietmar Maringer
Evolutionary money management

Manfred Gilli, Enrico Schumann
Large‐scale portfolio optimisation with heuristics

  • Dealing with uncertainty in biomedical datasets

Jonathan M. Garibaldi
Fuzzy techniques for modelling uncertainty in medical data and knowledge

Federico Maria Stefanini
Prior beliefs about the structure of a probabilistic network

S. Cabras, N. Pirastu, L. Casula, M.E. Castellanos, L. Persico, A. Sassu, G. Biino, S.R. Del Giacco, M. Pirastu
An application of random forest and Hungarian method to genome wide association of Asthma in a genetic isolate of Ogliastra

  • Model‐based clustering of functional data

Abel Rodriguez
Nested functional clustering using basis representations

Simone Vantini
Joint clustering and alignment of functional data: An application to vascular geometries

Elvira Romano
Spatio‐Functional data analysis: Clustering methods for models discovering

  • Statistical methods for the analysis of social networks

Anuska Ferligoj
Clustering attribute and/or relational data

Giordano Giuseppe, Maria Prosperina Vitale
On the use of multidimensional data analysis techniques for social networks

Susanna Zaccarin
The analysis of collaboration networks: Methodological and applied issues

  • Geometry and statistics

John T. Kent
Investigating patterns in shape analysis

Charles C. Taylor
Directional data on the torus, with applications to protein structure

Luigi Ippoliti, Pasquale Valentini, Antonio Gattone
Statistical analysis of facial expressions

  • Large datasets in finance: High frequency data and volatility estimation

Rocco Mosconi
Stock prices and traded quantities: Evidence from ultra high frequency data

Eduardo Rossi, Paolo Santucci de Magistris
Long memory and tail dependence in trading volume and volatility

Fulvio Corsi, Davide Pirino, Roberto Renò
Threshold bipower variation and the impact of jumps on volatility forecasting

  • Methods and models for the collection and analysis of World Wide Web data

Maurizio Naldi
Statistical modelling of WWW traffic

Bruno Scarpa
Mining massive data sets from Web

Silvia Biffignandi, Jelke Bethlehem
Web surveys: Methodological problems and research perspectives

  • The treatment of large administrative data sets in the ESS

Martin Luppes, Fabienne Fortanier
Research on globalisation statistics Netherlands

Anne Berthomieu
Managing of huge administrative data: The example of EU external trade statistics

  • Spacetime modeling for large datasets and distributed computing

Viktor P.Zastavnyi, Emilio Porcu
Covariances functions for massive space‐time dataset

Michela Cameletti
Distributed computing for spatio‐temporal hierarchical models

Orietta Nicolis, Doug Nychka
Reduced rank covariances for the analysis of environmental data

  • Statistical analysis of international trade data and their use in policies

Domenico Perrotta e Francesca Torti
Dinamic visualization of outliers and mixtures in trade data through robust methods

Michael J. Ferrantino, Xuepeng Liu, Zhi Wang
Avoidance behaviors of exporters and importers: Evidence from the U.S.‐China trade data discrepancy

Vytis Kopustinskas, Spyros Arsenis
Risk analysis approaches to rank outliers in trade data

  • Large datasets and statistical selection problem

Alain Dessertaine
Data‐Stream and multi‐level statistical data recovery strategy: An application to data‐stream from electrical communicating meters

Christian Derquenne
Research of optimal hierarchy of ordinal explanatory variables to explain an ordinal response variable

Jean‐Michel Poggi, Robin Genuer, Christine Tuleau
Variable selection using random forests

  • Large data structures from highthroughput genomic experiments

Fabio Macciardi
Analytical methods to identify genes for complex traits in Genome‐Wid Association Studies GWAS

R. Tagliaferri, I. Bifulco, F. Napolitano, G. Raiconi
Interactive analysis of genomic data: The meta clustering and consensus approach

Alberto Roverato, Robert Castelo
An application of the non‐rejection rate in meta‐analysis

  • Probabilistic and component-based approaches to structural equation modeling for the analysis of causality networks and multi‐block data structures

Arthur Tenenhaus
Kernel PLS path modeling

Giorgio Russolillo, Laura Trinchera, Vincenzo Esposito Vinzi
A non linear regularized component‐based approach to structural equation modeling

Concetto Elvio Bonafede
A Bayesian network for integration of risk assessment

  • Statistics in the omics era

Joaquim Pinto da Costa, Hugo Alonso, Luis Roque
A weighted principal component analysis and its application to microarray data

Lisete Sousa
Proteomics: Predicting proteins structure

Giovani Silva, Manuela Oliveira, José Borges
Modelling and analysis of forest fire data in Portugal

  • Issues in large scale social surveys

Vijay Verma
Issues in design and implementation of large‐scale repeated social surveys

Dalit Contini
International surveys on students’ competences and the evaluation of educational systems

  • Analysis of functional data

Manuel Febrero‐Bande, Pedro Galeano, Wenceslao González‐Manteiga
Principal components selection for estimating the functional linear model with scalar response

Pedro Delicado
Dimensionality reduction when data are density functions

Amparo Baìllo, Juan Cuesta‐Albertos, Antonio Cuevas
Some issues on supervised classification for Gaussian processes

  • Spatial econometrics: Methods and applications

Peter M. Robinson
Nonparametric regression with spatial data

Marco Bee, Giuseppe Espa
Estimating auto‐models with missing data

  • Visualization of large datasets in business and finance

Alfonso Iodice D’Enza , Michael J. Greenacre
Multiple correspondence analysis for quantification and visualization of huge categorical data

Domenico Vistocco, Claudio Conversano
Visualizing and clustering financial portfolios using internal compositions

Adalbert Wilhelm
Visual exploration of association rules

  • Statistical models for financial risk governance

Marcus Spies
Towards modelling operational risk in service networks

Paola Cerchiello, Elvio Concetto Bonafede
Semantic based DCM models for text classification

Silvia Figini, Juan Tomas Sajago
Longitudinal models for market reputation and risk

  • Visualizing large datasets

Antony Unwin
Largely about largeness: Graphics of large datasets

Alexander Gribov
Visualization of microarray data

  • Complexity, information and uncertainty in the analysis of economic phenomena

Massimo Egidi
“Imprinting” and the dynamic of re‐categorizing in problem solving

N.T. Longford, M.G. Pittau
Multiple imputation tailored for a specific database

Luciano Pietronero
Self‐organization and finite size effects in agent models for financial markets

  • Quality of measurement means quality of data: The case of environmental assessment

Maurizio Caselli, Eleonora Andriani
Multivariate receptors model for environmental source apportionment

Maurizio Caselli, Livia Trizio, Gianluigi De Gennaro
A simple feed forward neural network for the PM10 forecasting: Comparison with a radial basis function network

Luigi Campanella, Marcelo Enrique Conti
Principal component analysis applied to biomonitoring studies: A case study

Luigi Campanella
The quality of the measurements and the quality of life

  • Contributed papers

Rosa Arboretti, Stefano Bonnini, Livio Corain, Francesca Solmi
A comparative study on multiple comparison procedures

Rosa Arboretti, Dario Basso, Federico Campigotto, Luigi Salmaso
Permutation tests for survival data analysis

Maria Felice Arezzo, Giorgio Alleva
Estimation of probability of undeclared employment in Italian building industry

Michael Ashbrook
Mirror outlier detection; An overview

Luigi Augugliaro, Angelo M. Mineo, Giuseppe Cammarata, Alessandra Santoro
Genetic network construction in CML gene expression profile data analysis

Filippo Belloc, Antonello Maruotti, Lea Petrella
University student performance analysis with non‐ignorable drop‐out

Laura Bocci, Isabella Mingo
Clustering large data set: An applied comparative study

Andrea Bonanomi, Silvia Angela Osmetti
The Rasch model for victimization analysis: A proposal of an insecurity perception index

Riccardo Bramante, Luigi Santamaria
Capturing liquidity premia in high frequency financial time series

Giorgio Calzolari, Antonino Di Pino
Individual wage and reservation wage: Efficient estimation of a simultaneous equations model with endogenous limited dependent variables

Elisabetta Carfagna, Patrizia Tassinari, Maroussa Zagoraiou, Stefano Benni, Daniele Torreggiani
Efficient statistical sample designs in a GIS for monitoring the landscape changes

Nicoletta Cibella, Tiziana Tuoto
Statistical perspective on blocking methods when linking large data sets

Fabrizio Cipollini, Camilla Ferretti, Piero Ganugi
Mover stayer model in a small industrial area: A first application

Alessandra Coli, Francesca Tartamella
Integrating households income microdata in the estimate of the Italian GDP

Cinzia Conti, Domenico Gabrielli, Antonella Guarneri, Enrico Tucci
Measuring foreigners migration flows: Traditional methods and social network analysis

Livio Corain, Fortunato Pesarin, Luigi Salmaso
Finite sample consistency for non associative statistics

Gianluca Cubadda, Barbara Guardabascio
On the use of partial least squares regression for forecasting large sets of cointegrated time series

Marusca De Castris, Guido Pellegrini
Evaluation of net spatial effects of public subsidies

Fabiola Del Greco M.
Applications of large deviations to hidden Markov chains estimations

Giovanni De Luca, Giorgia Rivieccio
Multivariate tail dependence coefficients

Claudia De Vitiis, Paolo Righi
Generalized estimation procedure for non planned domains based on modified‐GREG estimator

Giancarlo Diana, Pier Francesco Perri
Using auxiliary information for missing data

Loredana Di Consiglio, Marco Fortini
Capture recapture approach to correct for under‐coverage in the register supported 2011 Italian population census

Marco Di Marzio, Agnese Panzera, Charles C. Taylor
A note on density estimation for toroidal data

Antonino Di Pino, Patrizia Pulejo
Estimation of the university education effect on Italian graduates’ labour income

Valeria Edefonti, Giovanni Parmigiani
Combinatorial mixtures of multiparameter distributions: An application to microarray data

Marco Enea
Fitting linear models and generalized linear models with large data sets in R

Roberto Fontana, Fabio Rapallo, Maria Piera Rogantin
Sudoku grids.Designs and contingency tables

Roberto Fontana
Regional information systems for tourism data: The experience of Piedmont

Marco Fortini, Adriano Pareto
A Mixture model for the identification of fraudulent interviewers in the Italian multipurpose surveys

Sara Giavante, Edmondo Di Giuseppe, Stanislao Esposito
Flat steps models for the analysis of temperature and precipitation Italian time series from 1961 to 2007

Francesco Giordano, Maria Lucia Parrella, Marialuisa Restaino
Detecting cycles in complex time series databases

Roberto Gismondi, Anna Rita Giorgi, Tiziana Pichiorri
The Hulliger’s criterion for managing outliers: New proposals and application to retail trade turnover

Paolo Giudici, Emanuela Raffinetti
Multivariate ranks‐based concordance indexes

Caterina Giusti, Stefano Marchetti, Monica Pratesi
Estimation of poverty indicators at the small area level in Italy

Daniele Imparato, Maria Governa, Mauro Gasparini
Groundwater resource monitoring via a transfer function approach

Letizia La Tona, Romana Gargano, Luca S. Scarlata
Nitrogen dioxide and carbon monoxide estimation in the metropolitan area of Catania

Paolo Mariani, Mauro Mussini, Biancamaria Zavanella
Tax systems simulation using integrated administrative data

Marina Marino, Francesco Palumbo, Cristina Tortora
Clustering in feature space for interesting pattern identification of categorical data

Esterina Masiello, Anne‐Laure Fougeres, Philippe Soulier
Estimation of a discrete spectral measure and actuarial applications

Simona Mastroluca
Short and long form: Planning the questionnaires in the new census strategy

Daria Mendola, Raffaele Scuderi
Harmonic growth: A new approach to the measurement of countries development

Massimo Mucciardi, Pietro Bertuccelli
Cluster and outlier analysis with GIS: An exploratory study on childbearing in Italy

Pierpaolo Napolitano
Marche Region Census Section Classification in Residential Typologies

Eugenia Nissi, Anna Lina Sarra, Sergio Palermi
Application of GWR for assessing spatial non‐stationarity of residential radon concentration: Evidence from Abruzzo data

Rossella Onorati, Paul D. Sampson, Peter Guttorp
Dimensionality reduction for large spatio‐temporal datasets based on SVD

Donald L. Pardew
Extreme skewness in service time distributions: Its threat to evaluations of material equivalence of service system performance using large sample control of type II error probabilities

Giuliana Passamani Time series convergence within I2) models: The case of daily long‐term bond yields in the euro area

Claudio Quintano, Rosalia Castellano, Gennaro Punzo
Generational determinants on the employment choice in Italy

Claudio Quintano, Rosalia Castellano, Sergio Longobardi
The literacy divide: Territorial differences in the Italian education system

Agnese Rapposelli
Efficiency measurement using data envelopment analysis combined with principal component analysis

Pierre Ribereau, Armelle Guillou, Philippe Naveau
Generalized probability weighted moments in extreme value theory

Francesca Rinesi, Sabrina Prati, Claudia Iaccarino
Deterministic record linkage: A case study by using demographic data

Antonio Angelo Romano, Giuseppe Scandurra
Price asymmetries in the Italian retail and wholesale gasoline markets

Luca Salvati, Stefano Tersigni, Simona Ramberti
Climate monitoring in Italy: Design and analysis of ISTAT hydro‐meteorological database

Daria Scacciatelli
Ensemble support vector regression: A new non‐parametric approach for multiple imputation

D. Soria, F. Ambrogi, P. Boracchi, J.M. Garibaldi, E. Biganzoli
Application of affinity propagation on a large breast cancer data set

Domenico Summo, Tommaso Pepe
Value creation and business performance: An empirical analysis in the manufacturing sector

Agostino Tarsitano
Classification of multiple short time series

Nikos Tzavidis, Nicola Salvati
An M‐quantile Random Effects Model for hierarchical and repeated measures data

Domenico Vitale, Massimo Bilancia
Unit roots, granger causality and global warming: Bridging the gap between econometric theory and climatology

Agata Zirilli, Angela Alibrandi, Maria Teresa Naso Onofrio
A non parametric approach for comparison between two hepatocellular carcinoma markers: Alpha Fetoprotein AFP) and Insulin‐like Growth factor II IGF‐II)

Francesco Zirilli
The analysis of electric power price data using maximum likelihood and a multiscale stochastic volatility model

Daniela Zugna, Bianca De Stavola, Ronald Geskus, Kholoud Porter
Complexity of large Observational  studies: Missing data, competing risks, associati