E. Littmann, H. Ritter
In , a new incremental cascade network architecture has been presented. This paper discusses the properties of such cascade networks and investigates their generalization abilities under the particular constraint of small data sets. The evaluation is done for cascade networks consisting of local linear maps using the Mackey(cid:173) Glass time series prediction task as a benchmark. Our results in(cid:173) dicate that to bring the potential of large networks to bear on the problem of extracting information from small data sets without run(cid:173) ning the risk of overjitting, deeply cascaded network architectures are more favorable than shallow broad architectures that contain the same number of nodes.