Fully convolutional networks for structural health monitoring through multivariate time series classification

We propose a novel approach to Structural Health Monitoring (SHM), aiming at the automatic identification of damage-sensitive features from data acquired through pervasive sensor systems. Damage detection and localization are formulated as classification problems, and tackled through Fully Convolutional Networks (FCNs). A supervised training of the proposed network architecture is performed on data extracted from numerical simulations of a physics-based model (playing the role of digital twin of the structure to be monitored) accounting for different damage scenarios. By relying on this simplified model of the structure, several load conditions are considered during the training phase of the FCN, whose architecture has been designed to deal with time series of different length. The training of the neural network is done before the monitoring system starts operating, thus enabling a real time damage classification. The numerical performances of the proposed strategy are assessed on a numerical benchmark case consisting of an eight-story shear building subjected to two load types, one of which modeling random vibrations due to low-energy seismicity. Measurement noise has been added to the responses of the structure to mimic the outputs of a real monitoring system. Extremely good classification capacities are shown: among the nine possible alternatives (represented by the healthy state and by a damage at any floor), damage is correctly classified in up to 95% of cases, thus showing the strong potential of the proposed approach in view of the application to real-life cases.


Introduction
Collapses of civil infrastructures strike public opinion more and more often. They are generally due to either structural deterioration or modified working conditions with respect to the design ones. The main challenge of structural health monitoring (SHM) is to increase the safety level of ageing structures by detecting, locating and quantifying the presence and the development of damages, possibly in real-time [1]. However, visual inspections-whose frequencies are usually determined by the importance and the age of the structure-are still the workhorse in this field, even if they are rarely able to pro-vide a quantitative estimate of structural damages. Therefore, it is evident why recent advances in sensing technologies and signal processing, coupled to the increased availability of computing power, are creating huge expectations in the development of robust and continuous SHM systems [2].
SHM applications are often treated as classification problems [3] aiming (i) to distinguish the damage state of a structure from the undamaged state, starting from a set of available recordings of a monitoring sensor system, and (ii) to locate and quantify the current damage. In this framework, we have adopted the so-called simulation-based classification (SBC) approach [4], and we have exploited deep learning (DL) techniques for the sake of automatic classification. In our procedure, data are displacement and/or acceleration recordings of the structural response, and the classification task consists of recognizing which structural state, among a discrete set, could have most probably produced them. These structural states, characterized by the presence of damage in different positions and of different magnitudes, suitably represent different damage scenarios.
To highlight the distinctive components of the SBC approach, we recall the general paradigm for a SHM system, according to [3]. A SHM system consists of four sequential procedures: (i) operational evaluation, (ii) data acquisition, (iii) features extraction and (iv) statistical inference. Operational evaluation defines what the object of the monitoring is and what the most probable damage scenarios are; data acquisition deals instead with the implementation of the sensing system; features extraction specifies how to exploit the acquired signals to derive features, that is, a reduced representation of the initial data, yet containing all their relevant information-for the case at hand, the onset and propagation of damage in the structure; statistical inference finally sets the criteria under which the classification task is performed.
Focusing on stages (ii) and (iii), the vibration-based approach is nowadays the most common procedure in civil SHM. Its popularity is mainly due to the effective idea that the ongoing damage alters the structure vibration response [5] and, consequently, the associated modal information. By looking at the displacement and/or the acceleration time recordings acquired at a certain set of points of a building, the vibration-based approach enables the analysis of both global and local structural behaviors. The technology required to build this type of sensor system is mature and can be exploited on massive scale [6]. In most of the cases, features extraction relies on determining the system eigenfrequencies and the modal shapes. On the other hand, it might be necessary to employ more involved outcomes to distinguish between the effect of modified loading conditions and the true effect of damage [7], for instance by constructing parametric time series models [8]. By employing DL, we aim at dealing with these aspects automatically.
Two competing approaches are employed in literature to deal with stage (iv), the a) model-based and the b) data-based approach, both introducing a sort of offline-online decomposition. By this expression, we mean the possibility to split the procedure into two phases: first, the offline phase is performed before the structure starts operating; then, the online phase is carried out during its normal operations.
The model-based approach builds a physics-based model, initially calibrated to simulate the structural response. The model is updated whenever new observations become available and, accordingly, damage is detected and located. Data assimilation techniques such as Kalman filters have been employed to efficiently deal with model updating [9].
Model-based approaches are typically ill-conditioned, and many uncertainties related to the proper tuning of model parameters may prevent a correct damage estimation.
Hence, data-based approaches are becoming more and more popular; they exploit a collection of structural responses and, either assess any deviation between real and simulated data, or assign to the measured data the relevant class label. The dataset construction can be done either experimentally [10] or numerically; however, the latter option is usually preferred, due to the frequent difficulties in reproducing the effects of damage in realscale civil structures properly. To reduce the computational burden associated with the dataset construction, simplified models (e.g. mass-spring models for the dynamics of tall and slender buildings)-still able to catch the correct structural response-are preferred with respect to more expensive high-fidelity simulations, involving, e.g., the discretization of both structural and non-structural elements. By adopting the SBC method, we rely on a data-driven approach based on synthetic experiments.
Once a dataset of possible damage scenarios has been constructed, machine learning (ML) proved to be suitable to perform the classification task [6]. The training of the ML classifier could be: • supervised, when a label corresponding to one of the possible outputs of the classification task is associated to each structural response; • unsupervised [11], when no labelling is available; • semi-supervised [12], when the training data only refer to a reference condition.
In the SBC framework, a semi-supervised approach was recently explored, e.g., in [13], leading to great computational savings and robust results when treating the anomaly detection task. In spite of their good performances, standard ML techniques based, e.g., on statistical distributions of the damage classes (as in the so-called decision boundary methods), as well as kernel-based methods (e.g. support vector machines), still rely on heavy data preprocessing, required to compute problem-specific sets of engineered features [14]. These features can be statistics of the signal, modal properties of the structure, or even more involved measures exploiting different types of signal transformation (e.g. Power Spectral Density and autocorrelation functions, to mention a few) [6]. Some relevant drawbacks arise, since: • pre-computed engineered features are not well suited for non-standard problems, for which setting damage classification criteria can be anything but trivial; • there is no way to assess the optimality of the employed features; • a computationally expensive pre-processing of a huge amount of data is usually required.
For these reasons, we rely on deep learning techniques, which allow both data dimensionality reduction and hierarchical pattern recognition at the same time [15,16]. DL techniques allow us to: • deal with non-standard problems, especially when different information sources have to be managed (as long as they are in the form of time series); • detect a set of features, optimized with respect to the classification task, through the training of an artificial neural network.
Despite these advantages, the use of DL for the sake of SHM has been quite limited so far [17,18]. We have therefore decided to employ Fully Convolutional Networks (FCNs) [19], a particular Neural Network (NN) architecture, to deal with the Multivariate Time Series (MTS) produced by monitoring sensor systems. To face different information sources, we have applied separate convolutional branches and, at a second stage, performed the data fusion of the extracted information.

SHM methodology
We introduce in this section a detailed explanation of the proposed strategy to deal with the SHM problem exploiting a SBC approach. We provide a simplified physics-based model of the structure employing M degrees of freedom (dofs), assuming to record timedependent signals through a monitoring system employing N 0 ≤ M sensors. Our aim is first to train, and then to use, two classifiers G d and G l for the sake of damage detection and localization, respectively, where In the former case, labels 0 and 1 denote absence or presence of damage, respectively; in the latter, G > 1 is a priori fixed and denotes the range of possible damage locationsalso in this case, the undamaged state is denoted by 0. We have decided to include the undamaged state among the possible outputs of G l not just to confirm the outcome of G d , but also to observe which damage scenarios, identified by their locations, are more often misclassified with the undamaged state. The training of G d and G l is performed using the two datasets D d train and D l train , respectively. Each of these two datasets (for simplicity we only consider the formation of D l train , being the process substantially equivalent for D d train ) collects V train structural responses, under prescribed damage scenarios and loading conditions. We denote by U i ∈ R N 0 ×L 0 , i = 1, . . . , V train , a collection of N 0 sensor recordings of displacement and/or acceleration time series of length L 0 , such that the time series u n (d i , l i ) recorded by the n-th sensor depends on the damage scenario d i and the loading condition l i , and can be seen as the sampling of a time-dependent signal u n (d i , l i ). We assume to deal with recordings acquired at a set of L 0 time instants uniformly distributed over the time interval of interest I. The damage scenario d i : P d → R M is prescribed at each structural element 1 and depends on a set of parameters η d ∈ P d ⊂ R D ; the loading condition l i : I × P l → R M , defined over the time interval I, is prescribed at each element, too, and depends on a set of parameters η l ∈ P l ⊂ R L . Here, we denote by P d and P l two sets of parameters, yielding the two sets C d and C l of admissible damage and loading scenarios, respectively, obtained when sampling η d ∈ P d and η l ∈ P l . During the training procedure, the performances of G d and G l are tracked by looking at their classification capabilities on two datasets D d val and D l val , each one collecting V val structural responses U i (defined as in Eq. (1)), i = 1, . . . , V val .
According to the SBC approach, the datasets D d train , D l train , D d val and D l val are constructed by exploiting a simplified physics-based model of the structure. For any damage scenario d ∈ C d and loading conditions l ∈ C l received as inputs, this numerical model-playing the role of digital twin of the structure to be monitored-returns a recorded displacement and/or acceleration time series r n (d, l). Since these latter are deterministic, to make our data more conformal to real measurements u n (d, l), we assume that each r n (d, l) is affected by an additive measurement noise n ∼ N (0, ), so that Here we consider each n normally distributed, with zero mean and covariance matrix ∈ R N 0 ×N 0 , as related to a real monitoring system [20]. Regarding the auto-correlation of the records (j = 1, . . . , L 0 ) of each sensor (n = 1, . . . , N 0 ) in time, we assume them to be independent and identically distributed. The background model providing r n (d, l) is here thought as being already tuned to accurately match the structural response in the undamaged case. Moving away from the baseline due to damage inception, with the adopted supervised strategy we therefore assume the possible damage scenarios to belong to a limited set, and for each of them relevant numerical analyses are exploited to mimick the real structural response, as affected by all the possible uncertainty sources. It should be also added that Eq. (2) accounts for the noise in the structural response induced by sensor measurements only. Since damage is a smeared measure of different phenomena occurring at the local scale (including or accompanied by, e.g. cracking and plasticity), it stands as a variable giving a measure of the unresolved dofs in a Mori-Zwangzig formalism, see [21]. In a state-space formulation like the one adopted for Kalman filtering [22], a further source of noise can be added through the state or model error, which accounts for the uncertainties linked to the unresolved dynamics of the system. An issue may thus arise in discerning the two noise sources linked to the model inaccuracy on one side, and to the sensor output and operational conditions on the other side. This discussion is indeed beyod the scope of this work, and interested readers may find relevant information in, e.g. [23,24].
The classifiers G d and G l are based on a fully convolutional neural network architecture (that will be detailed in the following section). The training of the network is supervised, and performed by feeding the FCN with multivariate time series {F n 0 } N 0 n=1 and associated labels (0 or 1 for G d , g ∈ {0, 1, . . . , G} for G l ). In this respect, hereon each multivariate time series {F n 0 } N 0 n=1 is referred to as an instance. In general, {F n 0 } N 0 n=1 = U i ; however, a single instance might be made up to W multivariate time series U iw , w = 1, 2, . . . , W of different lengths L w 0 to deal with the case of sensors recording time series of different length. Each component F n 0 = u n plays the role of input channel for the NN. The testing of the NN is done on instances {F n * } N 0 n=1 = U * i , obtained through the numerical model as structural response to loading conditions l * i ∈ C l , i = 1, . . . , V test , unseen (that is, associated to testing values η l from P l not sampled) when building the datasets D d train , D l train , D d val and D l val . All these instances are collected into two datasets D d test and D l test . The testing is done by verifying the correct identification of the class ({0, 1} for G d , {0, 1, . . . , G} for G l ) associated with the simulated signals. In concrete terms, a probability is estimated for each possible class, thus yielding the confidence level that the given class is assigned to the data, and the class with highest confidence is compared with the one associated to the simulated signal. No k-fold cross validation is used.
Once tested, G d and G l can make a prediction once a new signal {F n * } N 0 n=1 = U * is experimentally acquired from the real sensor network used to monitor the structure.
Let us now recap the procedure steps exploiting the schematic representation reported in Fig. 1. For the sake of convenience, we can split our procedure into: • an offline phase, where, as first step, the loading conditions C l (OFF-1#1) and the most probable damage scenarios C d are evaluated (OFF-1#2). Accordingly, a sensor network with N 0 sensors is designed (OFF-2). The datasets D d train , D l train , D d val , D l val , D d test and D l test are then constructed (OFF-3) by exploiting the physics-based digital twin of the structure. The classifiers G d (OFF-4#1) and G l (OFF-4#2) are therefore trained by using D d train and D l train and performing the validation using D d val and D l val . Finally, the classification capacity of G d and G l is assessed by using numerically simulated signals {F n * } N 0 n=1 = U * belonging to D d test and D l test , respectively (OFF-5#1 and OFF-5#2); • an online phase, in which for any new signal {F n * } N 0 n=1 = U * acquired by the real monitoring system and provided to the classifiers (ON-1), damage detection (ON-2) is performed through G d , and damage localization is performed through G l (ON-3).
In lack of recordings coming from a real monitoring system, and having assumed the experimental signals U * equal to the noise-corrupted output of the numerical model, steps OFF-5#1 and OFF-5#2 of the offline phase indeed coincide with steps ON-2 and ON-3 of the online procedure. 2 We highlight that only those damage scenarios d ∈ C d that have been numerically simulated in the offline phase can be classified during the online phase. Moreover, damage is considered temporary frozen within a fixed observation interval, Here N 0 represents the number of input channels and N represents the adopted number of filters. For sake of clarity, the dimensionality of the building blocks has been enhanced: a three-dimensional parallelepiped is used to depict the two-dimensional output of each convolutional layer; a two-dimensional rectangle is used to depict the one-dimensional output of the global pooling layer and of the softmax layer enabling to treat the structure as linear [2]. To model the effect of damage, we consider the stiffness degradation of each structural member; this assumption is acceptable if the rate of the evolving damage is sufficiently small with respect to the observation interval [25].
It is not possible to identify from the beginning the most suitable number of instances V train to be used to train the network. The easiest procedure (even if time-consuming) would be to assess the performances of G d and G l for different sizes V train , aiming at finding a trade-off between the computational burden required to construct the dataset and train the NN, and the classification capabilities. Beyond a certain critical size, massive dataset enlargements might lead to small improvements in the NN performance, as shown in our numerical results.
Finally, concerning the setting of the loading conditions C l , in this work we have (i) identified a set of possible loading scenarios that can significantly affect the response of the structure; (ii) subdivided this set into a certain number of subsets, representative of different possible dynamic effects of the applied load; (iii) sampled each subset almost the same number of times.

Neural network architecture
We now describe the FCN architecture employed for the sake of classification. As discussed in the previous section, {F n 0 } N 0 n=1 are the inputs adopted during the training phase (for which we know the instance label associated), while {F n * } N 0 n=1 are the inputs that we require the FCN to classify.
We have adopted a FCN stacking three convolutional layers L i , i = {1, 2, 3}, with different filter sizes h i , followed by a global pooling layer and a softmax classifier (the choice of the NN hyperparameters will be discussed in the following). Each convolutional layer L i has been used together with a Batch-Normalization (BN) layer B i and a Rectified Here N 1 0 and N 2 0 represent the number of input channels (possibly different) of the two NN branches; N 1 and N 2 represent the number of filters adopted. For sake of clarity, the dimensionality of the building blocks has been enhanced: a three-dimensional parallelepiped is used to depict the two-dimensional output of each convolutional layer; a two-dimensional rectangle is used to depict the one-dimensional output of the global pooling layer and of the softmax layer Linear Unit (ReLU) activation layer R i [14,19], see Fig. 2. When the input signals are made up by W multivariate time series with different length: for each one we first adopt the described convolutional architecture separately and then, through a concatenation layer, we perform data fusion on the extracted features. Classification is finally pursued through a softmax layer. The corresponding NN architecture is sketched in Fig. 3 in the case of time series with two different lengths L 1 0 and L 2 0 , but can be easily generalised. Tensorflow [26] has been used for the sake of NN construction.

Use of convolutional layers
Let us now show how convolutional layers can be adopted to extract features from multivariate time series. {F n 0 } N 0 n=1 are provided to the 1-st convolutional layer L 1 . The output of L 1 , {F n 1 } N 1 n=1 , still shaped as time series (of length L 1 ), do not represent displacement and/or acceleration any more. Indeed, they are features extracted from the input channels {F n 0 } N 0 n=1 . The following layers operate in the same manner: the outputs {F n i } N i n=1 of the into data sequences, whose length h i determines the receptive field of L i ; and the multiplication of each data sequence by a set of weights w (i,m) called filter, where the output F n i of each filter is called feature map. Mono-dimensional (1D) receptive field must be used in time series analysis, being each channel monodimensional. In Fig. 4 the fundamental architecture of L i is depicted, linking the inputs where: is the q-th connection weight of the m-th filter applied to the p-th input of F n i−1 .
As the goal of stacking several convolutional layers is to provide nonlinear transformations of {F n 0 } N 0 n=1 , their overall effect is to make the classes to be recognised linearly separable [27]. In this way, a linear classifier is suitable to carry out the final task. Every nonlinear transformation can be interpreted, as discussed, as an automatic extraction of features.

Batch Normalization, ReLU activation, global pooling and softmax classifier
The Batch Normalization (BN) layer B i is introduced after each convolutional layer L i to address the issue related to the vanishing/exploding gradients possibly experienced during the training of deep architectures [28]. It relies on normalization and zero-centering of the outputs {F n i } N i n=1 of each layer L i . We express the output of B i as {F n Bi } N i n=1 . For the same reason, the ReLU activation function is preferred instead of saturating ones [29]. where: In the adopted FCN architecture, the features to be used in the classification task are . The final number of features equals the number N 3 of filters of the last convolutional layer. By applying next a global average pooling [30], the extracted features {F n R3 } N 3 n=1 are condensed in a single channel b ∈ R G , being G the total number of classes.
The softmax activation layer finally performs the classification task. First, the channel b is mapped onto the target classes, by computing a score s g for each class g, where the vector θ g ∈ R G collects the weights related to the g-th class. The softmax function is then used to estimate the probability p g ∈ [0, 1] that the input channels belongs to the g-th class, according to: The input channels {F n 0 } N 0 n=1 are then assigned to the class with associated label g featuring the highest estimated probability p g , which then represents the estimated confidence level that class g is assigned to the data.

Neural Network training
The NN training consists of tuning the weights w (i,n) and θ g , respectively appearing in Eqs. (3) and (4) by minimizing a loss function depending on the data. In this respect, the Adam optimization method [31], a widespread stochastic gradient-based optimization method, has been used. For classification purposes, the most commonly adopted loss function is the cross entropy, defined for the classifier G d as: where: • g is the label of the instance provided to the NN during the traning; • y g i ∈ {0, 1} is the confidence that the i-th instance should be labelled as the g-th class, with 1 if for the i-th instance the g-th class is the target class 0 otherwise; • Y ∈ {0, 1} V train collects all the y g i confidence values; • p ∈ R G collects the estimated probabilities p g , see Eq. (5).
The loss function J l (Y , p) for the classifier G l is defined analogously.
Regarding the employed datasets: • D d train is used to train the NN by back-propagating the classification error; • D d val is used to possibly interrupt the training in case of overfitting, but not to modify the NN weights; • D d test is used to verify the prediction capabilities of the NN, after the training phase has been performed.
The same splitting applies to the data used for training G l . In order to assess the offline phase of the proposed procedure, we have tested G d and G l on their respective test sets D d test and D l test (steps OFF5#1 and OFF5#2 of Fig. 1). The number of times D d train and D l train are evaluated during the training of G d and G l corresponds to the number of epochs: in this work, we have bounded to 1500 the maximum number of epochs allowed. We have also provided the possibility of an early-stop of the training when, after having performed at least 750 epochs, the validation loss has not decreased three times in a row.
To control the training process, a learning rate ξ is usually introduced to scale the correction of the NN weights provided by back-propagating the classification error. In out case, the learning rate has been forced to decrease linearly with the number of epochs, moving from 10 −3 at the beginning of the training till 10 −4 at its end [32]. After having performed at least 750 epochs, an additional factor ζ = 1/ 3 √ 2 is used to scale down the learning rate if the loss function J (Y , p) is not reduced within the successive 100 epochs, as suggested in [32]. Random subsamples (also called minibatches) of the data points belonging to the training set are employed for the sake of gradient evaluation when running the Adam optimization method [27,31].

Hyperparameters setting
The setting of the NN hyperparameters, namely the dimensions of the kernels h i and the number of feature maps N i , is done according to [14,32]. In this work, we choose h 1 = 8, h 2 = 5, h 3 = 3 as kernel dimensions for the three convolutional layers. Since no zero-padding has been employed, the dimension of the time series is progressively reduced passing through the convolutional layer L i from L i−1 to L i = L i−1 − h i + 1. Accordingly, considering the parameters and the length of the time series used in this work, the dimension reduction related to a single convolutional layer is on the order of 1%. We have verified that the classification accuracy is barely affected by this reduction and, more in general, by the use of the zero-padding. It is possible to further improve the NN performances by operating a (necessarily problem-dependent) finer tuning of the NN hyperparameters, but only at the cost of a time-consuming repeated evaluation of the NN outcomes.
The number of filters N i to be adopted depends on the complexity of the classification task: the more complex the classification, the higher the number of filters needed. However, increasing the number of filters beyond a certain threshold, which depends on the problem complexity and the task to be performed, has no effects on the prediction capabilities of the NN; indeed, the risk would be to increase computational costs, and to overfit

Dataset construction
The proposed methodology is now assessed through the numerical benchmark shown in Fig. 5, and originally proposed in [33]. No real experimental measurements have been allowed for in the analysis; measurement noise has been instead introduced by corrupting the monitored structural response with uncorrelated random signals featuring different Signal to Noise Ratio (SNR) levels, to also assess the effect of sensors accuracy on the capability of the proposed approach. Further details are provided below.
The considered structure is an idealised eight-story shear building model, featuring a constant floor mass of m = 625 t and a constant inter-story stiffness of k sh = 10 6 kN/m. The proposed SHM strategy has been designed to handle signals related to different types of damage-sensitive structural responses characterized by different magnitude and sampling rate. Hence, in the following both the horizontal and the vertical motions of each story are allowed for and recorded. The longitudinal stiffness of the columns has been set to k ax = 10 8 kN/m, and a slenderness (given by the ratio their length and thickness) of 10 has been assumed for the same columns. The numerical model employs M = 16 dofs (8 in the x direction and 8 in the z direction), and N 0 = 16 virtual sensors are used to measure the noise-free displacements r n (collecting both horizontal displacements r n , n = 1, . . . , 8, and vertical displacements r n , n = 9, . . . , 16 the vertical displacements) at all the story levels. Although a non-classical damping was originally proposed in [33], the relevant effect on system identification or model update has shown to be marginal if the structure is continuously excited during the monitoring stage, see e.g. [2,34]. Therefore, in this feasibility study no damping has been taken into account. The dofs are numbered from 1 for the ground floor up to 8 for the eighth floor in both directions.
Due to the building geometry, eight different damage scenarios d(1), . . . , d(8) can be considered, each one characterized by a reduction of 25% of one inter-story stiffness only, that is, The label g is used to denote each damage scenario, ranging from 1 for the first floor up to 8 for the eighth floor; by convention, d(0) refers to the undamaged case.
Before assessing the classification capability of the NN, a parametric analysis has been carried out to check the sensitivity to damage of the vibration frequencies. Table 1 collects the results regarding the horizontal motion; for the analysed system, the axial frequencies can be obtained by scaling the reported frequencies by a factor 10. Any considered damage state reduces all the frequencies, though the variation is rather limited even with a stiffness reduction by 25%, see Table 1. The capability to perform damage localization just by exploiting these data can be largely ineffective, since some trends in the table, such as the monotonic dependence of the frequencies of a vibration mode on the damage inter-story, can be hardly recognized.
As proposed in [2,25], the shape of the vibration modes-in particular that of the fundamental one in the case of a building featuring constant mass and stiffness at each story as for the case at hand-should be taken into account in the analysis, in order to localise and quantify damage. As previously remarked, employing FCN allows us not only to analyse separately each recorded signal, but also to exploit their interplay. Moreover, even if the sensitivity to damage of displacements in horizontal and vertical directions is the same, their joint use enabled by the FCN can lead to an improvement of the NN performances.
Due to the different range of values of vibration frequencies in the case of horizontal or vertical excitation of the structure, the axial response turns out to be richer in highfrequency vibrations. To correctly record the signals, the sampling rates have been set to 66.7 Hz to monitor the horizontal vibrations, and 667 Hz to monitor the vertical vibrations. Even for the higher vibration frequencies, output signals are assumed to be not distorted by the accelerometers: the transfer function of the sensor itself has to be very close to 1 for frequencies up to the mentioned values, so that the amplitude of sensor output very well matches the real structural response to be locally measured. If the structural vibration frequencies or the sampling rates get too close to the internal resonance frequency of the sensor, for some specific applications different, ad-hoc designed devices will be selected. In the analysis, each instance is made up by two multivariate time series, one for each excitation type, referring to different time intervals: I = [0, 10]s for the shear case and I = [0, 1]s for the axial case, respectively. Accordingly, the time series lengths are equal to L 1 0 = 667 and to L 2 0 = 667 for both the displacements in x and z direction. This benchmark has been exploited to test the FCN architecture with either one convolutional branch or two convolutional branches (see Figs. 2 , 3). Indeed, what we are going to assess is the NN ability to perform the fusion of the information extracted through the concatenation layer, rather than the capacity to deal with time series of different lengths.

Table 1 Shear vibration frequencies of the considered eight-story building, for the undamaged case (0) and under different damage scenarios, each one featuring a reduction by 25% of the stiffness at the inter-story corresponding to the scenario label
Two load types have been considered: first, we have excited the structure with lateral and vertical loads applied at each story and characterised by narrow frequency ranges, randomly sampled from an interval including, but not limited to, the structural frequencies; then we have applied, once again at each story, a white noise, assessing both the case in which all the shear frequencies have been excited, and the one in which just some of them have been covered by the noise frequency spectrum. With these two load types, we have been able to assess the NN performances in two different cases: • case 1 (sinusoidal load case), in which the applied load is characterized by only few (a priori, random) frequencies; • case 2 (white noise load case), in which the applied load is characterized by a higher number of (a priori, random) frequencies, lying in a given range.
This latter case corresponds to the one of random vibrations, for instance due to lowenergy seismicity of natural or anthropic (urban) source [35], and is frequently adopted in literature, see e.g. [36]; the characteristic frequency range of seismic vibrations is sitedependent, being determined by the geographical and geological properties of the site. For example, in deep soft basins, the seismic vibrations are richer in low frequency components with respect to the ones in rock sites. For this reason, without any site characterization, it makes sense to assume more than a single frequency spectrum for the random vibrations.

Case 1 (sinusoidal load case)
In this first analysis, two different load combinations in the horizontal (x) and vertical (z) directions have been considered, to affect both the shear and axial vibration modes of the building. For each direction, the loads applied to the stories of the structure are given by the sum of two sinusoidal functions, whose amplitudes and time variations have been randomly generated. This expression for the load has been adopted to keep its description simple and, in comparison with single sinusoidal component case, to increase the set of frequencies that excite the structure. The applied load l = [l sh , l ax ] reads: ; γ sh ∈ R and γ ax ∈ R are random scaling factors; f sh , f ax > 0 set the frequencies of the sinusoidal components (see Table 2 for the adopted random generation rules).
The two sets of values adopted for the generation rule of f sh and f ax are chosen on the basis of the structural frequencies that could be excited both in the horizontal and vertical directions. At the same time, thanks to the adopted sampling rule, f sh and f ax may exceed these frequency ranges, producing instances in which the shear frequencies and/or the axial frequencies of the structure are not excited. Regarding the generation rule of the scaling parameter γ sh , its dependency on the dofs of the structure through the factor γ dof has been introduced in Table 2 in order to mimic the load distribution usually considered in a preliminary design process, when the shear behaviour of a regular building is evaluated. Keeping in mind that our principal interest here is to assess the prediction capacities of the NN architecture, this choice has enabled us to obtain displacement time series similar to the ones expected during the monitoring of the structure, although adopting a very simple generation rule for the applied lateral loads. Some examples of the time evolutions of the generated loads, applied to the first floor of the structure (hence of l sh 1 and l ax 1 ), are shown in Fig. 6. Through Eq. (2), we have added a measurement noise to mimic the output of a real monitoring system. For the sake of simplicity, the covariance matrix ∈ R 16×16 of such noise has been assumed to be diagonal, i.e. = σ 2 I where σ 2 is the variance of the measurement error in the horizontal and vertical directions for each floor, and I ∈ R 16×16 is the identity matrix.
Two sources of randomness have been assumed for the noise, due to environmental effects and to the transmission of the electrical signal. Their effects are superimposed in the covariance matrix with diagonal entries respectively amounting to σ 2 env and σ 2 el . The environmental noise has been assumed to induce vibrations of the same amplitude and/or to affect in the same way the converted electrical signals, independently of the building floor. Given that horizontal motions at the top of the buildings are in general greater than displacements at the lower levels, this assumption leads small amplitude signals to be more affected, in relative terms, by the environmental noise. This is reasonable if we assume that the localised disturbances that arise because of the surrounding environment have the same magnitude indipendently of the building levels.
Regarding the electrical disturbance, the same noise level has been assumed both in directions x and z, in spite of the usually different technical specifications for sensors measuring displacements with different magnitude. This means that the electrical disturbances have the same effect, in statistical terms, on the measurement outcomes in the horizontal direction u sh i and in the vertical direction u ax i . Figures 7 and 8 respectively show examples of time evolutions of horizontal and vertical displacements, to highlight the effects of the above assumptions on the structural signals. These displacement components always refer to the undamaged case, and to the load conditions specified in the captions. According to what highlighted, it is noted that the displacements of the 8th story are less affected by noise than the ones of the 1st story.   Due to the random generation of the applied load, different structural frequencies are excited in each simulation. To provide different scenarios also in terms of sensor accuracy (see also [37]), two levels of SNR of 15 dB and 10 dB have been adopted. The SNR is a summary indicator, referring to the overall level of noise corruption for the displacements in one direction. Still referring to Figs. 7 and 8, differences in terms of corruption levels between the two sensor accuracy scenarios can be appreciated.
To build the dataset required for the NN training, the procedure described so far has been adopted for all the damage scenarios. Figures 9 and 10 respectively show the effects

Case 2 (white noise load case)
In the second load case we have accounted for random vibrations caused e.g. by lowenergy seismicity [36]. The applied loads l = [l sh , l ax ], with i = 1, . . . , 8, at each floor and each time instants are obtained by first sampling out the values from a normal distribution N 0, 10 4 and then low-pass filtering them with a "roll-off" set between frequencies f min and f max . Two different scenarios have been considered for the frequency range of the applied excitations: f min = 15 and f max = 17 Hz; f min = 5 and f max = 7 Hz. In the first case all the shear modes and the first axial mode have been excited; in the second case, just the first three shear modes and no axial frequencies have been excited, see Table 1. Figures 11 and 12 respectively provide an overview of the simulated forces for the two cases.

Dataset composition and NN training
We now detail the construction of the employed datasets and the NN training phase. Each of the two classifiers has been trained on a different dataset, made by instances generated by evaluating the physics-based model for different loading and damage conditions. Each instance is made up by N 0 = 16 time series recordings of displacements (in the two directions, for each of the 8 floors) of length L 0 = 667. Due to the assumed shear-type behavior of the building, all the points belonging to each rigid floor share the same accelerations and displacements; hence, there is no need to plug-in specific optimal strategies to locate sensors in the network, which could be instead of interest in case of very localized damage events breaking the validity of the rigid floor assumption.
Two global datasets D d and D l made by V = 4608 instances each have been generated, and then split onto a training, a validation and a testing set, thus yielding For the splitting of the dataset D d into training D d train , validation D d val and test D d test sets, no specific rules are available, and only some heuristics can be used -see, e.g., [27]. We have thus employed 75% of V to train and validate the NN (V train and V val ), and the remaining 25% (V test ) to test it. Within the first subset, 75% of the instances have been in turn allocated for training, and the remaining 25% for validation. The final dataset subdivision then reads: V train = 56.25%V , V val = 18.75%V , and V test = 25%V . The splitting of D l has been done identically. The large number of instances employed for validation and test has allowed us to perform a robust assessment of the NN generalization capabilities. This has been done without limiting the information content that can be employed for the NN training; in fact, the dataset dimensions can be arbitrarily enlarged, if necessary, through a synthetic generation of the new instances, still keeping the same proportions.
During the training, an equal number of instances V train g = V train /G related to each damage scenario g = 0, . . . , 8 (the undamaged case has been considered, too, in addition to the G = 8 possible cases of damage) have been provided to the NN, to avoid the construction of a biased dataset D d train ; the same has been done for D l train . In this way, we indeed prevent the NN to be prone to return the class labels that have been more frequently presented in the training stage.
There are no specific rules to set V train g (and, therefore, the overall dimension V g = V /G of simulated cases for each damage scenario) a priori. Only few theoretical studies provided some recommendations for specific cases, see, e.g., [38]; however, they are not applicable to FCNs. In general, the problem complexity and the employed NN architecture must be taken into account on a case-by-case basis. For this reason, we have evaluated the G d and G l classifiers accuracies A d and A l on the validation set D d train and D l train , and the training time at varying V train g . We have then chosen the best dataset size according to a tradeoff between the two aforementioned indicators, and keeping in mind that the time required to generate a dataset and to train the NN both scale linearly with V train g . The G d classifier accuracy is defined as the ratio A d = V val /V val , where V val is the number of instances of D l val which are correctly classified by G d ; the G l classifier accuracy A l is defined in a similar way.
Let us now see how we have determined the overall dataset size V by applying the heuristic approach previously discussed. In Fig. 13, the accuracy A l at varying values of V g is reported, by considering the local case 1. By increasing V g from 256 to 384, A l is Fig. 13 Damage localization, case 1. Dependence on V g of the accuracy A l of the classifier G l highly affected, while a further increasing yields a smaller gain in accuracy. The nonmonothonic variation of A l with respect to V g is due to the randomness of the procedure, and in particular to the initialization of the weights of the convolutional filters. For the above reasons, we have adopted V g = 512 during the training phase.
Treating the damage detection task for case 1, a total number of V = 9216 instances have been generated. Half of the instances refers to the undamaged conditions, half to damaged conditions. Each damage scenario is equally represented (V g = 512 instances each). Regarding instead the damage localization task, V = 4608 and V g = 512 (including the undamaged case g = 0) have been adopted.
Still adopting the discussed heuristic criterion for the determination of the overall dataset dimension, V = 4096 has been used for the damage detection task when the white noise load case is treated. Once again, half of the instances refers to the undamaged conditions, half to the damage condition. Each damage scenario is equally represented (V g = 128 instances each). Regarding the damage localization task, V = 4608 and V g = 128 (including the undamaged case g = 0) have been adopted.

Classification outcomes
We now report the numerical results obtained for the two load cases, and for the two required tasks of damage detection and damage localization. The obtained classification outcomes are affected by the NN architecture, either with one or two convolutional branches, depending on whether the horizontal and vertical sensing are both considered or not. In particular, when treating the damage localization task in presence of the white noise load condition, we will also try to assess the impact of each input channel F n 0 on the overall NN accuracy.
Useful indications about the quality of the training can be derived from the behavior of the loss functions J d (Y , p) and J l (Y , p)-see Eq. (6)-of G d and G l , and of the accuracies A d and A l on the training and validation sets (D t train and D t val for G d ; D train and D val for G l ) as a function of the number of iterations. This latter depends on both the number of epochs and the minibatch size chosen for the training. 3 Table 3 Damage detection, case 1 To evaluate the NN performances, the adopted indices are still A d and A l , yet evaluated on D d test and D l test . These indices are always compared against the ones produced by a random guess, equal to 0.5 for G d , and to 1/9 = 0.111 for G l . For the damage localization case, the misclassification is measured by a confusion matrix in which the rows correspond to the target classes and the columns to the NN predictions.

Damage detection and localization in case 1-sinusoidal load case
In Table 3 the accuracies A d of G d on D d test for the two considered noise levels (SNR= 15 dB and SNR= 10 dB) are reported. NN architectures with both one and two convolutional branches have been tested.
The classifier G d reaches A d = 0.879 for SNR= 15 dB and A d = 0.775 for SNR= 10 dB. These outcomes obtained on high-noise datasets show the potentialities of the proposed approach in view of facing real engineering applications. Indeed, noise effect is a principal concern especially when pervasive and low-cost microelectromechanical systems (MEMS) sensor networks are employed [39], so that the possibility to handle it through FCNs may enhance the application of MEMS networks. Moreover, thanks to our procedure, we have been able to avoid the data pre-processing required by any ML approach based on problem specific features. Figure 14 reports the evolution of the training and validation loss for the dataset with SNR= 15 dB and SNR= 10 dB. The iteration number accounts for the number of times the NN weights are modified during the training process. The depicted training and validation loss functions refer to the case in which a two branches convolutional architecture has been employed to detect damage. The several spikes observed both in the loss and accuracy graphs are due to the stochastic nature of the training algorithm. During the early stages of the training, the NN displays the most significative gains in terms of classification accuracy, while further increasing the number of iterations only yields a limited effect on the generalization capabilities of the NN. Due to the lack of improvements, the earlystopping criterion has finally terminated the training.
Moving to the damage localization task, Table 4 collects the results related to the outcomes of G l on D l test obtained for two different noise levels. The results show that the NN performances benefit from the employment of a two branches architectures: A l increases, compared to the best outcome of the single convolutional layer architecture, from 0.769 to 0.812 for the SNR= 15 dB case, and from 0.654 to 0.707 for the SNR= 10 dB case. This means that the NN has succeeded in performing a data fusion of the extracted information for the sake of classification.
The values of A d and A l are quite close, despite the greater complexity of the damage localization problem; this might be due to the intrinsic capability of the FCN to detect correlations between different sensor recordings, allowing us to perform a correct damage localization. Figure 15 reports the evolution of the training and validation loss functions on D l val and D l test for the datasets with SNR= 15 dB and SNR= 10 dB, in the case where a two branches convolutional architecture has been employed. Compared with Fig. 14, a smaller difference in terms of loss and accuracy can be highlighted. This is due to the greater complexity of the damage localization task, that requires to exploit the computational resources of the NN entirely. Indeed, the same number of filters N 1 , N 2 and N 3 has been used for both the    Table 3, would not be affected by reducing the number of filters. This conclusion can be reached by looking at Fig. 14 and observing that, during the last stages of the training, A d on D d train is shown to be always greater than the one obtained on D d val .  Table 6 Damage localization, case 2

0.972
Accuracy A l of the classifier G l evaluated on D l test In Fig. 16, the confusion matrices related to the two datasets (SNR= 15 dB and SNR= 10 dB) are reported. Most of the errors concern the classification of the damage scenarios in which the inter-story stiffness of the highest floors has been reduced, as shown by the entries of the 7-th and 8-th rows and columns of the matrices. This outcome is not surprising if we consider that these damage scenarios only induce small variations in the shear vibration frequencies. Moreover, by looking at Figs. 9 and 10, we can remark that the time evolution of the structural motions under these damage scenarios cannot be easily distinguished from the undamaged case.

Damage detection and localization in case 2
We now consider the outcomes of the trained classifiers in the case where a random disturbance is applied to the structural system. Regarding the damage detection task, with this type of excitation the NN is able to distinguish between undamaged and damaged instances almost perfectly (see Table 5). Indeed, A d = 0.999 and A d = 0.998 have been reached by the two convolutional branches architecture when f min = 15 and f max = 17 Hz, or f min = 5 and f max = 7 Hz, have been selected as frequency ranges for the applied lateral and vertical forces. We next consider the NN outcomes for the damage localization task. With this type of excitation, the NN is able to accomplish an extremely accurate classification of the damaged scenarios, reaching A l = 0.986 and A l = 0.993 when f min = 15 and f max = 17 Hz or f min = 5 and f max = 7 Hz have been used, respectively. In the former case, the best classification performances have been obtained by the two convolutional branches architecture, as shown in Table 6. For the latter case, the NN employing as input F * = {u sh provides the best classification result. The better performances of the NN employing F * = {u sh i } 8 i=1 rather than F * = {u ax i } 8 i=1 is likely due to the fact in this latter case no axial frequencies have been excited by the applied load, as remarked in Case 2 (white noise load case). However, this fact also shows that the data fusion operated by the two convolutional Accuracy A l of the classifier G l evaluated on D l test . Different numbers N 0 of input channels F * , related to u sh i , are employed. Here, f min = 5 and f min = 7 Hz branches architecture has been only partially able to select the most important information required for the damage localization task. Nevertheless, very good results have been reached by also employing F * = {u ax i } 8 i=1 (see Table 6). We highlight the effect of each incoming signal on the classification outcomes (see Table 7), since the accuracy A l on D l test changes for different numbers of input signals N 0 . The results refer to the case in which only some of the displacements u sh i , i = 1, . . . , 8 have been considered, and f min = 5 and f max = 7 Hz. The corresponding confusion matrices are sketched in Fig. 17, showing that the classification error related to a damage scenario g is reduced when the corresponding u sh g , that is the signal acquired on the floor whose inter-story stiffness has been reduced, is used as input for the NN.

Conclusions
In this paper, we have investigated a new strategy for real-time structural health monitoring, treating damage detection and localization as classification tasks [3], and framing the proposed procedure in the family of SBC approaches [4]. We have proposed to employ fully convolutional networks to analyse time series coming from a set of sensors. Fully convolutional networks architectures differing for the number of convolutional branches have been exploited to deal with datasets including time signals of different length and sampling rate. Convolutional layers have been shown to enable the automatic extraction of features to be used for the classification task at hand. The neural network architecture has been trained in a supervised manner on data generated through the numerical solution of a physics-based model of the monitored structure under different damage scenarios.
In the considered numerical benchmarks, we have obtained extremely good performances concerning both damage detection and damage localization, even in the presence of noise, when the applied loads can be characterized either (i) in terms of a few (a priori, random) frequencies, or (ii) by a higher number of frequencies, within a given range. Especially in the second case, the outcomes of the NN classifier have shown the potentialities of the proposed procedure in view of the application to real-life cases.
In future works, we aim to employ the proposed architecture to deal with data coming from real monitoring systems, tackling the main limit of the proposed procedure concerning the mimicking of the real structural response. This is a well-known problem in the machine learning community [40]. By coupling recurrent layers branches to the proposed convolutional ones, we expect to further increase the NN performances. As further steps, we will try to exploit model order reduction techniques for the dataset construc- tion, extending the proposed methodology to more complex structural configurations and damage scenarios, and to design the set of sensors according to a Bayesian optimization technique [20,41,42].