A methodology to assess and improve the physics consistency of an artificial neural network regression model for engineering applications

In recent times, artificial neural networks (ANNs) have become the popular choice of model for researchers while performing regression analysis between inputs and output. However; in scientific and engineering applications, developed ANN regression model is often found to be inconsistent with the physical laws. This is due to the fact that ANNs are purely based on data and do not have any understanding of underlying physical laws. Alternate ANN frameworks like PGNN (Physics guided neural network) has been proposed in literature which incorporate physics loss function in the overall loss function to partially alleviate this issue. However, these frameworks don’t evaluate the physics consistency of relationship between inputs and output mapped by the ANN model which is the source of all physics inconsistencies. Hence, the present paper presents a methodology to assess and improve the physics consistency of the input output relationship mapped by the ANN regression model. The developed methodology can therefore be used to develop physics consistent ANN regression model. The heart of the methodology is an inferencing algorithm which interprets the input output relationship mapped by the ANN regression model. The inferencing algorithm is based on Taylor series and decomposes the ANN regression model into several region-wise polynomial models. Moreover, the inferencing algorithm can also find regions of singular zones in the ANN model predictions. The region-wise polynomial from inferencing algorithm can be used to assess the physics consistency of the ANN model. In the region of physics inconsistency, additional data points can be added and the ANN model can be re-trained. In cases, where the addition of data points is not possible, a physics based loss function can be used. The developed methodology is illustrated using several datasets. The developed methodology will help engineers and researchers built physics consistent ANN regression models.


Introduction
Artificial neural networks (ANNs) have been increasingly used in various fields such as engineering, geology, medicine, business, and experimental physics. Especially in science and engineering application, ANN regression models have been increasingly used to model complex relationship between various physics quantities. One of most common usage of ANN regression models in engineering applications are as surrogate models of physical processes. ANN surrogate models are generally developed either using experimental data or high fidelity computation-aided engineering (CAE) model data. Few of the applications of ANN regression models in engineering and science applications are detailed here. Wu et al. [1] developed an ANN surrogate model of engine air flow rate as a function of engine speed, manifold absolute pressure, intake and exhaust camshaft phasing and this model helped in accelerating the engine design process. Engine torque was maximized at elected engine speeds at a wide throttle opening by using ANN surrogate model of volumetric efficiency [2]. Meyer and Greff [3] established through investigations that ANNs have capability to replace conventional look-up tables of engine electronic control unit (ECU). Wendeker and Czarnigowski [4] proposed the use of an adaptive control system and a trained ANN surrogate model to minimize the error in the estimation of the required air-fuel ratio by an engine for stationary and dynamic operating conditions. An On-Board Diagnostic fault detection system using an ANN was developed by Grimaldi and Mariani [5]. Nicodemus and Sudipto [6] proposed a novel architecture of ANN regression model with end to end neuron connection to predict the finite element analysis (FEA) stress of an automobile connecting rod. An ANN regression model to predict to wear and tear of aircraft parts was developed by Paul et al. [7]. Recently, ANN regression models have also been used in string theory formulation [8,9]. In all these applications [1][2][3][4][5][6][7][8][9], ANN regression models were used to model physical quantities which are continuous and differentiable and hence these studies used either tangent hyberbolic (tanh) or sigmoid activation function which results in smooth and differentiable ANN output function.
Furthermore, in these applications, it is essential that developed ANN model does not violate any known physics laws. However, since ANNs are unaware of the real-world physics which are being modelled and their predictions are only based on the available data it is shown in literature that ANNs can sometimes learn spurious relationships due to scarcity of data and noise. Liano [10] was among the first researchers to highlight that ANN are susceptible to learning from noise in data. Furthermore, Karpatne et al. [11] has shown that ANNs will learn relationships which are not consistent with physical laws but look deceptively good on both training and test when size of data is small.
To overcome these limitations, alternate frameworks of ANNs have been proposed in literature. Karpatne et al. [11] proposed a novel framework for combining scientific knowledge of physics-based models with ANNs and coined the term physics guided neural networks (PGNN) for the framework.. Two steps are generally involved in the framework of PGNN i.e. (a) constructing hybrid-physics-data models where output of the physical based model is also taken as an input (b) using physics-based loss functions which penalizes the ANN if physics laws are not maintained. Jia [12] proposed a similar physics guided NN framework for time series data called physics guided recurrent neural networks (PGRNN) for transient lake temperature modelling. Zhang et al. [13] presented the physics-guided convolutional neural network (PhyCNN) for data-driven seismic response modelling. Wang et al. [14] proposed a PGNN methodology for modelling machine tool wear. Another class of neural network framework which are based on solving the physical laws rather than learning physics from training data are called physics informed neural networks (PINN). PINNs using the universal function approximator property of ANNs solve complex partial differential equation (PDE) systems [15,16] without any data. The performance of PINNs is comparable to many established PDE solvers. However, PINN face similar disadvantages as physicals based models and require many calibrated parameters to capture the complex physics.
The PGNN frameworks are based on penalty loss for consistency of physical parameters outside the input variables and are not concerned about output relationship with input. However, the source for all physics inconsistencies is the improper relationship mapped by ANN between input variables and output variables. The complete physical relationship between inputs and output is generally unknown; however, partial knowledge from available literature and expert opinion is available for most of these relationships. For example, the physics relationship between air temperature and lake temperature in lake temperature modelling [11] is provided by Yu et al. [17]. But since ANNs map highly non-linear relationships in multidimensional input space it is very difficult to infer the mapped relationship of ANN between input and output variables and to assess whether the known partial knowledge is consistent with the developed ANN model or not.
There has been significant research and many developed methods to infer input output relationship of ANN classification model especially for computer vision problem whereas methods to infer input output relationship of ANN regression model are very scarce. Several authors proposed gradient based backpropgation algorithms [18][19][20][21] to infer feature importance of ANN classification model. It should be noted that the above algorithms [18][19][20][21] only work for ReLU activation function. However, most of ANN regression models [1][2][3][4][5][6][7][8][9] in engineering applications use tanh or sigmoid function hence these algorithm [18][19][20][21] cannot be used. Another method to visualize the pixel wise contribution to a class prediction is layer-wise relevance propagation (LRP) [22]. Sundarajan et al. [23] proposed a attribution method based on integrated gradients. Recently, Chattopadhyay et al. [24] considered ANN architecture as a structural causal model, and they presented a methodology to compute the causal effect of each feature on the output. These algorithms [22][23][24] provide a numerical value for contribution for each input feature for a particular output so these algorithms cannot be used to assess physics consistency of input output relation of ANN regression model. Another class of algorithm which interpret the input output relationship of ANN models are the methods based on perturbation of a particular output instance and building a local model around that instance. LIME (Local Interpretable Model-Agnostic Explanations) algorithm [25] is most notable among these algorithms. LIME algorithm is based on sampling the input space in neighbourhood of output point and building local interpretable model by minimizing weighted mean square loss function with weightage function based on distance metric. LIME is model agnostic i.e. no restricted on model being interpreted which could random forest, ANN, SVM (support vector machine) or any other machine learning(ML) model. Also LIME can provide explanations based on variables which need not be the input variables to ML model. LIME works best for classification where only relative importance of inputs are required for an output prediction. Whereas for ANN regression models the model characteristics is required to be done in certain region and LIME suffers from disadvantages of non-unique local model and local model dependence on parameters of weightage function.
The main aim of the work is to present a methodology to develop an ANN regression model which is consistent with the available physics knowledge. To do this an algorithm is required which would interpret the input output relationship mapped by the ANN regression model. Furthermore, ANN maps vastly different input and output relationships in different input regions. Hence, in the present work the input output relationship of ANN regression model is inferred by using a multiple region wise polynomial models for ANN regression model. One possible method for developing these polynomials is by fitting the polynomial function over the ANN function values; however, there would be error between ANN and polynomial function and polynomial function may not capture the relationship mapped by the ANN model. Therefore, Taylor series [26] will be the basis for the developed the region wise polynomial function. Taylor series since it is based on differentials of the function they can capture the primary relationship even with error between Taylor polynomial and ANN function. Taylor series share one key property with ANNs that is, both are universal approximators. In fact, both Taylor series and ANN have been used to same application of solving complex partial differential equations [15,16,[27][28][29]. A key differentiator between Taylor series and ANN is that ANNs are flexible to approximate any function over any scale whereas Taylor series can approximate any function only inside the radius of convergence. A polynomial similar to Taylor polynomial can be generated using LIME algorithm but a key difference is that LIME builds the polynomial on top of ANN model by sampling input space in the vicinity of point of interest i.e. local model error behaviour would depend on model building process which include selection of sampling points and weightage function parameters. Whereas the Taylor polynomial in the current methodology doesn't another local model but characterizes the ANN model based on its local differentials at point of interest. Hence the local behaviour error profile will depend on characteristics of ANN model.

Methodology for developing a physics consistent ANN regression model
The traditional process of developing an ANN surrogate model in engineering applications involves the following steps as shown in Fig. 1. First the data points are sampled within the input space using either a latin hypercube or full factorial or other sampling techniques. Then the output data is generated using test procedure or CAE methodology for the selected sampled input points. The ANN regression model is fitted over the generated data. The predicted ANN output is correlated with generated data and regions in input space where accuracy is less, additional data are generated in that region. The process is iterated till the acceptable performance of model is reached. In this process, the accuracy of the predicted output is only considered.
In the proposed methodology, the ANN regression model will be optimized for prediction accuracy, physics consistency as well as consistency of singular zones in the input space as shown in Fig. 2. Mathematically, singular zones are defined as points where the function is not well-behaved. Physically, singularities are points in the input space where the output physical quantity behaves erratically and rapidly change in output is observed due to a very small change in one or more input variables. One of the well known physicals examples of singularity is resonance in vibrations. Generally, designers of engineering systems try to avoid running the physical system near singularity or design robust control system near singularity; hence identification of singularity region in the input space is of paramount importance to the user. In next section, brief details regarding properties of Taylor polynomial of ANN function will be explained. Then the ANN inferencing algorithm is explained in two sub-sections. Finally applications of developed methodology on datasets will be presented.

Properties of Taylor polynomial of ANN function
Given a training dataset with inputs x = {x 1 , x 2 , x 3, . . . . . . . . . . . . x k } and target output t( x) , let the ANN regression output function be NN( x ) and error function between ANN prediction and actual output be δ( x) i.e.
The properties of the output function of ANN (NN( x )) depends on the activation function used in the ANN. Using a finite higher order differentiable activation functions like tanh or sigmoid yields an ANN output function which is continuous and infinitely differentiable. Whereas using activation function like ReLU yield an ANN function which has zero higher order differentials. Since Taylor polynomial is based on higher order differentials, the present inferencing algorithm can be only used with activation functions like tanh and sigmoid. This is not a major hindrance since ReLU activation function is not used for ANN regression models in science and engineering applications. The output function predicted by ANN (NN( x )) using a finite higher order differentiable activation function can be approximated using Taylor polynomial of degree p at point x o and is given as The Taylor polynomial is only accurate if the computation point ( x ) lies within the radius of convergence of development point ( x o ). The various differential of NN( x ) with respect to inputs x can be computed by using Tensorflow gradients function [15] which is an automatic differentiator. One thing to note here is that Taylor approximation is for the ANN function which includes the error function ( δ x ) along with output function. It is well known fact that higher ANN function differentials don't correlate well with target function differentials even when using differentials in training (Sobolov training [30]). However, as explained above the ANN differentials are being used to characterize the ANN function and not target function and hence higher differentials trust worthiness is not an issue in the study. Several instances of use of higher order ANN differentials for reconstruction of ANN function has been shown throughout the manuscript even for datasets with noise. In subsequent sections properties of ANN differentials and Taylor polynomial are illustrated using examples. For the sake of simplicity examples used below are functions of single input but the same principles can be extended to multi inputs functions.
There are many aspects of the ANN model which should be consistent with the physics laws; however, the current paper only deals with physics consistency in relationship between input and output variables mapped by ANN regression model. Generally, in engineering applications the ANN regression model are used to compute the output variable variation in an interested region of input sub-space for decision making process. Hence the relationship between input and output variable are required for certain region for ANN regression model instead of a single point as in case ANN classification model. Therefore, the methodology to develop Taylor polynomial approximation for certain region of input sub-space is outlined in next sections. Each coefficient (sign and value) of Taylor polynomial explains relationship between input and output variable. These relationships can be checked for physics consistency based on available literature or expert opinion. With help of couple of examples the relationship explained by each Taylor coefficient will be illustrated here. Let t is target variable be function of two variables {x 1 , x 2 , } i.e. t = NN ( x 1 , x 2 ) and let the p th degree Taylor polynomial C 00 + C 10 x 1 + C 01 x 2 + C 20 x 2 1 + C 11 x 1 x 2 + C 02 x 2 2 . . . . . . . . . . . . . . . . + C 0p x p 2 which approximates target function in certain region of input space. Positive sign for Taylor coefficient C 10 indicates target variable t increases with increase in x 1 and decreases with decrease in value of x 1 whereas negative sign indicates inversely proportional relationship between t and x 1 in the approximated region. The coefficient C 11 explains the interaction effect of x 1 and x 2 on t i.e. if coefficient is positive then the target value increases with increase in x 1 and x 2 but simultaneous decrease in x 1 and x 2 will also yield increase in value of t. C 20 shows the effect absolute value of x 1 on t. The values of coefficient demonstrate the relative weightage of these different effects. The explanations can be checked with known knowledge of system being modelled. For example, the ANN surrogate model of Ref. [1], models engine air flow rate and one of its input is engine speed. It is a well known fact that the engine air flow rate increases with engine speed irrespective of other input parameters values and the consistency of this physics relationship can be checked by the Taylor polynomial. An more detailed illustration of this physics inferencing is presented in Sect. 5.3.

Higher order differentials of ANNs
In order to understand the behaviour of higher order differentials of ANN function, let's take a simple ANN with one input and one hidden layer of " l" neurons and one output. Let's weights and biases for input layer to hidden layer be given as w ih1 , w ih2 , . . . . . . . . . w ihl and b ih1 , b ih2 , . . . . . . . . . b ihl and hidden layer to output layer be given as w ho1 , w ho2 , . . . . . . . . . w hol and b ho1 , b ho2 , . . . . . . . . . b hol . Let the hidden layer have an activation function f 1 (.) and output layer is a linear layer. The output of each of l neurons is given as The output function NN x can be given as It can be seen from Eq. (2) that for infinitely differentiable activation functions ( f 1 (.)) like tanh or sigmoid function the ANN function (NN( x )) will be infinitely differentiable and the function higher differentials will never vanish to zero. This is true even when data is generated from simple functions like x 2 and x 5 whose higher differentials vanish to zero. For activation function like ReLU whose higher order differentials from second differential is zero, the ANN function will have higher order differentials as zeroes. The same conclusions can be extended to ANN with multiple hidden units and in practice it holds true.
To visualize the behaviour of higher order differentials of ANN function, two artificial datasets were used to train ANN with ReLU and tanh activation function. 100 uniform data points for x were taken from interval [− 4, 4] and target output y is generated using two analytical functions Sin(x) and x 2 . The ANN used here has 3 hidden layer with each layer having 15 neurons. The datasets can be modelled with lesser hidden layers and lesser number of neurons but these values were chosen so the observed behaviour would be valid for most of practically ANNs used for regression. For all datasets in the manuscript, the inputs and output values are normalized to have mean value of zero and variance value of one before feeding to the ANN model. Total data was split into 70-15-15 training, validation and test data sets and MSE along with regularization loss was taken as loss function and ANN was trained using gradient decent. It is well know that the first order differential is discontinuous and its higher order differentials are zero for output function of ANN while using ReLU activation function. This has been confirmed by the results from example ANN model using ReLU activation function but has not been shown in manuscript for sake of brevity. Hence, Taylor polynomial cannot be used to approximate ANN function when using ReLU activation function. Figure 3 shows the results from the ANN function while using tanh activation function which includes error function and some higher derivatives. It can be observed from Fig. 3 that the derivates from ANN function are continuous when using tanh activation and hence the ANN function can be approximated using Taylor polynomial.

Degree of Taylor polynomial
A zero degree Taylor polynomial is the function value at the development point itself and has zero radius of convergence. Increasing the degree of Taylor polynomial increases the radius of convergence at the development point. Radius of convergence is the distance from development point beyond which the Taylor series doesn't converge to the function value. However, computing radius of convergence is computationally expensive and hence a simple method would be used to compute the acceptable region of approximation by selecting a threshold error and comparing it with error between Taylor polynomial and ANN function. If the error is less than the threshold error than Taylor polynomial can be used to approximate in that region.
To understand the effect of degree of Taylor polynomial on approximation capabilities of Taylor polynomial, an artificial data set similar to previous studies [31] is generated with 200 uniform points in the range of x of [− 7.5,7.5] and target output y is generated using Sin(x) x . Same 3 layers ANN architecture with tanh activation function is used. The error between Taylor polynomial and ANN function for different degrees of Taylor polynomial along with error function is plotted in Fig. 4. It can be seen from Fig. 4 that error decreases as degree Taylor polynomial increases. The acceptable region of approximation of particular degree of Taylor polynomial can be accessed by taking a suitable threshold error such as 0.02 or 0.05.
For a multi-variable input ANN function, the acceptable region of approximation may be different in different input variable directions. Also it is very difficult if not impossible to visualize the error function or ANN function as function of input variables for a multi-variable.

Singularities in ANN functions
Taylor series cannot approximate function accurately in the singularity region. Radius of convergence of Taylor polynomial is severely reduced if development point is near the singularity. To show the Taylor polynomial behaviour near singularity, an artificial data with singularity is generated with 100 uniform points in the range of x of [− 4, 4] and target output y is generated using . Same 3 layers ANN architecture with tanh activation function is used. The generated dataset and error function are shown in Fig. 5. It can be seen from Fig. 5a that the singularity zone arises around x =1 and also it difficult to predict the singularity just from observing error function plot in Fig. 5b.
Taylor polynomial of 7th degree is used to approximate the ANN function with 3 different development points ( x d ). The three development points are chosen such that first point is far away from singularly, second point is near to the singularity and third point is at the singularity. The error between ANN function and Taylor polynomial for  (2022) 9:11 these 3 development points is shown in Fig. 6. For the development point which is far from singularity, the acceptable region of approximation is more in the direction opposite to singularity as compared to the direction towards the singularity. Moreover, it can be observed the acceptable region of approximation significantly reduces as distance between development points and singularity reduces. Taking the development point at singularity the acceptable region of approximation is very small and the error increases very sharply even at a very small distance from development point. Hence, using the Taylor polynomial the singularities in ANN function can be identified.

ANN regression model inferencing algorithm
In this section, the ANN inferencing algorithm will be presented. The inference algorithm will decompose ANN function into several region-wise Taylor. The algorithm is be divided into two major sections namely (a) computation of acceptable region of approximation of a Taylor polynomial for a given development point, and (b) strategy for selection of development points to envelope the input sub-space.

Computation of acceptable region of approximation for a ANN Taylor polynomial with a given development point
The computation of acceptable region of approximation is significantly complicated for multi variable function as compared to a single variable function. For even a single variable function as seen from Fig. 6 the acceptable region of approximation is different in different directions i.e. x d + x and x d − x . Given a multi-variable ANN functions To illustrate the behaviour of error function between ANN function and Taylor polynomial around the development point for a multi-variable function a two variable data set is generated similar to Ref. [32]. A uniform grid of 400 points are generated within the range of x and y taken as [-2, 2] and output target points z is generated using the function x * exp(−(x 2 + y 2 )).Same ANN architecture with tanh activation function is used as before. The output function and error function for two variable dataset is shown in Fig. 7.
Let the threshold error for computing acceptable region of approximation be 0.005. In this study, absolute error is considered as threshold error, but the algorithm is also applicable when relative error or percentage error is considered as threshold error. The Taylor polynomial is computed at the input data points and error between ANN function and Taylor polynomial as a function of distance from development point is shown in Fig. 8a. It can be seen as the distance from development increases the error also increases. Furthermore, for the same distance from development point different values of errors are observed based on the direction of the evaluation point from development point. In Fig. 8b, the data points are shown for which the error is within the threshold error. An important thing to observe from Fig. 8b is that the

Algorithm:
Step 0: Check all inputs for consistency Step 1: Repeat step 2-3 for i=1 to k (number of input variables) Step 2: Compute the error of the Taylor polynomial in dimension around the development point i.e. compute error or Taylor polynomial at point's { 1 , 2 , 3, … , + Δ … … … } (Δ should take positive and negative values) Step 3: Compute maximum Δ within which the error is less than threshold error in both positive and negative directions of Δ and select minimum of those two values and it is the maximum or limiting traversal length in each input i direction ( ) ]. If Taylor polynomial error for the grid is within threshold error then stop the algorithm and this region is acceptable region of approximation else go to next step. For simplicity let us represent the traversal length in each direction as and since this can be considered as zero iteration the limiting lengths can be denoted as 0 = .
Step 5: Generate two acceptable and not-acceptable length arrays and which represents the traversal lengths in each variable direction within threshold error and greater than threshold error. Initialize as zeroes and .
Step 5: Repeat step 6 to 8 (iteration j) until convergence criterion is met Step 6: Generate a grid of data points using ANN function for region 1  The convergence criteria can be a limit on maximum iterations or a minimum limit on distance between xc ai and xnc ai . If the limit on maximum iterations has reached and still algorithm couldn't find any non-zero region within the threshold error or detected region size is too small based on criteria then the development point can be considered as a singularity. The algorithm will be illustrated on two variable dataset at development point (0, 0). The algorithm is illustrated on the two variable data. Initially, 5th degree Taylor polynomial is computed in x and y direction by keeping other coordinate(s) constant at the development point. The error limits in x and y directions i.e. xlim 1 and xlim 2 are estimated as 0.375 and 0.8 for threshold error of 0.05. The error between ANN function and Taylor polynomial as a function of distance from development point in x and y direction is shown in Fig. 9a, b. A uniform grid of 100 points is The length of traversal in positive and negative direction is kept same in the present algorithm to simplify the strategy of selection of development points which is presented in next section. Furthermore, the splitting of limiting lengths by two is chosen to get convergence quicker with iterations and to avoid stalling of algorithm in singularity zones.
The algorithm is also applied to 5 variable dataset used in literature [33].Generally, most of the ANN regression models in science and engineering applications have input variables less than 10 and in most cases there are less than 5. The input range of each of the 5 variables is taken as [0, 1] and relationship between target output and the inputs is given by t = 10sin(π x 1 x 2 ) + 20(x 3 − 0.5) 2 + 10x 4 + 5x 5 .1200 point are sampled from input space using latin hypercube sampling and output values for these inputs points. The same 3 layer architecture with tanh function is used to train the data. The proposed algorithm will be used to find the region of acceptable approximation for 3 rd

Strategy for selection of development points
This section details a strategy for selection of development points to approximate a given input space with region wise Taylor polynomials. The algorithm presented here is geared towards minimizing the total number of iterations and error between ANN function and Taylor polynomial and is not concerned with the number of regional Taylor polynomials required to approximate the input space. The inputs to algorithm are input space to be approximated, threshold error, criteria for minimum acceptable region below which the region is to be considered as a singularity, minimum and maximum value of degree for Taylor polynomial and number of sample to be drawn for initial step. The proposed algorithm is detailed below.  Algorithm: Step 0: Check all inputs for consistency. Let the input space to be approximated be given as Step 1: Select n samples of developments points inside the input space based on either latin hyper cube sampling or random sampling or any other sampling method. Let each development point be { 1 , 2 , 3 , … … … … } where s is from 1 to n.
Step 2: Repeat step 3-4 for value of Taylor degree of polynomial from minimum to maximum Step 3: Compute the acceptable regions of approximation for all the sample points. Remove the sample which may be a singularity regions based on criteria for minimum acceptable region. Let an acceptable region for a sample be represented as 1  . Compute the vector = { 1 − 1 , 2 − 2 … … . . … , − }.
Step 5: Select the degree of Taylor polynomial as the degree with maximum Euclidean norm of the vector i.e. √( 1 − 1 ) 2 + ( 2 − 2 ) 2 … … … … . +( − ) 2 Step 6: Repeat step 7 for l=1 to k (number of dimensions of input data) Step 7: Compute spl=ceil( Step 11: If the computed error is less than the threshold error then output acceptable region of approximation for the development point as the expected value and go to next iteration if final iteration go to step 12 Else compute the acceptable region of approximation of the development point If the computed region is smaller than criteria for minimum acceptable region then consider region as singularity and go to step 12 Else if computed acceptable region of approximation is smaller than criteria for minimum acceptable region the go to step 0 with input space to be approximated as expected acceptable region of approximation Step 12: Check if whole input sub-space is enveloped if so output Taylor polynomial for each region. The basic idea behind the algorithm is to sample some development points in the input space in order to estimate the average region of approximation and use this average region of approximation as expected region of approximation to distributed development points uniformly across the input space. For these uniformly distributed development points if the actual region of approximation is larger than the average region of approximation then for these development points the region of approximation can be taken as average region of approximation. For those development points whose actual region of approximation is smaller than the average region of approximation the average region of approximation is divided into further sub-domains. This sub division process is done by considering the average region of approximation around the failed development point as the region to be approximated and samples points can be taken in this region and the whole steps for this region is repeated. Using mean of sampled development points region of approximation, it may be reasonably to expect around half of the development points in the input space to have actual region of approximation larger than average region of approximation so in order increases the number points having larger than expected region of approximation, mean minus standard deviation of region of approximation of sampled development points is used. The algorithm will be illustrated on 5 variable data set to approximate the ANN function in input space of  Tables 2 and 3. The detailed computation for point B using 3rd degree Taylor is already presented in preceding section. It can be clearly seen from Tables 2 and 3 that the norm of γ vector is larger for 3 rd degree Taylor polynomial as compared to a 2nd Taylor polynomial and hence degree of Taylor polynomial to approximate ANN function is taken as 3. The computed number of segment in each direction can be given as{sp 1 ,sp 2,   {3,2,2,1,1}. Hence the total input space will be divided into 12 (3*2*2*1*1) regions with 12 developments points. The co-ordinates of development points along with expected acceptable region of approximation is given in Table 4. It can be observed from Table 4 that the error in an expected acceptable region of approximation is less than threshold error for all but one sub region. For this region, the region of approximation would be computed by taking this region as baseline input space in next iteration.

Application of developed algorithm
In the preceding sections, the methodology to develop physics consistent ANN using inferencing algorithm is detailed. In, this section, the application of developed methodology has been illustrated on different datasets. Four benchmarking studies are provided which highlight different aspects of developed algorithm. In the first study, the difference between developed algorithm and LIME algorithm are detailed. The second study shows the singularity detection capabilities of the developed algorithm. The third and fourth studies show an example of interpretation of physics consistency using the developed algorithm and development of physics based loss function to obtain physics consistent ANN model.

Comparison with LIME
In this case study, the results from LIME algorithm will be compared with proposed methodology. An ANN model with 3 layers and 20 neurons each was built for 2-variable function ( f = x * exp(−(x 2 + y 2 ))) which was modified by adding of Gaussian noise of zero mean and 0.01 standard deviation of 0.01. The addition of noise was done to simulate real world dataset behaviour. The modified 2-variable function along with ANN model performance is show in Fig. 10. As mentioned before in the introduction LIME builds a local model over the ANN model by sampling input space in the vicinity of point of interest whereas the current methodology characterises the ANN model Table 4 Development points and expected region of approximation locally by using its local differentials. Furthermore, the local model from LIME algorithm depends on parameters chosen i.e. number of samples and standard deviation of weightage function. Moreover, since LIME model building involves sampling, different model would be outputted for different runs of LIME algorithm. This is confirmed by using the regression tutorial provided by authors of lime package (https:// github. com/ marco tcr/ lime). But the Taylor polynomial developed from current methodology is always unique. Below the equations of Taylor polynomial along with 4 LIME instances of similar polynomials generated at development point of (0, 0) are presented. For the 4 LIME models, the all sampling points are taken in the region of x ϵ [− 0.4, 0.4] and y ϵ [− 0.4, 0.4]. It is interesting to note that the coefficient of x and y terms which are equal to ANN first order differentials at (0, 0) are close to target function differential values of 1 and 0 for all the polynomials. Also, it can be observed that for different LIME instances the coefficients values and their signs are quite different hence offer different explanations of the ANN model (for instance, value of x 3 ,x 2 y and xy 2 coefficient and sign and value y 4 coefficient). Hence it is difficult for users to interpret and understand the physics consistency of ANN model by using LIME algorithm. It should be noted that the LIME explanation variation is even more significant in the tutorial provided by the authors but in this test case the variation was reduced due to sampling in small domain.  Fig. 11. Couple of things can be observed from error plots. Firstly, the range of error plots of Taylor polynomial and LIME 3 is similar but coefficients of some terms are vastly different. More importantly, it can be observed the Taylor polynomial error is less and is of one sign in the neighbourhood of development point which is desirable characteristics but this is not true for the LIME polynomial. This due to fact Taylor polynomial characterises the ANN model whereas LIME algorithm built a model on top of ANN. Moreover, using the current methodology Taylor polynomials with any user specified accuracy of ANN model can be built which is not possible with LIME. From a theoretical point of view the proposed Taylor polynomial can captures the behaviour of ANN model more efficiently than LIME algorithm. In summary, the major differences between proposed methodology and LIME model are (i) LIME built a local model in neighbourhood of development point whereas the proposed methodology develops a Taylor polynomial using local differentials which characteristics the ANN model (ii) LIME model is parameter dependent whereas proposed methodology is not (iii) Taylor polynomial from current methodology is unique whereas LIME outputs different polynomial for different runs (iv) proposed methodology can be used to develop series of Taylor polynomial to approximate ANN model within a given error limit which is not possible with LIME algorithm. Taylor

Singularity detection
One of the proposed usages of the proposed methodology is for detection of singularity zones in ANN prediction space. As seen from Sect. 3.3, Taylor polynomial has very minimal radius of convergence when the development point is near a singularity. Making use of this property, the singularity can be detected if the region of approximation of development point is less than threshold value. The threshold can be selected on heuristic judgement and previous experience. In a section, a subpart to proposed methodology is used to detect singularity. It is difficult from real world datasets to ascertain whether a region is actual singularity or not and hence an artificial dataset will used to confirm the accuracy of proposed methodology. 5 variable function which was used in Sect. 4.2 will be slightly modified to have singularity at x 4 = 0.5 and . 2500 input points were generated using latin hypercube sampling. An ANN model of 3 layers with 15 neurons was built using the data. The performance of the developed ANN model is shown in Fig. 12

Interpretation of physics consistency
For the next 2 studies, datasets were chosen from UCI repository maintained by the center for machine learning and intelligent systems at the University of California [34].
The first dataset will be used to illustrate the usage of the inferencing algorithm to obtain input output relations of ANN model which can be assessed for physics consistency. The first data set is the yatch hydrodynamic dataset generated using full scale experiments performed at delft ship hydromechanics laboratory using 22 different hull forms [35]. The data set contains 308 data points with 6 input variables i.e. longitudinal position of the center of buoyancy, prismatic coefficient, length-displacement ratio, beam-draught ratio, length-beam ratio, and Froude number and one output variable which is the residuary resistance of the ship hull. In first step, the regression model was developed from this. An ANN with 3 hidden layer each having 20 neurons was developed for this data.
The ANN model prediction of residuary resistance as compared to actual resistance is shown in Fig. 13. In next step, the inferencing algorithm would be applied to total input space or important sub-spaces. For the sake of brevity, the inferencing algorithm is illustrated for a single input sub-space: centre of buoyancy(  in the input sub space. The region of approximation for an 2nd degree Taylor polynomial with threshold error of 0.1 was computed using algorithm in Sect. 4.1. The values of region of acceptable approximation is not presented for the sake of brevity. Based on the average minus standard deviation values of acceptable length of approximation two development points in the input sub-space was selected. As seen from Table 5 the absolute error in both the expected region of approximation is less than the threshold error and hence the Taylor polynomial with initial development point can be used to approximate these regions. The Taylor polynomial for only region A is given below for the sake of brevity.   Some of the major relationship explanations inferred between input and output variables using Taylor polynomial in region A are: • Residual resistance is directly proportional to longitudinal position of the center of buoyancy, length-displacement ratio, and Froude number and inversely proportional to prismatic coefficient, beam-draught ratio, and length-beam ratio. • Froude number has largest influence of residual resistance. • Residual resistance is directly proportional to length-displacement ratio values but is inversely proportional to the square value of length-displacement ratio value. • Residual resistance is inversely proportional to beam-draught ratio values but is directly proportional to the square value of beam-draught ratio value • The interaction term of the prismatic coefficient and Froude number is third largest contributor to residual resistance and is a positive iteration term i.e. decrease/ increase of both variables will increase residual resistance but decrease of one variable and increase of other variable with decrease residual resistance From the developed relationships, the physics consistency can be assessed with the known physics knowledge. The first two observations seem to physics consistent from available open literature [35] whereas the third to fifth observations physics consistency can be assessed by domain expert. Similarly, inference algorithm can be applied to other regions and additional points can be generated for region where physics inconsistency is found. The process can be iterated till all the regions in ANN regression model satisfy the physics consistency.

Physics based differential loss function
The second engineering dataset which will be used to illustrate the physics based loss function is the concrete compressive strength dataset [34].The dataset consists of concrete compressive strength for different values mixture constituents along with the age  .
From the literature [36,37], it can be seen that the compressive strength (y) in is proportional to the amount of superplasticizer (x 5 ) which means the coefficient of x c5 in Taylor polynomial should be positive. But that is not the case which means this physics relationship is violated by the ANN regression model near the mean. One way to make the ANN model consistent with physics is to add more data points near mean of data and re-train the model. This is the preferred method. However, in cases where addition of more data is not possible, physics based loss can be added to total loss. This physics loss can be taken as an average of differential of output w.r.t. violated quantity over all the inputs points. Each constant value in the Taylor polynomial generated by inferencing algorithm represents a differential quantity and the differential quantity of the violated term can be added in physics loss function. If the physics relationship is directly proportional relation then negative differential should be used and positive differential should be used inversely proportional relationship.
where Ω is multiplier of physics loss (0.1 in present case) where n is number of data points.
The model is re-trained using the new loss function and plot for actual vs predicted is shown in Fig. 15. Again the inference algorithm is used with mean as the development point and developed the relationship shows that the ANN regression model is consistent with physics with regard to superplasticizer relationship. The relationship between compressive

Conclusions
The present manuscript presents a method to develop physics consistent ANN regression models in engineering and science applications. The developed method assesses the physics consistency of ANN model in different input sub-space regions and regions where physics inconsistency is found, additional points in that specific region are added and ANN model is re-trained. In cases where addition of points is not possible or addition of points isn't sufficient, a physics based loss can be used in the training of the ANN model. The assessment of physics consistency of the ANN model is done using an inferencing algorithm which interprets input output relationship of ANN regression model. The inferencing algorithm is based on Taylor polynomial. Taylor polynomial is a polynomial computed using the ANN function value and ANN gradients at an input point called development points. However, Taylor polynomial is only accurate if the distance between estimate point and development point is less than the radius of convergence. Hence in this paper region-wise Taylor polynomials are used which approximates the ANN function. The paper also studies basic properties of gradients of ANN function and effect of degree of Taylor polynomial on the approximation capability of the Taylor polynomial. The algorithm is split into two sections. The first section of the algorithm deals with computing the acceptable region of around a given development point. This algorithm estimates the initial region for investigation based on limiting traversal length in each input direction within threshold error. The initial region is compressed or expanded based on error in that region similar to a bisection algorithm until convergence criteria is met. The second section of the algorithm deals with strategy for selection of development points. Initially several development points are randomly sampled and acceptable region of approximation for all these sample point is computed using first section of algorithm. The expected acceptable region of approximation is estimated using the sampled development points. Then the input space is split into several domains based on the expected acceptable region of approximation. If error in split domain is less than threshold error than the computed Taylor polynomial is used to approximate that region. If the error in any of the regions is greater than threshold error, then the specific regions are split into several regions based on the same algorithm. The algorithms also find singular zones in ANN predictions space as the region which have very small acceptable region of approximation or the regions which cannot be approximated by a Taylor polynomial.
Furthermore, the algorithm was applied several datasets to demonstrate the process of developing the physics consistent regression model. The first case study was used to compare current methodology with LIME algorithm while second case study was (y − 35.81796) 16.69763 = 1.01053 + 1.50332x c1 + 0.95170x c2 − 1.41150x c3 − 0.13374x c4 + 0.54586x c5 − 0.16829x c6 + 3.092357x c7 − 0.63694x c8 . used to demonstrate singularity detection. The third case study is used to illustrate the inferencing algorithm to check physics consistency and fourth case study is used to illustrate the physics based loss function. The presented methodology will help engineer and researchers develop physics consistent ANN regression models.