Solving Forward and Inverse Problems of Contact Mechanics using Physics-Informed Neural Networks

This paper explores the ability of physics-informed neural networks (PINNs) to solve forward and inverse problems of contact mechanics for small deformation elasticity. We deploy PINNs in a mixed-variable formulation enhanced by output transformation to enforce Dirichlet and Neumann boundary conditions as hard constraints. Inequality constraints of contact problems, namely Karush-Kuhn-Tucker (KKT) type conditions, are enforced as soft constraints by incorporating them into the loss function during network training. To formulate the loss function contribution of KKT constraints, existing approaches applied to elastoplasticity problems are investigated and we explore a nonlinear complementarity problem (NCP) function, namely Fischer-Burmeister, which possesses advantageous characteristics in terms of optimization. Based on the Hertzian contact problem, we show that PINNs can serve as pure partial differential equation (PDE) solver, as data-enhanced forward model, as inverse solver for parameter identification, and as fast-to-evaluate surrogate model. Furthermore, we demonstrate the importance of choosing proper hyperparameters, e.g. loss weights, and a combination of Adam and L-BFGS-B optimizers aiming for better results in terms of accuracy and training time.


Introduction
Machine learning approaches usually require a large amount of simulation or experimental data, which might be challenging to acquire due to the complexity of simulations and the cost of experiments.Also, data scarcity can cause data-driven techniques to perform poorly in terms of accuracy.This is particularly true when using real-world observations that are noisy or datasets that are incorrectly labeled, as there is no physics-based feedback mechanism to validate the predictions.To tackle this problem, physics-informed neural networks (PINNs) have been developed.PINNs integrate boundary or initial boundary value problems and measurement data into the neural network's loss function to compensate for the lack of sufficient data and the black-box behavior of purely data-driven techniques [1].In terms of forward problems, PINNs can serve as a partial differential equation (PDE) solver even in cases where domains are irregular.This is because PINNs utilize automatic differentiation and therefore do not require any connectivity of the sampling points, making them a mesh-free method [2].Moreover, PINNs can break the curse of dimensionality when approximating functions in higher dimensions [3,4].Additionally, PINNs are a good candidate for addressing inverse problems due to the easy integration of measurement data [5].
To take advantage of these benefits, PINNs have been employed in various fields of engineering and science including geosciences [6], fluid mechanics [7,8], optics and electromagnetics [9][10][11], and industrial applications, e.g., fatigue prognosis of a wind turbine main bearing [12].Based on sensor data of a physical object, PINNs can be used in hybrid digital twins of civil engineering structures [13] and for critical infrastructure protection [14].Particularly in solid mechanics, PINNs have been developed for solving problems of linear elasticity, elastodynamics, elastoplasticity [15], and inverse problems for parameter identification [16].Rao et al. [17] propose PINNs in a mixed-variable formulation to solve elastodynamic problems inspired by hybrid finite element analysis [18].They introduce displacement and stress components as neural network output to enforce boundary conditions as hard constraints by deploying additional parallel networks.Also, it is claimed that a mixed-variable formulation enhances the accuracy and ease of training for the network.Samaniego et al. [19] utilize energy methods to develop PINNs for solving various examples in computational mechanics, i.e. elastodynamics, hyperelasticity and phase field modeling of fracture.Lu and colleagues develop physics-informed neural networks with hard constraints (hPINNs) to perform topology optimization [20].The authors enhance the loss formulation with the penalty method and the augmented Lagrangian method to enforce inequality constraints as hard constraints.Moreover, they deploy output transformation to enforce equality constraints explicitly for simple domains as introduced in the study of Lagaris et al. [21].In another study, Haghighat and colleagues utilize PINNs in the field of solid mechanics to tackle inverse problems and construct surrogate models [22].Their approach involves parallel networks based on the mixed-variable formulation for linear elasticity, and they expand their methodology to address nonlinear elastoplasticity problems including classical Karush-Kuhn-Tucker (KKT) type inequality constraints.They enforce KKT constraints as soft constraints via a sign function, which has discontinuous gradients.As an extension of their previous work on elastoplasticity, Haghighat et al. [15] deploy PINNs for constitutive model characterization and discovery through calibration by macroscopic mechanical testing on materials.As an alternative to the sign function, they adopt the Sigmoid function to enforce KKT constraints, since Sigmoid has well-defined gradients, but requires an additional hyperparameter.
As far as the authors are most aware, no previous work has been conducted on PINNs to solve contact mechanics problems.Here, we focus on the novel application of PINNs for contact mechanics for small deformation elasticity including benchmark examples, e.g.contact between an elastic block and a rigid flat surface, as well as the Hertzian contact problem.To enforce displacement and traction boundary conditions, we deploy PINNs with output transformation in the mixed-variable formulation inspired by the Hellinger-Reissner principle [23] in which displacement and stress fields are defined as network outputs.Additionally, contact problems involve a well-known set of Karush-Kuhn-Tucker type inequality and equality constraints sometimes also referred to as Hertz-Signorini-Moreau conditions in the contact mechanics community.We enforce this given set of equations as soft constraints via three different methods: sign-based method, Sigmoidbased method and a nonlinear complementarity problem (NCP) function, namely the Fischer-Burmeister function.NCP functions enable reformulating inequalities as a system of equations, and have proven particularly robust and efficient for the design of semi-smooth Newton methods in contact analysis [24][25][26][27].
To validate our PINN formulation for contact mechanics, two examples are investigated.The first example involves the contact between an elastic block and a rigid flat surface where all points in the possible contact area will actually be in contact.The second example is the famous Hertzian contact problem, where the actual contact area will be determined as part of the solution procedure.Furthermore, we illustrate four distinct PINN application cases for the Hertzian contact problem.In the first use case, we deploy the PINN as a pure forward solver to validate our approach by comparing results with a finite element simulation.PINNs can easily incorporate external data, such as measurements or simulations.In the second scenario, we therefore utilize displacement and stress fields obtained through FEM (in the sense of "virtual experiments") to enhance the accuracy of our PINN model.The third application is to deploy PINNs to solve inverse problems, particularly identifying the prescribed external load in the Hertzian contact problem based on FEM data.As a fourth and final example, the load (external pressure) is considered as another network input to construct a fast-to-evaluate surrogate model, which predicts displacement and stress fields for unseen pressure inputs.In the very active research field of physics-informed machine learning further advanced techniques, such as variational PINNs (VPINNs) [28,29] and integrated finite element neural networks (I-FENNs) [30], have been proposed recently.In particular, VPINNs as a Petrov-Galerkin scheme as compared to collocation in standard PINNs might be of interest for (nonsmooth) contact problems.However, these recent developments are beyond the scope of this first study and subject to future work.
The remainder of this article is structured as follows: "Problem formulation" section summarizes the fundamental equations and constraints of contact mechanics with small deformation elasticity.Also, the basics of the so-called mixed-variable formulation based on the Hellinger-Reissner principle are given.In "Physics-informed neural networks for solid and contact mechanics" section, a generalized formulation of PINNs with output transformation is outlined in detail, which in principle allows for the solution of arbitrary partial differential equations (PDEs).We narrow the field of interest down to solid and contact mechanics problems based on a mixed-variable formulation, and therefore different methods to enforce KKT constraints are explained.Several benchmark examples are analyzed in "Numerical examples" section, including the Lamé problem of elasticity, contact between an elastic block and a rigid domain, and the Hertzian contact problem."Conclusion" section concludes the paper by summarizing our key findings and providing an outlook on future research directions.

Contact mechanics
We consider a 2D contact problem between an elastic body and a fixed rigid obstacle as illustrated in Fig. 1.In the reference configuration, the elastic body is denoted by 0 , and in the current configuration, it is represented by t while the rigid obstacle has the Let us consider the boundary value problem (BVP) of small deformation elasticity where σ denotes the Chauchy stress tensor, u is the displacement vector representing the so-called primal variable, b denotes the body force vector, and n is the unit outward normal vector.Prescribed displacements are represented by û on ∂ u , and t denotes prescribed tractions on ∂ σ .Abbreviations BE, DBC and NBC denote the balance equation, Dirichlet boundary condition and Neumann boundary condition, respectively.The kinematic equation (KE) and constitutive equation (CE) for the deformable body are expressed as Here, ε is the infinitesimal strain tensor and C is the fourth-order elasticity tensor.In the specific case of linear isotropic elasticity, the constitutive equation can be expressed via Hooke's law as where λ and μ are the Lamé parameters, tr(•) is the trace operator to sum strain components on the main diagonal and I is the identity tensor.
The displacement vector u can be obtained for the elastic body by describing the motion from the reference configuration X to the current configuration x as follows (see Fig. 1a,  b) The gap function (GF) g n is defined as a distance measure between elastic and rigid bodies in the current configuration as The term x denotes the so-called closest point projection of x onto the surface of r (see Fig. 1b).Since all contact constraints will be defined in the current configuration, p n and t τ can be obtained by traction vector decomposition (TVD) of the contact traction vector t c as where Cauchy's stress theorem (CST) states that the stress tensor σ maps the normal vector to the traction vector t c .Note that the boundary vectors n and τ can be computed on the reference configuration assuming small deformation elasticity (linear elasticity) [31].
For a frictionless contact problem, we define the classical set of Karush-Kuhn-Tucker (KKT) conditions, commonly also referred to as Hertz-Signorini-Moreau (HSM) conditions, and the frictionless sliding condition (FSC) as Equation 11 enforces the kinematic aspect of non-penetration as shown in Fig. 1c.If two bodies are in contact, the gap vanishes, i.e., g n = 0.The term p n denotes the normal component of the contact traction, i.e., the contact pressure.Correspondingly, Eq. 12 guarantees that no adhesive stresses are allowed in the contact zone.Furthermore, the complementarity requirement in Eq. 13 necessitates that the gap should be zero when there is a non-zero contact pressure (point in contact), and the contact pressure should be zero if there is a positive gap (point not in contact).It should be noted that the tangential component of the traction vector vanishes for frictionless contact, resulting in Eq. 14.

Generic PINNs with output transformation
A generalized formulation for partial differential equations can be expressed in residual form with accompanying boundary conditions as Here, R[•] denotes a differential operator acting on a unknown solution u, B[•] is the boundary operator, g(x) represents the prescribed boundary condition, x are the spatial coordinates that span the domain and the boundary ∂ .Consider a fully-connected L-layers neural network to construct an approximated solution ũ to the BVP as follows [36,37] N L (z; θ) represents the network output in the output layer L, trainable network parameters, namely weights and biases, are denoted as θ, (.) represents a user-defined output transformation [20], and z is the network input such that x ⊂ z.Note that z can consist of spatial coordinates, time, and other additional input parameters.The network output is calculated using recursive L − 1 element-wise operations between the input layer and hidden layers as where ψ denotes the activation function that adds non-linearity to the layer output.
Output transformation enables the neural network to enforce boundary conditions explicitly.A user-defined output transformation can be obtained with suitable helper functions as where g is the prescribed boundary condition, s is a scaling parameter and h is a distanceto-boundary function fulfilling the following conditions: For simple boundaries ∂ , it is relatively easy to define an appropriate distance-toboundary function.On the other hand, it can become a quite challenging task in the case of arbitrary boundaries.One method to find a generalized distance-to-boundary function is to use NURBS parametrizations [38].Moreover, the scaling parameter can help the optimizer avoid getting stuck in local minima by balancing target governing equations (see "Domination of p n over g n " section).
To ensure that ũ is a reasonable approximation of u, the network parameters must be determined accordingly to the BVP.As shown in Fig. 3, the overall loss L(θ) consists of PDE losses L PDEs , boundary condition losses L BCs and experimental data losses L EXPs .Note that PDE derivatives are calculated using automatic differentiation (AD) [39].To minimize the overall loss, the optimization process continues until a prescribed tolerance is reached so that optimal network parameters θ * are calculated For each loss term, the inner loop sums up the mean squared error contributions of data points collected inside the domain or on the boundary.Specifically, {z i rp } N rp i rp =1 denotes the collocation points in the domain, {z i bp } N bp i bp =1 are the boundary points corresponding to the prescribed boundary conditions, and =1 represents the points on which measurement data u * is available.As the PDE residual R might have multiple terms and usually more than one boundary condition is defined, we use a lower index to explicitly point out that the inner summation is done for a given specific component, i.e.R i n =1 .=1 and {w i e } N e i e =1 denote the loss weights for the individual components of L PDEs , L BCs , L EXPs , respectively.We observe that the weighting of loss terms can be quite crucial for the convergence of the overall loss, since it avoids the optimizer expanding greater efforts on loss components that have a larger order of magnitude compared to others.It should be noted that the identical collocation points {z i rp } N rp i rp =1 are employed in the calculation of every component of L PDEs .However, the boundary and experimental points may vary in each L BCs and L EXPs contribution.

Application of PINNs to solid and contact mechanics
In the context of solid and contact mechanics problems, we use PINNs with output transformation in a mixed-variable formulation.In the mixed-variable formulation for quasi-static problems without additional network input parameters (i.e.z = x), a fullyconnected neural network (FNN) maps the given spatial coordinates x to the displacement vector u and stress tensor σ.In other words, the displacement and stress fields are chosen as the quantities of interest that the FNN approximates as (see Fig. 4) Combining the information provided in Fig. 3 for losses of general PINNs with the governing equations of solid mechanics, we obtain the total loss L E for linear elasticity (without contact) in the mixed-variable formulation with additional experimental data as where Here, terms L DBCs and L NBCs denote losses for Dirichlet and Neumann BCs, respectively, and the L EXPs term represents losses due to additional experimental data.In the mixedvariable formulation, the L PDEs term is constructed in a composite form to fulfill both the balance equation (BE) and the stress-to-stress coupling (SS) as depicted in Fig. 2. The index N m is used to distinguish the loss weights related to BE and SS.Since stress components are directly defined as network outputs, traction or Neumann BCs can be imposed as hard constraints using output transformation.Moreover, it is sufficient to calculate first-order derivatives of the neural network outputs with respect to the inputs, since the governing equations in the mixed-variable formulation contain only first-order derivatives.An alternative to the mixed-variable formulation is the classical displacementbased formulation in which only u is considered as the network output.However, such an approach requires second-order derivatives for evaluating the balance equation, and traction BCs can not be enforced as hard constraints [17,40].
Next, we construct the composite (total) loss function L C for linear elasticity including contact as follows: The first additional loss term enforces the frictionless sliding condition in the contact zone ∂ c (see Eqs. 9 and 10).For simplicity, we denote the mean squared error (MSE) as | • |, which can be calculated as The second additional term L KKT will be elaborated in the next section.While the various ways of evaluating the normal gap g n are a matter of intense discussions, especially within discretization schemes such as the finite element method [34,41], a very simple gap calculation is sufficient here due to the fact that we only consider contact problems between an elastic body and a rigid flat surface.The normal gap is consistently expressed by evaluating the orthogonal projection of the elastic body onto the rigid flat surface.

Enforcing the Karush-Kuhn-Tucker inequality constraints
There are several methods available to enforce inequality conditions in general.The direct approach is to formulate loss functions of inequalities and impose them as soft constraints with fixed loss weights [15,22].However, setting large loss weights can cause an ill-conditioned problem [20].On the other hand, when small loss weights are chosen, the estimated solution may violate the inequalities.To tackle this problem, authors in [20] suggest penalty and augmented Lagrangian methods, well-known from constrained optimization, which construct loss formulations with adaptive loss weights.In the following, we investigate three methods to enforce KKT conditions of normal contact problems based on soft constraints.

Sign-based method
One possible way to enforce KKT conditions is to use the sign function [22], which leads to Here, {w i=1 represent the loss weights on the corresponding KKT condition.Figure 5 illustrates that 1  2 1 − sign(g n ) gn contributes to the loss component L g n 0 when the gap g n is less than zero.On the other hand, 1  2 1 + sign(p n ) pn contributes to L p n 0 if the contact pressure p n is positive.Note that the sign function has gradient jumps, which is typically not a desired feature in the context of optimization.

Sigmoid-based method
An alternative approach to circumvent discontinuous gradients is to use the Sigmoid function [15] to obtain L KKT as where δ is the steepness parameter that controls the transition between zero and nonzero loss contributions.As depicted in Fig. 6, the Sigmoid function avoids gradient jumps through exponential regularization term.However, when δ is chosen too small, e.g.δ = 1, then significant unphysical loss function values are obtained for gn > 0 and pn < 0. On the other hand, setting δ too large recovers the sign-based implementation.Therefore, a parameter study must be conducted to find the optimal parameter δ opt .

A nonlinear complementarity problem function: Fischer-Burmeister
Nonlinear complementarity problem (NCP) functions are developed based on reformulating inequalities as equalities [42].One popular choice of NCP function is the Fischer-Burmeister function [43] expressed as By setting a = gn and b = −p n in the Fischer-Burmeister function, we obtain L KKT as follows The Fischer-Burmeister function is a particularly suitable choice for typical loss calculations based on the mean squared error (MSE), since (φ FB ) 2 is continuously differentiable also at a = b = 0 as reported [44].As shown in Fig. 7, the largest loss contribution comes from section IV in which both excessive penetrations, i.e. g n 0, and large adhesive stresses, i.e. p n 0, are present.We refer to [42][43][44][45] for more details about the Fischer-Burmeister function.Note also that having fewer loss terms generally eases the optimization process as well as parameter tuning, and in contrast to the previous variants in "Sign-based method" and "Sigmoid-based method" sections only one single loss weight is required in the case of the Fischer-Burmeister NCP function.

Algorithmic challenges Domination of p n over g n
The Fischer-Burmeister NCP function reduces a set of inequality constraints into a single equation.However, it might cause domination of one term over another.As explained in the previous sections, g n is derived from the displacement field u and p n is derived from the Cauchy stress field σ.Depending on the chosen problem parameters, e.g., stiffness, the estimated quantities u and σ might have different scales, and then so do g n and p n .As illustrated in Fig. 8, when g n and p n have similar scales, there is no domination.But increasing p n causes domination of p n over g n .In terms of optimization, the neural network will then expand more effort into minimizing the large-scale quantity p n , which might cause unacceptable violations of constraints related to g n .To tackle this problem, non-dimensionalization techniques can be used to ensure that p n and g n have similar scales [46].

Importance of output scaling
Output scaling is a functionality of output transformation that can prevent the optimizer to get stuck in local minima.In the context of solid and contact mechanics, the PINNs estimate the displacement field u and the stress field σ as neural network outputs.Assuming that no transfer learning is used, the first estimations of outputs are done randomly due to random initialization of the neural network parameters.Also, there are well-known and popular methods that generate initial network estimations following a normal distribution with a mean of zero and a standard deviation of one [47], e.g., Glorot initialization.However, using such initialization methods could cause convergence issues because of the output quantities having similar magnitudes.This issue can be explained through the example of the SS loss term (see Eq. 23) Minimization of L SS requires the condition σ = C : ε to hold.In case very large values of the material properties are chosen, e.g.Young's modulus, the term C : ε will initially dominate σ, which means a large loss contribution has to be handled by the optimizer.Therefore, to minimize the loss, either a significant increase in σ or a significant decrease in u is required.Such large increments in the optimization procedure are troublesome, as the gradient of the employed tanh() activation function tends to zero for large function arguments, which is also referred to as the vanishing gradient problem [48].To ease optimization, the network output u can be scaled by the inverse of the Young's modulus, i.e. by 1 E .This ensures that the initial magnitudes of both terms in the SS condition are comparable, which summarizes the benefits of output scaling in a nutshell.

Numerical examples
In the following, we investigate three numerical examples.The first example is the wellknown Lamé problem of elasticity, which is considered as a preliminary test without contact to verify that our PINN framework works as expected, including in particular the hard enforcement of DBC and NBC with output transformation.Afterward, our investigation focuses on examining two contact examples: a contact problem between a simple square block and a rigid flat surface, and the Hertzian contact problem.The main difference between the two contact examples is the fact that the actual contact area has to be identified by the PINN in the Hertzian example, while the potential and actual contact areas are the same in the case of the square block and the rigid flat surface.Note that 2D plane strain conditions are considered throughout the entire section and body forces are neglected.
All numerical examples have the following common settings.The PINN maps spatial coordinates x = (x, y) as inputs to transformed mixed-form outputs ( ũ, σ) = (ũ x , ũy , σxx , σyy , σxy ).Networks are initialized using the Glorot uniform initializer, and the tanh function is chosen as activation function.Models are first trained using the stochastic gradient descent optimizer Adam [49] with a learning rate, lr = 0.001, for 2000 epochs, and then we switch to the limited memory BFGS algorithm including box constraints (L-BFGS-B) [50] until one of the stopping criteria is met [51,52].Our workflow is developed based on the DeepXDE package [53] and we refer to the DeepXDE documentation for default L-BFGS-B options.Note that training points are generated by GMSH, since it has strong capabilities in mesh generation and visualization, and provides boundary normals at arbitrary query points [54].
As a common error metric, we report the vector-based relative L 2 errors for displacement E u L 2 and stress fields E σ L 2 as follows Here, ũi and σi denote PINN solutions and u i and σ i denote reference solutions that are obtained analytically or numerically.Also, the index j = (1, . . ., N test ) runs over the test points that are generated using structured meshes.We refer to Appendix 3 for an error comparison between vector-based and integral-based error measurements.

Lamé problem of elasticity
In the first example, we study a benchmark example without contact, namely the wellknown Lamé problem of a cylinder, subjected to an internal pressure p (see Fig. 9a).Since the problem is geometrically axisymmetric and the internal pressure is applied to the entire inner boundary, only a quarter of the annulus is considered.The analytical solution for the stress and displacement field can be derived in polar coordinates {r, α} as [55] Fig. 9 The Lamé problem of elasticity.a The full problem under internal pressure, and b the equivalent problem due to the axisymmetrical nature of both geometry and loading, and accompanying boundary conditions Here, R i and R o are the inner and outer radius of the annulus, respectively, p represents the internal pressure applied on the inner radius, E is the Young's modulus and ν is the Poisson's ratio.For our specific setup, we set R i = 1, R o = 2, p = 1, E = 2000, and ν = 0.3.Note that our formulation is based on Cartesian coordinates.Thus, polar transformation is performed to compare results.
To solve the Lamé problem, we deploy a PINN in the mixed variable formulation (see Eq. 39).The following output transformation is applied to enforce displacement and traction BCs as hard constraints on the edges numbered 1 and 3: As can be seen in Fig. 9, the constructed output transformation fulfills the displacement boundary conditions at x = 0 and y = 0. Also, the displacement outputs are scaled by 1/E to ease the optimization process (see "Importance of output scaling" section).Traction boundary conditions are enforced as hard constraints for edges numbered 1 and 3, which lets us represent zero shear stresses there.Since traction boundary conditions on edges 2 and 4 contain a coupling of normal and shear stresses, we enforce them as soft constraints.
The employed PINN is a fully connected neural network consisting of 3 hidden layers of 50 neurons each as indicated in Table 1.We train the network using 330 training points, of which 262 are located within the domain and the remaining 68 points are on the boundary.We refer to the introductory paragraph of Numerical examples" section for further settings of both network and optimizer.
Figure 10a, b show a comparison of the normalized stress and displacement solutions in radial direction.Relative L 2 errors for displacements and stresses are calculated on test points as 0.017% and 0.043%, respectively.While the network is trained with Adam, the convergence rate decreases along with epochs as shown in Fig. 10c.Applying L-BFGS-B just after Adam increases the convergence rates and leads to a further significant reduction of PDE and NBC losses.The average MSE for the PDE loss reaches approximately  1.06e−7, while the average MSE for the NBC loss is approximately 1.48e−8 when all stopping criteria are met.We observe that deploying Adam and L-BFGS-B optimizers in a sequential order is one of the key points to obtain a good accuracy since Adam avoids rapid convergence to a local minimum, which has also been mentioned in [52].Using a standard multi-core workstation as hardware, training takes 27.15 s and prediction takes 0.001 s.An additional version of the Lamé example is generated to demonstrate that PINNs with output transformation (see Eq. 18) can effectively tackle problems with large-scale parameters, e.g.E = 210 × 10 9 , ν = 0.25 and p = 1.5 × 10 5 .As provided in Table 2, relative L 2 errors for displacements and stresses are computed on test points as 0.023% and 0.048%.These results are very similar to the Lamé example with small-scale parameters since appropriate scaling constants are chosen, which dispels the common misconception that PINNs cannot handle very different magnitudes of the parameters involved.Thus, problems characterized by varying scales including those introduced by nonlinear contact terms can be effectively addressed through PINNs with proper output scaling.We refer to Appendix 2 for a comparison of the normalized stress and displacement solutions in radial direction, and the evolution of the cumulative PDE loss and NBC loss.

Contact between an elastic block and a rigid surface
The second example is a contact problem between a linear-elastic block and a rigid flat surface as depicted in Fig. 11a.The elastic block is subjected to an external pressure on its top surface and constrained in the horizontal direction on its left surface.The analytical solution [56] can easily be derived as follows and For our specific setup, we set the material parameters as E = 1.33, ν = 0.33, the edge length of the square block as l = 1 and the pressure as p = 0.1.Similar as before, we apply the following output transformation to enforce displacement and traction boundary conditions on the edges numbered 1, 2 and 3 as These output transformations can easily be derived based on Fig. 11b.For instance, the normal stress σ yy in the loading direction is equal to −p at y = l.Thus, we choose g(x) = −p, and h(x) = (l − y) so that the requirements given in Eq. 19 are fulfilled.On the other hand, the contact constraints at the bottom edge are enforced as soft constraints using the three different methods that have been explained earlier in "Enforcing the Karush-Kuhn-Tucker inequality constraints" section: the sign-based method, the Sigmoid-based method and the Fischer-Burmeister NCP function.For the Sigmoid-based method, we choose δ g n = 10 and δ p n = 100.For training, 514 points are used (434 points lie within the domain and 80 points lie on the boundary), and 11,827 points are used for testing.
As illustrated in Fig. 12, all of the investigated methods correctly capture that u y is linearly distributed and close to zero at the bottom due to the soft enforcement of contact constraints.The maximum absolute error for the displacement component u y , denoted by E u y abs,max , is larger for the sign-based formulation compared to the two other methods (see Table 3).Meanwhile, the normal stress component, σ yy , is close to − 0.  Table 3 The structure of hidden layers, performance measurements, and errors for the contact example between an elastic block and a rigid surface

Hertzian contact problem
In this example, we consider a long linear elastic half-cylinder (E = 200, ν = 0.3) lying on a rigid flat surface and being subjected to a uniform pressure p = 0.5 on its top surface as shown in Fig. 13a.The analytical solution for the contact pressure p c is given as [57,58] Here, b is the width of the contact zone, and R is the radius of the cylinder.For the chosen set of parameters (p = 0.5, E = 200, ν = 0.3, R = 1), b can be calculated as 0.076.An analytical solution is available only for the contact pressure.Reference solutions for displacement and stress fields within the half-cylinder are obtained with well-established FEM algorithms.The FEM simulations are performed with our in-house multi-physics research code BACI [59].The remaining loss weights are set as 1.See Appendix 1 for a detailed explanation of loss weights The following output transformation is applied to enforce displacement and traction BCs as hard constraints on the edges numbered 1 and 2 (see Fig. 13b) As for the Lamé problem, we scale the displacement field with the inverse of the Young's modulus 1/E.Additionally, only one non-zero helper function g(x) = −p is required for σyy .The traction BC on edge 3 and the contact constraints on edge 4 are enforced as soft constraints.To define the potential contact area, we set α = 15 • (corresponding to b = 0.259), which extends well beyond the actual contact area.Moreover, KKT constraints are enforced using the Fischer-Burmeister method.
In the following, we investigate four distinct PINN application cases for the Hertzian contact problem: Case 1: PINNs as pure forward model/PDE solver, Case 2: PINNs as dataenhanced forward model, Case 3: PINNs as inverse solver for parameter identification, and Case 4: PINNs as a fast-to-evaluate surrogate model.We refer to Table 4 for information on network architecture, and training and test points.To facilitate reproduction of the results, the table also reports the weights of individual loss terms.

Case 1: PINNs as pure forward model/PDE solver
In the first use case, we deploy PINNs as a pure forward solver for contact problems to validate our approach.Training takes a total of 4049.8 s and the prediction time is 0.091 s.Note that FE simulation of the same test case requires 19.2 s, a considerably longer duration compared to the prediction time of a trained PINN since the most time-consuming aspect of PINNs is the training process.Displacement and stress components obtained through PINN and FEM are compared with contour plots in Fig. 14.Errors for displacement and stress fields are quantified using the relative L 2 norm.As summarized in Table 5, we obtain a relative error E u L 2 = 2.24% for the displacement field and a relative error E σ L 2 = 3.74% for the stress field.Furthermore, the contact pressure distributions obtained via analytical solution, PINN and FEM are compared in Fig. 15.Since the zero traction boundary condition and the KKT constraints on the curved surface, numbered as edge 3 in Fig. 13, are not enforced as hard but soft constraints, the PINN result can only resolve the kink at x = ±0.076 in an approximate manner depending on the chosen set of training points.Consequently, the contact pressure p c reduces to zero smoothly and slightly violates the zero traction boundary condition in the non-contact zone.Accordingly, the rather large error values E σ L 2 are mostly related to this violation of the zero traction boundary conditions and KKT constraints close to the kink.Readers familiar with FEM modeling of contact problems will notice that a quite similar phenomenon occurs for mesh-based numerical methods where the transition between contact and non-contact zones cannot be perfectly resolved either and might even cause spurious oscillations in the case of higher-order interpolation functions [24,60,61].The term c pc denotes an integral of the predicted contact pressure pc over the potential contact boundary

Case 2: PINNs as data-enhanced forward model
One of the key features of PINNs is the capability of easily incorporating external data, such as measurement or simulation data, into the overall loss function.In this section, we enhance our PINN model with "artificial" measurement data obtained through FEM simulations, namely, data points for displacement and stress fields, to achieve better accuracy.The incorporated FEM data points are randomly selected, with 100 being selected within the domain and 100 being chosen along the boundary as depicted in Fig. 16b.A comparison of the contact pressure p c in the case of data enhancement with analytical solution and FEM reference solution is given in Fig. 16a.While the PINN accuracy is significantly improved upon close to the kink at x = ±0.076, the data-enhanced model underestimates the normal contact pressure around the origin.The relative L 2 errors confirm this assessment.While E σ L 2 is only slightly reduced, E u L 2 benefits dramatically from data enhancement.As provided in Table 5, the integrated contact pressure is close to the applied load, p = 0.5, due to the conservation of momentum.To fulfill the momentum equation, overshooting after the kink is balanced by undershooting around the origin.Similarly, in case 1, overshooting after the kink is balanced by the undershooting from around x = 0.04 to the kink.Moreover, we observe that results significantly rely on selecting appropriate additional data loss weights w EXPs (reported in Table 4).So far we identified the parameters through manual adjustments.In general, we recommend a hyperparameter analysis to determine optimal loss weights.Additionally, the data-enhanced model requires more training time compared to the pure forward model, which can be explained by the need to evaluate additional loss terms.However, after training is finished, the more accurate prediction takes essentially the same time as in case 1.

Case 3: PINNs as inverse solver for parameter identification
An interesting approach to solve an inverse problem is to simply add the unknown parameter to the set of network trainable parameters θ, and it can then be identified with the help of additional loss terms based on the difference between predictions and observations (see Eq. 37).In the following, we exemplarily identify the applied external pressure p acting on the half-cylinder using FEM results as "artificial" measurement data.* = arg min L C ( ) where, = (p, θ).(37) Figure 17 shows the convergence behavior of the identified pressure p compared to the actual pressure p through the number of epochs for different initial guesses po .First, we start training with a "good" initial guess po = 0.1 being quite close to the actual pressure p = 0.5, and then we increase it to po = 20 to measure how sensitive the PINN is to the choice of the initial guess.As depicted in Fig. 17c, convergence can be achieved even when a relatively large and therefore unphysical initial guess is made.There is a steep increase in convergence rate after 2000 epochs, which can be explained by the fact that we switch to the L-BFGS-B optimizer there.As mentioned in "Lamé problem of elasticity" section, we deploy Adam for 2000 epochs to avoid the optimization process getting stuck in local minima.Note, however, that switching from Adam to L-BFGS-B introduces oscillations in the transition region so that transition has to be handled with care.The relative error, denoted as E, is used to measure the difference between the identified and actual pressure values, resulting in the same value of 1.2% for all three initial guesses p o = 0.1, p o = 5, and p o = 20 as provided in Table 6.Additionally, a larger initial guess requires more computing time and epochs as expected.

Case 4: PINNs as fast-to-evaluate surrogate model
In the last use case, the load (applied external pressure) is considered as an additional network input, i.e. z = (x, p) (compare Eq. 16), to construct a fast-to-evaluate surrogate model that is capable of predicting displacement and stress fields for different pressure values.The relative error is defined as E * = abs (( * − * )/ * ) Fig. 18 An illustration of the procedure to include the applied external pressure as a neural network input Since a single network is deployed, the length of each network input must be the same.We sample the three-dimensional input space with N = 1946 training points (N = N rp + N bp (Eq.23)).While the spatial coordinates x are selected from a two-dimensional mesh (as in "Case 1: PINNs as pure forward model/PDE solver", "Case 2: PINNs as data-enhanced forward model", and "Case 3: PINNs as inverse solver for parameter identification" sections), the third (pressure) component of the input vector is drawn randomly from a uniform distribution over the considered pressure range.To improve the accuracy of the prediction, the N sampling points are repeated k times.In the context of network training, we refer to one instance of the N distinct sampling points as one chunk, so that k is the number of chunks as depicted in Fig. 18.Indeed, this process increases the computing time and complexity of the model, since the input size increases from (n, 2) to (n • k, 3).However, such a method can lead to better accuracy since the network is trained with a larger data set and also it enables batch training to reduce computational effort [47].For our specific example, we sample pressure values from a range of [0.2, 1.0] and we consider only two different numbers of chunks, namely k = 1 and k = 5.
As shown in Fig. 19, employing a single chunk is insufficient to accurately capture the influence of the applied external pressure p on the contact pressure distribution p c (x, p).However, increasing the number of chunks from k = 1 to k = 5 increases the accuracy.Table 7 provides the relative L 2 errors for the contact pressure distribution p c (x, p) between the analytical solution and the predictions of surrogate PINN models, and it can be seen that increasing the number of chunks indeed results in improved accuracy of the surrogate model.Nonetheless, the overall error level of 10-16% even in the case k = 5 is still too high from an engineering perspective and will be subject to further investigations.This example is only intended as a very first proof of concept, and we have already identified several algorithmic modifications that could possibly increase the accuracy of the surrogate model.For example, we use fixed loss weights even though pressure values in the input layer vary.Thus, adaptive loss weights should be implemented since the PINN

Conclusion
In this study, we have presented an extension of physics-informed neural networks (PINNs) for solving forward and inverse problems of contact mechanics under the assumption of linear elasticity.The framework has been tested on several benchmark examples with different use cases, e.g. the Hertzian contact problem, and has been validated by existing analytical solutions or numerical simulations using the finite element method (FEM).
As an alternative way of soft constraint enforcement as compared to existing methods, a nonlinear complementarity problem (NCP) function, namely Fischer-Burmeister, is explored and exploited to enforce the inequality constraints inherent to contact problems.This aspect has not been investigated in the context of PINNs so far to the best of the authors' knowledge.Besides using PINNs as pure forward PDE solver, we show that PINNs can serve as a hybrid model enhanced by experimental and/or simulation data to identify unknown parameters of contact problems, e.g. the applied external pressure.We even go one step further and deploy PINNs as fast-to-evaluate surrogate models, and could at least obtain a first proof of concept up to a certain level of accuracy.
A question that has emerged recently is whether data-driven approaches such as PINNs will replace classical numerical methods such as FEM in the near future.Within this study, we only considered benchmark examples that have been developed and solved decades ago using the FEM.Even for these simple examples, we came to conclusion that deploying PINNs as forward solvers for contact mechanics can not compete with FEM in terms of computational performance and accuracy.Therefore, we doubt the applicability of PINNs to complex engineering problems without data enhancement.However, PINNs can be a good candidate for solving data-enhanced forward problems and especially inverse problems due to the easy integration of additional data.Similarly, PINNs can break the curse of dimensionality of parametric models, so that more complex surrogate models can be generated.Also, it is observed that minimizing multiple loss functions simultaneously is one of the most significant challenges in training PINNs, and current optimization algorithms are not tailored to addressing this challenge.Therefore, using multi-objective optimization algorithms that are particularly designed for PINNs has the potential to be a gamechanger in improving their overall performance and accuracy.As an alternative to multi-objective optimization algorithms, it is observed that a proper scaling strategy via output transformation can be applied to ease the optimization.We believe that hybrid strategies can be a promising option to construct mixed models to benefit from the advantages of both classical and data-driven approaches.
This study reveals several possibilities for further exploration and investigation.Although the proposed PINN formulation for benchmark examples demonstrates acceptable results, further applications, particularly on complex domains including threedimensional problems should be analyzed.As an alternative strategy to scaling network outputs, a non-dimensionalized contact formulation can be implemented.Different NCP functions other than the Fischer-Burmeister function can be further investigated.The inverse solver has been applied to identify the applied external pressure, but it can be extended to also predict internal material parameters.Additionally, a hyperparameter optimization study can be performed to tune loss weights, network architecture and optimizer parameters.Last but not least, related techniques such as variational PINNs might overcome the limitations of collocation inherent to PINNs and instead provide a sound variational framework.

Appendix 1: Loss formulation for 2D linear isotropic elasticity under strain conditions
The elasticity tensor C under plane strain conditions can be expressed in terms of Lamé constants λ and μ as Inserting Eq. 38 into Eq.23 we can obtain the total loss for 2D linear isotropic elasticity under the plane strain condition as ( The out-of-plane stress component σ zz is not considered as network output since it can be calculated in the post-processing.

Appendix 2: Results for the Lamé problem of elasticity with large-scale parameters
See Fig. 20.

Appendix 3: Additional error comparisons
The vector-based L 2 error between the approximated PINN solution f and the analytical solution f , is denoted as,    The corresponding integral-based L 2 error between the approximated PINN solution f and the analytical solution f , is denoted as, where dx represents the area for Tables 8 and 9, while it represents the arc length for Table 10, since the contact pressure p c is integrated over the potential contact boundary ∂ c .We refer to "Lamé problem of elasticity", "Contact between an elastic block and a rigid surface", "Hertzian contact problem" sections for the respective number of test points N test .Note that in this first study, we used the vector-based error measure that is frequently used for PINNs [20,46] and easily implemented.For future studies, we suggest more expressive integral-based error estimates.

Fig. 1
Fig. 1 Contact problem between an elastic body and a rigid obstacle.a Reference configuration, b current configuration, c accompanying boundary conditions, illustration of the gap g n , tangential traction t τ and contact pressure p n

Fig. 2
Fig. 2 Tonti's diagram of Hellinger-Reissner (HR) principle for contact problems with small strain theory and frictionless sliding condition

Fig. 3
Fig.3The general representation of a physics-informed neural network for a BVP

Fig. 4
Fig. 4 Physics-informed neural networks in the mixed-variable form to solve quasi-static solid and contact mechanics problems without additional network parameters

Fig. 5 Fig. 6
Fig. 5 An illustration of the sign-based function depending on gap g n and contact pressure p n

Fig. 7
Fig. 7 The Fischer-Burmeister NCP function depending on gap g n and contact pressure p n as a 3D plot (a) and as a 2D contour plot (b)

Fig. 8
Fig. 8 Domination of p n over g n : a no domination, b intermediate domination and c strong domination

Fig. 10
Fig. 10 Results for the Lamé problem of linear elasticity.a Comparison of normalized stresses obtained from the analytical solution and the predicted values, b comparison of normalized displacements, and c evolution of the MSE for the cumulative PDE loss and NBC loss

Fig. 11
Fig. 11 Contact problem between an elastic block and a rigid flat surface, a geometry, supports and loading, and b an equivalent system including all relevant boundary conditions 1 and the shear stress component σ xy is close to zero.Since the traction boundary condition in y-direction on the top surface and the shear stress boundary conditions are enforced as hard constraints, absolute errors in the corresponding regions are zero for all cases.The sign-based formulation also performs worst in terms of the errors E σ yy abs,max and E σ xy abs,max .Moreover, L 2 relative errors show that the Fischer-Burmeister NCP function performs

Fig. 12
Fig. 12 Predictions of the displacement component u y , stresses σ yy and σ xy with corresponding absolute errors E uy abs , E σyy abs , E σxy abs .The contact constraints L KKT are enforced via three different methods: a sign-based method, b Sigmoid-based method and c the Fischer-Burmeister NCP function.Absolute error is defined as E * abs = abs( * − * )

5
Hidden layers Training time (s) Prediction time (s) E u L2 (%) E σ L2 (Three different methods to enforce the KKT constraints are compared: (a) sign-based, (b) Sigmoid-based and (c) Fischer-Burmeister best, i.e. with errors being up to one order of magnitude smaller than for the sign-based and Sigmoid-based variants.Additionally, it is observed that all investigated methods require similar computing time for training and prediction.Overall, it can be concluded that the Fischer-Burmeister NCP function yields the best results in terms of accuracy and computing time.

Fig. 13
Fig.13 The Hertzian contact problem between an elastic half-cylinder and a rigid flat surface a domain under uniform pressure on top and making use of symmetry, b accompanying boundary conditions

Fig. 14 Fig. 15
Fig. 14 PINNs as pure forward solver for the Hertzian contact problem.Comparison of stress and displacement components obtained by PINN and FEM

Fig. 16
Fig. 16 Comparison of contact pressure distributions p c (x) obtained by analytical solution, PINN and FEM for case 2. a PINNs as a data-enhanced forward solver, b distribution of data points generated through FEM simulations in the domain and on the boundary

Fig. 17
Fig. 17 Identification of the applied external pressure on the half-cylinder in case of different initial guesses a p o = 0.1, b p o = 5, c p o = 20

Fig. 19
Fig. 19 Comparison of the contact pressure distribution obtained through a PINN-based surrogate model to the analytical solution.The PINNs are trained using different numbers of chunks

Table 1
The structure of hidden layers, number of training and test points, performance measurements, and errors for the Lamé problem of linear elasticity

Table 2
Additional version of the Lamé problem with a challenging parameter set, selected scaling constants and computed L 2 errors

Table 4
The structure of hidden layers, number of input neurons, training and test points, and loss weights are given for different cases of the Hertzian contact problem

Table 5
Comparison of relative L 2 errors, training and prediction time for case 1 and case 2

Table 6
Comparison of relative errors, training time and number of epochs for the inverse Hertzian problem used to identify the applied external pressure

Table 7
Comparison of relative L 2 errors for the pressure distribution p c (x, p) between the analytical solution and the predictions of our surrogate PINN models.Predictions are based on unseen pressure values Predictions are based on unseen pressure values

Table 8
Comparison of the vector-based and integral-based L 2 error for the Lamé problem of elasticityThe evaluated quantities are displacement and stress components in polar coordinates (see "Lamé problem of elasticity" section for the example setup)

Table 9
Comparison of the vector-based and integral-based L 2 error for the contact problem between an elastic block and the rigid domain

Table 10
Comparison of the vector-based and integral-based L 2 for the contact pressure p c of the Hertzian contact problem for different cases (see "Case 1: PINNs as pure forward model/PDE solver" and "Case 2: PINNs as data-enhanced forward model" sections for the example setup)