Parasitic resistance as a predictor of faulty anodes in electro galvanizing: a comparison of machine learning, physical and hybrid models

In industrial electro galvanizing lines aged anodes deteriorate zinc coating distribution over the strip width, leading to an increase in electricity and zinc cost. We introduce a data-driven approach in predictive maintenance of anodes to replace the cost- and labor-intensive manual inspection, which is still common for this task. The approach is based on parasitic resistance as an indicator of anode condition which might be aged or mis-installed. The parasitic resistance is indirectly observable via the voltage difference between the measured and baseline (theoretical) voltage for healthy anode. Here we calculate the baseline voltage by means of two approaches: (1) a physical model based on electrical and electrochemical laws, and (2) advanced machine learning techniques including boosting and bagging regression. The data was collected on one exemplary rectifier unit equipped with two anodes being studied for a total period of two years. The dataset consists of one target variable (rectifier voltage) and nine predictive variables used in the models, observing electrical current, electrolyte, and steel strip characteristics. For predictive modelling, we used Random Forest, Partial Least Squares and AdaBoost Regression. The model training was conducted on intervals where the anodes were in good condition and validated on other segments which served as a proof of concept that bad anode conditions can be identified using the parasitic resistance predicted by our models. Our results show a RMSE of 0.24 V for baseline rectifier voltage with a mean ± standard deviation of 11.32 ± 2.53 V for the best model on the validation set. The best-performing model is a hybrid version of a Random Forest which incorporates meta-variables computed from the physical model. We found that a large predicted parasitic resistance coincides well with the results of the manual inspection. The results of this work will be implemented in online monitoring of anode conditions to reduce operational cost at a production site.


Introduction
Electro galvanizing is a well proven technology for producing corrosion-protected steel sheets with excellent surface quality, weldability and forming behavior [1]. Sometimes electro galvanizing is the only way to coat advanced high strength steels because it is a low temperature process and the mechanical properties of the steel substrate remain mostly unaffected. Electro galvanizing has some unique features like one-sided coating and low coating thicknesses (less than 3 µm) [2]. The anode's quality has a huge impact on the zinc thickness distribution and furthermore on the overall energy consumption. Therefore, the anodes underlie strict maintenance and have to be changed if degraded. Unfortunately, there is no method for online detection of anode quality, which can monitor critical anode properties to foresee the end of the anode's lifetime. But with the help of machine learning models the condition of anodes used in electro galvanizing lines can be monitored during their lifetime.
In steel manufacturing, the enormous progress of available machine learning techniques together with the remarkable increase of available processing power and memory at affordable price levels has spurred a significant number of applications, ranging from blast furnace molten iron quality prediction [3], blast furnace stock line detection [4], continuous casting sticker detection [5], coating weight control [6], and prediction of the mechanical properties of galvanized steel coils [7]. In particular, if the development of purely physics-driven models is too complicated or time-consuming, machine learning techniques are a viable alternative. However, despite the aforementioned achievements, steel manufacturing and heavy industry applications in general still pose significant challenges for the appliance of machine learning techniques. Typically, the amount of available labelled high-quality data is quite low. In such scenarios, it can be advantageous to use so-called physics-guided approaches as a kind of hybrid modelling technique.
Hybrid modelling has gained a lot of popularity recently as a proposed answer to the limitations of either purely data-driven or purely physics-driven model building. In hybrid modeling, the two approaches are systematically combined. There exists a variety of options how to combine these models, and we want to highlight some of the most prominent examples here. In a traditional Data Science setting, the domain knowledge (e.g., physical laws) is being exploited to shape the features which can be fed to datadriven models [8,9]. There are also examples of using domain-specific knowledge for feature selection, such as the selection of top-relevant frequencies to detect misfires in engines [10]. Another example from the same domain demonstrates another strategy, where a physics-based simulation model is applied in order to generate a dataset, which in turn is used as training data for data-driven machine learning models [11]. Alternatively, a physics-driven and a data-driven model can also be developed independently, with their results being computed in parallel and later combined to form a single result [12]. The term sequential hybrid models is used to describe settings, where one of the model's outputs serves as input to the respective other type of model. This can mean that a physically inspired model, e.g. based on differential equations and realworld measurements, serves as input to a data-driven model, which then learns how to combine the physics-based results and the features [13].
Finally, an efficient data-driven model may be built to serve as an approximation to a complex physics-based model, in order to achieve a speed-up in the computations [14].
The latter setting is beneficial if the physics-based model provides sufficiently good results, but at the price of high computational complexity, which limits its applicability. The data-driven approximation provides a rough estimate of the results, but with the advantage of a much lower computational complexity and may guide the exploration process. One domain, where this is of particular importance, is material science, where due to the size of the search space, data-driven machine learning models are used to identify the most promising regions of the input space, which are then further analyzed using the more exact knowledge-driven model, leading to considerable speed-ups [15,16]. Physics-based knowledge has also been exploited to inform the design of machine learning algorithms and architectures, going back to the inception of Convolutional Neural Networks, which exploit spatial neighborhood to achieve a sparse parameter space. Recently, more complex physics-derived constraints have been integrated directly into the network architecture, for example in the form of physics-informed convolutions [17]. This approach of constraining a machine learning algorithm is not limited to deep learning architectures, but can also be successfully applied to simpler models, like linear systems [18].
Finally, also the output stage of a machine learning algorithm can be adapted for hybrid-modelling [19]. For example, by adapting the loss function to replace the classical supervised scenario of labelled training examples with constraints derived from physical laws [20]. Karpatne et al. [21] proposed a rigorous scientific framework combining those different approaches, which they coined theory-guided data science. Their motivation is two-fold: first they attempt to increase the interpretability of data-driven, black-box models and, second, they intend to constrain data-driven models. The latter is aimed at improving the generalization ability of data-driven methods and to prevent physically inconsistent models. They outline procedures, how different parts of the model building process can be enhanced via theory-guided design, ranging from choosing appropriate link functions to selecting appropriate regularization functions. Subsequently, these methods have been applied specifically on deep neural networks [22]. Complementary approaches utilize physics-based theories to aid the interpretability of black-box models. As an example, Lei et al. [23] studied Convolutional Neural Networks by utilizing quantum mechanics, energy models and thermodynamic entropy to gain a deeper understanding of the intrinsic functionality of deep neural networks.

Our aim and contributions
To the best of our knowledge, this paper is the first to present a simplified physical model of the rectifier voltage in a large-scale electro galvanizing unit. This report is also the first to implement the use of the physical model for labelling of training and testing areas in an electro galvanization production line and employ sophisticated machine learning methods to predict the baseline rectifier voltage. We claim that aged or mis-installed anodes can be identified based on the presence of parasitic resistances, which are indirectly represented by a difference between theoretical (baseline) and measured voltage (parasitic voltage). In order to do that, the theoretical voltage was calculated based on a simplified physical model and alternatively predicted by machine learning from the electrical current, steel strip and electrolyte parameters. A comparison between the results of the physical and machine learning model and the use of meta-variables from the physical model in hybrid machine learning models is rounding off our discovery.

Process unit
The electro galvanizing line was previously described in [2,24]. More details on the products and technology involved can be read in [1] while the GRAVITEL ® plating cells are covered in [25,26]. A schematic drawing of the electro galvanizing line is depicted in Fig. 1. It consists of 12 vertical cells with four anodes each. In the entry section the steel strips are welded together to form an endless strip. The looper installed downstream compensates short downtimes during welding in the entry section. The strip is straightened by the tension leveler. A multistage process to produce a clean and grease-free surface is implemented for pre-treatment. This is an important precondition for the application of homogeneous zinc layers with good adhesion. In the plating section the strip passes 12 GRAVITEL cells in which zinc from the electrolyte is deposited on the surface due to the application of high electrical currents. This patented cell design allows single-sided or double-sided galvanizing of the strip. The conductor rolls and the sink roll direct the strip between the anode plates mounted on the anode boxes. One rectifier supplies two anodes, which are located on the top and the bottom side of the strip, as depicted in Fig. 2. The electrolyte is pumped into the wedge-shaped space between anode and strip and flows downward as pulled by gravity-similar to a cascade. Under the influence of the applied DC voltage, zinc ions move from the electrolyte to the strip, where they are deposited. In the post-treatment section, the strip surface is treated with various chemicals to obtain improved corrosion resistance and good paintability. Subsequently, the strip is rinsed and dried with hot air, and the thickness of the zinc coating is measured. In the inspection stand, surface inspection and strip marking are carried out. Upon request, the strip is additionally oiled (corrosion protection, forming behavior). Finally, the strip is coiled, cut from the endless Fig. 1 Schematic drawing of the electro galvanizing line. In the entry section, the strip is decoiled and welded to the previous strip. After alkaline cleaning and acidic pickling, the strip enters the plating section consisting of twelve GRAVITEL ® -cells. After rinsing and optional post treatment, the strip is inspected and cut into coils again strip and weighed. In the exit section, too, a looper helps to compensate short downtimes (cutting). Some production parameters are presented in Table 1.

Anode aging
Anode aging has a variety of causes, amongst which the most important are strip contact, abrasion, corrosion by electrolyte, and human caused error like mis-installation. The anodes have a typical lifetime of 3-12 months in the cell, until they are removed based on the results of the visual inspection. The inspection procedure is conducted on a monthly basis. During inspection the strip is stopped and electro galvanizing is conducted in a static manner so that the current anode activity is reflected on the strip. The strip segments are subsequently photographed and evaluated visually by a team of process experts and technicians. Typical pictures of strip segments corresponding to "good" and "bad" anode conditions are shown in Fig. 3. This inspection procedure is costly, time  consuming, and depends on the skills of the plant staff. This was the motivator to implement predictive modelling of the rectifier voltage to assess the anode condition and find the correct time for replacing them.

Data acquisition and preparation
The data used in this study was collected on one exemplary rectifier unit equipped with two anodes over the course of 22 months with a frequency of one sample every 10 s. The dataset consists of one target variable (voltage) and nine predictive variables which are given in Table 2. The data processing and regression model training was performed using our pre-developed scripts written in Python [27].
The data was filtered prior to analysis, i.e. data segments were chosen where no external disturbances or maintenance occurred (determined by in-house records), where the production is set to double-sided galvanization (which accounts for up to 95% of production time) and where the sheet rolling speed was held above 35 m/min. Additional filtering is based on valid ranges for individual parameters, i.e. pH 0-14 and electrolyte temperature 40.0-100.0 °C. The electrolyte values were determined in the laboratory in eight-hour intervals by replicated measurements. The respective analytical methods are mentioned in Table 2. The laboratory features were linearly imputed to the 10 s intervals.

A simplified physical model
For the calculation of the baseline voltage, i.e. the voltage during good anode conditions, we developed a physical model based on electrochemical and physical laws. The electrical circuit of a single cell for electro galvanizing in a steady state can be approximated by the system shown in Fig. 4.
• U rect is the driving voltage generated by the rectifiers and applied across the whole cell. • R supply is the resistance of the cables connecting the anodes and the conductor roll to the rectifier. • R roll is the resistance of the conductor roll itself and the transition resistance from the conductor roll to the steel sheet (see the schematic in Fig. 2). • R sheet models the resistance within the steel sheet from the conductor roll down to the electrolyte and depends on steel quality and strip thickness.
The circuit then splits into two parallel branches, as in the cases we investigated both sides of the strip were plated.
• The electrochemical potential of zinc deposition (E Zn ) remains nearly constant during operation and causes a voltage drop of about 0.76 V [28]. • The high current densities typically applied in electro galvanizing lines are far away from the equilibrium potential and therefore an additional potential η Zn has to be added. This overpotential can be calculated using the Tafel equation [29].

Fig. 4 Equivalent circuit of the electroplating process
• R anode is the resistance of the electrolyte. It mainly depends on the distance between anode and steel strip and on the specific resistance of the electrolyte. The inverse of the specific resistance σ electrolyte , can be calculated by the following empirical equation [30]: If I and U rect -along with all the other relevant parameters-are measured, the difference between the calculated and measured rectifier voltage can be used to detect parasitic resistances-like aged anodes-within the circuit. A detailed explanation of the physical model is presented in Additional file 1.

Machine learning
For prediction of baseline rectifier voltage, we employed three different regression algorithms, namely Random Forest (RF) regression, AdaBoost (ADA) regression and partial least-squares (PLS) regression. Our choice of regression algorithms includes both linear and non-linear regressors, with the first two (RF, ADA) being considered non-linear black box ensemble methods, which gained popularity in industrial settings due to their prediction quality, little preprocessing effort and model tuning, as well as fast parallel training [30,32]. Ensembles use the advantage of training many weak learners and average the predictions; the weak learners employed are commonly, but not exclusively, variations of decision trees aggregated either through boosting (sequential training) or bagging (parallel with random sub-sampling). Recently, ensemble methods performed best in machine learning and data science global challenges [33,34].

Partial least squares
In PLS we want to find the multidimensional direction in the X-space (predictive variables) that explains the maximum multidimensional variance direction in the Y (target variable) [35]. The PLS method compresses the X-space to a set of vectors called latent components from the original X-space and builds a linear multivariable regression model for the target. A detailed overview of the method is presented in Refs. [35,36].

Ensemble regressors
The Random Forest algorithm, conceptualized by Breiman [37], achieves prediction by exploiting bagging. The basis (weak learner) for RF is the decision tree algorithm. The independence between the individual weak learners reduces bias in the models, while variance can be controlled for by carefully optimizing weak learner hyperparameters, such as tree depth. Besides their good performance, RF accepts many feature representations and thus yields reduced preprocessing efforts, which makes them convenient for use in many applications, including manufacturing. Due to the fact that trees can be trained in parallel, a major advantage of RF is parallelization when used in high-throughput computing infrastructures.
AdaBoost (adaptive boosting) [38] is based on sequential training on sub-samples where each instance (weak learner) is built in order to raise the importance of samples which have been mis-predicted in previous instances in the sequence. The final ensemble is then averaged with a weight based on accuracy of the instances in the sequence. A comparison of the two algorithms with a more detailed description is presented in Ref. [32].

Hybrid models
The hybrid models, as described in the introduction, comprise a combination of predictors based on physical and machine learning models. In this work, we employed the described physical voltage models to generate meta-variables which were used in the machine learning models (PLS, RF, ADA) to improve the prediction quality. To this end, all of the variables calculated in Eqs. 1-2 which were not in the baseline data set were used as additional inputs to the three regressors.

Model training and validation
We used scikit-learn [39] as implementation of the algorithms in our work. The baseline rectifier voltage was set as the target variable, with the other variables in Table 2 being predictive variables and meta-variables for the hybrid models. The data was divided into "good" and "bad" segments. The segments were labelled according to the difference of theoretical to measured voltage and according to results from the manual inspection of the anodes' condition (see "Results of the physical model and data labeling" section). From the "good" segments (a data set of ~ 1.06 million measurements) a training set (75%) and a validation set (25%) were chosen at random. The validation set was held out until model validation when the model was evaluated. The complete model pipeline is presented in Fig. 5. All models were trained on an in-house big data server described in [40]. The evaluation of the models was conducted using the root mean squared error (RMSE, Eq. 5), the R-squared score (R 2 , Eq. 6), and mean absolute error (MAE, Eq. 7).
For optimizing hyperparameters we used grid search [41] with tenfold cross validation on the training set, which showed good results in our previous work [27]. The hyperparameters for the grid search are presented in Additional file 1: Table S1.

Results of the physical model and data labeling
We calculated the theoretical voltage based on the simplified physical model (Eqs. 1-2) in the section "A simplified physical model" given the electrical current, steel strip and electrolyte parameters. The assumption is that, from the difference of measured to theoretical voltage, one can reveal the appearance of parasitic resistance, which indicates bad anode conditions or mis-installation. Figure 6 shows a time series plot of the voltage difference (measured-theoretical), which can be seen as "parasitic voltage" or voltage which cannot be explained by other contributions. The moments at which the anodes were changed are shown by vertical dashed lines. It can be observed that anodes got replaced by maintenance personnel about the same time the median voltage difference (parasitic voltage) was at ~ 1 V. The training periods (shaded in orange color) for machine learning were determined at times when the voltage difference was low, i.e. below 0.6 V, and where the maintenance personnel estimated anode condition as good. From the shaded areas containing data with good anode condition (which represent model training and validation sets together), 25% of data was chosen by random sampling for validation. The non-shaded areas were used for the proof of concept. The RMSE of the theoretical voltage calculated by the physical model on the validation set is 0.464 V. The area from month 18 onwards demonstrates the concept very well since the model assigned low differences to a fresh anode exchanged in month 18.

Fig. 6
Timeseries plot (complete data set) of the rectifier voltage difference (measured-theoretical) or "parasitic voltage" for an exemplary anode couple. The theoretical voltage for the plot is calculated from the physical model ( Eqs. 1-2). The grey points are raw voltage differences (measured-theoretical), the green line is the median value of the raw difference data. The time window for calculating the median is approximately 10 days before and after (rolling median). The dashed vertical lines with the highlighted text present times of anode changes. The orange shaded areas represent the "good segments" used for machine learning model training and validation. The unshaded areas represent "bad segments", while some low voltage difference areas were left out for our proof of concept

Results of the machine learning models
We trained three different machine learning models described in the "Machine learning" section. The optimal hyperparameters chosen by grid search are presented in Table 3.
The predictive quality was evaluated on the validation sets. Figure 7 shows model residual plots. For comparison, the respective root mean square error values are included in the subplots. RF shows the best prediction on the validation set with a RMSE of 0.334 V. It is followed by the physical model with an RMSE of 0.464 V, PLS RMSE 0.819 V and ADA as the worst performing model with an RMSE of 1.570 V. From the plots in Fig. 7 it is clear that RF and the physical model have a higher error in the upper voltage region with no obvious global patterns in error distribution. They also appear to be less prone to potential outliers. ADA and PLS appear to have failed in fitting the data as the point cloud seems to experience a curved structure.
It was however expected that PLS will show less scattered points (outliers) since it can extrapolate better than Random Forest which is binning data amongst the known instances. Furthermore, RF has the hyperparameter "min samples" chosen to be 200, meaning there are 200 instances binned to one voltage value. ADA shows poor behavior with this data set.
The hybrid models were trained with 7 additional meta-variables capturing theorydriven information. Even though the additional variables are derived from the original ones (Table 1), we observed an improvement in the models' performance (Table 4). All models were improved through addition of meta-variables from the theoretical model, which adds electrochemical and physical knowledge to the models. This may have allowed weak learners (RF and ADA) to produce better suiting splitting criteria and thus allowed for a better fit. Even though we employed three different metrics for result evaluation, the metrics are following the same pattern with RF showing the best performance in all of them. This can be assigned to the size of the data set, i.e. with a count of ~ 265,000 measurements in the validation set the metrics will be less sensitive to outliers than is the case with using a variety of metrics small data sets. The improved performance of RF over boosting algorithms (ADA herein) was also observed in prior work [30,42]. Figure 8 shows a visual comparison of the median voltage differences predicted by the physical model (measured-theoretical) and the one predicted by means of the best model (Hybrid RF). The model was used to predict also the "bad sectors" which are unshaded areas in the plot. It can be seen that overall, the best model calculated a lower baseline voltage, which is expected to be more realistic especially after the installation of fresh anodes. The unexpected lower voltage from month 18 onwards is assigned to a novel type of anode coating tested in the unit (which was not available in the training set). Both models show slightly lower median voltage.

Concept transfer and limitations
Our proposed concept can be transferred to any setting where existing physical models are employed but still have limitations in predictive quality. It is commonly seen that manufacturing plants employ physical models to describe dynamic processes in their individual production units. With inclusion of metavariables derived from or generated within physical models, one can improve data-driven predictions. This can be considered a feature engineering method. The methodology can be applied to any condition monitoring setup where the measured value, descriptive of a condition (a target variable), differs from predicted or physically modelled baseline caused by suboptimal working conditions. One has to be able to clearly determine the target variable (a quantified condition) and its possible predictors. Early classification in condition monitoring can reduce material and energy cost. With the use of ensemble regression models like Random Forest and AdaBoost, one can improve predictive power and reduce preprocessing time, since the models work well with heterogeneous data. The simplified data processing can therefore reduce the efforts in creating online and semi-online automated learning pipelines.
Some of the limitations of this approach will be explored in future research, such as the development of a universal model for all rectifiers; the removal of local outliers per strip prior to training; automated data labelling; the inclusion of additional process variables which are not present in the physical model. The same limitations can be transferred to other use cases where one has more data available in the system than those which are being considered as co-variates of the target variable. Furthermore, often industrial settings have more than one operating/processing unit of the same kind, which means universal models for all units might be desirable. An important aspect when modelling such systems are product and expendable material properties which, if changed, can cause drift in the covariance matrix. Besides that, process units can age as well and in case of such a dynamic system be influenced by corrosion and other tribological effects. In the model presented here, we are certain that the model was trained on well distributed data regarding steel strip and anode material. The steel strip and anode material might get changed in the future and model re-training could be necessary to improve generalization of the presented model.

Conclusion
In this paper, we presented a hybrid machine learning approach to predict baseline voltage in a rectifier placed in an electro galvanizing production line and compared it to a simplified physical voltage model. The difference of the baseline from the measured voltage presents parasitic resistance due to bad anode condition. Therefore, this parasitic resistance is employed as an indicator of the anode condition. The best model (a hybrid machine learning model) shows good predictive quality with an RMSE of 0.263 V in a rectifier with (mean ± std. dev.) 11.228 ± 2.525 V. The winning algorithm RF can easily be employed semi-online (needs periodical re-training) due to fast and parallelizable training and little