TestDCat 3.0: catalog of test debt subtypes and management activities

When deadlines and resources of software projects become scarce, testing is usually in the first row to have its activities aborted or reduced; however, if defects cannot be found, product quality can be affected. In the software development process, aborted or reduced activities that can bring short-term benefits, but can be harmful to the project in the long run, are considered Technical Debt (TD) and, when the TDs impact testing activities, they are called Test Debt. Although there are several studies dealing with Test Debt, current solutions often deal with specific types of tests (e.g., exploratory and automated tests) and do not address the whole software testing process. Aiming to fill these gaps, this work then proposes a Test Debt Catalog, called TestDCat, with subtypes of Test Debts and Technical Debt management activities. This catalog is built based on the results of an empirical study, a literature review and semi-structured interviews conducted with practitioners who perform testing activities on five projects from industry. For the TestDCat evaluation, a case study is conducted in real projects in order to identify if the catalog is user-friendly and if its use helps the Test Debt management during the execution of test activities in these software development projects.


Introduction
Technical Debts are technical commitments generated in software projects that can bring short-term benefits, but which, in the long run, can be detrimental to project quality Li et al. (2015). This concept was first used by Cunningham (1992), who related the characterization of Technical Debts to problems in the code and the need for refactoring to pay the debts acquired. Other studies have addressed Technical Debts in other activities of the software development process (e.g., tests, requirements, usability) and provided solutions to manage them in software projects Alves et al. (2016) Li et al. (2015).
When the Technical Debts concern the software testing, they are known as Test Debts. They arise due to inadequate decisions regarding testing activities (e.g., lack of tests, test estimation errors) Samarthyam et al. (2017). For instance, members of a project may not perform (intentionally or unintentionally) some activities to achieve faster delivery and gain some competitive advantage Wiklund et al. (2017).
According to Samartyam et al. (2017), failure to identify and, consequently, manage Test Debts during software development can directly impact the quality of the software developed. In this context, it is necessary to investigate more deeply the causes of Test Debts and how they can be managed to control their costs and do not impact the maintenance/evolution of a software or make it unfeasible.
Although there are a significant number of works that deal with this type of debt Alves et al. (2016) Li et al. (2015), most of them deal only with code-related debts by using, for example, tools for analysis of test code coverage and unit tests. Therefore, a more in-depth investigation of debts that cannot be managed by code analysis alone is necessary. In addition, most studies focus on only a few specific causes of Test Debts (for example, lack of automated tests Wiklund et al. (2012) and exploratory tests Shah et al. (2014)). Then, there is a need for studies covering a more significant number of causes of this type of debt.
Thus, our work intends to fill the following gaps identified in the systematic mappings of Alves et al. (2016) and Li et al. (2015) and the literature review that is also done in this work: (i) lack of studies dealing with other types of Technical Debts (e.g., Test Debts); (ii) need for identifying non-code-related Technical Debts; (iii) few studies with more global approaches that cover more causes of Test Debts; and (iv) need for validated solutions in industry projects.
The knowledge acquired as a result of performing this work was organized in a catalog format and made available for the use of practitioners who work with software testing and Test Debts. The decision for this kind of format came from the fact that catalogs are ways of organizing the information and know-how that originates from practitioners Chung et al. (2012) and can be used as a source of information on how to deal with Technical Debt Kruchten et al. (2012) Ozkaya et al. (2011).
In a previous work , we proposed the catalog, which we called Test-DCat, with subtypes of Test Debts and management activities that can be used to help in the management of Test Debts. To build this catalog, we first investigated, through an empirical study in industry, problems that occurred in a testing factory, which performed tests in applications and systems, and then we identified that these problems could be formulated and treated as Test Debt. Next, the catalog itself (TestDCat 1.0) was developed based on the information gathered from semi-structured interviews and a literature review about Test Debts and their management. After analyzing the results of the first evaluation, we proposed TestDCat 2.0 to improve the first version. This second version was also evaluated and changes and improvements were suggested. The results of the evaluation presented evidence that the information organized in the catalog could support the management of Test Debts. This last evaluation also brings out the need of new improvements to our catalog.
So, this paper extends our aforementioned previous work by presenting the third version of our catalog, TestDCat 3.0, and a case study to evaluate it. This new version includes improvements that we identified from the results of the previous evaluation of TestDCat 2.0 . Regarding the case study, it was conducted in software projects of the industry to assess the uses of the catalog to assist the Test Debt management and its usability. Additionally, we present a summary of our empirical study conducted in a test factory to build the TestDCat. This paper is divided into seven sections. In Section 2, we introduce the methodology and steps used to build the catalog. TestDCat 3.0 is described in Section 3. In Section 4, we present the case study conducted and the related works are discussed in Section 5. Finally, in Section 6, we present the final considerations and perspectives for future work.

Methodology
The construction of the catalog follows a methodology (see Fig. 1) that is partially based on the technology transfer model presented by Gorschek et al. (2006). This model favors cooperation between academia and industry and can be beneficial to both. In short, it gives researchers the opportunity to study relevant industry issues and validate their results in a real environment and, in return, professionals receive knowledge about new technologies that can, for example, optimize their processes.
Regarding the first step of the methodology, the step called identify the problem consists of the observation, by the researcher, of the activities performed in the industry in order to identify real problems/issues. In our case, we identified our problem from the experience in developing software in a successful long-term partnership with the industry  in Research, Development and Innovation (R&D&I) projects that occur in the environment where this work has been developed. In this kind of projects, the GREat 1 Test Fig. 1 The methodology used to build the catalog factory (GTF) team has followed a testing process as documented in . Once we identified our problem, we conducted an empirical study to identify the main issues faced by GTF. (The results of this empirical study are also presented in .) As a result of the first step, the problem was identified and formulated as Test Debt.
The second step is called perform the literature review (see also details in Section 3 and 5). The goal of this step is to better understand the problem, in our case, Test Debts, and identify how to deal with it. This review was performed based on the analysis of Test Debt studies identified in the systematic mappings carried out by Li et al. (2015) and Alves et al. (2016). Based on the principle of Snowballing Wohlin (2014), we also performed a search for studies that cited the set of previously selected papers.
The third step is to propose the candidate solution to tackle the identified problem. In our previous work, two candidate versions are presented . The first version was developed from three distinct and complementary sources: (i) literature review; (ii) empirical study; and (iii) results of semi-structured interviews with professionals in the test area. The knowledge acquired from these three sources was then organized in a catalog format. The second version of the candidate solution was obtained from the result of the evaluation performed in TestDCat 1.0 and the information collected in a new series of semi-structured interviews conducted with software developers who also performed testing activities. This work presents a third candidate solution, which was developed from the analysis of the results of the evaluation of the previous versions.
The next steps of the methodology refer to the static and dynamic validation (i.e., perform the static validation and perform the dynamic validation). For the static validation, a survey is applied to evaluate the first version of the catalog, and a focus group Krueger and Casey (2002) is conducted to evaluate the second version. The dynamic validation is executed as case study Runeson and Höst (2009) with industry projects (five applications) at the GREat Test Factory.
In the last step, release the solution, we released the third version of the catalog, TestD-Cat 3.0, and used it in industry projects.

TestDCat 3.0
This section presents the third version of TestDCat, which was built based on the evaluations made in the catalog previous versions . The steps and details for building TestDCat 3.0 can be seen in .
In TestDCat 3.0, improvements have been made to the existing catalog versions that are related to the addition or removal of actions as well as occasional improvements in the descriptions of each action (e.g., change of the person responsible for the action). The structure of the catalog remains the same, using the 5W1H model Fernandes et al. (2013), commonly used to conduct action plans. The data is categorized according to TD management activities and subtypes of Test Debts, both mapped by Li et al. (2015). In addition to the subtypes identified by Li et al., two other subtypes have been identified: Inadequate equipment and Inadequate allocation.
Regarding the subtype Inadequate Equipment, the incorrect use of equipment can impact the quality of the tests performed, generating, for example, false positives. Inadequate allocations can impact the results of tests performed, when, for example, professionals who do not have the necessary expertise to perform tests on a given demand are allocated, and critical defects are not detected. Figure 2 presents the catalog, emphasizing with listed circles parts of it. Circle number 1 emphasizes Technical Debt management activities: Identification, Measurement, Prioritization, Communication, Monitoring, Repayment, Documentation, and Prevention. By clicking on any of these activities, subtypes of Test Debts and its related actions are presented. In the example shown in Fig. 3, the "Identification" activity was selected.
The circle number two presents the subtypes of Test Debts: Low code coverage, Deferring testing, Lack of tests, Lack of tests automation, Defects not found in tests, Expensive tests, Test effort estimation errors, Inadequate equipment, and Inadequate allocation. In this catalog, we present a set of actions for each subtype. For example, after clicking on the "Identification" activity, the user can choose which subtype he/she wants to handle. In this example, we choose "Low Code Coverage" that has two related actions.
Circle number three brings forward the actions identified through semi-structured interviews conducted. They are following the 5W1H model. In this case, these are the suggested actions to help catalog users to identify Test Debts caused by "Low Code Coverage." Actions can be used together or individually. In circle number four, we present the functionality in which the user can select which actions are most related to the context. A customized action plan is then created.
A total of 63 actions were catalogued and all the catalog's information can be viewed on the TestDCat Catalog's website 2 .
To use the catalog, the user can take the management activities as a guide and perform all the actions of the activity that he/she chooses (for example, he/she may want to perform the identification of all the Test Debt subtypes without performing the measurement activity). The user can also choose the Test Debt subtypes as a guide. Depending on the subtype chosen, he/she defines which management activities he/she wants to use, for example, he/ she can choose the subtype "Lack of tests" and make the identification actions for this subtype. Both forms of use are available on the website and can be changed by practitioners as needed.

Examples of catalog actions
Only the identification and repayment activities have specific actions for each subtype of Test Debt. For the other activities, the described actions could be applied for all subtypes.

Identification
The management activity Identification refers to the tasks performed to identify Technical Debts during the development of software. At the end of this activity, the list of TDs identified in the project is expected. Table 1 presents two identification actions. In the first one, it is expected to identify subtypes of Test Debts Lack of tests by analyzing the list of versions of the software in order to find indications of non-performance of tests in any of these versions analyzed. In the second action, it is expected that, by analyzing feedback from users, possible defects that were not detected at the time of testing will be identified.

Measurement
Measurement management activity aims to quantify the benefit and cost of known Technical Debts. This estimate can be performed for an individual TD or for the entire system in order to identify the level of TDs in the system. Table 2 presents an example of action to be taken in order to quantify the cost/benefit ratio of identified debts. Some examples of the cost calculation described in this action are: (i) effort required to document and perform manual tests; (ii) infrastructure required to perform the tests; and (iii) impact of changes in other system test cases.

Prioritization
The Prioritization management activity is intended to rank identified Technical Debts to assist in defining which should be repaid first and which may be postponed to another time.    Table 3 presents two actions for Prioritization activity. It is worth mentioning that these actions may be applied to all subtypes of Test Debts. The first action is related to the analysis of the cost/benefit ratio performed in the Measurement activity. Based on this analysis, it is possible to realize the prioritization. The second action is related to the relevance of the functionalities that have known TDs. The debts that are related to the most relevant functionalities should be prioritized.

Communication
The communication management activity aims to inform all stakeholders of the debts identified. By conducting this activity, it is expected that project members will be aware of the existence of the identified TD and will be able to take the necessary actions, if pertinent. Table 4 presents two examples of actions. The first action concerns the use of follow-up meetings to report identified debts. The second action is about the use of communication tools (e.g., Skype, email) to report the existence of debts.

Monitoring
The Monitoring management activity aims to monitor the cost/benefit ratio of each identified debt in order to notice changes in this ratio. This monitoring is important because Test Debts that at the time of identification are not relevant to the repayment may become relevant due, for example, to a change in the importance of the functionality in which the Test Debt was identified. Table 5 presents some of the examples of the actions defined in the catalog for the "Monitoring" activity. The first action is related to the monitoring of the cost/benefit ratio of a known debt. This analysis can use the same techniques used to measure the TD and should be carried out periodically in order to identify this change. The second action is related to the definition and monitoring of triggers. When a particular trigger (e.g., change in the complexity of a feature that contains a debt and discontinuity of a product) is identified, a new analysis of the cost/benefit ratio of the debt should be performed to, if necessary, carry out actions in this regard.

Repayment
The repayment activity aims to resolve or mitigate a known debt. It is performed when the team believes that failure to pay the debt can cause a major impact on the maintenance of the software under development.
For the repayment activity, actions were generated for all Test Debt subtypes. Table 6 presents examples of repayment actions for the subtypes Lack of tests and Defects not found in tests. The first action is related to the elaboration and performance of test cases of releases that have not been tested. For this subtype of Test Debt, it is very important that the payment of the debt is made with high priority, because the release that has not been tested may have many defects that have not yet been identified. The second action is, by analyzing the identified defects, to insert new test cases to cover the defect found or to make changes in existing test cases in order to expand or improve them to cover the defect under analysis. The monitoring of these triggers must be constant

Documentation
The documentation management activity is intended to document identified TDs for the knowledge of all stakeholders and for future consultation. Table 7 presents an example of an action that aims to set a standard for documentation of identified debts. Following a standard for documenting debts is essential to ensure that important information is not neglected and is included in the documentation of a debt. This documentation can be accessed when necessary and its content updated if relevant.

Prevention
This management activity aims to prevent the occurrence of TDs. Table 8 presents two actions related to this activity. The first one is related to the presentation of the debts already identified to stakeholders in order to prevent new debts of the same type from being acquired again. The second one is to add review to the test cases elaborated with the purpose of identifying improvements and, thus, avoiding the occurrence of new debts.

Actions to improve the test process
In addition to the specific actions for each activity, and subtype of Test Debt, it was implemented in TestDCat, actions with suggestions for improvements in the test process. These improvements are carried out in the existing activities of the test process and are designed to systematize the Test Debt management actions. Table 9 presents some examples of these types of actions in different management activities and Test Debt subtypes.

Limitations
The catalog proposed in this work aims to assist professionals and researchers who perform software testing activities to manage Test Debts. However, TestDCat has some limitations that are presented as follows.
One of the limitations is the generality of the actions in the catalog. This occurred due to the forms of TestDCat can be used in different kinds of organizations, so it was an option to produce more generic actions. With this, users of the catalog could make the necessary changes in the actions to meet the specifications of their organization. However, some users of the catalog may miss more specific actions; then, the generality becomes a limitation of TestDCat.
Another limitation is with regard to the possible bias of the catalog. This is due to the fact that one of the TestDCat inputs was generated from semi-structured interviews with industry professionals, but only professionals from the same organization were interviewed. This fact is a limitation of the work since it can be biased. However, as previously stated, it was decided to include in the catalog more generic actions for the management of Test Debt in order to reduce the bias of the inserted actions. Besides, the catalog also contains actions identified in the literature that have a different context from the context of the organization of the professionals interviewed.

Case study
Following the research methodology used in this work (see Section 2), a dynamic validation was planned and conducted. This validation was performed by a case study at GREat Test Factory (GTF) and aimed to validate the use of the catalog in real projects with the industry.

Case study design
Case studies are commonly used to study phenomena that occur in their natural context Runeson and Höst (2009). So, this work conducted a case study in a test factory to evaluate the use of the catalog for managing subtypes of Test Debts in real industry projects. The case study was based on the case study guide proposed by Runeson and Höst (2009), which is a relevant guide for conducting case studies in software engineering.

Definition of goals and questions
The case study was conducted with the intention of measuring the impact of the use of a catalog of Test Debts in a test factory that operates in testing activities of real projects with the industry. To do this, it was necessary to understand the knowledge of the GTF participants on Test Debts, to measure the current status of the occurrence of Test Debts in GTF and to investigate the impact of using a catalog of Test Debts. The questions that came from the objective and which should be answered with the conducting of the case study are: -RQ1. Does the use of the catalog assist in the management of Test Debts? This question aimed to identify whether the use of the catalog really helped in the management of the Test Debt subtypes. As a collection method, in addition to the application of a survey, tools were also investigated to monitor the test activities carried out by GTF as well as the verification of indicators collected after the use of the catalog. Table 10 summarizes these metrics and their respective calculation formulas. -RQ2. How easy is TestDCat to use? This question was designed to verify the perception of GTF participants regarding the use of TestDCat. As a collection method, the SUS questionnaire Brooke et al. (1996) was used to identify the level of usability of the catalog.

Study objects
The objects were two software products developed in partnership with the industry. The selected products were the ones that demanded tests for the GTF during the period of the case study. Table 11 presents the products, the platforms for which they were developed, the number of releases launched during the period of the case study, the number of code lines (Loc) of the software product, as well as the number of test cases already elaborated in the GTF for each product. The names and versions were changed for confidentiality reasons. A total of five uses of the catalog were analyzed. The P1 product is a mobile application based on Android technology. It currently has 366.203 active users. The tests of this application are related to validation of the state machine that has many different states that need to be validated. In addition, it is necessary to perform the tests on several different Android versions to ensure the correct functioning of the application on these various versions in which it should operate.
The P2 product is a WEB tool used to support the internal processes of the company. This tool is used as a means to perform the activities of the company. Its malfunction impacts on all the work done in the company. Thus, the tests must be careful to ensure that this tool works properly.

Case study participants
The GTF team who participated in the case study is composed of five professionals, divided as follows: (i) one test leader, responsible for the creation and execution of scenarios and test cases, who also monitors the execution of the activities performed and technically assists the team; (ii) two test analysts, responsible for the creation and execution of scenarios and test cases; and (iii) two testers, responsible for the execution of previously prepared test cases.
In addition to practitioners, two researchers with knowledge of TestDCat and its actions as well as a vast knowledge of Test Debts, participated in the case study. These researchers made observations and notes during the execution of the case study for further discussion.

Execution of the case study
In order to conduct the case study, the catalog was applied in five test demands of GTF, being three in the demands of P1 project, and two in the demands of P2 project. Before the beginning of the catalog uses, a presentation of TestDCat was done in which the objective of the catalog was presented as well as how it could be used.
The catalog was applied in P1 and P2 software products when testing demands were requested to GTF. The use of the catalog followed the step by step defined by one of the researchers together with the GTF leader: -The GTF team would meet to analyze the catalog and identify which subtypes of Test Debt would be managed in that demand. In addition, at this meeting, it was also defined which actions would be taken from each management activity of the previously chosen Test Debt subtype. -Analyze the chosen actions and adapt the activities of the test process established in GTF to manage the chosen Test Debt subtypes. -Perform the actions together with the activities of the GTF testing process. The main necessary change in the GTF test process for the use of the catalog was to insert the TestDCat analysis and define the actions for the management of the Test Debt subtypes in the same meeting that defines the current demand test scope. The adaptations of the activities performed in GTF were carried out according to the choice of the Test Debt subtype that would be managed in the current test demand. The adaptations made to each demand will be presented in the next subsections.

First catalog use
After the training on how to use TestDCat, the first use of the catalog was started in the test demand requested to GTF to perform the tests of a specific release of P1 software product.
After starting the steps established for the use of the catalog in the realization of the test demands, the GTF team met to analyze the catalog and discuss which subtypes of the Test Debt would be managed in the release in question.
For the first catalog use, the team chose to deal with the subtype of Test Debt Defects not found in test. This choice was due to the prioritization of the team in dealing with debts that could generate defects found by the client, because the problems found by the client would bring greater inconvenience to GTF. It is worth noting that this subtype was the first Test Debt subtype identified in the test factory even before the existence of the catalog.
With the Test Debt subtype defined, the team analyzed and chose the actions they would take to manage the chosen Test Debt subtypes. According to the selected actions, it was necessary to add or change some activities of the testing process used in GTF to perform the management of the Test Debt subtypes. Table 12 presents a summary of the actions chosen in this first use of the catalog. It can be seen that the team prioritized actions that were already fully or partially performed by GTF and that would have less impact on the activities already performed by the team.

Second catalog use
The second use of the catalog still occurred in the P1 project. At the catalog analysis meeting, the GTF team decided that they would manage the following subtypes of Test Debt in this session: (i) Defects not found in test; (ii) Lack of tests; and (iii) Test effort estimation errors. All these subtypes were identified in the empirical study carried out in GTF and it was known to the team that they frequently occurred in GTF. Thus, the team decided to manage them in this second use of the catalog.
After defining the Test Debt subtypes, the team defined which actions would be performed to manage the chosen subtypes. Table 13 shows the actions selected for this new catalog use. It is worth noting that the management activities in which the actions were not altered in relation to the previous use were omitted in the table, yet all the management activities were performed.

Third catalog use
The third use of the catalog was performed in the P1 product. In this use, three subtypes of Test Debt were chosen to be managed: (i) Test estimation errors; (ii) Inadequate allocation; (iii) Inadequate equipment. For this application, two new subtypes were chosen, inadequate allocation and inadequate equipment. The intention was to identify and manage different Test Debts that GTF did not yet know. Also, in this new application, a new member joined the GTF team, so it would be possible to manage potential Test Debts caused by inadequate allocation. Table 14 presents details of the actions chosen to manage the subtypes of Test Debts selected in this third use. It is important to highlight that all management activities were performed, but this table only details the activities whose actions were not performed in the two previous catalog uses.

Fourth catalog use
The fourth use of the catalog was carried out in the P2 software product. For this use, the GTF team decided to manage the following subtypes of Test Debt: (i) Deferring tests; (ii) Lack of tests; (iii) Test estimation errors; and (iv) Inadequate allocation. Of these subtypes, only the Deferring tests had not been managed in the previous uses of the catalog. The subtypes Lack of testing and Test estimation errors were chosen, because they were the subtypes of Test Debt that had already been identified in this product in previous releases (without the use of TestDCat). The subtype Inadequate allocation was chosen because a new GTF member was working in the test of this product. Table 15 shows the actions chosen to manage the previously selected Test Debt subtypes. As occurred in the explanation of the previous uses, the management activities whose actions were not changed in relation to the previous uses were omitted in the table.

Fifth catalog use
The fifth and last use of the catalog in the scope of this case study was performed in the P2 product. For this use, the following Test Debt subtypes were chosen: (i) Lack of tests; and (ii) Test estimation errors. The choice of these subtypes was due to the frequent occurrence of these Test Debts. Only these two subtypes were chosen because this release was short and the team was worried about having an impact on the demanding deadline if more subtypes of Test Debt were treated.
For this use, no new actions were performed other than those already carried out in the previous uses performed in the P2 product.

Results
During the use of the catalog, some metrics were collected and a survey was applied with the participants of the case study in order to collect feedback on, for example, the catalog usability and effectiveness.

REASONS FOR THE CHOICE -Identify the cases in which the test has been postponed -Identify lack of tests in some releases -Identify test estimation errors -Identify inadequate allocations that could impact the quality of the tests performed
The actions for payment of these Test Debts were already usually used in GTF In addition to the presentation of Test Debts and the review of test cases and estimates, the GTF team identified the need to negotiate more flexible deadlines with the client requesting the tests

CHANGES IN ACTIVITIES
The following activities were added to the testing process: (i) Analyze the scope of the test to identify postponed test cases; (ii) Compare different versions of the product to identify the existence of new features and verify if these features have evidence that they were tested; (iii) Analyze reported test effort; (iv) Analyze delays in test releases to identify inadequate allocations (e.g., the tester did not have the necessary knowledge to perform those specific tests)

-Based on the prioritization of postponed Test Debts, the team should define which tests should be developed and executed during the test demand -Elaborate and perform tests for features that were not tested -Change effort reported according to the analysis performed -Make changes in team allocation according to identified needs -Test Debt presentation meeting -Review of test cases and scenarios
as well as test estimates -Negotiation with the client for more flexible deadlines that take into account adequate test schedules.

Collected metrics
The metrics collected were part of the GTF process and are based on articles available in white literature Seela and Yackel (2017) and academic Lazic and Mastorakis (2008).
In addition to the debts already collected in GTF, the metric Number of Test Debts was also collected to measure the amount of Test Debts identified during the test demands. Table 16 presents the results of the metrics collected after the use of the catalog in each of the test demands for projects P1 and P2. With regard to Test Debts, Fig. 3 presents the number of Test Debts identified, repaid (refers to carrying out activities to resolve or mitigate a known Test Debt), and not repaid during the releases of P1 and P2 products.
Regarding the time of analysis, the first use of the catalog was the one that required the most time for analysis of the catalog, taking in total two hours of analysis. The other uses took an average of 30 minutes.

Survey results
A survey 3 was applied with the participants of the case study. This survey was divided into three parts: (i) collection of the participants' profiles; (ii) application of the SUS questionnaire to evaluate the usability of the catalog; and (iii) subjective questions about the impact of the use of the catalog in the GTF test demands. A pilot test was conducted to verify if the questionnaire was adequate to be applied with the other GTF members. After this test, the survey was conducted with the GTF members who participated in the case study.
The participants profile. As stated in subsection 4.1.3, five GTF members participated in the case study. Thus, five responses to the applied survey were obtained. As everyone works with software testing, the profile of the participants was defined as Testing practitioners. Table 17 presents the profile of the participants who responded to the survey. SUS application. The SUS questionnaire Brooke et al. (1996) was used to evaluate the usability of the TestDCat. This questionnaire aims to measure the usability of a system according to the perspectives of users. It is based on ISO 9241-11 Standard (1998) and is commonly used to evaluate the usability of systems, products, and services.
The questionnaire was applied with the participants of the case study, and five answers were obtained. After calculating the results of the questionnaire, a score of 82.5 was obtained.
General questions about the catalog. To identify from the perspective of GTF members about the usefulness of the TestDCat, some questions were inserted into the survey: (i) I believe that the use of TestDCat was useful for the management of Test Debts during the execution of test demands; (ii) I was able to achieve my objectives with the use of TestDCat for the management of Test Debts; and (iii) I believe that I achieved my objectives with the use of TestDCat in the best possible way. These issues used the 5-point Likert rating scale.  Figure 4 presents the answers collected from the third part of the applied survey.

Discussion
This section discusses the results achieved with the collection of the metrics and the application of the survey. After that, the research questions will be discussed and the threats to the validity of the case study conducted will be presented.

Collected metrics
Regarding the data collected, it was possible to observe that in all applications, Test Debts were identified. In the first use, five Test Debts of the subtype Defects not found in tests were identified, and all of them were paid still in execution time. It was observed that the metric variance of the calendar showed that the effort made was higher than the estimated effort, which may indicate that the use of the catalog may have impacted the effort undertaken in the release.
In the second use, the GTF team chose to manage the Test Debt subtypes Defects not found in tests, Lack of tests, and Test estimation errors. However, only the subtype Lack of tests was identified. Forty-one Test Debts related to Lack of tests were identified, and only seven were repaid during the release. Of the remaining debts, ten were not paid because they were related to a system functionality that was not prioritized; the remaining were documented to be paid in later releases. In this release, it was possible to observe that the use of the catalog did not impact the schedule variance, even with a large number of Test Debts identified.
In the third use, the following Test Debt subtypes were chosen to be managed: (i) Test estimation errors; (ii) Inadequate allocation; (iii) Inadequate equipment. From the subtypes chosen, only one debt related to Test estimation errors was identified and no Test Debts related to Inadequate allocation and Inadequate equipment were identified. However, other subtypes that were not the focus of this release were identified, such as Sixteen Defects not found in tests and Two Lack of tests. Due to a lack of prioritization, only two of the identified debts were paid during the release. Nevertheless, the remaining debts were already paid during the execution of the subsequent release. The fact that Test Debts were identified that were not the focus of the release may indicate that the team has already internalized the management of debts that had already been the focus of previous releases.
Regarding the effort made in the release, despite a large number of test cases performed, no significant difference was observed between the actual effort and that planned, which may indicate that the catalog did not have a significant impact on the increase of hours performed by the team.
The fourth use occurred in product P2 and the following subtypes of Test Debt were chosen: (i) Deferring tests; (ii) Lack of tests; (iii) Test estimation errors; and (iv) Inadequate allocation. However, no Test Debts of the subtypes that were the focus of this release were identified, but five Test Debts related to Defects not found in tests were identified. These debts were not paid at release time due to low prioritization but were documented for payment in later releases, if appropriate.
This release again showed an increase in the schedule variance metric. However, it is believed that this was due to a large number of bugs identified in the execution of the tests, which required more time for documentation and reporting of these bugs.
In the fifth use, the following subtypes were chosen: (i) Lack of tests; and (ii) Test estimation errors. However, Test Debts related to Defects not found in tests and Lack of tests were identified, five and four, respectively. From the debts identified, six were paid in the release, and six were documented and paid in the subsequent release. The schedule variance metrics showed no significant change.
With the completion of the five applications of the catalog, it can be observed that the team always identifies Test Debts in the releases, but not always the debts identified are those that were chosen to be managed in the meeting of the scope of the test demand. This is positive since the team identifies Test Debts independent of the type that was chosen to be managed in the release.
It can also be observed that the main subtypes that occur in products P1 and P2 were Defects not found in tests and Lack of tests. Thus, for these products, the GTF team should pay more attention to preventing the acquisition of these debts.

Survey
Regarding the usability of the system, to situate and qualify the usability of a system after application of the SUS questionnaire, Bangor et al. (2008) present a scale listing usability adjectives ranging from the worst imaginable to the best imaginable, and an Acceptable and Unacceptable range of scores according to the score obtained after application of the SUS questionnaire.
After applying the SUS questionnaire, the score obtained was 82.5, which indicates, according to Bangor et al. (2008), that the usability of TestDCat is in the Acceptable margin of usability. Regarding the adjective presented in the scale, the TestDCat is in the margin between Good and Excellent.
Regarding the perspective of GTF members about the usefulness of the TestDCat, it was observed that all participants believe that the catalog was useful for the management of the Test Debts identified in the case study. Most also agree that, with the use of the catalog, they felt able to manage the Test Debts they chose and that the actions used for this management were adequate and sufficient.

Research questions
After analyzing the results collected during the case study, it is possible to answer the research questions defined in the case study planning.

RQ1. The use of the catalog assists in the management of Test Debts?
In order to answer this question, the survey results were analyzed concerning the user's perception of the catalog usefulness. Also, some metrics collected during the case study were analyzed.
Regarding the feedback of the participants, all answers were positive about the usefulness of the catalog. Also, by analyzing some defined metrics, it was observed that using the catalog helped the GTF team manage their Test Debts. There were a total of 79 Test Debts identified in the five catalog applications, and only five were not paid within the catalog application period. Therefore, it is believed that the catalog assisted in the management of Test Debts.

RQ2. How easy is TestDCat to use?
To answer this question, the SUS questionnaire was applied to assess the usability level of the TestDCat. Also, an open question was inserted in the survey to receive feedback from participants about the usability of the system.
When analyzing the score obtained by SUS, it was observed that the usability of TestDCat could be considered between good and excellent. However, when analyzing the answers to the open question of the survey, some participants commented that the first use was confusing. However, with the continuous use of the catalog, it was possible to use it without significant problems.
Therefore, it is believed that the catalog has acceptable usability, but it would be interesting to improve it in order to shorten the learning curve required for its use.

Threats to validity
Regarding the threats to validity, we discuss threats related to Internal Validity and External Validity. According to Wohlin et al. (2012) and Runeson and Höst (2009), threats to Internal Validity are influences that can affect the independent variable concerning causality, and threats to External Validity, in turn, are conditions that limit the ability to generalize the results to industrial practice.
In our case, the main threat regarding the Internal Validity is the selection of the subjects that were made based on convenience sampling Wohlin et al. (2012). In the case of External Validity, a small number of participants is our main threat. Despite these limitations, it is worth noting that the participants were professionals with significant experience in software testing activities in the industry. Furthermore, these professionals had experience in testing mobile and web software.

Related work
Regarding Test Debt, Samartyam et al. (2017) present an overview of TDs, the factors that contribute to this type of debts and strategies for repayment of acquired Test Debts. These authors also classify Test Debts into: (i) unit testing; (ii) exploratory testing; (iii) manual testing; and (iv) automated testing. For each type of test, they present possible factors that may generate debts. Aiming to support the repayment of TDs, the authors propose a process with three macro activities: (i) quantify the Test Debt, get the permission of the high administration, and execute the refund; (ii) repay debts periodically; and (iii) avoid the Test Debts from accumulating. They also present strategies for the payment of Test Debts that involve the application of good practices for test automation and in the accomplishment of testing activities. These authors also describe two case studies in industry and report their experiences with Test Debts. However, they do not propose an integration of these steps with an existing testing process in the organization. Also, they do not detail the two case studies and the impacts of paying their Test Debts.
Regarding Test Debt management in a testing process, the work of Sousa (2016) aims to identify, within the software testing process, problems that could be considered as Technical Debt. Sousa presents a set of 22 TDs related to the software testing process collected from technical literature, besides their eventual causes and indicators. To evaluate these TDs, they conducted a survey with test professionals who answered if they agreed with each TD, its causes and indicators. Furthermore, Sousa presents a map with the Technical Debts identified in the literature to support professionals in managing TDs that may occur during the execution of the testing process. This map was evaluated by applying a questionnaire with software testing professionals. Sousa, however, does not verify other important activities of TD management (e.g., measurement, prioritization, communication). In addition, the map was not applied to software organizations. Thus, its practical support in real projects was not confirmed.
Other studies have investigated the impact of testing types or techniques in generating TDs. For example, Shah et al. (2014) performed a systematic review to answer the following questions: (i)"Does exploratory testing represent an archetypal example of Technical Debt inducing practice" and (ii)"Shall it be repaid later in the application life cycle?". In this review, the authors present how the exploratory testing influences the test activities and the related Technical Debts. Thus, they conclude that: (i) The lack of definition of test cases makes it difficult to perform regression tests and may cause residual defects; (ii) High human dependence, the missing of results evaluation, and lack of test planning may cause residual defects; (iii) Lack of documentation may lead to a poor understanding of the functionalities, generating rework and causing a wrong effort planning. Therefore, the review by Shah et al. provides an overview of Test Debt concerning exploratory tests and presents how debts can be paid or prevented. However, they do not provide any more TD  Wiklund et al. (2012) present studies that indicate a general trend of high Technical Debt in many test execution automation implementations, for example, leading to problems for the use, extension, and maintenance of systems. In addition to the presentation of test process improvement models, and how they address the testing infrastructure, the results of a case study investigating potential sources of Technical Debt in test execution automation systems are reported. The authors focused on automated testing and brought contributions with observations specific to the area, such as the effects caused by the sharing of tools and the fact that there is instability in the execution of tests in different environments. Therefore, there was no concern with the identification of Test Debts in other types of software tests. Also, the work does not detail ways to manage those debts. Table 18 presents a comparison of the aforementioned related work, regarding the following characteristics: (i) Comprehensive approach, which indicates that the study involves comprehensive strategies to manage Test Debts in the software industry considering the testing types and test process activities; (ii) Target, which refers to the Test Debts target of the study; (iii) Test Debt management in a testing process, which denotes that the work proposes Technical Debt management activities within test process (e.g., changes in a test process in order to manage the Test Debt); and (iv) Conduction of a case study, which indicates whether the work conducted a case study in which some Test Debt management activity, approach or strategy has been applied.
None of the works presents a comprehensive approach. They perform only a few management activities, mainly identification and repayment, and do not focus on the other management activities that may be useful for proper Test Debt management (e.g., measurement and prioritization). Besides that, the Test Debt target varies from test types Samartyam et al. (2017); Shah et al. (2014) and test process Sousa (2016) to how the testing is performed Wiklund et al. (2012).
With regard to the characteristic Test Debt management in a testing process, only the work of Sousa et al. (2016) proposes the management of Test Debts in an integrated manner with the testing process. Regarding the conduction of a case study, only the work of Samartyam et al. (2017) presents case studies applying the approaches presented for Test Debt management. Therefore, we have concluded that none of the studies cover all the characteristics presented or implement them only partially, different of our work that proposes the TestDCat 3.0, a comprehensive catalog to manage the Technical Debts related to both test subtypes and process. It proposed management activities based on the test process presented by ISO/ IEC 29119 and its effectiveness was measured through a case study performed with industry products.

Conclusion
Test Debts have a high impact on software quality. Therefore, they require the use of management activities to keep them visible and under control. However, most studies in the literature dealing with this type of Technical Debt concentrate in specific subtypes of Test Debt (e.g., automated tests, exploratory tests). Few studies present an overview of the subtypes and ways of managing them. In addition, there are not many articles focused on non-code-related Test Debts. Finally, despite the growing number of approaches that deal with Test Debts, there are few case studies in the industry that use these approaches, which makes it difficult to understand the real impact and cost of using these management approaches.
Aiming to address these gaps and support practitioners in the management of Test Debts, a catalog, called TestDCat, was created. The TestDCat is based on the results of an empirical study, on a review of the literature, and on the information gathered from semi-structured interviews conducted with professionals from industry.
TestDCat presents an overview of management activities, subtypes of Test Debts, and actions to assist in management activities. As an initial assessment, the catalog was presented to the participants of the first series of semi-structured interviews. They answered a survey regarding clarity, ease of use, and completeness. In the second evaluation, a focus group was organized to analyze in detail the actions of the catalog and to suggest changes and improvements. The results of these two evaluations were used to improve the evaluated version of the catalog and generate its latest version.
From the latest version (3.0) of the TestDCat, a case study was conducted in the GTF. It aimed to evaluate the application of TestDCat in real products. Five applications of the catalog were performed in two software products, and the collected results were analyzed. Two research questions were defined for the case study: (i) RQ1. The use of the catalog assists in the management of Test Debts?; and (ii) RQ2. How easy is TestDCat to use?. The results obtained from the case study presented evidence that the information organized in the catalog can support the management of Test Debts and has good usability. Thus, it may help the development and testing team to monitor the current debts and to take the necessary actions.
From the results of this work, the main future research directions are described as follows: -Improve the catalog by inserting more specifics for each proposed action (e.g., tools and steps taken to perform the action). These improvements can be identified when the practitioner/researcher uses in real situations the actions in their projects and documents all activities, artifacts and tools used to carry out the action. -In order to make the catalog more comprehensive and possible to use in several organizations, and to identify new Test Debts, it is interesting to expand the literature review carried out with a systematic review of Test Debts and their management. -There are different types and subtypes of Technical Debt, but some of them are related in some way (e.g., dependency). Thus, it is important to analyze how Test Debt subtypes relate to each other and how Test Debts can impact other types of Technical Debt. -In the current version, the use of the catalog is entirely manual; research can be done to study ways to automate some actions proposed in the catalog in order to facilitate its use. -Regarding the case study, it is important to apply the TestDCat in other organization that do testing activities. Also, metrics should be studied to perform the comparison between demands that use and do not use TestDCat.
With the realization of these future works, the catalog would be more complete and with more specific practices and tools to assist practitioners in managing their Test Debts.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Bruno Sabóia Aragão received the M.Sc in Computer Science from the Federal University of Ceara (UFC) and a Bachelor's degree in Computer Science at the University of Ceara (UECE). He is currently project manager at GREat (Software and Systems Engineering Network Group) at UFC. He has experience in projects using Scrum, Software Testing, Web Systems Development, and Mobile Applications. He is also Scrum Master Certified (Scrum Alliance).