Dance Dance Gradation: A Generation of Fine-Tuned Dance Charts

. This paper proposes a system to automatically generate dance charts with (cid:12)ne-tuned diﬃculty levels: Dance Dance Gradation (DDG). The system learns the relationships between diﬃcult and easy charts based on the deep neural network using a dataset of dance charts with diﬀerent diﬃculty levels as the training data. The diﬃculty chart automatically would be adapted to easier charts through the learned model. As mixing multiple diﬃculty levels for the training data, the generated charts should have each characteristic of diﬃculty level. The user can obtain the charts with intermediate diﬃculty level between two diﬀer-ent levels. Through the objective evaluation and the discussions for the output results, it was suggested that the proposed system generated the charts with each characteristic of the diﬃculty level in the training dataset.


Introduction
There are so many genres of video games which are the popular entertainment in the world. Rhythm-based video games are one of the popular game genres. In most of the rhythm-based video games, players perform some actions corresponding to the displayed chart. Dance games, such as Dance Dance Revolution, are typical and popular rhythm-based video games all over the world. Playing dance games has attracted attention not only as a lot of entertainment but also fitness conditioning 1 .
There are varied types of players, beginners to experts, for the rhythm-based video games. So, multiple charts for each difficulty level are prepared for the same song. The game creators manually compose these multiple charts in general. The difficulty level of the charts is discretely composed, thus there is sometimes too much distance between the difficulty levels. Some of the players, especially middle class players, are not enough satisfied with the charts because the charts in the appropriate difficulty level for the player are not prepared. For example, if the easiest chart is too easy but the second-easiest chart is too difficult for a player, the player might require the intermediate difficulty level between those. In order to comply with such requests, the game creators have to compose an indefinitely large number of charts: that is impossible by the manual. We believe that automatic generation of fine-tuned charts has been demanded. Dance Dance Convolution (DDC) [2] is the system to compose dance charts automatically from audio tracks. In DDC, the big task of generating dance charts is divided into two subtasks; step-placement and step-selection tasks. DDC generates dance charts based on the relationships between acoustic features and the charts. The user can specify the difficulty level of the charts to be generated on five-level because the model in DDC has the one-hot vector for difficulty levels as the input. However, more detail tuning of difficulty levels cannot be supported in DDC. Moreover, it has been reported that the quality of the easier charts generated by DDC is not enough.
This paper proposes Dance Dance Gradation (DDG) that is a fine-tuning system for difficulty levels of dance charts. For the step-placement task, by blending the training dataset in different difficulty levels, DDG fine-tunes the difficulty levels of the generated charts. Fig. 1 shows the concepts of DDC and DDG. Machine learning acquires the relation model between input and output in the training dataset. That is to say, the output of the learned model should have the averaged characteristics of the training dataset. As inputting multiple difficulty levels as the training dataset, the generated chart should be the averaged difficulty level based on the mixing ration of difficulty levels. Then, we propose the method to determine the threshold for onset detection considering the mixing ration. Fig. 2. The concept of the beat layer. The note should belong to the lowest layer. For example, the last note in this figure is assumed as the step in the 4th layer, though it also belongs to the 8th and 16th layers.

Definition
Some definitions key to this paper are as follows. A set of timing obtained by dividing a bar into n equal parts (n > = 4) is defined as "nth beat layer." Fig. 2 shows the idea of the beat layer and the corresponding musical score. Let the lowest layer in the layers which the timing where a given step exists belongs to be the lth layer, the step should belong to lth layer and defined as "lth step" or "the step in the lth layer"; here, "the lowest" means least n. This expression is defined as "beat layer" in this paper. The "4th" and "8th" above the notes in Fig. 2 shows the beat layer of each note. This expression does not strictly equal to a quarter note but is common in rhythm-based video games. This custom might arise from the note length not being taken into the consideration for playing the rhythm-based games.
If some steps belong to the higher layer, the sequence of steps should be more difficult and complex. Conversely, the sequence of steps belonging to the lower layer is easy to be sensed. Difficult charts have steps belong to the higher layer and easy charts consist of the steps belonging to lower layers.
In this paper, we used ITG dataset that is also used in the existing paper [2] 2 . Table 1 shows statistics of the dataset. The multiple charts for different difficulty levels for a single audio track are contained in the dataset. Each chart is named as "Beginner (B)," "Easy (E)," "Medium (M)," "Hard (H)," and "Challenge (C)" in ascending order of difficulty.

Contribution
This paper offers the following contributions; -We propose the system for constructing dataset with data which have different characteristics for supervised learning. Training with this dataset, the learned model generates new data which reflects the strong point of each characteristic.
-We set the metric to evaluate how well-balanced output reflecting each characteristic. In this paper, we use this metric for determining the threshold to binarize the probability of step-placement.

Related Work
Procedural content generation (PCG) is a field of research for generating game contents automatically [7]. In PCG researches, the game is automatically generated depending on player's skill and behavior. Hastings et al. proposed Galactic Arms Race in which the weapon of the character is automatically evolved with player's behavior logs and preferences [3]. Pedersen et al. modeled the player's behaviors and proposed a method to generate suitable map for Super Mario Bros. [6]. This study can be categorized to one of those PCG researches with machine learning method. Dynamic Difficulty Adjustment is a field of research for dynamically changing the difficulty levels of games based on the skills and actions of the players [4]. Andrade et al. resolve the problem of adjusting the actions of the computercontrolled opponent in the real-time games with reinforcement learning [1]. In this paper, we manually set the dataset for the training. If the training data would be set based on the players' skills, the difficulty levels can be dynamically and automatically adjusted. How to model the skills of the players and how to apply the model would be our future work for Dynamic Difficulty Adjustment in rhythm-based games.
This paper can be categorized not only the entertainment computing but also music information retrieval researches. In the field of music information retrieval, adjusting the difficulty levels for musical instruments has been a popular topic for a long time. For instance, Yazawa et al. proposed a method to generate guitar tablature adapting with player's level from the audio track [8] and Nakamura and Sagayama proposed a method to reduce piano score with merged-output hidden Markov model [5]. Though such reduction methods would be powerful for some particular instruments, different rules and features should be prepared for each instrument. On the other hand, a method based on machine learning with difficult and easy charts can be applied to other instruments if we collect huge corresponding easy and difficult charts. For actual musical instruments, it is a hard task to collect a lot of scores in different difficulty level for the same song. However, in the field of the rhythm-based games, that is the target of this paper, we can easily obtain the adequate amount of charts in different difficulty levels for the same song. The proposed system captures this specific characteristic of the rhythm-based games.

The Proposed System
This paper proposes Dance Dance Gradation (DDG). DDG is a system with training dataset in multiple difficulty levels to learn fine-tuned difficulty levels of dance charts for the step-placement task. Only charts with a unified difficulty level are used as the training data, the model should generate charts for that level. On the other hand, using the charts with multiple difficulty levels as the training data, the model should generate the charts moderately reflecting each characteristic of the difficulty level stored in the dataset. For example, if both Easy and Medium charts are used as the training data in 50% and 50%, the generated charts would be harder than Easy but easier than Medium. That is, DDG gradationally tunes the difficulty level of dance charts by arranging the training dataset.
Note, we use the same step-selection model as DDC with no modification, the chart is generated through step-placement and step-selection tasks, though. The details of step-selection model can be found in the existing paper [2].

Constructing dataset for training characteristic of each difficulty level
In dance games, the characteristic of charts is different depending on difficulty levels. Through the objective analysis for ITG dataset, we confirmed that the following score features related to the step-placement task are different depending on difficulty levels;

Feature 1: Frequency of steps
The more the number of steps the higher the difficulty level is; Beginner charts have approximately 0.6 steps per second on average while Challenge have approximately 4.9 steps per second. Feature 2: Rhythm complexity The rhythm in the easy chart is more likely to be more simple than the difficult chart; over 95% of steps in Beginner and Easy charts exist on 4th layer. 33% of steps in Challenge charts belong to 8th and 22% belong to 16th, where steps in easier charts hardly belong to.

Model
We use an LSTM model to adapt the difficulty level of charts while improving the model for step-placement in DDC [2]. Fig. 3 shows the comparative image of the models for step-placement in DDC and this paper. In DDC, acoustic features obtained from the audio track with CNN and one-hot difficulty vector are used as the input for LSTM which estimates the probability that a step is placed in the chart. The proposed model uses the score features of the difficult chart at each time as the input instead of acoustic features and difficulty vector. The following three features are used as the score features based on the features mentioned in Section 3.1. The output layer is the sigmoid function, and the output would be continuous value in the range (0, 1). The target at each time is expressed in binary; a step exists at the time or not. The output value concern the probability that a step is placed at the time t: SP (t). Estimating SP (t) for all t in a given audio track, sequential data SP that is a sequence of step probabilities can be obtained.

Setting threshold for step-placement: especially for blended difficulty level
The set of the timing for step-placement can be obtained from the series of the probability for steps SP detailed in section 3.2. The number of steps dynamically influences the characteristics of charts as Feature 1 in section 3.1. Accordingly, the threshold for the step-placement should be appropriately determined.
The proposed system learns multiple difficulty levels, thus it is expected that the output charts should have the characteristics of all of the learned difficulty levels. Then, the characteristics of each difficulty level should be reflected in the output chart depending on the usage rate of each difficulty level DR. For example, we expect that the output charts should have the characteristics of Beginner and Easy as fifty-fifty if DR = (1, 1, 0, 0, 0), and the characteristics of the output charts should be mainly Beginner-like but a little bit a flavor of Medium if DR = (1, 0, 0.2, 0, 0). Based on this idea, we evaluate; Criteria 1 How much the output charts have the characteristics of each difficulty level used as the training data Criteria 2 How much the balance of Criteria 1 should be similar to DR We use F -score as Criteria 1 which can be calculated from the comparison between the ground truth and the output charts. As Criteria 2, we use the harmonic mean of F -score for each difficulty level that is weighted with DR. For all of the tracks in the validation data, the proposed system generates the charts while changing the threshold. And, the final threshold would be determined as the value that shows the best harmonic mean of F -score. The threshold is determined by the following procedures; 1. SP s for all of the tracks in the validation data are obtained. 2. All of the local maximum values are selected from SP s as the set of local maximum values. 3. The set of local maximum values is sorted in descending order of the value. 4. The threshold is determined by the stepwise approach as follows; f or n = 1 to (size of the set of local maximum) i). The nth local maximum is used as threshold n , the local maximums over than threshold n are detected. ii). Comparing the detected local maximums and the correct step-placement in each difficulty level, {F } n = {F -score B , F -score E , F -score M , F -score H , Fscore C } is calculated. iii). Calculate the weighted harmonic mean of {F } n as HM . The weight is equal to DR. 5. The threshold n that shows the highest HM is determined as the threshold.
For example, as learning a model for Beginner and Easy, let {F} a be as follows; {F } a = {0.5, 0.4, N aN, N aN, N aN

Experiments
We trained the model with eight patterns of DR to verify the effectiveness of DDG. Table 2 shows the list of model name and DR used for the training. All model used Challenge charts as the input. None of the eight models used neither Hard nor Challenge charts as output; the 4th and 5th elements of DR are 0. In the ITG dataset, there are 120 tracks with five kinds of the chart and 13 tracks with four kinds of the chart. We randomly divided the tracks in the dataset as 80% (107 tracks) for the training data, 10% (13 tracks) for the validation data and 10% (13 tracks) for the test data.

Training Methodology
The training methodology was same as Donahue et al. except for the termination condition. The target at each frame was the ground truth value. We calculated the updates using backpropagation through time with 100 steps of unrolling. The binary cross entropy is minimized by using stochastic gradient descent. All models were trained with batches of size 256. We applied 50% dropout following each LSTM (only in the input to output but not temporal directions) and fully connected layer. All examples before the first step in the output chart  Fig. 6 shows the charts for the track Queen of Light generated by six models. B100 model placed all steps at the beginning of the bar, and E100 model generated the sequence of 4th step. B100E100 model, which learned both Beginner and Easy charts, generated the sequence of 4th step with some quarter rests. It seemed that the chart generated by B100E100 was more difficult than the chart generated by B100 but easier than the one generated by E100. Three models using Medium generated the chart including some 8th steps (circled with dashed line). The frequency of 8th steps increased in order of B100E100M100, E100M100, and M100 model. It was suggested that the more Medium influenced the learning the harder the generated chart became. Fig. 6 shows the charts for the track Lemmings on the Run generated by B100, B100M50, B100M100, and M100 model. B100 model places all steps at the beginning of the bar as same as Fig. 5. The more the rate of Medium charts is increased the more the number of steps would be placed. B100M50 and B100M100 models did not place any 8th steps though M100 model placed several 8th steps. It was suggested that B100M50 and B100M100 models learned both characteristic of Beginner and Medium charts, and learn not to place 8th steps which were only seen in the Medium charts.

F -score
We calculated precision, recall, and F -score through all tracks in the test data to evaluate the effectiveness of each model. The charts for difficulty level used in the training were used as the ground truth for those metrics. For example, we calculated the metrics against Beginner and Easy charts for B100E100 model, which used charts of those level for the training.
It is expected the model which learned multiple difficulty levels generates the intermediate level charts. It means that the generated charts should cover all steps placed in easier ground truth and have extra steps. In other words, recall is more important than precision for easier ground truth. On the other hand, these charts do not have to cover all steps in harder ground truth but should not have extra steps. For harder ground truth, precision is more important than recall. Table 3 shows the results of the proposed models. From the results, it was suggested that B100M50, B100M100, B100E100M100, and E100M100 models satisfied those requirements; it seemed that those models could tune difficulty level. From these results, we confirmed that DDG could generate the charts with a fine-tuned difficulty level. However, B100E100 model achieved high recall for both Beginner and Easy charts and low precision; it seemed that this model placed more steps than expected as the threshold for this model was set as too low.

Conclusion
In this paper, we present new task of fine-tuning difficulty level for rhythm-based video games. To tackle this task, we proposed a system that trains a model with multiple difficulty levels: Dance Dance Gradation (DDG). For that system, we use the method to determine the threshold for nicely blending the characteristics of each learned difficulty level. Through the experiments, it was confirmed that DDG generated the appropriate charts of the averaged difficulty level reflecting on mixing ratio of learned difficulty levels.
Our future work is to use acoustic features as input combined with features of difficult charts. These features might help to generate more entertaining charts synchronized with the track. acknowledgement This paper was supported in part by JSPS Grant-in-Aid for Young Scientists (B) #16K21482.