Digital Turbine's Multi-Testing mechanism allows publishers to try out different variants within the same placement, and attempt to achieve the best returns for the placement's configurations. Once you have analyzed the results of your test, you can then decide to either increase or decrease the percentage of traffic it receives, and even deprecate it.
The main benefit of Multi-Testing is that it allows you to test different placement configurations on just a subset of your users, which enables you to understand the effect of the change, prior to rolling it out as part of your entire traffic.
What Can be Tested?
Using Multi-Testing, you can test:
- New networks: introducing a new network to your waterfall and measuring the impact of a new network instance to the waterfall
- Auto vs. Fixed CPM: comparing a waterfall with instances with Auto CPM vs the same waterfall with Fixed CPM instances
- Bidding mediation: adding bidding mediated instances vs. traditional mediation instances
- Price floors: sending different price floors to DT Exchange
- Ad amount: how the number of ads each user is receiving impacts performance
- Ad frequency: how the frequency of ads each user is receiving impacts performance
- Geographic targeting: comparing different country targeting options
- Banner refresh rate: test different rates starting at 10s
The key components of a Multi-Testing experiment include:
|Experiment||The test runs by showing different configurations to separate user groups|
|Variant||The number of user groups that will be exposed experiment configurations, and the distribution of users to those groups. One variant is always the control group|
|Frequency||The percentage of traffic to be allocated to each variant, at user level|
|Control Variant||A control variant or control group receives the current or default configurations of your placement. In other words, the control variant does not receive any changes and is used as a benchmark against which other test results are measured|
|Goal||The metric that you want to improve with the experiment. An experiment compares the goal metric across the variant groups so that you can see which configurations had the most desirable effect|
For an accurate split of traffic, make sure you are using DT FairBid SDK 3.1.0 and above.
Step 1: Pre-Planning
Ask the following questions regarding the purpose of the test:
What do you want to test?
Example: introducing another network to my waterfall for a rewarded placement in my app, called "Rewarded iOS on launch"
What is the experiment goal?
In other words, which metric are you optimising? Example: I'm optimising ARPDEU, which means average revenue per daily engaged user (an engaged user is a user who had at least 1 impression per that day). I want to make sure I make the most revenue from each unique impression
How long should the experiment run?
Example: My placement has 20K impressions and 10K unique impressions per day. This means it would take about a week to get sufficient data for effective results.
What is the “actionable” value below which you keep the control configurations and above which you move to configure like the test variants?
Example: If the test variant performs better than the control variant for at least 5% or more, I will use the test variant settings for my entire placement
Step 2: Configuring a Multi-Testing Experiment
- In the DT Console, go to the Placement Setup page
- Move the toggle to start the test
The only item on the list of variants is the current configuration of your placement, its name is derived from the name of your placement. This variant uses your existing placement settings and should be used as the control group.
Edit the name to make it more descriptive (you can always rename later). We recommend the following structure:
- The word: Control
- Experiment name
- Traffic Allocation percentage
- Experiment Start Date
Example: "Control Adding Verizon 80% 06 30 23"
- Using the control variant, you can either:
- Duplicate your existing variant and edit it (recommended); or
- Create a new variant from scratch, by clicking Add Variant
- Give the duplicate variant a name. Preferably a descriptive name, that follows the same structure:
- The word: Test or Treatment
- Experiment name
- Frequency of traffic allocation percentage (to equal 100%)
- Experiment Start Date
Example: "Test Adding Verizon 20% 06 21 23"
- An estimate is provided for how long it is recommended to run the test for. The test needs to run for enough time to gain statistically significant data and provide valuable results for the test.
The number of recommended days relies on the placement ARPDEU and the traffic allocation set. The recommended duration is meant to guarantee (with a 90% confidence level) that the expected results of the multi-test are reliable. Consider any unusual days in terms of your users' behavior, such as a holiday or special selling day, that may occur during the multi-test period and affect your results. We recommend adding an additional day to the number of recommended days above.
- Click on Variant Setup to edit the configurations to the ones you wish to test.
To test whether adding a network instance helps to increase your Average Revenue Per Engaged User, add a mediated network instance to your waterfall.
- The experiment is set, and it will start generating data.
To create an A/B/C Tests, follow the same steps and add an additional variant, making sure that your traffic allocation equals 100%.
After an experiment has started, we advise that you no longer make changes to variant configurations to avoid invalidation of the test results.
Ending a Multi-Testing Experiment
- Choose which variant configurations you want to persist as the placement configuration
- Accordingly, click the Multi-Testing toggle button to turn the status to off, for the variant that you want to stop
Results for the Multi-Testing are found using DT's Dynamic Reports.
- Go to Dynamic Reports
- Select App Performance
- Filter by: Publisher > App > Placement
- Split the data using the Variant Name as a dimension
- Add further dimensions relevant to your testing, such as Publisher Name, App Name, and Placement Name
- Add the metrics that you wanted to measure through your test: Avg. Rev. per Engaged User, Fill Rate
- Compare your variants to see which one performs better
|Avg. Rev. per Engaged User||Average Revenue (publisher payout) Per Daily Engaged User; Engaged Users are users that saw at least 1 ad from the particular ad placement being analyzed|
|Publisher Revenue||This metric depends on the allocation of the impressions. To perform a comparison, it should be normalized|
|Fill Rate||Fill rate is calculated by dividing the number of times an ad request is filled by an ad network (percentage)|
To normalize the Publisher Payout:
|Variant||Avg eCPM||Publisher Payout|
|Test Adding Verizon 20% 06 30 20||1.5||1000|
|Control Adding Verizon 80% 06 30 20||1.2||1500|
Add a column to calculate how much the revenue would have been, had the variants been on the same part of the traffic (for example, each of them on 50%):
|Variant||Avg eCPM||Publisher Payout||Actual Frequency||Normalized Payout (Assuming 50% Allocation)|
|Control Adding Verizon 80% 06 30 20||1.2||1500||80%||937.5|
|Test Adding Verizon 20% 06 30 20||1.5||1000||20%||1250 Winner|
You can choose to split the impressions between the 2 variants for several days and explore the results, or use the test version as the main configuration of the placement.
Multi-Testing Best Practices
- Run one test per placement at a time
The benefit of DT FairBid's Multi-Testing is that it allows you to run separate tests on each placement simultaneously. With sufficient traffic volume, you can safely run several experiments for each app.
Although technically you can run several experiments on one placement at the same time, DT recommends that you stick to one experiment per placement at a time. The reason behind this is that different experiments may interact with other, and some users will see the configurations for both experiments. This will affect the results of the experiment.
For example, if you are running a test to see the impact of introducing a new network to your interstitial placement waterfall, and then you choose to run a different test to see ad pacing on the same placement, there will be a difficulty in understanding which experiment caused which changes in the results.
- Pay attention to your test sample sizes
If you do not perform your test on enough DEUs (Daily Engaged User), the results received are likely to be unreliable.
If your test sample is too small, the results of the test may not be accurate. For example, if one variant receives 150% more ARPDEU, but the DEU was only 5, then the results will be statistically insignificant.