华为算法精英实战营第三期 - 多租数据库中的缓冲区共享和预分配
举办方:华为技术有限公司
举办方:华为技术有限公司
Buffer Sharing and Pre-allocation for Multi-tenant Database
Time limit: 400 seconds
Memory limit: 1024 megabytes
Input: standard input
Output: standard output
Cloud computing technology allows enterprises to benefit from affordable, scalable, and secure managed database services with high availability and reliability. In cloud databases, multiple tenants may share a database instance. The multi-tenant database scheme provides them with mutually isolated database environments. In this way, each tenant can manage their data in their own database environment without affecting others.
The database uses buffer pool technology to speed up data access. When a tenant needs to access a certain data page, the database first checks whether the data page is in the buffer pool. If not, it reads it from the disk and stores it in the buffer pool. When the number of data pages in the buffer pool reaches its limitation, the eviction algorithm will evict some data pages so that new data pages can be loaded from the disk.
The business peak of tenants may have patterns, and we can pre-allocate buffer space for them. For example, the access volume of e-commerce services may suddenly increase during the promotion period; the access peak of some services usually occurs during the day, while the access peak of other services occurs at night.
We can abstract the access to the database as a sequence of
instructions. The access is concurrent, and there may be a number of instructions coming in per unit time. We assume that the number of tenants is
. In a multi-tenant environment, different instructions may come from different tenants. Therefore, each instruction
in the sequence can be regarded as a tuple of tenant
and page
,i.e.,
. The pages of tenants are independent of each other and are labeled separately. For example, instruction
and instruction
point to different pages.If the page
requested by instruction
already exists in the buffer pool, it is called a hit; otherwise, it is called a page fault. When a page fault occurs, the eviction algorithm selects a memory page in the buffer for eviction and uses the slot to store the new page
.
In a multi-tenant environment, all tenants share a large buffer pool with size, the maximum number of pages that can be stored in the buffer, and each tenant
occupies
of it. We should be able to respond to tenants' sudden request changes, dynamically adjust the amount of buffer used by each tenant and allocate enough buffer space for them before the business peak appears. When a page fault occurs, a page will be evicted from the buffer and replaced by the accessed page. In the cache replacement process, it is also necessary to ensure that the actual buffer size used by each tenant is not less than its minimum quota, that is,
. When the quota
of tenant
is less than or equa to
, other users are not allowed to occupy buffer location belonging to tenant
. Initially, each page in the buffer does not belong to any of the tenants. Note that evicting these initial pages still counts as evicting some other tenant.
Design a scheduling algorithm to implement buffer sharing and pre-allocation mechanisms. Maximize the overall multi-tenant usage experience in limited buffer space. The overall multi-tenant usage experience can be regarded as the gap between the number of page faults of each tenant and their basic expectation. We preferentially ensure the user experience of tenants with higher priority levels.
1. Input
The first line contains three integers N, Q, K . They respectively represent the number of tenants in the system, the total buffer size, and the maximum number of pre-allocated buffer locations per unit time.
The second line contains integers
, where
represents the priority level of tenant
.
The third line contains tuples, each consisting of two integers,
. Here,
denotes the minimum buffer size of tenant
, and
denotes the buffer size of tenant
for calculating the expected user experience, which will be explained in the “Scoring” section. It is guaranteed that
is less than or equal to
.
2. Interaction & Output
After your program reads the above input, the judge is ready to receive your output. You can output a series of integers in one line to specify which buffer locations are pre-allocated for which tenants. There is no number limit (but cannot exceed ) for this initial pre-allocation. If you choose not to pre-allocate any buffer, output a '0'; or if you choose to pre-allocate buffers, the format is
where '
' denotes the number of buffers for pre-allocation, and a tuple
denote the buffer location and the tenant id of the
-th buffer.
For example, output '4 1 1 2 1 3 2 4 2' denotes that there are 4 tuples , and
following, the first two tuples
and
pre-allocate buffer locations 1 and 2 for tenant 1, and the other two tuples
and
pre-allocate buffer locations 3 and 4 for tenant 2.
Then, you are going to respond to the tenants' access data. The judge will send you lines interactively, one line per time. For each line, the first integer is a status flag:
The scheduling algorithm must be online. It means the solution must print the answer to -th instruction before it gets the line with the
-th instruction. The very first instruction happens after the initial pre-allocation.
Variable constraints:
For correct interaction, print the end-of-line after each command and flush the output buffer:
Otherwise, your solution will get the "Idleness Limit Exceeded" outcome.
3. Scoring
We aim to evaluate the user experience of tenants, with a keen focus on tracking page faults and verifying the availability of allocated buffer locations. We introduce to represent the quantified experience of tenant
. More precisely,
models the loss of user experience. The smaller
the better experience.
For tenant , when a page fault occurs, we update
by:
where is the loss factor for allocating a buffer, and p is the penalty for your impromptu decision on which page to replace without pre-allocation. Specifically,
Assume that the loss of allocating buffer is unit time. When a buffer location is marked to be pre-allocated to a tenant, we mark
on the location. When the location is used by the tenant, we mark
. If a tenant directly replaces its own page or another's page without pre-allocation, it will take
unit time to finish the process, hence we can take
so that
. The initial pre-allocated buffer locations can be considered as being pre-allocated at time minus-infinity.
In this challenge, there are tests in total. And we set
,
, and
starts at 0 in each test.
In each test, we calculate the clipped ratio between , obtained by your algorithm, and the expected user experience
:
where is the tenant
's expected experience when using the database exclusively. It is equal to the number of page faults obtained by using the least recently used (LRU,https://en.wikipedia.org/wiki/Cache_replacement_policies#LRU) algorithm when the buffer size of tenant
is
.
Then the intermediate score in the
-th test can be expressed as follows:
where the square function is to severely penalize the scheduling that creates a loss of user experience more than twice that of the baseline.
For every test, you should try to minimize your as possible as you can. It will eventually be compared with the baseline cost,
, which is a positive number and is calculated using a baseline solution developed in advance.
Finally, the total score can be expressed as follows:
The function sets each test to 500 points. your cost is expected to be within 5 times the baseline, otherwise you will earn no score.
4. Example
There is no standard answer, here we only explain the meaning of the values in the example input and output.
The first line, '2 10 2', in the example input indecates that there are 2 tenants and a buffer with size 10. It is up to 2 buffer locations can be pre-allocated per unit time. The following two line is the tenants' priority level, and
which you can refer to the Input section. Then the interaction begin.
In example output, '4 1 1 2 1 6 2 7 2' denotes it initially pre-allocates 4 buffer locations, locations 1 and 2 for tenant 1, and locations 6 and 7 for tenant 2. Then it responses the interaction messages:
5. Note
You can submit multiple times, but up to once per minute and 20 times per day.
There are two sets of tests in this problem. For the duration of the competition, each submission is tested on the preliminary set of tests. When the competition is finished, for each contestant:
The contestants are then ranked according to their final score. The final tests are similar, but not identical, to the preliminary tests. The number of final tests may be larger, but their distribution is similar to the distribution of preliminary tests.
It is not recommended to use multiple algorithms to select the optimal solution, which may cause timeouts. It is also not recommended to use a specific solution for each testset, which may be eliminated in the final ranking.
6. Languages Support
Only support C++ language
C++ compiler version: g++ 7.3.0
The default C++ language dialect option is -std=c++17