为了得到本网站最好的浏览效果,我们建议您使用Chrome浏览器。立即体验

华为算法精英实战营第三期 - 多租数据库中的缓冲区共享和预分配

已结束
本赛题面向多租数据库的缓冲区分配技术,对访问模型进行抽象,期望选手运用合理的替换算法动态调度各租户使用的缓冲区,并根据不同租户的业务特征,合理设计缓冲区预分配策略,从而提升整体服务质量。

举办方:华为技术有限公司

报名已截止

Buffer Sharing and Pre-allocation for Multi-tenant Database


Time limit: 400 seconds

Memory limit: 1024 megabytes

Input: standard input

Output: standard output


Cloud computing technology allows enterprises to benefit from affordable, scalable, and secure managed database services with high availability and reliability. In cloud databases, multiple tenants may share a database instance. The multi-tenant database scheme provides them with mutually isolated database environments. In this way, each tenant can manage their data in their own database environment without affecting others.


The database uses buffer pool technology to speed up data access. When a tenant needs to access a certain data page, the database first checks whether the data page is in the buffer pool. If not, it reads it from the disk and stores it in the buffer pool. When the number of data pages in the buffer pool reaches its limitation, the eviction algorithm will evict some data pages so that new data pages can be loaded from the disk.


The business peak of tenants may have patterns, and we can pre-allocate buffer space for them. For example, the access volume of e-commerce services may suddenly increase during the promotion period; the access peak of some services usually occurs during the day, while the access peak of other services occurs at night.


We can abstract the access to the database as a sequence of  instructions. The access is concurrent, and there may be a number of instructions coming in per unit time. We assume that the number of tenants is . In a multi-tenant environment, different instructions may come from different tenants. Therefore, each instruction  in the sequence can be regarded as a tuple of tenant  and page ,i.e.,. The pages of tenants are independent of each other and are labeled separately. For example, instruction and instruction  point to different pages.If the page requested by instruction already exists in the buffer pool, it is called a hit; otherwise, it is called a page fault. When a page fault occurs, the eviction algorithm selects a memory page in the buffer for eviction and uses the slot to store the new page .


In a multi-tenant environment, all tenants share a large buffer pool with size, the maximum number of pages that can be stored in the buffer, and each tenant occupies  of it. We should be able to respond to tenants' sudden request changes, dynamically adjust the amount of buffer used by each tenant and allocate enough buffer space for them before the business peak appears. When a page fault occurs, a page will be evicted from the buffer and replaced by the accessed page. In the cache replacement process, it is also necessary to ensure that the actual buffer size used by each tenant is not less than its minimum quota, that is, . When the quota  of tenant  is less than or equa to , other users are not allowed to occupy buffer location belonging to tenant . Initially, each page in the buffer does not belong to any of the tenants. Note that evicting these initial pages still counts as evicting some other tenant.


Design a scheduling algorithm to implement buffer sharing and pre-allocation mechanisms. Maximize the overall multi-tenant usage experience in limited buffer space. The overall multi-tenant usage experience can be regarded as the gap between the number of page faults of each tenant and their basic expectation. We preferentially ensure the user experience of tenants with higher priority levels.


1. Input


The first line contains three integers N, Q, K . They respectively represent the number of tenants in the system, the total buffer size, and the maximum number of pre-allocated buffer locations per unit time.


The second line contains  integers , where  represents the priority level of tenant .


The third line contains  tuples, each consisting of two integers, . Here,  denotes the minimum buffer size of tenant , and  denotes the buffer size of tenant  for calculating the expected user experience, which will be explained in the “Scoring” section. It is guaranteed that is less than or equal to.


2. Interaction & Output


After your program reads the above input, the judge is ready to receive your output. You can output a series of integers in one line to specify which buffer locations are pre-allocated for which tenants. There is no number limit (but cannot exceed ) for this initial pre-allocation. If you choose not to pre-allocate any buffer, output a '0'; or if you choose to pre-allocate buffers, the format is

 where '' denotes the number of buffers for pre-allocation, and a tuple denote the buffer location and the tenant id of the -th buffer.


For example, output '4 1 1 2 1 3 2 4 2' denotes that there are 4 tuples , and following, the first two tuples and pre-allocate buffer locations 1 and 2 for tenant 1, and the other two tuples and pre-allocate buffer locations 3 and 4 for tenant 2.


Then, you are going to respond to the tenants' access data. The judge will send you lines interactively, one line per time. For each line, the first integer is a status flag:

  1. If the flag is greater than or equal to '1', it denotes the current time, and will be followed by two integers that represent the access instruction . For each of the instructions , you should output an integer  that is either the existing or the newly assigned buffer location for page of tenant . The time is incremental (begins from 1) and the number of instructions per unit time will not exceed the bandwidth (not given).
  2. If the flag is '0', it occurs in the gap between two unit times, which denotes that you can send a pre-allocation instruction to the judge with the format described above. It is also optional and you can send up to K tuples to pre-allocate specific buffers. If you choose not to pre-allocate anything, it still has to print the line with "0" as the number of pre-allocations. If a buffer is already owned by a tenant with a page loaded, the content of the buffer will be cleared if you mark it as pre-allocated. 
  3. If the flag is '−1', it denotes the end of the interaction.


The scheduling algorithm must be online. It means the solution must print the answer to -th instruction before it gets the line with the -th instruction. The very first instruction happens after the initial pre-allocation.


Variable constraints:

  1. the number of tenants:
  2. the total buffer size:
  3. the maximum number of pre-allocated buffer locations per unit time:
  4. the maximum number of instructions per unit time:
  5. the priority level of tenant:
  6. the minimum and basic buffer size of tenant :
  7. the request tenant of instruction :
  8. the request page of instruction :


For correct interaction, print the end-of-line after each command and flush the output buffer:

  1. cout.flush() or fflush(stdout) for C++;

Otherwise, your solution will get the "Idleness Limit Exceeded" outcome.


3. Scoring


We aim to evaluate the user experience of tenants, with a keen focus on tracking page faults and verifying the availability of allocated buffer locations. We introduce to represent the quantified experience of tenant . More precisely,  models the loss of user experience. The smaller the better experience.


For tenant , when a page fault occurs, we update  by:

where  is the loss factor for allocating a buffer, and p is the penalty for your impromptu decision on which page to replace without pre-allocation. Specifically,


Assume that the loss of allocating buffer is unit time. When a buffer location is marked to be pre-allocated to a tenant, we mark on the location. When the location is used by the tenant, we mark . If a tenant directly replaces its own page or another's page without pre-allocation, it will take unit time to finish the process, hence we can take so that . The initial pre-allocated buffer locations can be considered as being pre-allocated at time minus-infinity.


In this challenge, there are tests in total. And we set , , and starts at 0 in each test.

In each test, we calculate the clipped ratio between , obtained by your algorithm, and the expected user experience :


where is the tenant's expected experience when using the database exclusively. It is equal to the number of page faults obtained by using the least recently used (LRU,https://en.wikipedia.org/wiki/Cache_replacement_policies#LRU) algorithm when the buffer size of tenant is .


Then the intermediate score in the -th test can be expressed as follows:

where the square function is to severely penalize the scheduling that creates a loss of user experience more than twice that of the baseline.


For every test, you should try to minimize your  as possible as you can. It will eventually be compared with the baseline cost, , which is a positive number and is calculated using a baseline solution developed in advance.


Finally, the total score can be expressed as follows:

The function sets each test to 500 points. your cost is expected to be within 5 times the baseline, otherwise you will earn no score.


4. Example



There is no standard answer, here we only explain the meaning of the values in the example input and output.

The first line, '2 10 2', in the example input indecates that there are 2 tenants and a buffer with size 10. It is up to 2 buffer locations can be pre-allocated per unit time. The following two line is the tenants' priority level, and which you can refer to the Input section. Then the interaction begin.

In example output, '4 1 1 2 1 6 2 7 2' denotes it initially pre-allocates 4 buffer locations, locations 1 and 2 for tenant 1, and locations 6 and 7 for tenant 2. Then it responses the interaction messages:



5. Note


You can submit multiple times, but up to once per minute and 20 times per day.


There are two sets of tests in this problem. For the duration of the competition, each submission is tested on the preliminary set of tests. When the competition is finished, for each contestant:

  1. the jury takes the latest submission with non-zero score on preliminary tests;
  2. this submission is tested on the final set of tests;
  3. the score on final tests is the final score for the contestant.

The contestants are then ranked according to their final score. The final tests are similar, but not identical, to the preliminary tests. The number of final tests may be larger, but their distribution is similar to the distribution of preliminary tests.


It is not recommended to use multiple algorithms to select the optimal solution, which may cause timeouts. It is also not recommended to use a specific solution for each testset, which may be eliminated in the final ranking.


6. Languages Support


Only support C++ language

C++ compiler version: g++ 7.3.0

The default C++ language dialect option is -std=c++17