All documents
Access the Model Testing service and choose Test Jobs tab, click button "Create New Job"
Model Source: Select where your model is hosted
You can choose between two model sources when configuring your setup:
A test suite is a collection of test cases designed to evaluate the performance, accuracy, and stability of a machine learning model.
Test suite: Select an evaluation benchmark or leader board task to auto-fill test settings
Support is currently limited to the Nejumi Leaderboard 3 - benchmarking Japanese LLMs on language skills and alignment, using diverse datasets for performance and safety evaluation.
Tasks: Choose more specific tasks of the test suite that you selected.
Nejumi Leaderboard 3 provides specific tasks such as Llm-jp-eval (jaster), JBBQ and JTruthfulQA. You can choose to view results for all tasks or select a single task
Hover over each option to view detailed information about the tasks
Parameters | Description | Default value |
---|---|---|
Log samples | Choose whether to save model’s inputs and outputs to review | True |
Max tokens | Set the maximum number of tokens that model can generate. | 1024 |
Few-shot | Set the number of few-shot examples to place in context. | 0 |
Temperature | Set the randomness level for model's output | 0.00 |
Repetition penalty | Adjust how much to penalize repeated tokens to encourage diverse output. | 1.00 |
Seed | Enter a random seed to ensure consistent results across runs. | 1308 |
Top-K | Limit token selection to top-k most likely options. | -1 |
Top-P | Set the probability threshold for selecting the next token. | 1.00 |
Select the GPU configuration for running your test job.
ft_[base model]_[timestamp]
You can manually run the job by clicking the Run button
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checbox-analytics | 11 months | |
cookielawinfo-checbox-functional | 11 months | |
cookielawinfo-checbox-others | 11 months | |
cookielawinfo-checkbox-necessary | 11 months | |
cookielawinfo-checkbox-performance | 11 months | |
viewed_cookie_policy | 11 months |