Benchmark dataset of experimental results for single-pass stream-based active learning query algorithms

Version 1.1

Chacon, Guilherme Silva; Wainer, Jacques, 2026, "Benchmark dataset of experimental results for single-pass stream-based active learning query algorithms", https://doi.org/10.25824/redu/RGPRFD, Repositório de Dados de Pesquisa da Unicamp, V1, UNF:6:sl36naVV5qc2upYzbt888w== [fileUNF]

Learn about Data Citation Standards.

Contact Owner

Dataset Metrics

2 Downloads

Description	1. Overview This dataset contains the aggregated and structured results of a large-scale benchmark evaluating twelve single-pass stream-based active learning query strategies. This is the experimental results dataset for the master's dissertation: "A Quantitative and Comparative Analysis of Single-Pass Stream-Based Active Learning Query Algorithms". The experiments span: 82 datasets 5 machine learning models 12 stream-based query strategies 5 labeling budgets: 5%, 10%, 20%, 50%, and 100% 20,000+ experimental runs Each row represents a single experimental configuration, defined by: (dataset, model, hyperparameters, query strategy, labeling budget) This file is designed for statistical analysis, ranking, and comparative evaluation of strategies under constrained labeling scenarios. 2. File Structure Granularity: One row per experimental run Primary metric: Final model accuracy Evaluation setting: Single-pass stream-based active learning 3. Column Dictionary Below is the semantic definition of each column in the dataset. `dataset` Type: String Description: Dataset used in the experiment. Scope: 82 unique datasets. Purpose: Enables cross-dataset robustness analysis. `model_name` Type: String Description: Machine learning algorithm used. Scope: 5 model families. Purpose: Allows studying model–strategy interaction. `model_params` Type: String (serialized dictionary) Description: Hyperparameters used for the model. Example: {'C': 0.01} Recommendation: Parse into dictionary for reproducibility or hyperparameter grouping. `query_strategy` Type: String Description: Active learning strategy used in the stream. Scope: 12 strategies. Purpose: Main variable of interest for comparative evaluation. `budget` Type: Float Values: 0.05 0.10 0.20 0.50 1.00 Description: Fraction of instances allowed to be labeled. Interpretation: Controls labeling cost. `initial_score` Type: Float Description: Baseline performance before applying active learning. Purpose: Reference point for measuring improvement. `percentage_queried` Type: Float Description: Actual fraction of instances labeled. Note: May slightly differ from the defined budget due to stream dynamics. Reflects real labeling consumption. `final_accuracy` Type: Float Description: Final model performance after active learning. Metric: Classification accuracy. Primary evaluation metric. 4. Summary `experiment_results.csv` is a large-scale benchmark dataset for evaluating stream-based active learning strategies under varying labeling budgets. It supports: Cross-dataset comparisons Strategy ranking Budget sensitivity analysis Model–strategy interaction studies Efficiency and robustness evaluation The structure is analysis-ready and designed for statistical benchmarking and research publication purposes.
Subject	Computer and Information Science
Keyword	Machine learning, Active machine learning, Stream-based active learning, Quantitative analysis, Single-pass active learning, Online active machine learning
License/Data Use Agreement	CC BY-NC 4.0

	1 File
	experiment_results.tab Tabular Data - 3.2 MB Published Feb 25, 2026 2 Downloads 8 Variables, 20090 Observations UNF:6:sl36naVV5qc2upYzbt888w==	Access File ???file.accessBtn.header.access??? Public Download Options Comma Separated Values (Original File Format) Tab-Delimited RData ???file.accessBtn.header.metadata??? Variable Metadata Data File Citation EndNote XML RIS BibTeX

Citation Metadata

Persistent Identifier	doi:10.25824/redu/RGPRFD
Publication Date	2026-02-25
Title	Benchmark dataset of experimental results for single-pass stream-based active learning query algorithms
Author	Chacon, Guilherme Silva (Universidade Estadual de Campinas (UNICAMP). Instituto de Computação.) - ORCID: https://orcid.org/0009-0008-7652-9185 Wainer, Jacques (Universidade Estadual de Campinas (UNICAMP). Instituto de Computação.) - ORCID: https://orcid.org/0000-0001-5201-1244
Point of Contact	Use email button above to contact. Wainer, Jacques (Universidade Estadual de Campinas (UNICAMP). Instituto de Computação.)
Description	1. Overview This dataset contains the aggregated and structured results of a large-scale benchmark evaluating twelve single-pass stream-based active learning query strategies. This is the experimental results dataset for the master's dissertation: "A Quantitative and Comparative Analysis of Single-Pass Stream-Based Active Learning Query Algorithms". The experiments span: 82 datasets 5 machine learning models 12 stream-based query strategies 5 labeling budgets: 5%, 10%, 20%, 50%, and 100% 20,000+ experimental runs Each row represents a single experimental configuration, defined by: (dataset, model, hyperparameters, query strategy, labeling budget) This file is designed for statistical analysis, ranking, and comparative evaluation of strategies under constrained labeling scenarios. 2. File Structure Granularity: One row per experimental run Primary metric: Final model accuracy Evaluation setting: Single-pass stream-based active learning 3. Column Dictionary Below is the semantic definition of each column in the dataset. `dataset` Type: String Description: Dataset used in the experiment. Scope: 82 unique datasets. Purpose: Enables cross-dataset robustness analysis. `model_name` Type: String Description: Machine learning algorithm used. Scope: 5 model families. Purpose: Allows studying model–strategy interaction. `model_params` Type: String (serialized dictionary) Description: Hyperparameters used for the model. Example: {'C': 0.01} Recommendation: Parse into dictionary for reproducibility or hyperparameter grouping. `query_strategy` Type: String Description: Active learning strategy used in the stream. Scope: 12 strategies. Purpose: Main variable of interest for comparative evaluation. `budget` Type: Float Values: 0.05 0.10 0.20 0.50 1.00 Description: Fraction of instances allowed to be labeled. Interpretation: Controls labeling cost. `initial_score` Type: Float Description: Baseline performance before applying active learning. Purpose: Reference point for measuring improvement. `percentage_queried` Type: Float Description: Actual fraction of instances labeled. Note: May slightly differ from the defined budget due to stream dynamics. Reflects real labeling consumption. `final_accuracy` Type: Float Description: Final model performance after active learning. Metric: Classification accuracy. Primary evaluation metric. 4. Summary `experiment_results.csv` is a large-scale benchmark dataset for evaluating stream-based active learning strategies under varying labeling budgets. It supports: Cross-dataset comparisons Strategy ranking Budget sensitivity analysis Model–strategy interaction studies Efficiency and robustness evaluation The structure is analysis-ready and designed for statistical benchmarking and research publication purposes.
Subject	Computer and Information Science
Keyword	Machine learning (DLC (LCSH)) Active machine learning Stream-based active learning Quantitative analysis Single-pass active learning Online active machine learning
Funding Information	No Funder: 0000
Depositor	Chacon, Guilherme Silva
Deposit Date	2026-02-24
Declarações obrigatórias sobre ética e privacidade	o projeto que gerou os dados foi aprovado pelo Comite de Ética em Pesquisa da Unicamp ou não envolve questões que requeiram tal aprovação; os dados que serão depositados estão de acordo com a LGPD (Lei Geral de Proteção de Dados)

Dataset Terms

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

CC BY-NC 4.0

Dataset Version	Summary	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Request Access

Enable access request

You must enable request access or add terms of access to restrict file access.

Terms of Access for Restricted Files

Save Changes

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Restricted Files Selected

The selected file(s) may not be downloaded because you have not been granted access.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 953.7 MB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Log In to request access.

Dataset Terms

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

CC BY-NC 4.0

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://redu.unicamp.br/api/access/datafile/

Request Access

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

Compute Batch

Clear Batch

Dataset	Persistent Identifier	Change Compute Batch

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.2)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until Tecnológicas is published by its administrator.

Publish Dataset

This dataset cannot be published until Tecnológicas and Repositório de Dados de Pesquisa da Unicamp are published.

Return to Author

Return this dataset to contributor for modification.

Benchmark dataset of experimental results for single-pass stream-based active learning query algorithms

1. Overview

2. File Structure

3. Column Dictionary

`dataset`

`model_name`

`model_params`

`query_strategy`

`budget`

`initial_score`

`percentage_queried`

`final_accuracy`

4. Summary

1. Overview

2. File Structure

3. Column Dictionary

`dataset`

`model_name`

`model_params`

`query_strategy`

`budget`

`initial_score`

`percentage_queried`

`final_accuracy`

4. Summary