cmt.base_tasks.preprocessing

Preprocessing tasks.

Class PreCounter

class cmt.base_tasks.preprocessing.PreCounter(*args, **kwargs)[source]

Bases: DatasetTask, LocalWorkflow, HTCondorWorkflow, SGEWorkflow, SplittedTask, RDFModuleTask

Performs a counting of the events with and without applying the necessary weights. Weights are read from the config file. In case they have to be computed, RDF modules can be run.

Example command:

law run PreCounter --version test  --config-name base_config --dataset-name ggf_sm --workflow htcondor --weights-file weight_file

Parameters:
  • weights_file (str) – filename inside cmt/config/ (w/o extension) with the RDF modules to run

  • systematic (str) – systematic to use for categorization.

  • systematic_direction (str) – systematic direction to use for categorization.

create_branch_map()[source]
Returns:

number of files for the selected dataset

Return type:

int

workflow_requires()[source]
requires()[source]

Each branch requires one input file

output()[source]
Returns:

One file per input file

Return type:

.json

get_weight(weights, syst_name, syst_direction, **kwargs)[source]

Obtains the product of all weights depending on the category/channel applied. Returns “1” if it’s a data sample.

Returns:

Product of all weights to be applied

Return type:

str

run()[source]

Creates one RDataFrame per input file, runs the desired RDFModules and counts the number of events w/ and w/o additional weights

Class PreCounterWrapper

class cmt.base_tasks.preprocessing.PreCounterWrapper(*args, **kwargs)[source]

Bases: DatasetSystWrapperTask

Wrapper task to run the PreCounter task over several datasets in parallel.

Example command:

law run PreCounterWrapper --version test  --config-name base_config --dataset-names tt_dl,tt_sl --PreCounter-weights-file weight_file --workers 2

Class PreprocessRDF

class cmt.base_tasks.preprocessing.PreprocessRDF(*args, **kwargs)[source]

Bases: PreCounter, DatasetTaskWithCategory

Performs the preprocessing step applying a preselection + running RDF modules

See requirements in PreCounter.

Example command:

law run PreprocessRDF --version test  --category-name base_selection --config-name base_config --dataset-name ggf_sm --workflow htcondor --modules-file modulesrdf --workers 10 --max-runtime 12h

Parameters:
  • modules_file (str) – filename inside cmt/config/ or “../config/” (w/o extension) with the RDF modules to run

  • keep_and_drop_file (str) – filename inside cmt/config/ or “../config/” (w/o extension) with the RDF columns to save in the output file

weights_file = None
output()[source]
Returns:

One file per input file with the tree + additional branches

Return type:

.root

run()[source]

Creates one RDataFrame per input file, applies a preselection and runs the desired RDFModules

Class PreprocessRDFWrapper

class cmt.base_tasks.preprocessing.PreprocessRDFWrapper(*args, **kwargs)[source]

Bases: DatasetCategorySystWrapperTask

Wrapper task to run the PreprocessRDF task over several datasets in parallel.

Example command:

law run PreprocessRDFWrapper --version test  --category-name base_selection --config-name ul_2018 --dataset-names tt_dl,tt_sl --PreprocessRDF-workflow htcondor --PreprocessRDF-max-runtime 48h --PreprocessRDF-modules-file modulesrdf  --workers 10

Class Categorization

class cmt.base_tasks.preprocessing.Categorization(*args, **kwargs)[source]

Bases: PreprocessRDF

Performs the categorization step running RDF modules and applying a post-selection

Example command:

law run Categorization --version test --category-name etau --config-name base_config --dataset-name tt_dl --workflow local --base-category-name base_selection --workers 10 --feature-modules-file features

Parameters:
  • base_category_name (str) – category name from the PreprocessRDF requirements.

  • feature_modules_file (str) – filename inside cmt/config/ or ../config/ (w/o extension) with the RDF modules to run

  • skip_preprocess (bool) – whether to skip the PreprocessRDF task

region_name = None
workflow_requires()[source]
requires()[source]

Each branch requires one input file

output()[source]
Returns:

One file per input file with the tree + additional branches

Return type:

.root

run()[source]

Creates one RDataFrame per input file, runs the desired RDFModules and applies a post-selection

Class CategorizationWrapper

class cmt.base_tasks.preprocessing.CategorizationWrapper(*args, **kwargs)[source]

Bases: DatasetCategorySystWrapperTask

Wrapper task to run the Categorization task over several datasets in parallel.

Example command:

law run CategorizationWrapper --version test --category-names etau --config-name base_config --dataset-names tt_dl,tt_sl --Categorization-workflow htcondor --workers 20 --Categorization-base-category-name base_selection

Class MergeCategorization

class cmt.base_tasks.preprocessing.MergeCategorization(*args, **kwargs)[source]

Bases: DatasetTaskWithCategory, ForestMerge

Merges the output from the Categorization or PreprocessRDF tasks in order to reduce the parallelization entering the plotting tasks. By default it merges into one output file, although a bigger number can be set with the merging parameter inside the dataset definition.

In simulated samples, hadd is used to perform the merging. In data samples, to avoid skipping events due to different branches between them, haddnano.py (safer but slower) is used instead. In any case, the use of one method or the other can be forced by specifying the parameters --force-hadd and --force-haddnano respectively.

Example command:

law run MergeCategorization --version test --category-name etau --config-name base_config --dataset-name tt_sl --workflow local --workers 4

Parameters:
  • from_preprocess (bool) – whether it merges the output from the PreprocessRDF task (True) or Categorization (False, default)

  • force_hadd (bool) – whether to force hadd as tool to do the merging.

  • force_haddnano (bool) – whether to force haddnano.py as tool to do the merging.

  • systematic (str) – systematic to use for categorization.

  • systematic_direction (str) – systematic direction to use for categorization.

region_name = None

Class MergeCategorizationWrapper

class cmt.base_tasks.preprocessing.MergeCategorizationWrapper(*args, **kwargs)[source]

Bases: DatasetCategorySystWrapperTask

Wrapper task to run the MergeCategorizationWrapper task over several datasets in parallel.

Example command:

law run MergeCategorizationWrapper --version test --category-names etau --config-name base_config --dataset-names tt_dl,tt_sl --workers 10

Class MergeCategorizationStats

class cmt.base_tasks.preprocessing.MergeCategorizationStats(*args, **kwargs)[source]

Bases: DatasetTask, ForestMerge

Merges the output from the PreCounter task in order to reduce the parallelization entering the plotting tasks.

Parameters:
  • systematic (str) – systematic to use for categorization.

  • systematic_direction (str) – systematic direction to use for categorization.

Example command:

law run MergeCategorizationStats --version test --config-name base_config --dataset-name dy_high --workers 10

Class MergeCategorizationStatsWrapper

class cmt.base_tasks.preprocessing.MergeCategorizationStatsWrapper(*args, **kwargs)[source]

Bases: DatasetSystWrapperTask

Wrapper task to run the MergeCategorizationStatsWrapper task over several datasets in parallel.

Example command:

law run MergeCategorizationStatsWrapper --version test --config-name base_config --dataset-names tt_dl,tt_sl --workers 10

Class EventCounterDAS

class cmt.base_tasks.preprocessing.EventCounterDAS(*args, **kwargs)[source]

Bases: DatasetTask

Performs a counting of the events with and without applying the necessary weights. Weights are read from the config file. In case they have to be computed, RDF modules can be run.

Example command:

law run EventCounterDAS --version test  --config-name base_config --dataset-name ggf_sm

Parameters:

use_secondary_dataset (bool) – whether to use the dataset included in the secondary_dataset parameter from the dataset instead of the actual dataset

requires()[source]

No requirements needed

output()[source]
Returns:

One file for the whole dataset

Return type:

.json

run()[source]

Asks for the numbers of events using dasgoclient and stores them in the output json file

Class EventCounterDASWrapper

class cmt.base_tasks.preprocessing.EventCounterDASWrapper(*args, **kwargs)[source]

Bases: DatasetSuperWrapperTask

Wrapper task to run the EventCounterDAS task over several datasets in parallel.

Example command:

law run EventCounterDASWrapper --version test  --config-name base_config --dataset-names tt_dl,tt_sl --workers 2