Commit d517d3f3 authored by Jussi Lindgren's avatar Jussi Lindgren
Browse files

Plugins: Added a simple Outlier Removal box

parent 86d6814c
/**
* \page BoxAlgorithm_OutlierRemoval Outlier Removal
__________________________________________________________________
Detailed description
__________________________________________________________________
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Description|
The outlier removal box discards extremal feature vectors. The user can specify the desired quantile limits [min,max].
The algorithm loops through the feature dimensions and computes range r(j)=[quantile(min),quantile(max)] for each dimension j.
If each feature j of example i is inside r(j), the example i is kept. Otherwise it is discarded. The box is intended to
be sent all the vectors of interest before being given the stimulation to start the removal.
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Description|
__________________________________________________________________
Inputs description
__________________________________________________________________
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Inputs|
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Inputs|
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Input1|
The stimulation to start the removal.
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Input1|
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Input2|
The feature vectors to prune.
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Input2|
__________________________________________________________________
Outputs description
__________________________________________________________________
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Outputs|
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Outputs|
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Output1|
The stimulation to announce that the removal is complete.
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Output1|
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Output2|
The kept feature vectors.
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Output2|
______________________________________________________
__________________________________________________________________
Settings description
__________________________________________________________________
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Settings|
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Settings|
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Setting1|
Lower quantile threshold. In [0,1].
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Setting1|
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Setting2|
Upper quantile threshold. In [0,1].
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Setting2|
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Setting3|
Stimulation to start the removal at and to pass out after.
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Setting3|
__________________________________________________________________
Examples description
__________________________________________________________________
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Examples|
Choice [0.02,0.95] truncates at 2% of the lowest feature values and at 95% of the highest feature values, per dimension.
If the quantile range is specified as [0,1], the box will pass out the original vector set.
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Examples|
__________________________________________________________________
Miscellaneous description
__________________________________________________________________
* |OVP_DocBegin_BoxAlgorithm_OutlierRemoval_Miscellaneous|
The box can be attempted to remove artifacts when training classifiers that are sensitive to extremal values, for example LDA. In band-power based Motor Imagery, eye blinks can cause really strong band powers, which can then bias the classifier training. With proper control of the upper quantile of this box, such examples can be pruned from the training set.
An intuitive way to think about the filtering made by the box is to imagine a hypercube (rectangle) in the data space. The boundaries of the cube correspond to the estimated quantiles. Each feature vector that is fully inside the cube is kept.
It may be difficult to choose meaningful quantile limits without looking at the feature values. The latter can be attempted with Signal Display. It is also possible to have outliers that are not in any way extremal. Such outliers can be wrongly placed in the feature space or have a wrong associated class label. This box cannot catch such problems.
* |OVP_DocEnd_BoxAlgorithm_OutlierRemoval_Miscellaneous|
*/
#include "ovpCBoxAlgorithmOutlierRemoval.h"
// #include <cstdio>
#include <openvibe/ovITimeArithmetics.h>
#include <algorithm>
#include <iterator>
// #include <sstream>
using namespace OpenViBE;
using namespace OpenViBE::Kernel;
using namespace OpenViBE::Plugins;
using namespace OpenViBEPlugins;
using namespace OpenViBEPlugins::Classification;
using namespace std;
boolean CBoxAlgorithmOutlierRemoval::initialize(void)
{
m_oStimulationDecoder.initialize(*this, 0);
m_oFeatureVectorDecoder.initialize(*this, 1);
m_oStimulationEncoder.initialize(*this, 0);
m_oFeatureVectorEncoder.initialize(*this, 1);
// get the quantile parameters
m_f64LowerQuantile = FSettingValueAutoCast(*this->getBoxAlgorithmContext(), 0);
m_f64UpperQuantile = FSettingValueAutoCast(*this->getBoxAlgorithmContext(), 1);
m_ui64Trigger = FSettingValueAutoCast(*this->getBoxAlgorithmContext(), 2);
m_f64LowerQuantile = std::min<float64>(std::max<float64>(m_f64LowerQuantile, 0.0), 1.0);
m_f64UpperQuantile = std::min<float64>(std::max<float64>(m_f64UpperQuantile, 0.0), 1.0);
m_ui64TriggerTime = -1LL;
return true;
}
boolean CBoxAlgorithmOutlierRemoval::uninitialize(void)
{
m_oFeatureVectorEncoder.uninitialize();
m_oStimulationEncoder.uninitialize();
m_oFeatureVectorDecoder.uninitialize();
m_oStimulationDecoder.uninitialize();
for(uint32 i=0;i<m_vDataset.size();i++)
{
delete m_vDataset[i].m_pFeatureVectorMatrix;
m_vDataset[i].m_pFeatureVectorMatrix = NULL;
}
m_vDataset.clear();
return true;
}
boolean CBoxAlgorithmOutlierRemoval::processInput(uint32 ui32InputIndex)
{
getBoxAlgorithmContext()->markAlgorithmAsReadyToProcess();
return true;
}
bool pairLess(std::pair<float64,uint32> a, std::pair<float64,uint32> b)
{
return a.first < b.first;
};
boolean CBoxAlgorithmOutlierRemoval::pruneSet(std::vector<SFeatureVector>& l_vPruned)
{
if(m_vDataset.size()==0)
{
// nothing to do, ok
return true;
}
const uint32 l_ui32DatasetSize = m_vDataset.size();
const uint32 l_ui32FeatureDims = m_vDataset[0].m_pFeatureVectorMatrix->getDimensionSize(0);
const uint32 l_ui32LowerIndex = static_cast<uint32>(m_f64LowerQuantile * l_ui32DatasetSize);
const uint32 l_ui32UpperIndex = static_cast<uint32>(m_f64UpperQuantile * l_ui32DatasetSize);
this->getLogManager() << LogLevel_Trace << "Examined dataset is [" << l_ui32DatasetSize << "x" << l_ui32FeatureDims << "].\n";
std::vector<uint32> l_vKeptIndexes;
l_vKeptIndexes.resize(l_ui32DatasetSize);
for(uint32 i=0;i<l_ui32DatasetSize;i++)
{
l_vKeptIndexes[i] = i;
}
std::vector< std::pair<float64,uint32> > l_vFeatureValues;
l_vFeatureValues.resize(l_ui32DatasetSize);
for(uint32 f=0;f<l_ui32FeatureDims;f++)
{
for(uint32 i=0;i<l_ui32DatasetSize;i++)
{
l_vFeatureValues[i] = std::pair<float64, uint32>(m_vDataset[i].m_pFeatureVectorMatrix->getBuffer()[f], i);
}
std::sort(l_vFeatureValues.begin(), l_vFeatureValues.end(), pairLess);
std::vector<uint32> l_vNewIndexes;
l_vNewIndexes.resize(l_ui32UpperIndex - l_ui32LowerIndex);
for(uint32 j=l_ui32LowerIndex,cnt=0;j<l_ui32UpperIndex;j++,cnt++)
{
l_vNewIndexes[cnt]=l_vFeatureValues[j].second;
}
this->getLogManager() << LogLevel_Trace << "For feature " << (f+1) << ", the retained range is [" << l_vFeatureValues[l_ui32LowerIndex].first
<< ", " << l_vFeatureValues[l_ui32UpperIndex-1].first << "]\n";
std::sort(l_vNewIndexes.begin(), l_vNewIndexes.end());
std::vector<uint32> l_vIntersection;
std::set_intersection(l_vNewIndexes.begin(), l_vNewIndexes.end(), l_vKeptIndexes.begin(), l_vKeptIndexes.end(), std::back_inserter(l_vIntersection));
l_vKeptIndexes = l_vIntersection;
this->getLogManager() << LogLevel_Debug << "After analyzing feat " << f << ", kept " << l_vKeptIndexes.size() << " examples.\n";
}
this->getLogManager() << LogLevel_Trace << "Kept " << static_cast<uint64>(l_vKeptIndexes.size())
<< " examples in total (" << (100.0 * l_vKeptIndexes.size() / static_cast<float64>(m_vDataset.size()))
<< "% of " << static_cast<uint64>(m_vDataset.size()) << ")\n";
l_vPruned.clear();
for(uint32 i=0;i<l_vKeptIndexes.size();i++)
{
l_vPruned.push_back(m_vDataset[l_vKeptIndexes[i]]);
}
return true;
}
boolean CBoxAlgorithmOutlierRemoval::process(void)
{
// IBox& l_rStaticBoxContext=this->getStaticBoxContext();
IBoxIO& l_rDynamicBoxContext=this->getDynamicBoxContext();
// Stimulations
for(uint32 i=0; i<l_rDynamicBoxContext.getInputChunkCount(0); i++)
{
m_oStimulationDecoder.decode(i);
if(m_oStimulationDecoder.isHeaderReceived())
{
m_oStimulationEncoder.encodeHeader();
l_rDynamicBoxContext.markOutputAsReadyToSend(0, l_rDynamicBoxContext.getInputChunkStartTime(0, i), l_rDynamicBoxContext.getInputChunkEndTime(0, i));
}
if(m_oStimulationDecoder.isBufferReceived())
{
const IStimulationSet *stimSet = m_oStimulationDecoder.getOutputStimulationSet();
for(uint32 s=0;s<stimSet->getStimulationCount();s++)
{
if(stimSet->getStimulationIdentifier(s) == m_ui64Trigger)
{
std::vector<SFeatureVector> l_vPruned;
if(!pruneSet(l_vPruned))
{
return false;
}
// encode
for(uint32 f=0;f<l_vPruned.size();f++)
{
OpenViBEToolkit::Tools::Matrix::copy(*m_oFeatureVectorEncoder.getInputMatrix(), *l_vPruned[f].m_pFeatureVectorMatrix);
m_oFeatureVectorEncoder.encodeBuffer();
l_rDynamicBoxContext.markOutputAsReadyToSend(1, l_vPruned[f].m_ui64StartTime, l_vPruned[f].m_ui64EndTime);
}
const uint64 l_ui64HalfSecondHack = ITimeArithmetics::secondsToTime(0.5);
m_ui64TriggerTime = stimSet->getStimulationDate(s) + l_ui64HalfSecondHack;
}
}
m_oStimulationEncoder.getInputStimulationSet()->clear();
if(m_ui64TriggerTime >= l_rDynamicBoxContext.getInputChunkStartTime(0, i) && m_ui64TriggerTime < l_rDynamicBoxContext.getInputChunkEndTime(0, i))
{
m_oStimulationEncoder.getInputStimulationSet()->appendStimulation(m_ui64Trigger, m_ui64TriggerTime, 0);
m_ui64TriggerTime = -1LL;
}
m_oStimulationEncoder.encodeBuffer();
l_rDynamicBoxContext.markOutputAsReadyToSend(0, l_rDynamicBoxContext.getInputChunkStartTime(0, i), l_rDynamicBoxContext.getInputChunkEndTime(0, i));
}
if(m_oStimulationDecoder.isEndReceived())
{
m_oStimulationEncoder.encodeEnd();
l_rDynamicBoxContext.markOutputAsReadyToSend(0, l_rDynamicBoxContext.getInputChunkStartTime(0, i), l_rDynamicBoxContext.getInputChunkEndTime(0, i));
}
}
// Feature vectors
for(uint32 i=0; i<l_rDynamicBoxContext.getInputChunkCount(1); i++)
{
m_oFeatureVectorDecoder.decode(i);
if(m_oFeatureVectorDecoder.isHeaderReceived())
{
OpenViBEToolkit::Tools::Matrix::copyDescription(*m_oFeatureVectorEncoder.getInputMatrix(), *m_oFeatureVectorDecoder.getOutputMatrix());
m_oFeatureVectorEncoder.encodeHeader();
l_rDynamicBoxContext.markOutputAsReadyToSend(1, l_rDynamicBoxContext.getInputChunkStartTime(1, i), l_rDynamicBoxContext.getInputChunkEndTime(1, i));
}
// pad feature to set
if(m_oFeatureVectorDecoder.isBufferReceived())
{
const IMatrix* pFeatureVectorMatrix = m_oFeatureVectorDecoder.getOutputMatrix();
CBoxAlgorithmOutlierRemoval::SFeatureVector l_oFeatureVector;
l_oFeatureVector.m_pFeatureVectorMatrix=new CMatrix();
l_oFeatureVector.m_ui64StartTime=l_rDynamicBoxContext.getInputChunkStartTime(0, i);
l_oFeatureVector.m_ui64EndTime=l_rDynamicBoxContext.getInputChunkEndTime(0, i);
OpenViBEToolkit::Tools::Matrix::copy(*l_oFeatureVector.m_pFeatureVectorMatrix, *pFeatureVectorMatrix);
m_vDataset.push_back(l_oFeatureVector);
}
if(m_oFeatureVectorDecoder.isEndReceived())
{
m_oFeatureVectorEncoder.encodeEnd();
l_rDynamicBoxContext.markOutputAsReadyToSend(1, l_rDynamicBoxContext.getInputChunkStartTime(1, i), l_rDynamicBoxContext.getInputChunkEndTime(1, i));
}
}
return true;
}
#ifndef __OpenViBEPlugins_BoxAlgorithm_OutlierRemoval_H__
#define __OpenViBEPlugins_BoxAlgorithm_OutlierRemoval_H__
#include "../ovp_defines.h"
#include <openvibe/ov_all.h>
#include <toolkit/ovtk_all.h>
#include <vector>
#include <map>
#define OVP_ClassId_BoxAlgorithm_OutlierRemovalDesc OpenViBE::CIdentifier(0x11DA1C24, 0x4C7A74C0)
#define OVP_ClassId_BoxAlgorithm_OutlierRemoval OpenViBE::CIdentifier(0x09E41B92, 0x4291B612)
namespace OpenViBEPlugins
{
namespace Classification
{
class CBoxAlgorithmOutlierRemoval : public OpenViBEToolkit::TBoxAlgorithm < OpenViBE::Plugins::IBoxAlgorithm >
{
public:
virtual void release(void) { delete this; }
virtual OpenViBE::boolean initialize(void);
virtual OpenViBE::boolean uninitialize(void);
virtual OpenViBE::boolean processInput(OpenViBE::uint32 ui32InputIndex);
virtual OpenViBE::boolean process(void);
_IsDerivedFromClass_Final_(OpenViBEToolkit::TBoxAlgorithm < OpenViBE::Plugins::IBoxAlgorithm >, OVP_ClassId_BoxAlgorithm_OutlierRemoval);
protected:
typedef struct
{
OpenViBE::CMatrix* m_pFeatureVectorMatrix;
OpenViBE::uint64 m_ui64StartTime;
OpenViBE::uint64 m_ui64EndTime;
} SFeatureVector;
OpenViBE::boolean pruneSet( std::vector<CBoxAlgorithmOutlierRemoval::SFeatureVector>& l_vPruned);
OpenViBEToolkit::TFeatureVectorDecoder< CBoxAlgorithmOutlierRemoval > m_oFeatureVectorDecoder;
OpenViBEToolkit::TStimulationDecoder< CBoxAlgorithmOutlierRemoval > m_oStimulationDecoder;
OpenViBEToolkit::TFeatureVectorEncoder< CBoxAlgorithmOutlierRemoval > m_oFeatureVectorEncoder;
OpenViBEToolkit::TStimulationEncoder< CBoxAlgorithmOutlierRemoval > m_oStimulationEncoder;
std::vector < CBoxAlgorithmOutlierRemoval::SFeatureVector > m_vDataset;
OpenViBE::float64 m_f64LowerQuantile;
OpenViBE::float64 m_f64UpperQuantile;
OpenViBE::uint64 m_ui64Trigger;
OpenViBE::uint64 m_ui64TriggerTime;
};
class CBoxAlgorithmOutlierRemovalDesc : public OpenViBE::Plugins::IBoxAlgorithmDesc
{
public:
virtual void release(void) { }
virtual OpenViBE::CString getName(void) const { return OpenViBE::CString("Outlier removal"); }
virtual OpenViBE::CString getAuthorName(void) const { return OpenViBE::CString("Jussi T. Lindgren"); }
virtual OpenViBE::CString getAuthorCompanyName(void) const { return OpenViBE::CString("Inria"); }
virtual OpenViBE::CString getShortDescription(void) const { return OpenViBE::CString("Discards feature vectors with extremal values"); }
virtual OpenViBE::CString getDetailedDescription(void) const { return OpenViBE::CString("Simple outlier removal based on quantile estimation"); }
virtual OpenViBE::CString getCategory(void) const { return OpenViBE::CString("Classification"); }
virtual OpenViBE::CString getVersion(void) const { return OpenViBE::CString("1.0"); }
virtual OpenViBE::CIdentifier getCreatedClass(void) const { return OVP_ClassId_BoxAlgorithm_OutlierRemoval; }
virtual OpenViBE::Plugins::IPluginObject* create(void) { return new OpenViBEPlugins::Classification::CBoxAlgorithmOutlierRemoval; }
virtual OpenViBE::CString getStockItemName(void) const { return "gtk-cut"; }
virtual OpenViBE::boolean getBoxPrototype(
OpenViBE::Kernel::IBoxProto& rBoxAlgorithmPrototype) const
{
rBoxAlgorithmPrototype.addInput("Input stimulations", OV_TypeId_Stimulations);
rBoxAlgorithmPrototype.addInput("Input features", OV_TypeId_FeatureVector);
rBoxAlgorithmPrototype.addOutput("Output stimulations", OV_TypeId_Stimulations);
rBoxAlgorithmPrototype.addOutput("Output features", OV_TypeId_FeatureVector);
rBoxAlgorithmPrototype.addSetting("Lower quantile", OV_TypeId_Float, "0.01");
rBoxAlgorithmPrototype.addSetting("Upper quantile", OV_TypeId_Float, "0.99");
rBoxAlgorithmPrototype.addSetting("Start trigger", OV_TypeId_Stimulation, "OVTK_StimulationId_Train");
return true;
}
_IsDerivedFromClass_Final_(OpenViBE::Plugins::IBoxAlgorithmDesc, OVP_ClassId_BoxAlgorithm_OutlierRemovalDesc);
};
};
};
#endif // __OpenViBEPlugins_BoxAlgorithm_OutlierRemoval_H__
......@@ -17,6 +17,8 @@
#include "box-algorithms/ovpCBoxAlgorithmClassifierTrainer.h"
#include "box-algorithms/ovpCBoxAlgorithmClassifierProcessor.h"
#include "box-algorithms/ovpCBoxAlgorithmOutlierRemoval.h"
#if defined TARGET_HAS_ThirdPartyEIGEN
#include "algorithms/ovpCAlgorithmConditionedCovariance.h"
#include "algorithms/ovpCAlgorithmClassifierLDA.h"
......@@ -96,6 +98,8 @@ OVP_Declare_Begin();
OpenViBEPlugins::Classification::registerAvailableDecisionEnumeration(OVP_ClassId_Algorithm_ClassifierMLP, OVP_ClassId_Algorithm_ClassifierMLP_DecisionAvailable);
#endif // TARGET_HAS_ThirdPartyEIGEN
OVP_Declare_New(OpenViBEPlugins::Classification::CBoxAlgorithmOutlierRemovalDesc);
OVP_Declare_End();
#include<cmath>
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment