Running Kilosort 1 or 2 and loading the results
KilosortDataset is a wrapper around the output of Kilosort or Kilosort2, which will load the output files back into Matlab for further analysis. Most of these fields are explained in detail in the Phy documentation but we document them here for convenience. Running Kilosort
To run Kilosort or Kilosort2 on an ImecDataset:
Or for Kilosort 2:
By default, the standard configuration settings will be used. For Kilosort1, these are hardcoded based on configFiles/StandardConfig_MOVEME.m. For Kilosort2, the script configFiles/configFile384.m will be run to produce the ops struct, unless a different configuration file is set in the environment variable KILOSORT_CONFIG_FILE, which must be on the path. Default configuration settings can be overridden by passing in extra parameters, e.g.:
npxutils.runKilosort1(imec, 'Th', [4 10], 'GPU', false);
npxutils.runKilosort2(imec, 'minfr_goodchannels', 0.1);
Loading Kilosort results
You can create a KilosortDataset instance by pointing at the folder containing the Kilosort output:
ks = npxutils.KilosortDataset(pathToKilosortOutput();
The constructor will optionally take an ‘imecDataset’ parameter providing the npxutils.ImecDataset instance if there is no .imec.ap.bin file in the Kilosort directory, and a ‘channelMap’ parameter in case the default is not correct. The results can then be loaded using ks.load().
The descriptions of each property can be found in the +Neuropixel/KilosortDataset.m code, copied here for convenience, originally described in the Phy documentation: - nChannels : number of channels used by Kilosort
- nSpikes : number of spikes extracted
- nClusters : number of unique clusters
- nTemplates : number of spike templates
- nPCFeatures number of spatiotemporal PC features used for templates
- nFeaturesPerChannel : number of PC features used for each channel
- amplitudes - [nSpikes] double vector with the amplitude scaling factor that was applied to the template when extracting that spike
- channel_ids - [nChannels] uint32 vector with the channel ids used for sorting
- channel_positions - [nChannels, 2] double matrix with each row giving the x and y coordinates of that channel. Together with the channel map, this determines how waveforms will be plotted in WaveformView (see below).
- pc_features - [nSpikes, nFeaturesPerChannel, nPCFeatures] single matrix giving the PC values for each spike. The channels that those features came from are specified in pc_features_ind. E.g. the value at pc_features[123, 1, 5] is the projection of the 123rd spike onto the 1st PC on the channel given by pc_feature_ind[5].
- pc_feature_ind - [nTemplates, nPCFeatures] uint32 matrix specifying which channels contribute to each entry in dim 3 of the pc_features matrix
- similar_templates - [nTemplates, nTemplates] single matrix giving the similarity score (larger is more similar) between each pair of templates similar_templates(:, :) single
- spike_templates - [nSpikes] uint32 vector specifying the identity of the template that was used to extract each spike
- spike_times - [nSpikes] uint64 vector giving the spike time of each spike in samples. To convert to seconds, divide by sample_rate from params.py.
- template_features - [nSpikes, nTempFeatures] single matrix giving the magnitude of the projection of each spike onto nTempFeatures other features. Which other features is specified in template_feature_ind.
- template_feature_ind - [nTemplates, nTempFeatures] uint32 matrix specifying which templateFeatures are included in the template_features matrix.
- templates - [nTemplates, nTimePoints, nTemplateChannels] single matrix giving the template shapes on the channels given in templates_ind
- templates_ind - [nTemplates, nTempChannels] double matrix specifying the channels on which each template is defined. In the case of Kilosort templates_ind is just the integers from 0 to nChannels-1, since templates are defined on all channels.
- whitening_mat - [nChannels, nChannels] double whitening matrix applied to the data during automatic spike sorting
- whitening_mat_inv - [nChannels, nChannels]double, the inverse of the whitening matrix.
- spike_clusters - [nSpikes] uint32 vector giving the cluster identity of each spike.
- cluster_groups - [nClusters] categorical vector giving the “cluster group” of each cluster (noise, mua, good, unsorted)
- cluster_ids - [nClusters] unique clusters in spike_clusters
Segmenting a Kilosort dataset into trials
ks.spike_times contains the times for each spike in samples from the beginning of the file, but there is a more useful representation for data collected with a trial structure: split the spikes into separate groups based on which trial they occurred in, and convert the times to milliseconds since the start of the trial.
TrialSegmentationInfo
In order to do this, you need to figure out where trials start and stop. You’ll need to write this code, since this will differ for each experimental setup. Essentially, you need to create a npxutils.TrialSegmentationInfo instance and populate its fields with the correct values:
Here is an example script that uses the sync channel to determine where trials begin and end. It expects one bit (named 'trialStart') to contain TTL pulses each time a trial starts, and another bit (named 'trialInfo') to contain ASCII-serialized bits of text occurring at the start of each trial. For example, the string id=1;c=2 would correspond to trialId=1 and conditionId=2. It also assumes that a trial ends when the next trial begins (or at the end of the file). Long trials can be subsequently truncated using tsi.truncateTrialsLongerThan(maxDurationSeconds).
function tsi = parseTrialInfoFromSync(syncRaw, fs, syncBitNames)
% fs is in samples per second
% parses the sync line of an neuropixel .imec.ap.bin data file
% and produces a scalar TrialSegmentationInfo
% parse the sync
if isempty(syncBitNames)
trialInfoBitNum = 1;
trialStartBitNum = 2;
else
[tf, trialInfoBitNum] = ismember('trialInfo', syncBitNames);
if ~tf, trialInfoBitNum = 1; end
[tf, trialStartBitNum] = ismember('trialStart', syncBitNames);
if ~tf, trialStartBitNum = 1; end
end
serialBit = bitget(syncRaw, trialInfoBitNum);
trialStart = bitget(syncRaw, trialStartBitNum);
% trials start when going high
idxStart = find(diff(trialStart) == 1) + 1;
nTrials = numel(idxStart);
tsi = npxutils.TrialSegmentationInfo(nTrials, fs);
samplesEachBit = round(fs / 1000); % each bit delivered per ms
for iR = 1:nTrials
if iR < nTrials
idxNext = idxStart(iR+1) - 1;
else
idxNext = numel(serialBit);
end
bitsByTrial = uint8(serialBit(floor(samplesEachBit/2) + idxStart(iR) : samplesEachBit : idxNext));
lastHigh = find(bitsByTrial, 1, 'last');
lastHigh = ceil(lastHigh / 8) * 8;
bitsByTrial = bitsByTrial(1:lastHigh);
infoThis = parseInfoString(bitsToString(bitsByTrial));
if isfield(infoThis, 'id')
tsi.trialId(iR) = str2double(infoThis.id); %#ok<*AGROW>
else
tsi.trialId(iR) = NaN;
end
if isfield(infoThis, 'c')
tsi.conditionId(iR) = str2double(infoThis.c);
else
tsi.conditionId(iR) = NaN;
end
tsi.idxStart(iR) = idxStart(iR);
tsi.idxStop(iR) = idxNext;
end
function out = bitsToString(bits)
nChar = numel(bits) / 8;
assert(nChar == round(nChar), 'Bit length must be multiple of 8');
out = blanks(nChar);
for iC = 1:nChar
idx = (1:8) + (iC-1)*8;
out(iC) = char(bin2dec(sprintf('%u', bits(idx))));
end
end
function out = parseInfoString(str)
keyval = regexp(str, '(?<key>\w+)=(?<value>[\d\.]+)', 'names');
if isempty(keyval)
warning('Could not parse info string "%s"', str);
out = struct();
else
for i = 1:numel(keyval)
out.(keyval(i).key) = keyval(i).value;
end
end
end
end
KilosortTrialSegmentedDataset
Once you have the trial boundaries stored in your TrialSegmentationInfo instance, you can split the properties of the KilosortDataset into each trial, resulting in a npxutils.KilosortTrialSegmentedDataset instance. To facilitate merging this into another data structure later, you will need to specify the ultimate trialId order you want the KilosortTrialSegmentedDataset to have. For example, if you have a behavioral data structure, you can extract the list of trial ids from that so that your KilosortTrialSegmentedDataset will have a matching trial sequence.
trialIds = cat(1, behaviorStruct.trialId);
Any trials not found in the TrialSegmentationInfo will simply be empty in the KilosortTrialSegmentedDataset. If you simply want to preserve the trials in the order they are in tsi, you can simply use:
trialIdsB = tsi.trialIds;
You can then segment the KilosortDataset using:
seg = npxutils.KilosortTrialSegmentedDataset(ks, tsi, trial_ids)