Running Kilosort 1 or 2 and loading the results

KilosortDataset is a wrapper around the output of Kilosort or Kilosort2, which will load the output files back into Matlab for further analysis. Most of these fields are explained in detail in the Phy documentation but we document them here for convenience.

Running Kilosort

To run Kilosort or Kilosort2 on an ImecDataset:

npxutils.runKilosort1(imec, ...);

Or for Kilosort 2:

npxutils.runKilosort2(imec, ...);

By default, the standard configuration settings will be used. For Kilosort1, these are hardcoded based on configFiles/StandardConfig_MOVEME.m. For Kilosort2, the script configFiles/configFile384.m will be run to produce the ops struct, unless a different configuration file is set in the environment variable KILOSORT_CONFIG_FILE, which must be on the path. Default configuration settings can be overridden by passing in extra parameters, e.g.:

npxutils.runKilosort1(imec, 'Th', [4 10], 'GPU', false);
npxutils.runKilosort2(imec, 'minfr_goodchannels', 0.1);

Loading Kilosort results

You can create a KilosortDataset instance by pointing at the folder containing the Kilosort output:

ks = npxutils.KilosortDataset(pathToKilosortOutput();
ks.load();

The constructor will optionally take an ‘imecDataset’ parameter providing the npxutils.ImecDataset instance if there is no .imec.ap.bin file in the Kilosort directory, and a ‘channelMap’ parameter in case the default is not correct. The results can then be loaded using ks.load().

The descriptions of each property can be found in the +Neuropixel/KilosortDataset.m code, copied here for convenience, originally described in the Phy documentation:

nChannels : number of channels used by Kilosort
nSpikes : number of spikes extracted
nClusters : number of unique clusters
nTemplates : number of spike templates
nPCFeatures number of spatiotemporal PC features used for templates
nFeaturesPerChannel : number of PC features used for each channel
amplitudes - [nSpikes] double vector with the amplitude scaling factor that was applied to the template when extracting that spike
channel_ids - [nChannels] uint32 vector with the channel ids used for sorting
channel_positions - [nChannels, 2] double matrix with each row giving the x and y coordinates of that channel. Together with the channel map, this determines how waveforms will be plotted in WaveformView (see below).
pc_features - [nSpikes, nFeaturesPerChannel, nPCFeatures] single matrix giving the PC values for each spike. The channels that those features came from are specified in pc_features_ind. E.g. the value at pc_features[123, 1, 5] is the projection of the 123rd spike onto the 1st PC on the channel given by pc_feature_ind[5].
pc_feature_ind - [nTemplates, nPCFeatures] uint32 matrix specifying which channels contribute to each entry in dim 3 of the pc_features matrix
similar_templates - [nTemplates, nTemplates] single matrix giving the similarity score (larger is more similar) between each pair of templates similar_templates(:, :) single
spike_templates - [nSpikes] uint32 vector specifying the identity of the template that was used to extract each spike
spike_times - [nSpikes] uint64 vector giving the spike time of each spike in samples. To convert to seconds, divide by sample_rate from params.py.
template_features - [nSpikes, nTempFeatures] single matrix giving the magnitude of the projection of each spike onto nTempFeatures other features. Which other features is specified in template_feature_ind.
template_feature_ind - [nTemplates, nTempFeatures] uint32 matrix specifying which templateFeatures are included in the template_features matrix.
templates - [nTemplates, nTimePoints, nTemplateChannels] single matrix giving the template shapes on the channels given in templates_ind
templates_ind - [nTemplates, nTempChannels] double matrix specifying the channels on which each template is defined. In the case of Kilosort templates_ind is just the integers from 0 to nChannels-1, since templates are defined on all channels.
whitening_mat - [nChannels, nChannels] double whitening matrix applied to the data during automatic spike sorting
whitening_mat_inv - [nChannels, nChannels]double, the inverse of the whitening matrix.
spike_clusters - [nSpikes] uint32 vector giving the cluster identity of each spike.
cluster_groups - [nClusters] categorical vector giving the “cluster group” of each cluster (noise, mua, good, unsorted)
cluster_ids - [nClusters] unique clusters in spike_clusters

Segmenting a Kilosort dataset into trials

ks.spike_times contains the times for each spike in samples from the beginning of the file, but there is a more useful representation for data collected with a trial structure: split the spikes into separate groups based on which trial they occurred in, and convert the times to milliseconds since the start of the trial.

TrialSegmentationInfo

In order to do this, you need to figure out where trials start and stop. You’ll need to write this code, since this will differ for each experimental setup. Essentially, you need to create a npxutils.TrialSegmentationInfo instance and populate its fields with the correct values:

tsi = npxutils.TrialSegmentationInfo(nTrials, fsAP);

tsi.idxStart = [list of start sample indices]

tsi.idxStop = [list of stop sample indices];

tsi.trialId = [list of trial ids];

Here is an example script that uses the sync channel to determine where trials begin and end. It expects one bit (named 'trialStart') to contain TTL pulses each time a trial starts, and another bit (named 'trialInfo') to contain ASCII-serialized bits of text occurring at the start of each trial. For example, the string id=1;c=2 would correspond to trialId=1 and conditionId=2. It also assumes that a trial ends when the next trial begins (or at the end of the file). Long trials can be subsequently truncated using tsi.truncateTrialsLongerThan(maxDurationSeconds).

function tsi = parseTrialInfoFromSync(syncRaw, fs, syncBitNames)

% fs is in samples per second

% parses the sync line of an neuropixel .imec.ap.bin data file

% and produces a scalar TrialSegmentationInfo

% parse the sync

if isempty(syncBitNames)

trialInfoBitNum = 1;

trialStartBitNum = 2;

else

[tf, trialInfoBitNum] = ismember('trialInfo', syncBitNames);

if ~tf, trialInfoBitNum = 1; end

[tf, trialStartBitNum] = ismember('trialStart', syncBitNames);

if ~tf, trialStartBitNum = 1; end

end

serialBit = bitget(syncRaw, trialInfoBitNum);

trialStart = bitget(syncRaw, trialStartBitNum);

% trials start when going high

idxStart = find(diff(trialStart) == 1) + 1;

nTrials = numel(idxStart);

tsi = npxutils.TrialSegmentationInfo(nTrials, fs);

samplesEachBit = round(fs / 1000); % each bit delivered per ms

for iR = 1:nTrials

if iR < nTrials

idxNext = idxStart(iR+1) - 1;

else

idxNext = numel(serialBit);

end

bitsByTrial = uint8(serialBit(floor(samplesEachBit/2) + idxStart(iR) : samplesEachBit : idxNext));

lastHigh = find(bitsByTrial, 1, 'last');

lastHigh = ceil(lastHigh / 8) * 8;

bitsByTrial = bitsByTrial(1:lastHigh);

infoThis = parseInfoString(bitsToString(bitsByTrial));

if isfield(infoThis, 'id')

tsi.trialId(iR) = str2double(infoThis.id); %#ok<*AGROW>

else

tsi.trialId(iR) = NaN;

end

if isfield(infoThis, 'c')

tsi.conditionId(iR) = str2double(infoThis.c);

else

tsi.conditionId(iR) = NaN;

end

tsi.idxStart(iR) = idxStart(iR);

tsi.idxStop(iR) = idxNext;

end

function out = bitsToString(bits)

nChar = numel(bits) / 8;

assert(nChar == round(nChar), 'Bit length must be multiple of 8');

out = blanks(nChar);

for iC = 1:nChar

idx = (1:8) + (iC-1)*8;

out(iC) = char(bin2dec(sprintf('%u', bits(idx))));

end

function out = parseInfoString(str)

keyval = regexp(str, '(?<key>\w+)=(?<value>[\d\.]+)', 'names');

if isempty(keyval)

warning('Could not parse info string "%s"', str);

out = struct();

else

for i = 1:numel(keyval)

out.(keyval(i).key) = keyval(i).value;

end

KilosortTrialSegmentedDataset

Once you have the trial boundaries stored in your TrialSegmentationInfo instance, you can split the properties of the KilosortDataset into each trial, resulting in a npxutils.KilosortTrialSegmentedDataset instance. To facilitate merging this into another data structure later, you will need to specify the ultimate trialId order you want the KilosortTrialSegmentedDataset to have. For example, if you have a behavioral data structure, you can extract the list of trial ids from that so that your KilosortTrialSegmentedDataset will have a matching trial sequence.

trialIds = cat(1, behaviorStruct.trialId);

Any trials not found in the TrialSegmentationInfo will simply be empty in the KilosortTrialSegmentedDataset. If you simply want to preserve the trials in the order they are in tsi, you can simply use:

trialIdsB = tsi.trialIds;

You can then segment the KilosortDataset using:

seg = npxutils.KilosortTrialSegmentedDataset(ks, tsi, trial_ids)