Running Kilosort 1/2 and loading the results¶

KilosortDataset is a wrapper around the output of Kilosort or Kilosort2, which will load the output files back into Matlab for further analysis. Most of these fields are explained in detail in the Phy documentation but we document them here for convenience.

Running Kilosort¶

To run Kilosort or Kilosort2 on an ImecDataset:

npxutils.runKilosort1(imec, ...);

Or for Kilosort 2:

npxutils.runKilosort2(imec, ...);

By default, the standard configuration settings will be used. For Kilosort1, these are hardcoded based on configFiles/StandardConfig_MOVEME.m. For Kilosort2, the script configFiles/configFile384.m will be run to produce the ops struct, unless a different configuration file is set in the environment variable KILOSORT_CONFIG_FILE, which must be on the path. Default configuration settings can be overridden by passing in extra parameters, e.g.

npxutils.runKilosort1(imec, 'Th', [4 10], 'GPU', false);
npxutils.runKilosort2(imec, 'minfr_goodchannels', 0.1);

Loading Kilosort results¶

You can create a KilosortDataset instance by pointing at the folder containing the Kilosort output:

ks = npxutils.KilosortDataset(pathToKilosortOutput();
ks.load();

The constructor will optionally take an ‘imecDataset’ parameter providing the npxutils.ImecDataset instance if there is no .imec.ap.bin file in the Kilosort directory, and a ‘channelMap’ parameter in case the default is not correct. The results can then be loaded using ks.load().

The descriptions of each property can be found in the +Neuropixel/KilosortDataset.m code, copied here for convenience, originally described in the Phy documentation:

>> ks

KilosortDataset with properties:

                  path: '/data/kilosort/neuropixel_01'
           raw_dataset: [1×1 npxutils.ImecDataset]
            channelMap: [1×1 npxutils.ChannelMap]
                  fsAP: 30000
           apScaleToUv: 2.3438
                  meta: [1×1 struct]
              pathLeaf: 'neuropixel_01'
              isLoaded: 1
         hasRawDataset: 1
             nChannels: 371
               nSpikes: 8181228
             nClusters: 592
            nTemplates: 653
           nPCFeatures: 32
   nFeaturesPerChannel: 3
          syncBitNames: [16×1 string]
              dat_path: 'neuropixel_01.imec.ap.bin'
        n_channels_dat: 385
                 dtype: 'int16'
                offset: 0
           sample_rate: 30000
           hp_filtered: 0
            amplitudes: [nSpikes × 1 double]
           channel_ids: [nChannels × 1 uint32]
     channel_positions: [nChannels × 2 double]
           pc_features: [nSpikes × nFeaturesPerChannel × nPCFeatures single]
        pc_feature_ind: [nTemplates × nPCFeatures uint32]
     similar_templates: [nTemplates × nTemplates single]
       spike_templates: [nSpikes × 1 uint32]
           spike_times: [nSpikes × 1 uint64]
     template_features: [nSpikes × nPCFeatures single]
  template_feature_ind: [nTemplates × nPCFeatures uint32]
             templates: [nTemplates × nTimePoints × nChannels single]
         templates_ind: [nTemplates × nChannels double]
         whitening_mat: [nChannels × nChannels double]
     whitening_mat_inv: [nChannels × nChannels double]
        spike_clusters: [nSpikes × 1 uint32]
        cluster_groups: [nClusters × 1 categorical]
           cluster_ids: [nClusters × 1 uint32]
         clusters_good: [nClustersGood × 1 uint32]
          clusters_mua: [nClustersMUA × 1 uint32]
        clusters_noise: [nClustersNoise × 1 uint32]
     clusters_unsorted: [nClustersUnsorted × 1 uint32]

nChannels : number of channels used by Kilosort
nSpikes : number of spikes extracted
nClusters : number of unique clusters
nTemplates : number of spike templates
nPCFeatures number of spatiotemporal PC features used for templates
nFeaturesPerChannel : number of PC features used for each channel
amplitudes - [nSpikes] double vector with the amplitude scaling factor that was applied to the template when extracting that spike
channel_ids - [nChannels] uint32 vector with the channel ids used for sorting
channel_positions - [nChannels, 2] double matrix with each row giving the x and y coordinates of that channel. Together with the channel map, this determines how waveforms will be plotted in WaveformView (see below).
pc_features - [nSpikes, nFeaturesPerChannel, nPCFeatures] single matrix giving the PC values for each spike. The channels that those features came from are specified in pc_features_ind. E.g. the value at pc_features[123, 1, 5] is the projection of the 123rd spike onto the 1st PC on the channel given by pc_feature_ind[5].
pc_feature_ind - [nTemplates, nPCFeatures] uint32 matrix specifying which channels contribute to each entry in dim 3 of the pc_features matrix
similar_templates - [nTemplates, nTemplates] single matrix giving the similarity score (larger is more similar) between each pair of templates similar_templates(:, :) single
spike_templates - [nSpikes] uint32 vector specifying the identity of the template that was used to extract each spike
spike_times - [nSpikes] uint64 vector giving the spike time of each spike in samples. To convert to seconds, divide by sample_rate from params.py.
template_features - [nSpikes, nTempFeatures] single matrix giving the magnitude of the projection of each spike onto nTempFeatures other features. Which other features is specified in template_feature_ind.
template_feature_ind - [nTemplates, nTempFeatures] uint32 matrix specifying which templateFeatures are included in the template_features matrix.
templates - [nTemplates, nTimePoints, nTemplateChannels] single matrix giving the template shapes on the channels given in templates_ind
templates_ind - [nTemplates, nTempChannels] double matrix specifying the channels on which each template is defined. In the case of Kilosort templates_ind is just the integers from 0 to nChannels-1, since templates are defined on all channels.
whitening_mat - [nChannels, nChannels] double whitening matrix applied to the data during automatic spike sorting
whitening_mat_inv - [nChannels, nChannels]double, the inverse of the whitening matrix.
spike_clusters - [nSpikes] uint32 vector giving the cluster identity of each spike.
cluster_groups - [nClusters] categorical vector giving the “cluster group” of each cluster (noise, mua, good, unsorted)
cluster_ids - [nClusters] unique clusters in spike_clusters

Segmenting a Kilosort dataset into trials¶

ks.spike_times contains the times for each spike in samples from the beginning of the file, but there is a more useful representation for data collected with a trial structure: split the spikes into separate groups based on which trial they occurred in, and convert the times to milliseconds since the start of the trial.

TrialSegmentationInfo¶

In order to do this, you need to figure out where trials start and stop. You’ll need to write this code, since this will differ for each experimental setup. Essentially, you need to create a npxutils.TrialSegmentationInfo instance and populate its fields with the correct values:

tsi = npxutils.TrialSegmentationInfo(nTrials, fsAP);
tsi.idxStart = [list of start sample indices]
tsi.idxStop = [list of stop sample indices];
tsi.trialId = [list of trial ids];

Here is an example script that uses the sync channel to determine where trials begin and end. It expects one bit (named 'trialStart') to contain TTL pulses each time a trial starts, and another bit (named 'trialInfo') to contain ASCII-serialized bits of text occurring at the start of each trial. For example, the string id=1;c=2 would correspond to trialId=1 and conditionId=2. It also assumes that a trial ends when the next trial begins (or at the end of the file). Long trials can be subsequently truncated using tsi.truncateTrialsLongerThan(maxDurationSeconds).

function tsi = parseTrialInfoFromSync(syncRaw, fs, syncBitNames)
    % fs is in samples per second
    % parses the sync line of an neuropixel .imec.ap.bin data file
    % and produces a scalar TrialSegmentationInfo

    % parse the sync
    if isempty(syncBitNames)
        trialInfoBitNum = 1;
        trialStartBitNum = 2;
    else
        [tf, trialInfoBitNum] = ismember('trialInfo', syncBitNames);
        if ~tf, trialInfoBitNum = 1; end
        [tf, trialStartBitNum] = ismember('trialStart', syncBitNames);
        if ~tf, trialStartBitNum = 1; end
    end

    serialBit = bitget(syncRaw, trialInfoBitNum);
    trialStart = bitget(syncRaw, trialStartBitNum);

    % trials start when going high
    idxStart = find(diff(trialStart) == 1) + 1;
    nTrials = numel(idxStart);

    tsi = npxutils.TrialSegmentationInfo(nTrials, fs);

    samplesEachBit = round(fs / 1000); % each bit delivered per ms
    for iR = 1:nTrials
        if iR < nTrials
            idxNext = idxStart(iR+1) - 1;
        else
            idxNext = numel(serialBit);
        end
        bitsByTrial = uint8(serialBit(floor(samplesEachBit/2) + idxStart(iR) : samplesEachBit : idxNext));

        lastHigh = find(bitsByTrial, 1, 'last');
        lastHigh = ceil(lastHigh / 8) * 8;
        bitsByTrial = bitsByTrial(1:lastHigh);

        infoThis = parseInfoString(bitsToString(bitsByTrial));

        if isfield(infoThis, 'id')
            tsi.trialId(iR) = str2double(infoThis.id); %#ok<*AGROW>
        else
            tsi.trialId(iR) = NaN;
        end
        if isfield(infoThis, 'c')
            tsi.conditionId(iR) = str2double(infoThis.c);
        else
            tsi.conditionId(iR) = NaN;
        end

        tsi.idxStart(iR) = idxStart(iR);
        tsi.idxStop(iR) = idxNext;
    end

    function out = bitsToString(bits)
        nChar = numel(bits) / 8;
        assert(nChar == round(nChar), 'Bit length must be multiple of 8');
        out = blanks(nChar);
        for iC = 1:nChar
            idx = (1:8) + (iC-1)*8;
            out(iC)  = char(bin2dec(sprintf('%u', bits(idx))));
        end
    end

    function out = parseInfoString(str)
        keyval = regexp(str, '(?<key>\w+)=(?<value>[\d\.]+)', 'names');
        if isempty(keyval)
            warning('Could not parse info string "%s"', str);
            out = struct();
        else
            for i = 1:numel(keyval)
                out.(keyval(i).key) = keyval(i).value;
            end
        end
    end
end

KilosortTrialSegmentedDataset¶

Once you have the trial boundaries stored in your TrialSegmentationInfo instance, you can split the properties of the KilosortDataset into each trial, resulting in a npxutils.KilosortTrialSegmentedDataset instance. To facilitate merging this into another data structure later, you will need to specify the ultimate trialId order you want the KilosortTrialSegmentedDataset to have. For example, if you have a behavioral data structure, you can extract the list of trial ids from that so that your KilosortTrialSegmentedDataset will have a matching trial sequence.

trialIds = cat(1, behaviorStruct.trialId);

Any trials not found in the TrialSegmentationInfo will simply be empty in the KilosortTrialSegmentedDataset. If you simply want to preserve the trials in the order they are in tsi, you can simply use:

trialIds = tsi.trialIds;

You can then segment the KilosortDataset using:

>> seg = npxutils.KilosortTrialSegmentedDataset(ks, tsi, trial_ids)

KilosortTrialSegmentedDataset with properties:

                   dataset: [1×1 npxutils.KilosortDataset] % unsegmented KilosortDataset
                 trial_ids: [nTrials × 1 uint32] % trial ids
            trial_has_data: [nTrials × 1 logical] % indicator if trial found in tsi
               trial_start: [nTrials × 1 uint64] % nTrials start sample idx copied from tsi
                trial_stop: [nTrials × 1 uint64] % nTrials start sample idx copied from tsi
                 spike_idx: {nTrials × nClusters cell} % nTrials x nClusters lists of indices into ks.spike_times array for each spike
               cluster_ids: [nClusters × 1 uint32] % copied from ks
            cluster_groups: [nClusters × 1 categorical] % copied from ks
                      sync: {1×1 cell} %  segmented contents of sync channel
              syncBitNames: [16×1 string] % copied from ImecDataset
               raw_dataset: [1×1 npxutils.ImecDataset] % original ImecDataset copied from ks
                   nTrials: 1092 % number of trials as numel(trialIds)
           nTrialsHaveData: 1092 % number of trials with matching trialIds in tsi
                 nClusters: 592 % number of clusters (sorted units)
           nChannelsSorted: 385 % number of channels
               channel_ids: [nChannelsSorted × 1 unit32] % list of channel ids used for sorting
         trial_duration_ms: [nTrials × 1 double]
                      fsAP: 30000
                amplitudes: {nTrials × nClusters cell} % each of these will contain a vector with one entry for each spike from that cluster on that trial
               pc_features: {nTrials × nClusters cell}
               spike_times: {nTrials × nClusters cell} % raw sample times from ks
  spike_times_ms_rel_start: {nTrials × nClusters cell} % times from trial start in milliseconds
         template_features: {nTrials × nClusters cell}
           spike_templates: {nTrials × nClusters cell}