Skip to content

Latest commit

 

History

History
485 lines (448 loc) · 9.4 KB

reading_files2.md

File metadata and controls

485 lines (448 loc) · 9.4 KB

Reading data from files

Import Pandas

The following lines of code from assignment 3 imports the Pandas library. Pandas contains functions needed to read a csv file

import pandas as pd

Reading the file into a pandas DataFrame

Input the data from subject 1 into a DataFrame using the following line of code

df = pd.read_csv('s01/s01.txt', sep='\t')

Displaying the DataFrame

Subject 1's data is now held in the DataFrame df. A portion of df is shown below.

df.tail(5)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id year month day hour minute gender age handedness wait block trial target_location target flankers rt response error pre_target_response ITI_response target_on_error
187 1 2015 5 22 11 30 m 25 r 1.627 5 28 left white congruent 0.349 white True False False 0.024
188 1 2015 5 22 11 30 m 25 r 1.627 5 29 right white congruent 0.371 white True False False 0.023
189 1 2015 5 22 11 30 m 25 r 1.627 5 30 up black incongruent 0.549 black True False False 0.023
190 1 2015 5 22 11 30 m 25 r 1.627 5 31 left white neutral 0.463 white True False False 0.023
191 1 2015 5 22 11 30 m 25 r 1.627 5 32 right black neutral 0.430 black True False False 0.023

The code above shows the last 5 values of df. If we were to just use df.tail(), it would show us the last 10 values of the DataFrame.

Reading Multiple Files

Import and use glob

import glob
sub_files = glob.glob('s??/s??.txt')

Here, I am importing pythons glob package and using it to list all of subjects .txt files. All subjects have a file starting with s followed by a two digit ID number. I used '??' as the id number to find all of subjects files and their corresponding .txt file.

Reading the .txt files

To read each participants file I have used list comprehension to include a for loop within my list. The code will loop through and add each subjects file to a list called sub_data. This produces a list of individual DataFrames; one DataFrame for each subject. To put all of the subjects data into one DataFrame, I used pd.concat() to concatenate all of the participants data files.

sub_data = [pd.read_csv(file, sep='\t') for file in sub_files]
df = pd.concat(sub_data)

The complete DataFrame

Next I used df.reset_index() to ensure all of the trials had unique index numbers and then printed a random sample of 8 values from df

import glob
subFiles = glob.glob('s**/s**.txt')
subData = [pd.read_csv(file, sep='\t') for file in subFiles]
df = pd.concat(subData)
df = df.reset_index()
df.sample(8)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
index id year month day hour minute gender age handedness wait block trial target_location target flankers rt response error pre_target_response ITI_response target_on_error
313 121 2 2015 5 25 14 36 f 21 r 12.508 3 26 right white congruent 0.409 white True False False 0.024
365 173 2 2015 5 25 14 36 f 21 r 3.096 5 14 left black neutral 0.429 black True False False 0.024
334 142 2 2015 5 25 14 36 f 21 r 2.156 4 15 down black congruent 0.460 black True False False 0.024
235 43 2 2015 5 25 14 36 f 21 r 4.677 1 12 left white incongruent 0.451 black False False False 0.024
525 141 1 2015 5 22 11 30 m 25 r 1.599 4 14 down white neutral 0.818 black False False True 0.023
498 114 1 2015 5 22 11 30 m 25 r 1.392 3 19 down black congruent 0.551 black True False False 0.023
409 25 1 2015 5 22 11 30 m 25 r 3.240 practice 26 up white congruent 0.425 white True False False 0.023
351 159 2 2015 5 25 14 36 f 21 r 2.156 4 32 left black incongruent 0.728 white False False False 0.024