Skip to main content

Programming With Python Video Transcript

Brian: Great, yes, well again, thanks to everyone for coming. My name is Brian Jackson. I’m a professor in the physics department. I’m an astronomer by training, and so I have some experience with data science and with Python, frankly. In astronomy right now, almost all the kind of simple data science-type things are being done in Python, so Python, I think, is a really good language for Sonic, particularly, but I think for a lot of scientific fields. And as those of you who have done any kind of coding before know, once you learn one language, you can kind of learn, you kind of learn all languages. In computing, so it’s, you know, it’s useful. Python is a particularly readable language. I think it’s one of those kinds of languages that it’s easy to interpret what you’re seeing without even necessarily knowing a lot of coding, so this is a tremendous advantage in that sense. Okay, I’m just trying to clear some stuff out here. Yes, and thanks Jenny for the post. Is anyone on Zoom that wants to be on Zoom we’ll probably post links and things to Zoom as well as the Ether pad. We’ll drop links in there. I put up, I didn’t put any Jupiter notebooks. If someone posted one, I didn’t put it in myself, so no, okay. I did post a few notes and things at this tiny URL: tinyurl.com/RCdays 2023 Python. So this is just a link to a couple of the notebooks and stuff that I started for this thing, but they’re not probably sort of generally useful. I didn’t make them for anyone except for me, honestly, but we put notes there that’ll be useful. So let’s bring up the software carpentry. We’re going to use software carpentries.

Python tutorial because it’s really great. How many of you have already done software carpentry tutorials? Anyone done one? A couple of you? Yeah, they’re really good. They’re very well constructed, and they’ve been tested and kind of vetted by a variety of folks, so they’re pretty strong, solid instructional materials. Before we dive in though, let’s have introductions. Why doesn’t everyone just introduce themselves? At least give us your name, and that way, we can kind of get to know each other a little better. What, we start over here?

Molly: Hi, I’m Molly. I work in the genetics and infectious disease lab on campus, so if you ever get a COVID test, I’m the one running that for you. Is this connected to the Zoom? Can you hear us? Okay, yes.

Madline: I’m Madeline. I’m a Material Science PhD student with Eric Jankowski’s lab.

Andrea: Hi, Andrea, a PhD student at the University of Idaho.

Maria: Hi, I’m Maria. I’m a busy student in the biomolecular sciences program in Charles’ lab.

Aman: Hi, I’m Aman. I’m doing my Master’s with Ellen Enderlin in the geosciences department.

Jason: Hi, I’m Jason Watt, and I’m with OIT research computing. I’m helping out today as a Zoom moderator as well.

Joe: I’m Joe with the University of Idaho. I’m a system admin for Falcon.

Michael: I’m Michael from the University of Idaho State. I’m also a system admin for Falcon.

Jenny: Hi there, I’m Jenny Father Gill. I’m an HPC engineer with research computing services here at Boise. State and I’ll be helping out with this workshop.

Brain: Does anyone online want to introduce themselves? See, we have someone named Randy and someone named Jason. It’s just Jason’s in the room, and Randy is Randy in the room? No? Okay, great. Yeah, so let’s get into this here. This is the software carpentry Python tutorial, and it uses as an example some sort of fake data having to do with inflammation. All these software carpentry workshops, almost all of them, the tutorials are kind of focused around a kind of real-world problem example so that you can apply what you’re learning as you go along, and so it provides some data files and things like that. So the first thing you’re going to want to do is go to the website here. Hopefully, you all can see that. Oh, that doesn’t make the URL any bigger. Oh, perfect, thanks so much, Jenny. Thanks so much, perfect. So visit the URL, and you’re going to want to scroll down to where you see the setup. So is it set up?

And the first thing we want to do is to install Python. I think that might be the easiest way to do this. There are ways to run things online without installing Python, but I think it’ll be easier just to install Python. So you’ll want to go to the Anaconda website, which is anaconda.com, and I think that’ll take you immediately to the download page. It’s smart enough to know what OS you’re using, so go ahead and download it. And it should download and install it for your system. For your system, okay? I’m going to move forward a little bit because some of you, about maybe half of you, are kind of done. Those of you who are still installing, don’t worry, we’ll try to go slow so you get a chance to catch up. So you will want to install Anaconda Navigator, which I think is what Anaconda does when you download it. Do folks see Anaconda Navigator somewhere among their applications? Okay, so go ahead and fire up Anaconda Navigator, and you should see a pretty look like this.

Okay, so these are all the different kinds of apps that Anaconda provides, of which I’ve used like maybe two. We’re going to use Jupyter Notebook to run Python. When you click launch, it’ll open up a tab in your browser, so it actually runs through the browser and I think whatever your default browser is set to, that’s what will open up, so mine’s Firefox. So it’ll open up a little on Mac; it opens up a little terminal, and then it opens up abrowser tab. Okay, and so you should see something that looks like this once you open that up. Wait a second if you have so maybe if you have not managed to install Anaconda thus far, would you put up an orange sticky so we know, and we can get  some help to you?

And if you have managed to finish your Anaconda install, I’m going to take you down; go ahead and take down your blue sticky. You go and take down your blue sticky. Exactly, thank you.

And now we’ll use the blue sticky for those of you who managed to get the Anaconda Jupyter Python Jupyter Notebook running. So if you manage to get to this place right here where I am, and you can see like a window with files and stuff, put up your blue sticky.

For those of you who’ve managed to get this open, we’ll start by opening up a Jupyter notebook, a Python notebook. So if you go to new, by the way, wherever you are right now in the browser, that’s where your notebook is going to end up. So if you don’t want your notebook to end up in this case, this is actually the root directory for me, I opened up my Jupyter notebooks under a folder just specifically for this workshop.

I’m sorry, I mean it doesn’t do double-clicking, just single clicking. So navigate down to wherever you want your notebook to live for this class. So again, I put mine under documents, and once you get where you want to go, you want to click “New” and pick Python. Python 3 is probably what’s going to show up for you. When you do that, you’ll open up a new tab, and you should see a notebook that looks like this.

Can I enlarge this? So we’re looking at basically a Python command line here; we can run code right here.

Let’s see, who is not, is anyone not? If you have not gotten to this place here where you have a Python notebook actually open, put your orange sticky in for me. If you have any other stickies, go ahead and pull them down so we just don’t lose track. Okay. Yeah, that’s right. This, the stickies, is one of the software carpentry instructor like things they tell you to use. There’s a whole training that you can take to do software carpentry instruction, and they have the stickies as one of the things you’re supposed to use. Okay, so I don’t see any orange stickies. I’m going to presume that everybody is on a window that looks like this, is that right? Okay, so if you can switch back to your tab with the software carpentry Python tutorial, that’s where we’re going to start.

We should see a screen that looks like this, and then you should be able to switch back and forth between that tab and this tab where you’ve got your Python notebook, back and forth.

Okay, cool. All right, so we just did the setup; hopefully, everyone’s got through that. We’re going to dive into Python fundamentals. So if you scroll down on the software carpentry page, you should see a list schedule, and the number one thing labeled number one is Python fundamentals, so we’ll click on that and go into that. And I’m not going to read this stuff to you; obviously, you guys can read these things to yourself. But this tutorial basically just introduces you to some of the basic operations you can do with Python and Python variables.

So, yeah, sorry, and if you ever have an issue, you’re just going to put your orange sticky up, and one of the helpers in the room can help you, and I’ll slow down a little bit to make sure we don’t lose anybody.

So one of the simplest things you can do, of course, is to do arithmetic. So if you copy-paste this expression right there, you can copy it to your clipboard and then go over to your Python notebook and paste it into the cell, and it’ll paste it probably paste it as code, so it’ll color-code the things, so numbers will show up as green and operators will show up as pink.

Okay, so let’s say we wanted to run that expression; we wanted Python to process that expression. So in the notebook, there are a couple of ways to process expressions. You can go up here and click “Run.” Okay, so that’ll run that one cell and provide the output for that. So I’ll click “Run.” Output: 23. Is that right? This is like one of those puzzles on Reddit you see all the time, right, with these arithmetic expressions. Anyway, so 23 is cool for us; that’s the answer.

You can also press Control + Return, and that will run the cell, and it’ll run the cell but leave your cursor on that cell; do you see how this cell is still highlighted in blue here? So my cursor still lives on that cell, but if you want, you can press Shift + Enter, and that’ll run the cell and then move you down and create a new cell underneath if there’s not already a cell down there.

So let me, I’m going to delete this cell down here, okay, and press Shift + Enter, and you can see that it ran the expression and then created a new cell underneath. In fact, I think I can just do that; you can just create a bunch of empty cells; nothing happens, and it just makes new cells. You can use the scissors to delete cells as well.

Okay, so that’s about as simple a thing as you can do in programming languages to run an arithmetic calculation. You can assign variables; this is probably something you’re very familiar with.

With if you’ve ever used any computing language. So we’ll do a copy-paste for that on that tab, and then paste it over in a new cell.

Now, you can create new cells by hitting Shift + Enter, but you can also go up to “Insert” in the Python notebook and then insert cells above or cells below using the insert menu. Not only can you use the menu to insert cells above and below, but if a particular cell is highlighted, as my cell is here, but your cursor is not in the typing window, you can press A or B to insert a new cell. So I’m just pressing A and B to insert new cells. If your cursor is active inside the cell, though, pressing A and B, of course, just makes A and B go in there.

Okay, so I’ll make a new cell, insert the cell below, and I’m going to paste that variable assignment into that cell. Okay, and it just doesn’t do anything; it’s just like, “all right,” and then if you want to see what the value of that variable is, You can do a print in a Jupyter Notebook. The notebook is smart enough to close the parenthesis for you. So if you type any kind of parenthesis, it’ll automatically put the closing parenthesis. Print weights_kg, then press shift+return, and you can see that it tells us what the value was.

Okay. You might at some point want to save your notebook, so you can go over to the little disk drive here. And it’ll save it. My notebook, of course, when you first open a notebook, has the title as “Untitled,” so if you move your mouse up to the title and click on it, it’ll let you name it. So if you want to change the title, you can name it something different. I’ll just say “example.” Let’s call it “WC Python Tutorial.” You can put spaces.

Okay. Jupyter Notebook is also pretty clever; it’ll auto-save for you. So you can see it’ll auto-save every few seconds.

Any questions so far? Is everyone cool? Specific place? Maybe it’s wrong. Did you put an underscore? It’s easy to type the wrong name. Okay, great!

So that’s variable assignment in Python. A single equals sign and a name for a variable will give you a variable assignment. You have a lot of latitude for variable names in Python. Variables can include letters, digits, and underscores. You can’t start a variable name with a number, like, so you can’t have something like “0_weights.” Like that is not an allowed variable name; zero weights, but weight zero would be fine. So you can have digits in them; it just can’t be the first character, and then Variable names are also case-sensitive, so if we go back to our example notebook here and I type Weight with a capital W, actually, let’s just all caps. PG equals 100. That will be different; that will now be different from the other one.

Okay, so variable names are case-sensitive. I think that’s probably true for almost every language nowadays. Okay, there’s a wide variety of variable types in Python, and variable typing in Python Can be quite sophisticated, but fortunately, Python is not like C; you don’t have to like tell Python what the variable type is; you don’t have to. Variables are not strictly typed in Python, so if you’ve ever used other languages like C or C++, you have to tell the compiler, “Hey, this variable is going to be an integer; hey, this variable is going to be a string.” Hey, this variable is going to be a character. The Python compiler doesn’t care about that; it’ll just interpret from context what the variable type is.

So for example, if we go back to our Python Notebook, you can see that weight underscore kg, all lowercase, is actually an integer Because it’s just sixty; we assigned it sixty without putting a decimal place after it, and so I Think Python interprets that as an integer. You can figure out the type of variable by typing print, let’s Do the lowercase, and then use the function type. You see when I type the word type, the word type turns green; that tells you that it’s not Like a function that Python already knows about; it’s a built-in function. So we can do type, open a Parenthesis and ask it the type of, oops, ask it the type of the variable weight_kg, and it’s an integer.

So that’s one of the cool things about Python is that you don’t have to type; variables are not strictly typed. It can get you into trouble in some cases, but for the most part, you won’t Run into many issues. Python’s pretty clever.

If you want to assign a decimal point to weight, I’m going to copy-paste this: weight_kg equals 60.3, and underneath here, I’ll type weight KJ equals 63.3, and I’ll ask it to print the value for weight_kg and the type for weight_kg. Okay, so I’m going to print both of these things on the next line. Okay, and now you can see it tells me the value for weight_kg is 60.3, which it should, And it tells me that weight_kg is a float. So it’s no longer an integer; it’s a float, meaning that it has decimal points. Okay. Probably most of the time, the numbers you work with are going to be decimal, float variables.

You’ll notice, of course, now that weight_kg no longer has the value 60 that we assigned it earlier. So one of the things about Jupyter Notebooks that can get you into trouble is You can have multiple conflicting assignments in the same notebook, and the notebook, the compiler, Will just take whatever the most recently run cell is and use that as the definition for a variable. Right, so remember, we originally gave weight_kg the value 60 and made it an integer, But now weight_kg has been reclassified as a float and has the value of 60.3.

So what can Very often happen, happens to me all the time in research, is I’ll have a, you know, a big notebook with lots and lots of different cells in it, and I’ll run a cell, and it won’t work, or it will work once but then not again when I try to run it again later. Almost always, what ended up happening was I ran some other cell that defined a variable in a different way from what I wanted, And then I run that cell that I’m trying to run, and it doesn’t work.

So that is like I probably, I would say, 30% of the time when I get a bug and an error in code, that’s What’s happened is I’ve run a previous cell. So for example, if we go back up to this Cell here where we assign weight_kg a value of 60, if I run this cell again, In fact, let’s have it give us the value for weight_kg. Okay, so Now I’m having it tell me the value for weight_kg and then the type weight_kg,

And so now weight_kg is 60 and an integer again. Do you understand what I’m saying here? I’m going to make a new cell at the bottom here and do the same thing: print weight_kg, type weight_kg. And you can see that it still has that same value even though there’s a cell above it Where weight is given a different value. Does that make sense? Because I didn’t run That, when I didn’t run this cell here, weight_kg hasn’t changed values yet. I Can do this: I’ll run that cell and then run the cell, and now it’s changed. Okay, So I just want to highlight that because that’s one thing that catches me a lot. Of course, you can create strings, and the way you can create a string is by having a variable name equal to something in quotes. You can use single or double quotes.

Let me insert a cell below, and we’ll set up a variable here. Okay, and I’ll have it print the value of the variable and the type of the variable, just so you can see what happens. Oh my gosh. And now you can see that the variable has a value of 001 and it’s a string. Okay, let me ask, is this going too fast? I assume for everybody, everyone’s cool with this? Yeah, this is all pretty standard. I bet most of you are familiar with this. Okay.

We can do variable arithmetic of course. If you want to convert your weight in kilograms to weight in pounds, you can multiply by 2.2. This will inherit the type of the most precise operand in the calculation. Okay, so you can see that I’m calculating weight in pounds by multiplying weight in kg by 2.2. Well, 2.2 is a float. 2.2 is not an int, it’s a float. And so by doing that, I will turn weight_lb into a float, even if weight kg is an integer. Does that make sense? You can concatenate. Concatenate means to add additional values onto a variable. So if you have a string and you want to stick more strings onto it, you can do that using the plus sign. That’s one way to do it. So here you can see that we had previously defined patient ID to be 001 as a string, and now we’re going to redefine patient ID to be inflam_001, all as strings. So we’ll do that and tell it to print the patient ID. And now we’ve changed the value of that variable. So this is an example of what’s called operator overloading, where the operator will do different things in different contexts. In this case, the plus sign is being used as a string concatenation operator and not an arithmetic operator. And this is, I think, also different from maybe C, where if you add two strings together, it’ll do it arithmetically and not as strings. Python has an enormous number of built-in functions. Print, we’ve already seen; you’ve seen the print statement several times. Type is another one. This is a list of all the functions. Okay, this is a definition. I’m sorry, but you can find any number of lists online of built-in functions. Super, super useful.

Print is probably the very most useful statement you can make. You can print out things in a variety of different ways. So I’m copy-pasting this line here, and so what it’ll do is it’ll give us a string where the patient ID is at the beginning, comma, and then it will give the string ‘weight in kilograms’ and then the variable for weight and kilograms. So you can construct long strings from variable values this way by concatenating things or by putting commas between them. So in this case, print kind of implicitly concatenates all the strings here together when you put these commas between the different variables. See? Okay, print type. We just did this. You can check the type of a variable. That’s very useful.

You can, of course, do arithmetic within a print statement. So here it says print, and then it’s going to actually do arithmetic, and the return value for this operation will be cast as a string to print to the screen. So it automatically converts that float into a string to make it into a concatenated string.

Same as we have.

There’s some description about how variables. Work, I suspect we don’t need to go through that in too much detail.

Python has a couple of different ways of commenting, but one of the easiest ways to do commenting is just to put a pound sign or hashtag in front of a line, and that’ll turn it into a comment. And so you have to do that for every single line. If I type stuff down below that comment, it’ll treat it as code.

Another thing that’s very useful about Jupyter notebook in particular is you can make, you can embed text as cells alongside code. So for instance, if I wanted to make this cell into something like a paragraph because I’m explaining the code to somebody, you can go up with your mouse to where it says ‘code’ here in the dropdown list, and you can choose markdown. And what that’s doing is it’s telling Jupyter notebook, “Hey, this one cell, interpret this cell as text and not as code.” So I can write comments and things. This cell is text, not code.

You can’t include long comments, equations, URLs, etc., within the body of a notebook. Okay, and so if I compile that cell, which is what I did, and shift enter, it’ll actually just render it as text.

For those of you who’ve used LaTeX, if you use LaTeX to do equations, you can also include LaTeX equations directly here. As for LaTeX, now we have to do it correctly. Equation, I’m just going to show you just for the heck of it. If you’re not familiar with LaTeX, don’t worry about it. But you can, for those of you who are, include it.

So this is a really, really nice feature. Of Jupyter notebooks, where if you write code or do some sort of calculation for a research project, you can actually embed an explanation of the calculation within the notebook itself. And notebooks can be exported as HTML files or PDFs, so they’re really, really easy to share calculations with colleagues. I use it all the time in astronomy. At least, it’s become a standard way of sharing the results of a research paper. You actually just provide the Jupyter notebook.

Any questions? Sir, let me just pause for a second. If you have any questions, yeah, please. I’m not 100% sure, so yeah, if you run a variable assignment, right? So if we just run the variable assignment here, I’m going to delete this line and just run the variable assignment. It doesn’t send any output to the screen, so I couldn’t tell you actually 100% why it decides sometimes to put output or not. It’s not usually an issue; it doesn’t affect anything.

I think if you put a semicolon at the end of the line, it won’t output anything. So sometimes, like if you make a plot, if you make a plot and sometimes you don’t, it’ll generate a bunch of text output that you don’t want, and if you put a semicolon after the command, it’ll suppress it. Yeah, in fact, if we need a parenthesis, that’s a good question. I think probably you do. Let’s say no, you don’t. So yeah, a semicolon after will suppress output. It doesn’t have any effect on the calculation.

Other questions? That’s a really good question. You’ll also notice that there’s a counter at the far left of each cell, and so that’ll tell you what the most recent cell run was, and whether a cell has been run at all. So, okay, so if I do that, I haven’t run that cell, and so it doesn’t actually run that calculation, so if you do a variable assignment in a cell that you don’t run, the compiler will just anything else, that was a really good question, okay. Comments, output, let’s see.

Okay, so why don’t we take two minutes and if you’re on this page, see if you can guess what values these variables are going to have after running these statements, and there’s a solution. I won’t be able to solution yet, but take two minutes and see if you can, and when you’re done with that part, put your blue sticky up. All right, so yeah, so software carpentry has these fun little quizzes, kind of throughout, and you can just cheat if you want, look at the foreign So after the first line, of course, mass of the value of at age does not after the next line, age is going to have a value, the next line down you’ll notice here, this is a good example of where you can assign, you can do something to a variable and then assign that new value back to the variable, so this actually if you look at this line on its own, it might seem a little confusing, but the right-hand side of the equal sign gets run first, and then the return value gets put into the variable, so that can be a little confusing when you first see it. Same thing down here, you basically subtract 20 from age and then you reassign it to the variable H, so H is the new value, any questions about any of that? Does that kind of make sense? Is everybody cool? Python has a bunch of capabilities having to do with like lists and kind of doing multiple assignments at once, so here’s a good example, you can actually assign multiple variables, and multiple different values using this kind of construction, so let’s copy-paste this into our notebook over here. And you’ll find that the variable first and the variable second are defined by Grace and Hopper, so these are this is one way to assign a bunch of variables all at once. I guess I tend not to do this all that often. It’s useful for when you’re like if you write a function and you want to return multiple values from the function, but I tend not to write quite, I tend not to write assignments that look like this too much because it can be a little confusing at least for me, and maybe other people are better at interpreting that. As a general rule, you probably know this as a general rule, you want to write code that’s readable. It’s I think it’s more important, depending on your context, to write code that’s readable than code that’s clever, so if you have to write code that looks, it sort of feels logically kind of clunky but it’s easy to read, it’s better to write it in a way that’s easy to read than to worry about writing a code that’s super optimized. You can always come back and optimize the code, but it may be very hard to interpret code that’s poorly written for readability, so as a general rule, better to write for readability than for optimization, and if you’re trying to use optimized code anyway, you shouldn’t be using Python because Python is kind of slow compared to other compilers. Okay, here, I hope, hopefully, we have a good sense of what this is. Any questions? We’re going to move on to the next. page. We want to make sure there are no questions so far.

Okay, so click down here on this arrow because this will take us to the next page. And in this exercise, we’re going to learn how to import

Python libraries and how to import data files, simple data files anyway. One of the best things about Python is this is an enormous number of libraries out there for Python. If there’s some kind of data processing type thing, data visualization processing type thing you need to do, odds are somebody’s written a Python library that already does it, and a lot of, almost all, so much of the source code is open, so it’s very easy to get a notebook or functionality that you might need. So we’ll start with one of the simplest, most commonly used in libraries in Python, that’s numpy, so grab this command series of commands here, copy it, and then we’ll paste it into a cell down here. Oops, we’ll paste it first. And you’ll run it, and it may take a second, and you’ll notice it takes a little bit longer maybe to run that cell than other cells. That’s because Python is importing a whole library to do this. So what you’ve done is you’ve imported the library numpy. You will import libraries and functions, and so all this, all the time. One very useful function that Python provides that allows you to get information about anything, libraries, variables, anything, is help.

So you can use help(open parenthesis, numpy) and it’ll provide you with, just like, comprehensive documentation, assuming that whoever wrote the library bothered to write documentation. So this is documentation that’s inside the numpy library itself, so it’ll give you like examples. and links, all kinds of stuff. Numpy is also, if your search for numpy, if you go to do a Google search for numpy, they have really extensive documentation on their website as well. And odds are, again, if you run into some sort of calculation, something you need to do in Python, you can Google it, and someone has done it already, and you’ll go to a Stack Overflow page, and it’ll just give you some code that you can grab. Okay, but help is a really useful function. I’m going to do one thing just to show you what will happen if you did not import this library. So as we’ve said before, oftentimes you’ll be using a notebook, and you’ll forget to run certain cells to define things. Let me, I’m going to reset our notebook, so I’m going to clear out all the variables, and clear out all the libraries. If you click this “restart the kernel” right here, it’ll come up and say, “Really, you really want to do that?” And yes, I’m going to restart it, so now it’s cleared everything out, so it doesn’t know about any of the variable assignments or the libraries. If I now try to ask for help on numpy, what’s going to happen is it’s going to say, “I don’t know what you’re talking about.” So you have to import a library for Python to know about it. It doesn’t have built-in libraries; it has built-in functions, but not libraries. So I can ask for help with, like, print, right? If I want to get help with print, it can provide me help with that because that’s a built-in function, but numpy is not built-in. So you’ve got to import, and I’m going to. And now that I’ve done that, I can run it, I can get help on numpy.

Okay, numpy has an enormous number of functions and classes, just so much stuff. You’ll end up using numpy all the time. One of the most useful functions that you’ll find in Numpy is called load text, and so this construction here, numpy.loadtext, means go into the numpy library, library, and use the function or the method called load text that’s defined in that library, okay? We’re going to run this in a second, but you need the data file that’s provided in the setup.

So if you go back to your, oops, where am I? If you just go back up to the top where’s this program with Python, click here. And on setup, go to the setup page again. If you scroll down, you’ll see links to two zip files right here. You want to download both of those zip files because it’s going to provide the data that we need for this exercise. So go ahead and download those, and move them to the same place on your computer where you have your Python notebook. That’s really important because if the Python notebook that you’re using right now is not in the same place as the data files, you won’t know where to find them. You can enter a long file path and tell it where to find them, but we’re not going to get that far probably in this. You just want to have it in the same folder.

Yeah, folks, my suggestion is at this point probably it’ll be easier to just open up a new notebook in the same directory where your code is, where your data files are. So my data files are here, if you’re on a Mac, you can actually get, do a right click or a control click, and get info, and it’ll tell you where that data file is. My suggestion is go back to your Jupyter notebook file navigator screen. You should have two Jupyter notebook tabs open right now; you should have the code that we’ve been playing with, and then another thing, that’s where the Jupyter notebook was originally opened.

So, using this Jupyter Notebook file navigator, I would navigate via these links up here to the folder where your data now live, and the data are in this data folder that you just downloaded, okay? I’m going to come around and help you with that because that might be a little confusing for folks. Okay, perfect. You guys are way ahead of where that’s how we were. Okay, perfect, so I’m here. So, here’s the notebook I’m going to use to load data, so I’ve copy-pasted it.

If we go back to our Python Programming page and go back down to the lesson where we were, which is analyzing patient data, you can copy-paste “import numpy” into your notebook. And then you can copy-paste this whole command, numpy.loadtxt, and then there’s a bunch of stuff in here. You just want to copy-paste everything here into your notebook. And when you do that, it’ll just give you the output from the file, but guess what? You haven’t actually loaded that data into a variable; all you’ve done is loaded it up. It puts it out on the screen. What you’re seeing here are elements in the array. Numpy.loadtext returns; it’s a function that returns a variable, and what it returns is a numpy array.

So an “Umpire” array is probably what you think of as an array; it’s just an n-dimensional list of numbers, or in some cases, it can be other data types, but in this case, it’s just numbers. It’s two-dimensional; in this case, it’s got left, right, up, and down. You can tell that by the fact that there are these two square brackets. But I’ve just loaded the array; I haven’t actually bothered to assign it to a variable yet. So you’ll want to assign it to a variable, so data equals the return value from load numpy. Okay, so you do that, and you’ll notice, of course, it doesn’t have any output. Okay, but if we print data, we’ll get the same output we had before, more or less. So now, data is an array variable. And in fact, you can actually ask it the type of data too, and I think it’ll say, it should say numpy array, yeah, numpy array, yeah.

There’s an array that has information about itself, and so, I can say data.dtype, so dtype is a field; it would say it’s a field inside the numpy array, so it’s something that the numpy array knows about. So we can type that as well, and what that’ll do is tell you what the—what is this, let me delete— what that’ll do is tell you that the variable type of each of the elements within the array. In this case, it’s a float, a 64-bit float, so if you know what 64-bit is, don’t worry about it; it’s just a float number. It has decimal places, a lot of them.

Arrays also know about their own shape, so you can do data.shape, and that’ll tell you what shape the array is in. In this case, it’s a 60 by 40 array; that means there are 60 entries along the columns and 40 entries along the rows. I can never remember whether it’s row first or column first.

You can access different elements of the array by indexing them. So here, for example, we say the data is zero; data with a square bracket. Now be sure you use square brackets, don’t use rounded parentheses, and don’t use curly braces; those have different meanings. In fact, if you put rounded parentheses here, it’s going to complain that you’re asking it to call a function. This is data; numpy.nd array object is not callable. What it means is, hey, you told me To call a function called Data, there is no function called Data. Okay, so the rounded parentheses mean you’re calling a function, so you need to use the square brackets for an array. That’s not a trivial distinction; in fact, you can try curly braces; it should also fail. Yeah, it’s also not correct. It’s got to be, gotta be square braces. And that’ll give you the first value in data, the zeroth element. Python is like C in a lot of the variables, and then it counts from, it counts up to zero. Sometimes it’s a little confusing if you use Fortran. Does Fortran still count as one? Do you know? You can tell it now. Okay. I see. I see. So different other languages will actually start counting up from one instead of counting up from zero. Python counts up from zero. You can ask about some random values somewhere in the middle, so the 29 comma 19 value. Just some random number in there. One of the great things about Python is you can take slices of arrays, so you can take kind of chunks of arrays, and the way you set that up is by setting up the number colon number. So here, for example, we’re going to grab columns 0, 1, 2, and 3; 0, 1, 2, and 3 by rows zero, one, two, all the way up to row nine. So be careful, this can be a little confusing. I’ve been caught up by this a lot. That last number right there, this range that you’re giving Python is exclusive; it does not include the last one, so zero colons four means zero, one, two, three, but not four. Okay, that can be confusing. Okay, let’s see what else we got. You can do the same thing; you don’t have to start at zero of course, you can start somewhere in the middle. You can also, if it’s implied from context, you don’t even understand if you give it the starting index or the ending index. So here, for example, data blank colon 3, which means start at zero. You don’t have to even tell it zero, start at zero and go up to zero, one, two, not the third one though. Okay, zero, one, two, and then the second entry means start at column 36 and go all the way to the end. Does that make sense? Okay, so we can, in fact, just to show you that this comes out the same, you can, you can type the same thing, zero. And you’ll see it comes out the same. Okay, so if you just, if you’re gonna have a slice of an array like that, and you just put a colon and don’t include the first or the last entry, it’ll assume either the beginning of the array or the end of the array.

The NumPy library provides a variety of functions that can operate on floats or arrays of floats, so the mean and the average is one of those examples. So you can print numpy dot mean of data, and it’ll return the average of all the values within data, the average of all the values, all at once. Different functions have different requirements.

In terms of the arguments you provide, I don’t know, this is where it’s going through. Of course, you can, you can… oh, sorry, yes, please. That’s a really good question. I don’t just say print and then some slice of the array. I mean, how many entries does it go? So that’s 10 entries across. I don’t know, I don’t know off the top of my head. Let’s see, so there’s 20. I don’t, I just don’t know how far I’ll go. That doesn’t look right. Is that actually going all the way to 60? Is it 60 entries? I’m not sure, actually. I don’t know. So usually, if I have an enormous array like that, and I want to know what the values are, it’s pretty rare that I’ll print them out like this. I might print them out one at a time. At that point, it’s probably just better to start plotting the thing. Yeah, I don’t know. I’ve never run into that. I’ve never tried to make it print when I had an enormous array. It doesn’t seem limited, so I guess if you take slices of the array, it’ll always print out all the slices. Let me see if that’s true, yeah, because it doesn’t do that. I don’t know, yeah, that’s weird. I’m not sure. I never, I never usually, if you have a big array, I just plot it or print out a specific set of variables. It’s pretty rare that I’ll try to do the whole thing all at once like that. Other question? Sorry, I didn’t miss that. Great, of course, you can do other operations, you know, all the standard kinds of arithmetic operations you might want to do to an array. You can take a maximum value; you can ask for the maximum value, you can ask for the minimum value in the array, you can calculate the standard deviation, STD. A very useful function that you might need at some point. I don’t think they talk about it in here. Occasionally you’ll run into cases where your data array will include things that aren’t numbers, so you’ll get Infinities or not a number, NaN, and you might still want to know what the maximum value is of the values in that array. If you ask for an array that has an Infinity in it, which you can have in Python, and you ask for the maximum value, it’ll just return infinity, and you’re like, well, that’s not useful; I need to know the actual value for the numbers, not like the maximum, and so there’s another function called NaN Max. NaN Min, actually, let me do this as a separate cell.

In this case, data only has finite values, so you’re not actually going to miss anything. I don’t know if there’s a finite standard deviation, let’s see, yeah, so there you go. So NaN Max, NaN Min, and standard deviation will calculate those values and ignore things that are not numbers. So I run into this a lot where you’ll have an Infinity that came up in a calculation and you want to know the maximum but not include the infinity, so NaN is a good thing to use there. Mystery functions, what that’s all about. Oh, that’s one of the things you can do, tab completion.

You may be familiar with this, numpy dot a m, and you’re like, I can’t remember the name of the max function, what was it, and you can do tab; if you press the tab button, it’ll show you example suggestions that are consistent with what you’re asking. Then you can kind of scroll between them here. Thanks for joining us! Do you have, like, Python installed? You might need help with that, probably, right? I’m talking to me right here, yeah. Okay, great for sure.

Okay, we’re on the, if you do a Google search for, Software Carpentry Python, you’ll get to the page where we’re at. Yeah, so you can use tab completion, and it’ll remind you of the available functions. That’s a super useful thing. We’re on analyzing patient data; that’s the example we’re working through.

Okay, let’s see what else happens. Sometimes you want to calculate the average, not of all the elements in the array but maybe the average across a column or across a row, and in that case, you can ask for the maximum, sorry, let’s say you can do numpy mean and say axis equals zero, and so what this will do is return an array that gives you the average value across all the rows. Okay, so if we print this, we’ll get that. Okay, this should be, how many, how many is that? 40. Yeah, so that’s 40 elements, so that’s the average across each row.

If I want to do the average across each column, then I set axis equal to one. And I’ll get averages across each column. This notation that I’m showing you here where now I’ve got NumPy mean and then I send it a variable, and another variable equals something. In Python, there are a couple of different kinds of variables that you can send to functions. You’ll notice that mean before mean worked without having to send it this axis thing. So, mean requires at least one argument. It requires at least one argument; it has to have the thing you want to take the average of, but it has optional arguments as well that it referred to as keyword arguments.

Keyword arguments, and so what those usually do, is modify the behavior of the function in some way. So in this case, if you send it axis equals something, it’ll try to take the average along that axis. If you don’t send in anything, of course, it just gives you the, I’ll show you again, of course, it just gives you the average of everything. What happens if you try to send it an axis that the variable doesn’t have? So now I’m saying not along the row, not along the column, but along a depth. What’s going to happen there? It’ll crash because it doesn’t have that axis.

Does that make sense? Since this is a two-dimensional array, it only has axis zero and axis one; it doesn’t have an Axis Three, or axis two; that would be a third dimension that it doesn’t have access to, a set of bounds. See, of course, you can ask for shape. That’s what I just did.

Strings can also act like arrays. And so, you can take slices of strings. So we can go and try this one out. So, we set a variable equal to the string oxygen, and we can take the first three characters, so the zeroth one, first, and second character, not the third character, right? Remember, it doesn’t take that. So, you can slice strings, as if they’re arrays. I’m going to skip through that; I’ll let you guys, you guys can work on this yourselves. I don’t want you to have to do this. Oh, actually, you know, let’s talk about this. You can also send Python negative indices for an array; this is a super useful feature. What this does is, so if I put the number one there, that will send that will provide me the first, the month entry. Okay, but if I put the number negative there, what it’ll do is actually send me the last entry, the last character. Does that make sense? So, a minus number actually counts from the end, going the other direction. If you do the same thing and put a 2 there, what is it going to give you? It’s going to give us the second character from the other, from the end, right? E, and so you can actually slice that way too. You can say, give me all characters, and elements from 0 up to minus two. X, Y, G. I guess, yeah, so that’s super useful, and we didn’t even need to put the zero there, right? We could have left the zero out because it’s implied. So, being able to slice back in the opposite direction is really, really useful.

You can combine arrays in a couple of different ways. Let’s see. So, if we create an array A, which is a two-dimensional array, so the first row is 1, 1, 2, 3, the next row is 4, 5, 6, then 7, 8, 9 on the way, it’s basically between seven values, then you can define a variable B, which is A stacked on top of itself. I can never remember which way that goes. So, A is stacked on top of itself horizontally, that’s B. C is A stacked on top of itself vertically, so that’s C. Here, you can see how they do different things, and then there’s even a D stack. Can I do D? See what that does? Don’t think this is going to work properly. What does that do? Oh yeah, then it makes it a third dimension, so actually, it stacks it up along a third dimension. That’s what D stack does, depth stack. I honestly, every single time I need to do a stack of an array, I have to test it first because I can never remember which way it stacks. I just can’t visualize these in my head, so yeah, you can see that this is a three-dimensional array now. D is three dimensions. In fact, we can ask it to tell us the shape of D. Let’s have it do all those shapes. Let’s see, the shape, D dot shape. So, you can see the D now has this three-dimensional structure. Diff is another useful function; diff will Take the difference between—did they leave something out here? Oh, I see. Yeah, diff will make the difference between one element and the next in an array. Actually, so let me go back here. We need to define this variable first, and then we’ll have it print the diff. Okay, so patient three, week one is the first week for patient one, so it’s like the three rows, the seventh up to the seventh column, and then we’re going to take a diff. So that’s asking, what’s the difference from one element to the next one, two, three, four, five, six, seven, so there are seven element entries in this, but there are only six entries—one, two, three, four, five, six—in a diff. Why would there only be six entries? Because you just—if there are seven entries, you can only subtract each element pairwise, six ways, right? You only get six. Okay. That can be tricky, because sometimes if you’re trying to calculate the derivative of a function that’s tabulated in an array like that, you’ll get a number of entries for the derivative that’s one fewer right than the variable itself. So just be careful. I’m trying to think. That’s the one way that I’ve used diff—to try to calculate derivatives. Any questions? I’ll pause again real quick. So, we’re supposed to go until two to five. Should we take like maybe 10 minutes to stretch your legs and stuff? This might be a good place to break. Yeah, so it’s—when I’ve got like three, let’s call it 3:20. We’ll come back at 3:30. Okay?