Friday, December 6, 2013

Sorting Algorithms

So, this final SLOG post may have slipped my mind, what with studying and classes finishing and all. However, I've still got 2 hours and a bit before 11:59 so let's give this a go.

Sorting algorithms are... algorithms used to sort lists (collections? I'm looking for a term without a more specific meaning in Python than there is in general programming) of objects.

Efficiency

In my experience, which is really not a lot, sorting algorithms are always taught alongside efficiency. This is because there are many sorting algorithms, all of which accomplish the same task, but will take differing amount of times to do so. 

Efficiency is denoted by Big-O or O() notation. The term inside the brackets is some time of function, which is always plotted with number of items in the list as the independent variable, and time taken (which can be measured in various ways) as the dependent variable. 

Some common efficiencies, in descending order are:
  • c - constant, the time taken for a 1-item list is the same as the time taken for a 1000-item list
  • c log n - logarithmic
  • cn - linear
  • cn^2 - quadratic
  • cn^3 - cubic
  • More Powers!
  • c2^n - exponential
  • cn! - factorial
  • cn^n - not even sure what to call this
Within each category, there are different powers of c. For example, a O(2n^2) would have a higher efficiency than a O(3n^2), but the difference is minimal compared to the difference between O(2n^2) and O(2n^3).

Another thing to note, is that sorting algorithms can have a best-case and a worst-case efficiency. Meaning if you know you are dealing with small lists, you may want to pick a different algorithm than if you know you will be dealing with big lists.

Selection Sort

Selection sort has a best- and worst-case of O(n^2). Starting from the beginning of the list, you pick the item in that position. Then you move iteratively through the rest of the list, comparing all items to the original item. If the item further down the list is of lower value (sorting in ascending order), then the two items are switched. The new item at the original position is then compared with the rest of the list, starting from its old position. This is actually a lot simpler to put in code:

for i in range(len(L)):
    for j in range(i + 1, len(L)):
        if L[i] > L[j]:
            L[i], L[j] = L[j], L[i]

The reason for the best- and worst-case complexities being the same, is that, even if L[i] is the smallest item in the list, it must still be compared to every item after it in the list.

And of course, I can't help but include this helpful visualization:

Quick Sort

From its name, you would assume quick sort to be fairly fast. It has a best-case performance of O(n log n) and a worst-case performance of O(n^2). The worst-case is fairly rare, so it is usually ranked by its best-case performance. I didn't list this one above, so it would be below O(log n) but above O(n).

Quick sort is a recursive algorithm. You start by splitting the list around one element, known as the pivot. All items smaller than the pivot go before it, all items larger than the pivot go after it. Thus, the pivot is in its correct position. Then the same algorithm is applied to two smaller lists to the left and right of the pivot. This continues until there are three or less items in the sublist, in which case those three items will all be correctly sorted.

Obviously, this algorithm works best when the pivot would end up in the centre of the sorted list. Otherwise, it is very similar to the above selection sort.

This algorithm also comes with a dance.

Merge Sort

Merge sort is another recursive sorting algorithm. (Wow, recursion is important!) It has a best- and worst-case performance of O(n log n), making it slightly better than quick sort if you think all your lists have a high chance of already being sorted. 

Merge sort is performed by first dividing a list in half. Then the merge sort algorithm is performed on each half, and the two results are merged (concatenated). This algorithm keeps breaking the list down into smaller chunks until it reaches size 2, in which case the two items are simply compared and switched if needed.

By starting with splitting the list in half, merge sort avoids quick sort's problem of an already-sorted list.





        

 

Thursday, November 21, 2013

New PoSt Requirements for Computer Science

Not sure if this is entirely relevant, but here goes. I feel like rant- discussing.

I, and presumably all the other first years taking a computer science course, received an email stating that the Comp. Sci. PoSts are going to have a high cut-off average that is to be announced. This is in order to ensure all students enrolled in a PoSt will make it into their required classes. This is really great for Comp. Sci. students but sucks for me.

I like computer science courses and programming in general because I see it as a useful and interesting skill, even if it isn't particularly relevant to my degree focus or future career. Technology is a huge and important part of most lives (in developed countries) and having an understanding of how it works is also key.

I intended to take maybe one computer science course a year, just to keep going. However, these new requirements have deterred this.

Oh well, I'll have to pursue my hobby outside of the university's bounds. Which leads back to how I started with python in the first place, an online programming course offered by MIT.

Resources are everywhere, I won't let this stop me.

Test 2 and Assignment 2

The final evaluations (besides this SLOG) for CSC 148 are complete and the marks are in.

Test 2

I was expecting a much wider variety of questions on Test 2, maybe something on sorting algorithms and big-O notation. Instead, the questions were focused on linked list and binary search trees and creating methods that involved recursion. Thankfully for me, this is one of my strengths. I did well on the test except for a few minor errors.

One thing I have noticed is that if you didn't get the hang of recursion when it was first introduced, you would find yourself falling further and further behind as the course progressed. While iteration is sometimes a possibility, it seems so much more difficult to me.

Assignment 2

This was the assignment on Regexes. My code seemed to pass all the tests but one, matching a string with a regex that contained  a * operator. I will probably go back and see if I can fix this over the winter break. Programming problems seem to stick with me, even outside of course work. I just like having the feeling of a problem to be solved. Other than that, there weren't many surprises. 

I'm curious to know the grade distribution for this assignment, especially compared to the last one. Danny may have went over it, but I left class after grabbing my test, I wasn't sure if there was more but others were leaving. I found A2 to be much easier than A1, mostly due to the aforementioned GUI component.

Wednesday, November 6, 2013

Assignment 2 - Regular Expressions

Assignment 2 was due this past Tuesday. Once again, I thought I did fairly well on it. I hope my feelings are more accurate than they were last time.

The assignment was centred around regular expressions -- expressing them in tree form, and matching whether a string met the requirements of a certain regular expression tree.

I just realized I forgot to write a proper helper function for my code shortcut. Whoops. Too late now.

While I understood everything the assignment asked, I'm still unclear on the role and uses of a regular expression. To the Google machine! (dun dun dun dun dunhhhhh)

From Wikipedia:
In computing, a regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations.

That helps, a little. An applications of regexes is the Ctrl-F that I so dearly love. Maybe I'll find some more uses down below.

On a quick scan/scroll throught the article I noticed:

  • There are a lot more complicated regular expressions that we dealt with.
    • This is both in characters (more than '1', 'e', or '0').
    • And in operators (parentheses can do more fancy things. There are also ?, +, $, and many, many more.
  • There are three algorithms for matching a regex (okay I'm getting lazy.. it's an accepted shortcut) with a string.
    • I don't really understand the three or which I used because Wikipedia describes them in 'formal language' and well, I gave up.
  • More uses!
    • Apparently regexes are useful in the 'the production of syntax highlighting systems, data validation, and many other tasks'.
    • Again, I'm just going to leave it at that. I'm happy with a vague understanding.


Wednesday, October 30, 2013

Binary Search Trees

I guess I should try writing a post about a lecture topic at some point, so here goes.

On Monday we were introduced to Binary Search Trees -- binary trees that are sorted by value. They are used in opposition to regular, unsorted binary trees in order to apply a search method that has greater than O(n) efficiency.

How It Works
The binary tree has a number in each node. Any nodes with numbers less than the root node go to the left, and nodes with numbers greater than the root node go to the right. And so on and so on in a great branching tree.

This means that at each check, 'half' the tree can be eliminated. I put half in quotes because this only works if the tree is balanced (the right and left branch have the same number of nodes). As shown in class, its possible to have a binary tree which only makes use of one of its children. In this case, no extra elimination is possible.

My Question
How does the efficiency of finding a value in a binary search tree compare to the efficiency of finding a value in a sorted list. It seems to me that searching through a binary tree follows the same process as doing a binary search on a regular sorted list (non-tree format).  If there is no efficiency difference, what is the benefit? It was shown in class that a binary search tree isn't that helpful when your tree isn't balanced. Which means an extra step on inserting to the tree is necessary to keep the search efficient. In addition, reshuffling a tree after deleting an item seems much more difficult than deleting an item from a list.

The only benefit to the binary tree that I can see is the linking. However, I still can't find a situation where linking would be useful. Any suggestions?

Monday, October 21, 2013

GUIs

Attempt at posting regularly: sort of accomplished?

Assignment 1 of this course has made me think a lot about Graphical User Interfaces (GUIs). I have worked with GUIs in all three of my computer courses, but I still don't feel like I've been 'taught' about them. Computer science courses tend to focus on the underlying concepts of programming: objects, functions, if/else, loops, etc. What I always feel is missing, is the step that brings a program with complex logic to the end user.

The days of MS-DOS are gone. Current computer users are not comfortable with the console, and many wouldn't even be able to find it if asked. We now interact with computers through buttons and fields rather than keyboard commands and everything else seems unnatural.

Creating and working with graphical interfaces is never thoroughly taught in introductory computer courses (in my experience). If they are mentioned, it is as a side-topic. "Create this program and, oh, make it with a GUI". In high school, I was taught how to design a JSwing GUI and used the automated features of Netbeans to create event handlers.

Things I would like to learn in a course:

  • Click and keypress events
  • How to make clickable buttons and text fields
  • How to design/draw an interface

Some of this crosses over into graphical design. It's more of the shiny packaging than the actual meat of a program. This is probably why it isn't taught in the introductory courses. Perhaps they have full sections on it in upper years. However, a GUI is incredibly important to any program meant to be used by the masses. Incorporating them into introductory computer science classes would give beginning programmers a vital tool.

Sunday, October 13, 2013

Recursion

This is another required-topic blog post.

Recursion is a process for solving a problem. Given a large problem, using recursion means breaking down the problem into similar, smaller cases then using the same logic on those cases until a final 'base' case is reached. This base case has a simple solution. That solution is then used to solve the previous slightly larger problem and so-on until the beginning problem is solved.

Recursion is another topic I was introduced to in my second high school computer science course. In my opinion, it is one of the most confusing topics I have encountered in computer science. Solving a problem by saying, use this solution method I don't actually have yet is very counterintuitive.

Yet recursion is incredibly useful in programming. Once you can wrap your head around it, you can solve once complicated problems in just a few lines and solve others that seemed impossible. From a programming-time standpoint, recursion saves time. In the beginning stages, using recursion could take as much time as not because the logic is unfamiliar. Yet, the end result is a few lines of code whereas not using recursion could require multiple loops, variables, and if/else statements. Using recursion can save time in the creation of a program.

The other benefit is one I learned when studying the different sort methods. That is, recursion also reduces computing time. That is, after the program is written it will take a certain amount of time each it is run. Some programs have the time they take increase linearly (e.g. double for loops) as the number of variables increases. Other increase quadratically (e.g. double for loops), logarithmically, and so on. Going back to the sort methods, a quick sort, which doesn't use recursion, will take O(n^2) in the worst case. That is, it increases quadratically. Merge sort, which uses recursion, takes O(n log n) in the worst case which is a drastic improvement. While the two make take the same amount of computing time (and therefore computing power) when the numbers are small, using recursion becomes more and more beneficial from a resource standpoint as the number of times the code will be run increases.