Source Code
# Python Program to find the factors of a number
# This function computes the factor of the argument passed
def print_factors[x]:
print["The factors of",x,"are:"]
for i in range[1, x + 1]:
if x % i == 0:
print[i]
num = 320
print_factors[num]
Output
The factors of 320 are: 1 2 4 5 8 10 16 20 32 40 64 80 160 320
Note: To find the factors of another number, change the value of num
.
In this program, the number whose factor is to be found is stored in num
, which is passed to the print_factors[]
function. This value is assigned to the variable x in print_factors[]
.
In the function, we use the for
loop to iterate from i equal to x. If x is perfectly divisible by
i, it's a factor of x.
Here is an example if you want to use the primes number to go a lot faster. These lists are easy to find on the internet. I added comments in the code.
# //primes.utm.edu/lists/small/10000.txt
# First 10000 primes
_PRIMES = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113,
127, 131, 137, 139, 149, 151, 157, 163, 167, 173,
179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
233, 239, 241, 251, 257, 263, 269, 271, 277, 281,
283, 293, 307, 311, 313, 317, 331, 337, 347, 349,
353, 359, 367, 373, 379, 383, 389, 397, 401, 409,
419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
467, 479, 487, 491, 499, 503, 509, 521, 523, 541,
547, 557, 563, 569, 571, 577, 587, 593, 599, 601,
607, 613, 617, 619, 631, 641, 643, 647, 653, 659,
661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
739, 743, 751, 757, 761, 769, 773, 787, 797, 809,
811, 821, 823, 827, 829, 839, 853, 857, 859, 863,
877, 881, 883, 887, 907, 911, 919, 929, 937, 941,
947, 953, 967, 971, 977, 983, 991, 997, 1009, 1013,
# Mising a lot of primes for the purpose of the example
]
from bisect import bisect_left as _bisect_left
from math import sqrt as _sqrt
def get_factors[n]:
assert isinstance[n, int], "n must be an integer."
assert n > 0, "n must be greather than zero."
limit = pow[_PRIMES[-1], 2]
assert n DataTransformerRegistry.enable['json']
Creating categories
Imagine that you have a variable that records month:
x1 = pd.Series[["Dec", "Apr", "Jan", "Mar"]]
Using a string to record this variable has two problems:
There are only twelve possible months, and there’s nothing saving you from typos:
x2 = pd.Series[["Dec", "Apr", "Jam", "Mar"]]
It doesn’t sort in a useful way:
x1.sort_values[] #> 1 Apr #> 0 Dec #> 2 Jan #> 3 Mar #> dtype: object
You can fix both of these problems with a factor. To create a factor you must start by creating a list of the valid levels:
month_levels = pd.Series[[
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
]]
Now you can create a factor:
y1 = pd.Categorical[x1, categories=month_levels]
y1
#> ['Dec', 'Apr', 'Jan', 'Mar']
#> Categories [12, object]: ['Jan', 'Feb', 'Mar', 'Apr', ..., 'Sep', 'Oct', 'Nov', 'Dec']
y1.sort_values[]
#> ['Jan', 'Mar', 'Apr', 'Dec']
#> Categories [12, object]: ['Jan', 'Feb', 'Mar', 'Apr', ..., 'Sep', 'Oct', 'Nov', 'Dec']
And any values not in the set will be
silently converted to nan
:
y2 = pd.Categorical[x2, categories=month_levels]
y2
#> ['Dec', 'Apr', NaN, 'Mar']
#> Categories [12, object]: ['Jan', 'Feb', 'Mar', 'Apr', ..., 'Sep', 'Oct', 'Nov', 'Dec']
Sometimes you’d prefer that the order of the levels match the order of the first appearance in the data. You can do that when creating the factor by setting levels to pd.unique[x]
:
f1 = pd.Categorical[x1, categories=pd.unique[x1]]
f1
#> ['Dec', 'Apr', 'Jan', 'Mar']
#> Categories [4, object]: ['Dec', 'Apr', 'Jan', 'Mar']
If you ever need to access the set of valid levels directly, you can do so with levels[]
:
pd.Series[f1].cat.categories
#> Index[['Dec', 'Apr', 'Jan', 'Mar'], dtype='object']
Modifying factor order
It’s often useful to change the order of the factor levels in a visualisation. For example, imagine you want to explore the average number of hours spent watching TV per day across religions:
relig_summary = gss_cat.groupby['relig'].agg[
age = ['age', np.mean],
tvhours = ['tvhours', np.mean],
n = ['tvhours', 'size']
].reset_index[]
chart = [alt.Chart[relig_summary].
encode[alt.X['tvhours'], alt.Y['relig']].
mark_circle[]]
chart.save["screenshots/altair_cat_3.png"]
It is difficult to interpret this plot because there’s no overall pattern. We can improve it by reordering the levels of relig
using the sort
argument in alt.Y[]
. The
sort
argument uses -x
to sort largest at the top and x
to sort with the largest at the bottom of the y-axis. If you would like to implement more intricate sortings using alt.EncodingSortField[]
with the following arguments.
field
, the column to use for the sorting.op
, the function you would like to use for the sort.- Optionally,
order
, allows you to take the values from theop
argument function and sort them as'descending'
or'ascending'
.
Thus, if we were going to implement more detailed sorting we would use alt.EncodingSortField[field = 'tvhours', op = 'sum', order = 'ascending']]
. Note that sorting within Altair for boxplots is not very functional. You would need to use pd.Categorical[]
to put the categories in your prefered order.
chart = [alt.Chart[relig_summary].
encode[alt.X['tvhours'], alt.Y['relig']].
mark_circle[]]
chart.save["screenshots/altair_cat_4.png"]
Reordering religion makes it much easier to see that people in the “Don’t know” category watch much more TV, and Hinduism & Other Eastern religions watch much less.
As you start making more complicated transformations, I’d recommend moving them out of Altair and into a new variable using pandas.
chart = [alt.Chart[relig_summary].
encode[alt.X['tvhours'], alt.Y['relig', sort = '-x']].
mark_circle[]]
As you start making more complicated transformations, I’d recommend moving them out of Altair and into a new variable using pandas. What if we create a similar plot looking at how average age varies across reported income level?
rincome_summary = gss_cat.groupby['rincome'].agg[
age = ['age', np.mean],
tvhours = ['tvhours', np.mean],
n = ['tvhours', 'size']
].reset_index[]
chart = [alt.Chart[rincome_summary].
encode[alt.X['age'], alt.Y['rincome', sort = '-x']].
mark_circle[]]
chart.save["screenshots/altair_cat_5.png"]
Here, arbitrarily reordering the levels isn’t a good idea! That’s because rincome
already has a principled order that we shouldn’t mess with. Reserve sorting for factors whose levels are arbitrarily ordered.
Why do you think the average age for “Not applicable” is so high?
Exercises
There are some suspiciously high numbers in
tvhours
. Is the mean a good summary?For each factor in
gss_cat
identify whether the order of the levels is arbitrary or principled.
Modifying factor levels
The pandas categorical methods for editing the categories are done using three primary methods:
rename_categories[]
: simply pass a list of the new names.add_categories[]
: new list names are appended.remove_categories[]
: Values which are removed are replaced withnp.nan
.remove_unused_categories[]
: Drops categories with no values.
You can read more about categories within pandas with the categorical data documentation.
Exercises
How have the proportions of people identifying as Democrat, Republican, and Independent changed over time?
How could you collapse
rincome
into a small set of categories?