The problem: https://projecteuler.net/problem=47
Maybe just a loop (in which we count how many distinct factors a number has) will do.
import timeit
seen_factors = dict()
seen_factors[1] = 0
def count_factors_of(given_n):
global seen_factors
n = given_n
i = 2
count_so_far = 0
while i <= n:
if n % i == 0:
count_so_far += 1
n //= i
while n % i == 0:
n //= i
if n in seen_factors:
c = seen_factors[n]
new_c = count_so_far + c
seen_factors[given_n] = new_c
return new_c
i += 1
seen_factors[given_n] = count_so_far
return count_so_far
for i in range(1, 9*4+1):
count_factors_of(i)
assert count_factors_of(3) == 1
assert count_factors_of(8) == 1
assert count_factors_of(9*4) == 2
start_time = timeit.default_timer()
n = 2
four_in_a_row = 0
while four_in_a_row < 4:
c = count_factors_of(n)
if c == 4:
four_in_a_row += 1
if four_in_a_row == 4:
print(4, "->", n-3, n-2, n-1, n)
else:
four_in_a_row = 0
n += 1
elapsed = timeit.default_timer() - start_time
print("Elapsed time", elapsed)
One way to try to make this code faster is to use $4$ as the loop step and check the $3$ neighboring numbers only when we find something with 4 distinct prime factors. This would be the following code.
start_time = timeit.default_timer()
def check_neighbors(n):
m = n-3
first = 0
while first < 3 and count_factors_of(m+first) != 4:
first += 1
found = True
for j in range(4):
if count_factors_of(m+first+j) != 4:
found = False
break
if found:
return [m+first, m+first+1, m+i+2, m+first+3]
else:
return None
n = 10 #anything large enough, but not too large, will do... for check_neighbors, we need n-3 >= 1
while True:
if count_factors_of(n) == 4:
result = check_neighbors(n)
if result is not None:
print(result)
break
n += 4
elapsed = timeit.default_timer() - start_time
print("Elapsed time", elapsed)
Ok. So that was actually faster. With pypy3
, this code runs in $9.46$ seconds.
Maybe another approach on this problem has to do with ordering prime powers. We can generate the sequence of all primes. Suppose we could generate the sequence of all products composed of four prime powers having pair-wise distinct basis. If we could do that, then we could just generate that list and, then, look for four-in-a-row in it.
Maybe, first, we should order all pairs $(p, n)$, where $p$ is a prime and $n$ is a positive integer, using the order relation $(p, n) \le (q, m) \iff p^n \leq q^m$. This is indeed an order relation.
It is not necessarilly true that an inifnite countable ordered set can be listed in an ordered way (e.g. rationals with the usual ordering), even if it has a perfectly fine order relation imposed on it (e.g. rationals, again). This one with the pairs can, though. The idea follows.
Start with a list of primes seen so far $P$. What we'll want to maintain is that the last entry $p$ of $P$ is such that $(p, 1)$ is above (according to the order relation above) all the other $(q, m)$, where $m$ is such that $(q,m-1)$ was already produced in our sequence of prime powers or $m=1$.
For each of the primes seen so far, we want the power associated to it which is to be produced. That is, for each prime $p$ in the seen primes so far, we want $n$ such that $(p,n)$ is the next to be put out when the next thing to be put out is a power of $p$. If $n$ is the number for the $j$-th entry in $P$, we'll call it $n_j$.
Our algorithm then is the following. Make sure the last prime in the list has $n=1$ (that is, it wasn't produced with $n=1$ yet) and that $(p, 1)$ is larger than all the other $(q, m)$ candidates that we have. Now, test all of them against one another. The lowest one gets produced. To see if we should extend our list of primes seen so far, we must check it against the last one that is being produced.
from random import randint
def is_prime(n):
if n <= 1:
return False
if n == 3 or n == 2:
return True
a = randint(2, n-1) # for fermat
if (a**(n-1) - 1) % n != 0:
return False
i = 2
while i*i <= n:
if n % i == 0:
return False
i += 1
return True
some_primes = [2, 3, 5, 7, 11, 13, 17, 127]
for p in some_primes:
assert is_prime(p)
def find_first_prime_gt(before_begin):
cand = before_begin+1
while True:
if is_prime(cand):
return cand
cand += 1
def all_pn_in_order():
primes = [2, 3]
powers = [1, 1]
while True:
best_p = primes[0]
best_n = powers[0]
best_pow = best_p**best_n
best_i = 0
for i in range(len(primes)):
p = primes[i]
n = powers[i]
its_pow = p**n
if its_pow < best_pow:
best_p = p
best_n = n
best_pow = its_pow
best_i = i
# We check if we need to update our primes list
last_n = powers[len(powers)-1]
if last_n == 1 and best_i < len(powers)-1:
# In here we can be sure that all the primes we haven't seen yet can't be the answer.
powers[best_i] += 1
yield (best_p, best_n)
else:
last_p = primes[len(primes)-1]
new_prime = find_first_prime_gt(last_p)
primes.append(new_prime)
powers.append(1)
for (p, n) in all_pn_in_order():
print(p, n, p**n)
if p >= 100:
break
With that built, I suspect we can list numbers that have exactly 4 prime factors in increasing order as well, which would essentially solve the problem. I have some ideas on how to do that, but they all seem messy and not really fast. To solve the problem the naive way, it took me just a few minutes for 1 minute and something to run the code. Just to be clear, then, this attempt on "improvement" is merely for intellectual curiosity. Maybe something interesting will appear (like the algorithm to output the power of prime).
I imagine, the same idea used to generate powers of primes can be used to generate products of powers of two primes and then powers of products of three primes. Or maybe I'll have to consider the order relation $(a,b) \leq (c,d) \iff ab \leq cd$ (this will be an order relation of $a,d,c,d$ are powers of primes.
What we really need here, informally speaking, is an algorithm $A$ to list in order the elements of $\{(c,d)\}$ given the elements of $\{c\}$ and the elements of $\{d\}$ in order. If we can do that, then we'll just have to apply a procedure which implements that algorithm three times. It's like $Q = A(A(P, P), A(P, P))$ where $P$ is the generator for the ordering of prime powers. Now we have to just run through $Q$. $Q$ will all the things that are at most power of 4 primes, but we can easily just ignore those that aren't powers of four primes because the things $Q$ will produce will be pairs of pairs: ( ( (p, n), (q, m) ), ( (p', n'), (q', m') ) )
It turns out we can do something better than that. At this point I abandoned this notebook file for some days and went off to do a generalization of this. The following python code is what I came up with.
from typing import Union, List, Callable, Iterator, Tuple, TypeVar, \
Iterable
ItemType = TypeVar('ItemType')
def quicksort(
items: List[ItemType],
less_or_eq: Callable[[ItemType, ItemType], bool]
) -> None:
def swap(i: int, j: int) -> None:
items[i], items[j] = items[j], items[i]
def less(a: ItemType, b: ItemType) -> bool:
return not less_or_eq(b, a)
def partition(begin: int, end: int) -> int:
import random
swap(begin, random.randint(begin, end-1))
pivot_value: ItemType = items[begin]
pivot: int = begin
for j in range(begin+1, end):
if less(items[j], pivot_value):
pivot += 1
swap(pivot, j)
swap(begin, pivot)
return pivot
def the_quicksort(begin: int, end: int) -> None:
while end - begin > 1:
pivot: int = partition(begin, end)
size_lower = pivot - begin
size_upper = end - (pivot+1)
if size_lower > size_upper:
the_quicksort(pivot+1, end)
end = pivot
else:
the_quicksort(begin, pivot)
begin = pivot+1
the_quicksort(0, len(items))
def incremental_product_sort(
iter0: Iterable,
iter1: Iterable,
prod: Callable,
le: Callable
) -> Iterable:
"""
Given two iterators (`iter0`, `iter1`) which produce elements
according to an order relation, and given a product (`prod`),
we compute the following (represented in pseudo-python-code):
SORT
[
(a, b, prod(a,b))
for a in iter0
for b in iter1
]
COMPARING WITH
lambda entry_x, entry_y: le(entry_x[2], entry_y[2])
The main usefulness of this function has to do with the fact
that iter0 and/or iter1 may be infinite sources. Under reasonable
assumptions, the above sorting can be done and this procedure
generates those entries in sorted order according to the `le`
comparison function given.
Let S0 and S1 be two sets containing the values iter0 and
iter1 produce respectively (that is, next(iter0) is always an
element of S0 and next(iter1) is always an element of S1). We
don't require S0 and S1 be the exact ranges from these iterators,
but you can think of them as being if you'd like.
Assume there are order relations on S0 and also on S1 such that
iter0 and iter1 respect that order (that is, if a is produced
before b by, say, iter1 then a is less than or equal to b in that
order relation on S1). Lets call these orderings LE0 and LE1.
Consider a space S2 and a mapping `prod` from S0 x S1 to S2. The
idea is that we get pairs (x,a) (x produced by iter0 and a produced
by iter1), which are elements of S0 x S1, and we map this pair
into another value v = prod(x,a) in S2. We want to work on v.
What we want to generate is the sequence of all (x,a,prod(x,a))
in sorted order according to `le` on prod(x,a).
The parameter fourth, `le`, is an order relation on S2 such that
((S0, LE0), (S1, LE1), (S2, le), prod)
form a compatible product-ordering system, meaning the following:
1.
(S0, LE0), (S1, LE1), (S2, le) are all (total) order relations.
2.
For all x, y in S0 and for all a, b in S1 such that x LE0 y and
a LE1 b, then prod(x,a) LE2 prod(y,b).
3.
For all x,y in S0 and a in S1, there is c in S1 such that
prod(x,a) LT2 prod(y,c) (this means prod(x,a) LE2 prod(y,c) and
not (prod(y,c) LE2 prod(x,a)) -- the use LT2 means the "less than"
derived from LE2).
4.
For all x in S0 and a,b in S1, there is z in S0 such that
prod(x,a) LT2 prod(z,b) (same remark as above).
5.
For every x in S0, iter0 will eventually produce something larger
than x (that is, it'll eventually produce a y such that x LE0 y and
not (y LE0 x)).
6.
For every y in S1, iter1 will eventually produce something larger
than or equal to y.
** Properties 3 and 4 mean that given a pair (x,a) and
a third value (be it in S0 or in S1), we can fill in the missing
one in order to become greater than (as derived from LE2) (x,a)
as viewed after calculating the product.
** Properties 5 and 6 just mean that iter0 and iter1 will eventually
produce large values, as large as we need if we just keep iterating.
With that, we generate a sequence of triples (x,a,prod(x,a))
for all x generated by iter0 combined with all the a generated by
iter1, and we want to generate this in order as given by LE2 on
prod(x,a).
For all (a,b) in range(iter0) x range(iter1), (a,b,prod(a,b)) will be
produced, including possible repetitions (e.g. consider
both iter0 and iter1 generating the sequence 1 1 2 2 3 3 4 4 ...
for example, then the pair (1,1, prod(1,1)) will appear 4 times in
the result).
As an example, consider iter0 and iter1 both being iterators for the
same infinite sequence (not the same iterator, but they generate
the same sequence of values independently): all the primes, in their
usual order. In here you can think of
S0 = S1 = { all the primes },
and
LE0 = LE1 = `the usual numeric <=`.
Consider prod(a,b) = a*b. And S2 = { all positive integers }, and
LE2 to be the usual numeric ordering as well.
That (S0, LE0), (S1, LE1), (S2, le) are all (total) order relations
is clear because the order relations, in here, are just the usual
ones that we know are indeed order relations.
To finish showing we have a compatible product-ordering system,
- For a,b,c primes, pick a prime p which is larger than ab (which
will eventually be produced by the iterators given they "generate
all the primes"). Then ab <= cp.
The other properties are trivial.
We should now generate something like the following
(2,2,4)
(2,3,6)
(3,2,6)
(3,3,9)
(2,5,10)
(5,2,10)
(2,7,14)
(7,2,14)
(3,5,15)
(5,3,15)
...
I said "something like" because there is room for different output
here. For example the third and second productions could have
been done in swapped order. In any case, we get the product of primes
in order.
If we wanted to, from that, get all the product of two distinct
primes, in order, we'd write a filter to discard adjacent
repetitions (based on the third entry) and only pass on the tuples
whose first two entries are different.
"""
# The idea is that we will read enough from both sources to produce
# a batch of all entries whose products are less than or equal to
# track_product.
#
# We work in ranges. The idea is to, at each time, produce every
# entry whose product is > last_track_product and also
# <= track_product.
track_product = None
last_track_product = None
last_track_set = False
# The following two indices allow for some prunning while
# generating entries. They are for efficiencies reasons only. The
# idea is that all the pairs that are in the i,j position with
# 0 <= i < srcs_prefix_produced[0] and
# 0 <= j < srcs_prefix_produced[1]
# were already produced.
srcs_prefix_produced = [0,0]
it0: Iterator = iter0.__iter__()
it1: Iterator = iter1.__iter__()
SRC0 = 0
SRC1 = 1
fetched: List[List] = [[], []]
finite_sources: List = [False, False]
def iprod(src_x: int, i_x: int,
src_y: int, i_y: int):
if src_x == SRC0:
assert src_x == SRC0 and src_y == SRC1
return prod(fetched[SRC0][i_x], fetched[SRC1][i_y])
else:
assert src_x == SRC1 and src_y == SRC0
return prod(fetched[SRC0][i_y], fetched[SRC1][i_x])
def fetch(src: int):
assert src == SRC0 or src == SRC1
try:
v = next(it0) if src == SRC0 else next(it1)
fetched[src].append(v)
return v
except StopIteration as e:
finite_sources[src] = True
raise e
def update_tracks(new_track_product):
nonlocal last_track_product
nonlocal last_track_set
nonlocal track_product
last_track_set = True
last_track_product = track_product
track_product = new_track_product
def gt_last_track(p):
# We assume `p` is greater than the last track if the last track
# product wasn't set yet. Otherwise, we compare.
return (not last_track_set) or not le(p, last_track_product)
def le_track(p):
return le(p, track_product)
def default_product_criteria(p):
if not le_track(p):
return 1
elif not gt_last_track(p):
return -1
else:
return 0
def produce_batch_given_product_criteria(
product_criteria: Callable
) -> List[Tuple]:
# The indices selection is through a binary search process.
#
# We first find a range of possible values from source 0 using
# binary search and a kind of "interval comparison". This is
# done by admissable_i_values and produce_i_region.
#
# Then, for each value found in this first-stage search, we
# binary search in the values found so far from source 1. This
# is done by prod_j_search and produce_j_region.
#
# OBS.: The code to do the "interval comparison"-based binary
# search on source 0 values is still here, but not used. It turns
# out that, by measurement, we concluded that it's faster to just
# go through all the values in fetched[SRC0]. I imagine for
# larger values of len(fetched[SRC0]), this will cease to be true,
# but, for the general case, "n is small". So in the statement
# to produce batch, below, we just have `fetched[SRC0]`
# instead of `admissable_i_values()`.
def produce_j_region(vi, j0) -> Iterable[Tuple]:
vj = fetched[SRC1][j0]
p = prod(vi, vj)
yield (vj, p)
loop_ranges: List[Tuple] = [
(j0-1, -1, -1),
(j0+1, len(fetched[SRC1]), 1)
]
for (begin,end,step) in loop_ranges:
for j in range(begin, end, step):
vj = fetched[SRC1][j]
p = prod(vi, vj)
if product_criteria(p) == 0:
yield (vj, p)
else:
break
def prod_j_search(vi) -> Iterable[Tuple]:
lo = 0
hi = len(fetched[SRC1])-1
while hi >= lo:
mid = lo + (hi-lo)//2
crit: int = product_criteria(prod(vi, fetched[SRC1][mid]))
if crit == 0:
yield from produce_j_region(vi, mid)
break
elif crit == 1:
hi = mid-1
else:
lo = mid+1
def produce_i_region(i0: int) -> Iterable:
vi = fetched[SRC0][i0]
yield vi
for i in range(i0+1, len(fetched[SRC0])):
pi = iprod(SRC0, i, SRC1, 0)
crit: int = product_criteria(pi)
if crit <= 0:
yield fetched[SRC0][i]
else:
break
for i in range(i0-1, -1, -1):
pi = iprod(SRC0, i, SRC1, -1)
crit: int = product_criteria(pi)
if crit >= 0:
yield fetched[SRC0][i]
else:
break
def admissable_i_values() -> Iterable:
lo = 0
hi = len(fetched[SRC0])-1
while hi >= lo:
mid = lo + (hi-lo)//2
pi_min = iprod(SRC0, mid, SRC1, 0)
pi_max = iprod(SRC0, mid, SRC1, -1)
crit_min: int = product_criteria(pi_min)
crit_max: int = product_criteria(pi_max)
if crit_max == -1:
# For i <= mid, any product involving i will be
# less than or equal to pi_max. If this pi_max is too small,
# there is no point producing for i <= mid.
lo = mid+1
elif crit_min == 1:
# For i >= mid, any product involving i will be greather
# than or equal to pi_min. If this pi_min is already too
# large, there is no point in producing for i >= mid.
hi = mid-1
else:
yield from produce_i_region(mid)
break
batch: List[Tuple] = [
(vi, vj, p)
for vi in fetched[SRC0] # admissable_i_values()
for (vj, p) in prod_j_search(vi)
]
quicksort(batch, lambda x, y: le(x[2], y[2]))
return batch
def produce_assuming_both_infinite() -> List[Tuple]:
while le(iprod(SRC0, -1, SRC1, 0), track_product):
fetch(SRC0)
while le(iprod(SRC0, 0, SRC1, -1), track_product):
fetch(SRC1)
new_track_product = min(
iprod(SRC0, -1, SRC1, 0),
iprod(SRC0, 0, SRC1, -1)
)
batch = produce_batch_given_product_criteria(
default_product_criteria)
update_tracks(new_track_product)
return batch
def produce_assuming_only_one_infinite() -> List[Tuple]:
finite_src = SRC0 if finite_sources[SRC0] else SRC1
other_src = 1 - finite_src
while le(iprod(other_src, -1, finite_src, 0), track_product):
fetch(other_src)
new_track_product = iprod(other_src, -1, finite_src, 0)
batch = produce_batch_given_product_criteria(
default_product_criteria)
update_tracks(new_track_product)
return batch
def assume_both_finite_and_finish() -> Iterable[Tuple]:
yield from produce_batch_given_product_criteria(
lambda p: 0 if gt_last_track(p) else -1
)
try:
v1 = fetch(SRC0)
v2 = fetch(SRC1)
track_product = prod(v1, v2)
except StopIteration:
raise ValueError("Empty source given: iter" +
str(SRC0 if finite_sources[SRC0] else SRC1) + ".")
try:
while True:
for entry in produce_assuming_both_infinite():
yield entry
except StopIteration:
try:
while True:
for entry in produce_assuming_only_one_infinite():
yield entry
except StopIteration:
yield from assume_both_finite_and_finish()
I won't write in here (in markdown) what it does. I did it in the docstring for incremental_product_sort
. The idea is that you give it to sequences (these could be iterators for infinite sequences) whose values are in order, a product operation and a $\leq$ comparator for the order relation on the range of this product operation (the assumptions on these things are specified, somewhat, in the docstring). This procedure will generate all the triples (a,b,prod(a,b))
in order according to the given le
on prod(a,b)
. So, for example, the first iterator could be for all primes, the second for all positive integers, the product prod = lambda a,b: a**b
and le = lambda x,y: x <= y
. This would produce powers of primes in order. The interesting thing is that this can be retrofitted into itself. With sequence of prime powers in order, I can produce products of two prime powers, in order, and then products of four prime powers, in order. Some details need to be taken care of such as removing duplicates and such. The following code does it.
def all_true(it: Iterable[bool]) -> bool:
for b in it:
if not b:
return False
return True
def multiplex_iterator(
it: Iterable,
ways: int,
cutoff_threshold: int = 100
) -> List[Iterable]:
assert ways >= 1
store: List = []
itor: Iterator = iter(it)
offset: int = 0
indices: List[int] = [0]*ways
def possibly_shrink():
nonlocal offset
if all_true(
indices[i] >= offset + cutoff_threshold
for i in range(ways)
):
offset += cutoff_threshold
del store[0:cutoff_threshold]
def an_iterable(me: int) -> Iterable:
i = 0
while True:
real_i: int = i - offset
if real_i == len(store):
store.append(next(itor))
yield store[real_i]
i += 1
if i % cutoff_threshold == 0:
indices[me] = i
possibly_shrink()
return [an_iterable(iter_id) for iter_id in range(ways)]
def ignore_sequential_repeats(
it: Iterable,
eqv: Callable
) -> Iterable:
itor: Iterator = iter(it)
done = next(itor)
yield done
for v in itor:
if not eqv(done, v):
done = v
yield v
def the_ints_from(a: int) -> Iterable[int]:
while True:
yield a
a += 1
def the_prime_numbers(
count: Union[int, None] = None
) -> Iterable[int]:
"""
By default, we generate all the primes. The second parameter,
`count` can be None or an integer. If it is None (the default
value), then we generate all the primes. If it's a given
non-negative integer, we generate `count` amount of primes.
"""
if (count is not None) and (count < 0):
raise ValueError("Given negative `count` " +
"for the_prime_numbers: " + str(count) + ".")
last_n = 1
def do_next_prime() -> int:
nonlocal last_n
last_n += 1
while not is_prime(last_n):
last_n += 1
return last_n
if count is None:
while True:
yield do_next_prime()
else:
for i in range(count):
yield do_next_prime()
All this code above has to do with a python module I've been writing for solving these project euler problems. I'll make a github thing for them. I'm also using these things as excuses to learn more about python programming. For example, I'm sure the my current use of iterators/iterables isn't quite correct.
import timeit
start_time = timeit.default_timer()
from itertools import filterfalse
def last_entry_eq(a: List, b: List) -> bool:
return a[-1] == b[-1]
def numerical_le(a: int, b: int) -> bool:
return a <= b
prime_powers = ignore_sequential_repeats(
incremental_product_sort(
the_prime_numbers(),
the_ints_from(1),
lambda p,n: p**n,
numerical_le
),
last_entry_eq
)
two_way_prime_powers = multiplex_iterator(prime_powers, 2)
prods_of_2_distinct_prime_powers = ignore_sequential_repeats(
filterfalse(
lambda p: p[0][0] == p[1][0],
incremental_product_sort(
two_way_prime_powers[0],
two_way_prime_powers[1],
lambda p,q: p[-1]*q[-1],
numerical_le
)
),
last_entry_eq
)
two_way_prod_2prime_powers = multiplex_iterator(
prods_of_2_distinct_prime_powers, 2)
prods_of_4_distinct_prime_powers = ignore_sequential_repeats(
filterfalse(
lambda p: p[0][0][0] == p[1][0][0] or
p[0][0][0] == p[1][1][0] or
p[0][1][0] == p[1][0][0] or
p[0][1][0] == p[1][1][0],
incremental_product_sort(
two_way_prod_2prime_powers[0],
two_way_prod_2prime_powers[1],
lambda p,q: p[-1]*q[-1],
numerical_le
)
),
last_entry_eq
)
p1 = next(prods_of_4_distinct_prime_powers)
p2 = next(prods_of_4_distinct_prime_powers)
p3 = next(prods_of_4_distinct_prime_powers)
p4 = next(prods_of_4_distinct_prime_powers)
SHOW_FIRST_N = 100
print("The first", SHOW_FIRST_N, "products of four prime powers whose basis are pair-wise distinct.")
print(p1)
print(p2)
print(p3)
print(p4)
i = 4
for p in prods_of_4_distinct_prime_powers:
if i < SHOW_FIRST_N:
print(p)
i += 1
if i == SHOW_FIRST_N:
print("Done.")
print("The actual answer to the problem comes next.")
if p2[-1] - p1[-1] == 1 and p3[-1] - p2[-1] == 1 and p4[-1] - p3[-1] == 1:
print(p1)
print(p2)
print(p3)
print(p4)
break
else:
p1,p2,p3,p4 = p2,p3,p4,p
elapsed = timeit.default_timer() - start_time
print("Elapsed " + str(int(1000*1000*elapsed)/1000) + "ms.")
By the way, pypy3
runs this in $33.38$ seconds (pretty interesting, having in mind the 6.48 minutes the above execution took).
In retrospective, it makes sense that this takes longer. We're doing way more than finding four numbers in a row that have exactly four prime factors (which was what the previous code was trying to do).