# Letter frequencies: plot a histogram ordering the values PYTHON

by alienflow   Last Updated September 24, 2018 07:26 AM

What I am trying to do is to analyse the frequency of the letters in a text. As an example, I will use here a small sentence, but all that is thought to analyse huge texts (so it's better to be efficient).

# Well, I have the following text:

``````test = "quatre jutges dun jutjat mengen fetge dun penjat"
``````

Then I created a function which counts the frequencies

``````def create_dictionary2(txt):
dictionary = {}
i=0
for x in set(txt):
dictionary[x] = txt.count(x)/len(txt)
return dictionary
``````

And then

``````import numpy as np
import matplotlib.pyplot as plt
test_dict = create_dictionary2(test)
plt.bar(test_dict.keys(), test_dict.values(), width=0.5, color='g')
``````

I obtain

ISSUES: I want to see all the letters, but some of them are not seen (Container object of 15 artists) How to expand the histogram? Then, I would like to sort the histogram, to obtain something like from this

this

Tags :

For counting we can use a `Counter` object. Counter also supports getting key-value pairs on the most common values:

``````from collections import Counter

import numpy as np
import matplotlib.pyplot as plt

c = Counter("quatre jutges dun jutjat mengen fetge dun penjat")
plt.bar(*zip(*c.most_common()), width=.5, color='g')
``````

The `most_common` method returns a list of key-value tuples. The `*zip(*..)` is used to unpack (see this answer).

Note: I haven't updated the width or color to match your result plots.

ikkuh
September 24, 2018 07:09 AM

Another solution using pandas:

``````import pandas as pd
import matplotlib.pyplot as plt

test = "quatre jutges dun jutjat mengen fetge dun penjat"

# convert input to list of chars so it is easy to get into pandas
char_list = list(test)

# create a dataframe where each char is one row
df = pd.DataFrame({'chars': char_list})
# drop all the space characters
df = df[df.chars != ' ']
# add a column for aggregation later
df['num'] = 1
# group rows by character type, count the occurences in each group
# and sort by occurance
df = df.groupby('chars').sum().sort_values('num', ascending=False)

plt.bar(df.index, df.num, width=0.5, color='g')
plt.show()
``````

Result:

sobek
September 24, 2018 07:20 AM