python: having trouble returning a pandas data frame from a user defined function (probably user error)
I have a function that creates a DataFrame. Within the function i can have it printed. But I am doing something wrong in the return process, because I can't seem to call the DataFrame after running the function. Below is my dummy code and the attached error. import pandas as pd def testfunction(new_df_to_output): new_df_to_output = pd.DataFrame() S1 = pd.Series([33,66], index=['a', 'b']) S2 = pd.Series([22,44], index=['a', 'b']) S3 = pd.Series([11,55], index=['a', 'b']) new_df_to_output = new_df_to_output.append([S1, S2, S3], ignore_index=True) print new_df_to_output print type(new_df_to_output) print dir() return new_df_to_output testfunction('Desired_DF_name') print dir() print Desired_DF_name The DataFrame prints properly within the function. The directory shows that the DataFrame is not returned after the function. Trying to print that dataframe returns returns the following error Traceback (most recent call last): File "functiontest.py", line 21, in print Desired_DF_name NameError: name 'Desired_DF_name' is not defined I am sure it is a simple mistake but I can't find the solution after searching Stackoverflow and python tutorials. Any guidance is greatly appreciated.
Inside testfunction, the variable new_df_to_output is essentially a label that you are assigning to the passed in object. testfunction('Desired_DF_name') doesn't do what you think; it is assigning the value of the string 'Desired_DF_name' to the variable new_df_to_output; it is not creating a new variable named Desired_DF_name. Basically it's the same as writing new_df_to_output = 'Desired_DF_name'. You want to save the DataFrame that is returned from the function into a variable. So instead of testfunction('Desired_DF_name') you want def testfunction(): ... Desired_DF_name = testfunction() (You can change the definition of testfunction to remove the new_df_to_output parameter. The function wasn't doing anything with it anyway because you immediately reassign the variable: new_df_to_output = pd.DataFrame().)
I think you really want something like this: import pandas as pd def testfunction(): result = pd.DataFrame() S1 = pd.Series([33,66], index=['a', 'b']) S2 = pd.Series([22,44], index=['a', 'b']) S3 = pd.Series([11,55], index=['a', 'b']) result.append([S1, S2, S3], ignore_index=True) return result Desired_DF_name = testfunction() You should carefully read Defining Functions and More on Defining Functions in the documentation.
print Desired_DF_name I'm guessing print is expecting a DataFrame instance, but there is no DataFrame instance in your code snippet that is named Desired_DF_name.
how to tell python 3 to skip over non-digit characters from a csv file
writing numpy codes in cython with unknown dimensions
How to refresh django connection when inserting in SQL
How to get this enum if I only have a string representation in Python 2.7
How to escape Unicode in Python 3
Performance of timezone-aware Pandas DateTimeIndex
How to remove single space between text
Writing out results with function in.txt document when there's min and max?
Not understanding why I cant use use cx_Oracle with Django
Convert a complex array of array to a list
sorted on basis of two keys, descending order sort for first and ascending for second
Django + redis session sharing accross multiple hosts
how to access a database of one module from another module
Mqtt subscribe message while continuous publishing to topic
flask-restless validation_exceptions not working for few column in flask-sqlalchemy models
I don't understand how cache work's on GAE python